CN112602094A - Data processing apparatus, data processing method, and accelerator - Google Patents

Data processing apparatus, data processing method, and accelerator Download PDF

Info

Publication number
CN112602094A
CN112602094A CN202080004332.0A CN202080004332A CN112602094A CN 112602094 A CN112602094 A CN 112602094A CN 202080004332 A CN202080004332 A CN 202080004332A CN 112602094 A CN112602094 A CN 112602094A
Authority
CN
China
Prior art keywords
control instruction
instruction
module
data
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080004332.0A
Other languages
Chinese (zh)
Inventor
韩峰
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN112602094A publication Critical patent/CN112602094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the application provides a data processing device, a data processing method and an accelerator. The device comprises a control module, a data loading module and a processing module; the data loading module is used for responding to a control instruction of the control module and loading data to be processed for processing by the processing module; the processing module responds to the control instruction of the control module and processes the data to be processed; the control module controls the data loading module and the processing module to execute different control instructions at the same time. In this embodiment, the control module controls the data loading module and the processing module to execute different control instructions at the same time, which is beneficial to improving the utilization rate of processing resources and avoiding the waste of processing resources caused by the waiting process.

Description

Data processing apparatus, data processing method, and accelerator
Technical Field
The present application relates to the field of computer data processing, and in particular, to a data processing apparatus, a data processing method, and an accelerator.
Background
With the advance of the technology, various product implementation algorithms or program products are developed towards refinement, so that the processing processes of various product implementation algorithms or program products are complex and tedious, and large calculation or processing resources need to be consumed, and how to ensure the comprehensive utilization of the processing or calculation resources becomes a technical problem to be solved urgently.
As an example, a Convolutional Neural Network (CNN) is a complex and nonlinear hypothesis model, and the used model parameters are obtained by training and learning, and have the capability of fitting data. The convolutional neural network algorithm can be applied to scenes such as machine vision, natural language processing and the like, and when the CNN algorithm is implemented in an embedded system, because the processing of the neural network consumes resources greatly, the calculation resources and the real-time property need to be fully considered. Therefore, there is a need to improve the computational resource utilization of neural network processing.
Disclosure of Invention
In view of the above, it is an object of the embodiments of the present application to provide a data processing apparatus, a data processing method, and an accelerator.
First, according to a first aspect of an embodiment of the present application, a data processing apparatus is provided, which includes a control module, a data loading module, and a processing module;
the data loading module is used for responding to a control instruction of the control module and loading data to be processed for processing by the processing module;
the processing module responds to the control instruction of the control module and processes the data to be processed;
the control module controls the data loading module and the processing module to execute different control instructions at the same time.
According to a second aspect of the embodiments of the present application, there is provided a data processing method applied to a data processing apparatus, where the data processing apparatus includes a data loading module and a processing module; the method comprises the following steps:
responding to a control instruction, and loading data through the data loading module to be used by the processing module for data processing; and the number of the first and second groups,
responding to the control instruction, and performing data processing through the processing module; the data loading module and the processing module execute different control instructions at the same time.
According to a third aspect of embodiments of the present application, there is provided an accelerator including the apparatus of any one of the first aspect.
The embodiment of the application has the following beneficial effects:
in this embodiment, the control module controls the data loading module and the processing module to execute different control instructions at the same time, and when the processing module processes to-be-processed data corresponding to a current control instruction, the data loading module may load to-be-processed data corresponding to a next control instruction, so that the utilization rate of processing resources is improved, and waste of the processing resources caused by a waiting process is avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a schematic structural diagram of a first data processing apparatus according to an exemplary embodiment of the present application.
Fig. 2 is a schematic structural diagram of a second data processing apparatus according to an exemplary embodiment of the present application.
FIG. 3 is a flow chart illustrating the execution of control instructions according to an exemplary embodiment of the present application.
Fig. 4 is a schematic structural diagram of a third data processing apparatus according to an exemplary embodiment of the present application.
Fig. 5 is a schematic diagram illustrating a fourth data processing apparatus according to an exemplary embodiment of the present application.
FIG. 6 is a diagram illustrating execution of a first type of control instruction according to an exemplary embodiment.
FIG. 7 is a diagram illustrating execution of a second type of control instruction according to an exemplary embodiment.
Fig. 8 is a schematic structural diagram of a fifth data processing apparatus according to an exemplary embodiment of the present application.
FIG. 9 is a flow chart illustrating the execution of control instructions according to an exemplary embodiment of the present application.
FIG. 10 is a diagram illustrating execution of a third control instruction according to an exemplary embodiment.
Fig. 11 is a flow chart illustrating a data processing method according to an exemplary embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Based on the problems in the related art, please refer to fig. 1, an embodiment of the present application provides a data processing apparatus, and fig. 1 is a schematic structural diagram of a first data processing apparatus according to an exemplary embodiment of the present application. The device comprises: a control module 11, a data loading module 12 and a processing module 13. In another embodiment, the data loading module is not limited to one. That is, the data loading module may have a plurality. For example, if the data processing apparatus is used for data processing of convolution operation, the data processing apparatus includes two data loading modules, i.e., a feature map loading module and a weight loading module.
The data loading module 12, in response to the control instruction of the control module 11, loads data to be processed for processing by the processing module 13.
The processing module 13, in response to the control instruction of the control module 11, performs processing on data to be processed.
The control module 11 controls the data loading module 12 and the processing module 13 to execute different control instructions at the same time. In another embodiment, the control module 11 controls the data loading module 12 to process the control instruction in advance without waiting for the end of the processing of the last control instruction in the whole data processing apparatus. For example, the control module 11 does not need to wait for the control instruction x0 to finish processing in the processing module 13, and the control module 11 can control the data loading module 12 to process the operation corresponding to the control instruction x1 in advance, wherein the control instruction x0 and the control instruction x1 are control instructions executed in sequence.
In this embodiment, the control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12 and the processing module 13; the data loading module 12 is used for responding to the control instruction and loading the data to be processed for processing by the processing module 13; the processing module 13 responds to the control instruction to process the data to be processed; in order to further improve the comprehensive utilization rate of the processing resources, the control module 11 may control the data loading module 12 and the processing module 13 to execute different control instructions at the same time, that is, the control module 11 may control the data loading module 12 and the processing module 13 to execute different control instructions at the same time, after the data loading module 12 finishes loading the data to be processed corresponding to the current control instruction, it is not necessary to wait for the processing module 13 to finish processing the data to be processed corresponding to the current control instruction, the data loading module 12 may directly load the data corresponding to the next control instruction based on the control of the control module 11, that is, when the processing module 13 processes the data to be processed corresponding to the current control instruction, the data loading module 12 loads the data to be processed corresponding to the next control instruction, therefore, the utilization rate of processing resources is improved, and the waste of the processing resources caused by the waiting process is avoided.
In an embodiment, in response to the data loading module 12 completing executing the ith control instruction, the control module 11 sends the (i + 1) th control instruction to the data loading module 12; responding to the fact that the processing module 13 finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module 13; wherein i is an integer.
It should be noted that the fact that the data loading module 12 finishes executing the ith control instruction means that the data loading module 12 finishes loading the to-be-processed data corresponding to the ith control instruction; the completion of the execution of the ith control instruction by the processing module 13 means that the processing module 13 completes the processing of the data to be processed corresponding to the ith control instruction.
The data loading module 12 is configured to respond to the ith control instruction, and load to-be-processed data corresponding to the ith control instruction. The processing module 13 responds to the ith control instruction, and processes the data to be processed corresponding to the ith control instruction to obtain a processing result. When the processing module 13 processes to-be-processed data corresponding to the ith control instruction in response to the ith control instruction, if the data loading module 12 has already loaded the to-be-processed data corresponding to the ith control instruction, the data loading module 12 may directly receive the (i + 1) th control instruction sent by the control module 11, and load to-be-processed data corresponding to the (i + 1) th control instruction in response to the (i + 1) th control instruction without waiting for the processing module 13 to finish executing the ith control instruction; in this embodiment, the time for the data loading module 12 to wait for the next control instruction is further reduced, so as to avoid the waste of processing resources caused by the waiting time.
In one embodiment, the processing module 13 comprises a systolic array; the processing module 13, in response to the ith control instruction, writes the data to be processed corresponding to the ith control instruction into a systolic array, and performs an operation on the data to be processed through the systolic array to obtain the processing result. The embodiment realizes the data process of the data to be processed through a hardware structure, and is beneficial to improving the processing efficiency.
Fig. 2 is a schematic structural diagram of a second data processing apparatus according to an exemplary embodiment of the present application. The device comprises: a control module 11, a data loading module 12, a processing module 13 and a data writing back module 14.
The data loading module 12, in response to the control instruction of the control module 11, loads data to be processed for processing by the processing module 13.
The processing module 13, in response to the control instruction of the control module 11, performs processing on data to be processed.
The data writing back module 14, in response to the control instruction of the control module 11, writes the processing result of the data to be processed into an external storage module.
The control module 11 controls the data loading module 12, the processing module 13, and the data writing back module 14 to execute different control instructions at the same time.
In this embodiment, the control module 11 receives a control instruction from an external module, and sends the control instruction to the data loading module 12, the processing module 13, and the data writing-back module 14; the data loading module 12 is used for responding to the control instruction and loading the data to be processed for processing by the processing module 13; the processing module 13 responds to the control instruction, and processes the data to be processed to obtain a processing result; the data write-back module 14 writes the processing result into an external storage module in response to the control instruction.
In order to further improve the comprehensive utilization rate of processing resources, the control module 11 may control the data loading module 12, the processing module 13, and the data write-back module 14 to execute different control instructions at the same time, that is, the control module 11 may control the data loading module 12, the processing module 13, and the data write-back module 14 to execute different control instructions at the same time; after the data loading module 12 finishes loading the data to be processed corresponding to the current control instruction, the data loading module 12 can directly load the data to be processed corresponding to the next control instruction based on the control of the control module 11 without waiting for the processing module 13 to finish processing the data to be processed corresponding to the current control instruction; after the processing module 13 finishes processing the data corresponding to the current control instruction, the processing module 13 can directly process the data to be processed corresponding to the next control instruction based on the control of the control module 11 without waiting for the data write-back module 14 to write the processing result corresponding to the current control instruction into the external storage module; that is to say, when the data write-back module 14 writes the processing result corresponding to the previous control instruction into the external storage module, the processing module 13 may process data corresponding to the current control instruction, and the data loading module 12 may load data corresponding to the next control instruction, so as to improve the utilization rate of processing resources and avoid the waste of processing resources caused by the waiting process.
In an embodiment, in response to the data loading module 12 completing executing the ith control instruction, the control module 11 sends the (i + 1) th control instruction to the data loading module 12; responding to the fact that the processing module 13 finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module 13; and, in response to the data write-back module 14 completing the execution of the ith control instruction, sending the (i + 1) th control instruction to the data write-back module 14; wherein i is an integer.
It should be noted that the fact that the data loading module 12 finishes executing the ith control instruction means that the data loading module 12 finishes loading the to-be-processed data corresponding to the ith control instruction; the completion of the execution of the ith control instruction by the processing module 13 means that the processing module 13 completes the processing of the data to be processed corresponding to the ith control instruction; the data write-back module 14 finishes executing the ith control instruction means that the data write-back module 14 finishes writing the processing result corresponding to the ith control instruction into the external storage module.
The data loading module 12 is configured to respond to the ith control instruction, and load to-be-processed data corresponding to the ith control instruction. The processing module 13 responds to the ith control instruction, and processes the data to be processed corresponding to the ith control instruction to obtain a processing result. The data write-back module 14 responds to the ith control instruction, and writes a processing result corresponding to the ith control instruction into an external storage module.
When the processing module 13 responds to the ith control instruction to process the to-be-processed data corresponding to the ith control instruction, if the data loading module 12 has already loaded the to-be-processed data corresponding to the ith control instruction, the data loading module 12 may directly receive the (i + 1) th control instruction sent by the control module 11 and load the to-be-processed data corresponding to the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the processing module 13; the embodiment further reduces the waiting time of the data loading module 12, and avoids the waste of processing resources caused by the waiting time.
Further, when the data write-back module 14 responds to the ith control instruction to write the processing result corresponding to the ith control instruction into the external storage module, if the processing module 13 has already processed the to-be-processed data corresponding to the ith control instruction, without waiting for the completion of the execution of the ith control instruction by the data write-back module 14, the processing module 13 may directly receive the (i + 1) th control instruction sent by the control module 11, and respond to the (i + 1) th control instruction to process the to-be-processed data corresponding to the (i + 1) th control instruction; the embodiment further reduces the waiting time of the data processing module 13, and avoids the waste of processing resources caused by the waiting time.
In an example, please refer to fig. 3, which is a timing diagram illustrating the data loading module 12, the processing module 13, and the data writing-back module 14 processing control instructions; after the data loading module 12 finishes loading the to-be-processed data corresponding to the control instruction 1, the control instruction 2 can be directly obtained and the to-be-processed data corresponding to the control instruction 2 can be loaded without waiting for the completion of the processing of the to-be-processed data corresponding to the control instruction 1 by the processing module 13 and the completion of the writing of the processing result corresponding to the control instruction 1 by the data writing back module 14, so that the time for the data loading module 12 to wait for the next control instruction is further reduced, and the waste of processing resources caused by the waiting time is avoided; correspondingly, after the processing module 13 finishes processing the to-be-processed data corresponding to the control instruction 1 and obtains the processing result, the processing module can directly obtain the control instruction 2 and process the to-be-processed data corresponding to the control instruction 2 without waiting for the data write-back module 14 to write the processing result into the external storage module, so that the time for the processing module 13 to wait for the next control instruction is further reduced, and the waste of processing resources caused by the waiting time is avoided.
It can be understood that, in the embodiment of the present application, no limitation is imposed on the specific type of the external storage module, and the specific setting may be performed according to an actual application scenario. For example, may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
In an exemplary embodiment, the data processing apparatus may be applied to correlation processing based on a convolutional neural network, which is a machine learning algorithm widely used for computer vision tasks such as object recognition, object detection, and semantic segmentation of images. The structure of a neural network typically includes an input layer, one or more hidden layers, and an output layer, wherein operations in the hidden layers include, but are not limited to, convolution operations, pooling operations, or activation operations; in general, the hidden layers in the convolutional neural network may be named according to the types of operations, for example, the hidden layers performing convolution operation may be classified as convolutional layers, the hidden layers performing pooling operation may be classified as pooling layers, or the hidden layers performing activation operation may be classified as activation layers. The data processing device provided by the embodiment of the application can carry out convolution operation of the convolution layer, pooling operation of the pooling layer and activation operation of the activation layer, so that the operation process of the deep neural network is accelerated in a hardware mode, the operation time of the deep neural network is reduced, and the operation efficiency is improved.
The objects processed by the convolutional neural network include, but are not limited to, images, audio, text, or the like, different types of hidden layers in the convolutional neural network correspond to different operation parameters, for example, the operation parameters of the convolutional layers are convolution kernels, the operation parameters of the active layers are activation functions, and the operation parameters of the pooling layers are pooling parameters.
The following description will be given taking as an example the application of the convolutional neural network to the field of image processing, where the data processing apparatus is used to perform convolution operations of convolutional layers in the convolutional neural network: the convolution operation process of the convolution layer is to perform vector inner product operation on the image to be processed based on the convolution kernel to obtain a feature map; in this scenario, the control module 11 sends a convolution operation control instruction to the data loading module 12, the processing module 13, and the data write-back module 14, respectively, and the data loading module 12 loads the image to be processed and the convolution kernel in response to the convolution operation control instruction of the control module 11; the processing module 13 responds to a convolution operation control instruction of the control module 11, and performs convolution operation on the image to be processed through the convolution core to obtain a feature map; the data write-back module 14 responds to the convolution operation control instruction of the control module 11, and writes the feature map into an external storage module; the control module 11 controls the data loading module 12, the processing module 13, and the data writing back module 14 to execute different convolution operation control instructions at the same time, and controls a parallel processing process of the instructions through convolution operation, so that the operation efficiency is effectively improved, and the utilization rate of computing resources is also improved.
Further, in order to improve the data loading efficiency, the data loading module 12 may include an object loading unit and a parameter loading unit, where the object loading unit is configured to load the image to be processed, the parameter loading unit is configured to load the convolution kernel, and the object loading unit and the parameter loading unit are simultaneously loaded, which is favorable for improving the loading efficiency.
In one embodiment, the control module 11 includes at least one instruction slot for caching the control instruction. In one embodiment, the instruction slot includes a set of instruction cache flags and control status signals. Wherein the instruction cache flag is used to indicate whether the instructions cached in the instruction slot are valid. For example, the instruction cache flag may indicate which control instruction is cached by the instruction slot and whether the cached control instruction is valid. The set of control state signals is used to represent the operating state of the corresponding module or represent the operating state of other instruction slots related to the instruction slot. In one embodiment, the working state refers to whether processing or operation of the corresponding module is completed or whether instruction cache operation of the corresponding instruction slot is completed. Through the arrangement of at least one instruction slot, the execution of each instruction can be coordinated, so that each submodule in the data processing device can process the operation corresponding to different instructions at the same time. Therefore, the work efficiency of the data processing device is improved.
The control module 11 sends the (i + 1) th control instruction cached in the instruction slot to the data loading module 12 in response to the data loading module 12 completing the execution of the ith control instruction; responding to the fact that the processing module 13 finishes executing the ith control instruction, and sending the (i + 1) th control instruction cached in the instruction slot to the processing module 13; and in response to the data write-back module 14 completing the execution of the ith control instruction, sending the (i + 1) th control instruction cached in the instruction slot to the data write-back module 14.
Further, the instruction slot may record a state of the control instruction cached in the instruction slot, and the control module 11 may control an execution process of the control instruction according to the state of the control instruction recorded in at least one instruction slot. In an example, taking that the data loading module 12, the processing module 13, and the data writing back module 14 respectively correspond to an instruction slot, and taking an example that one of the instruction slots corresponds to the data loading module 12, after the control module 11 sends the ith control instruction in the instruction slot to the data loading module 12, correspondingly, the state of the control instruction recorded in the instruction slot represents that the ith control instruction is invalid, and at this time, the control module 11 may cache the (i + 1) th control instruction in the instruction slot. Accordingly, after the i +1 th control instruction caches the value of the instruction slot, the state of the control instruction recorded in the instruction slot is changed to indicate that the i +1 th control instruction is valid, which indicates that the i +1 th control instruction has not been sent to the data load module 12. In this embodiment, through the states of the control instructions recorded in the instruction slots, it is ensured that the control module 11 can sequentially cache different control instructions, so as to ensure that different control instructions are sequentially sent to the data loading module 12, the processing module 13, and the data write-back module 14.
In one implementation, the state of the control module 11 recorded in the instruction slot may be represented by a control instruction state signal, where the control instruction state signal indicates whether the control instruction cached in the corresponding instruction slot is valid. In an example, when the control instruction status signal in the instruction slot indicates that the ith control instruction cached in the instruction slot is invalid, the control module 11 may cache the (i + 1) th control instruction in the instruction slot, and set the status value of the control instruction status signal to a value representing valid, which is used to indicate that the (i + 1) th control instruction cached in the instruction slot is valid.
The data loading module 12, the processing module 13, and the data writing back module 14 respectively correspond to an instruction slot, and whether the control instruction cached in the instruction slot is valid is related to whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the instruction slot.
As one implementation manner, a control instruction completion signal may be used to indicate whether a module corresponding to the at least one instruction slot completes an operation of a control instruction corresponding to the at least one instruction slot; the "whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the at least one instruction slot" means whether the module corresponding to the at least one instruction slot completes the reception of the control instruction corresponding to the instruction slot.
The control module 11 comprises at least one instruction slot for caching the control instruction; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates that the corresponding module completes the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid; otherwise, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is valid. In this embodiment, through the control instruction status signal and the control instruction completion signal, it is ensured that the control module 11 can buffer different control instructions in order, so as to ensure that different control instructions are sent to the data loading module 12, the processing module 13, and the data write-back module 14 in order.
Further, the data loading module 12, the processing module 13, and the data writing back module 14 each correspond to an instruction slot, and the buffering process of the control instruction in the instruction slot is sequentially transmitted in an order corresponding to the data loading module 12, the processing module 13, and the data writing back module 14; in one example, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14; when the ith control instruction in the second instruction slot 111 is invalid, the (i + 1) th control instruction cached in the second instruction slot 111 is cached in the second instruction slot 111, when the ith control instruction in the third instruction slot 112 is invalid, the (i + 1) th control instruction cached in the second instruction slot 111 is cached in the third instruction slot 112, and when the ith control instruction in the fourth instruction slot 113 is invalid, the (i + 1) th control instruction cached in the third instruction slot 112 is cached in the fourth instruction slot 113, so that the (i + 1) th control instruction is sequentially cached in the second instruction slot 111, the third instruction slot 112 and the fourth instruction slot 113.
Whether the control instruction cached in the instruction slot is valid is related to whether the module corresponding to the at least one instruction slot completes the operation of the control instruction corresponding to the instruction slot or not, and is also related to whether the module corresponding to the at least one instruction slot completes the operation of the corresponding control instruction or not; the "whether at least one instruction slot completes the operation of the corresponding control instruction" refers to whether the control instruction in the instruction slot is cached to another instruction slot.
In an implementation manner, two control instruction completion signals may be set, where one control instruction completion signal is used to indicate whether a module corresponding to the at least one instruction slot completes an operation of the control instruction corresponding to the at least one instruction slot, and the other control instruction completion signal is used to indicate whether the at least one instruction slot completes an operation of the corresponding control instruction.
The control module 11 comprises at least one instruction slot for caching the control instruction; the instruction slot comprises a control instruction state signal and at least two control instruction completion signals; when one of the control instruction completion signals indicates the corresponding module to complete the operation of the corresponding control instruction, and the other control instruction completion signal indicates the at least one instruction slot to complete the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid; otherwise, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is valid. In this embodiment, through the control instruction status signal and the two control instruction completion signals, it is ensured that the control module 11 can buffer different control instructions in order, so as to ensure that different control instructions are sent to the data loading module 12, the processing module 13, and the data write-back module 14 in order.
In one example, referring to fig. 4, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14. In one embodiment, the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are all configured to: after the control module 11 finishes sending the ith control instruction at different times, respectively caching the corresponding (i + 1) th control instruction. In another embodiment, the instruction slot may cache the corresponding instruction control word of the control instruction after decoding, without caching the corresponding control instruction. The second instruction slot 111 is configured to cache the i +1 th control instruction sent to the data loading module 12 after the control module 11 finishes sending the i-th control instruction; the third instruction slot 112 is configured to cache the (i + 1) th control instruction sent to the processing module 13 after the control module 11 finishes sending the ith control instruction; the fourth instruction slot 113 is configured to cache the i +1 st control instruction sent to the data write-back module 14 after the control module 11 finishes sending the i-th control instruction. It should be noted that, the time points of the i +1 th control instruction in the above instruction slots may be different. The "sending the ith control instruction is completed" means that after the control module 11 sends the ith control instruction to the data loading module 12 corresponding to the second instruction slot 111, the processing module 13 corresponding to the third instruction slot 112, or the data write-back module 14 corresponding to the fourth instruction slot 113, the data loading module 12, the processing module 13, or the data write-back module 14 completes receiving the ith control instruction.
For a second instruction slot 111, in response to invalidation of the ith control instruction in the second instruction slot 111, the control module 11 caches the (i + 1) th control instruction to the second instruction slot 111; and in response to the data loading module 12 completing the execution of the ith control instruction, sending the (i + 1) th control instruction in the second instruction slot 111 to the data loading module 12.
The second instruction slot 111 includes a second valid signal, a data load module completion signal, and a third instruction slot completion signal; the second valid signal is used to indicate whether the control instruction cached in the second instruction slot 111 is valid; the data loading module completion signal is used to indicate whether the data loading module 12 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the data loading module 12); the third instruction slot complete signal is used to indicate whether the third instruction slot 112 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the second instruction slot 111 to the third instruction slot 112).
When the ith control instruction has been sent to the data loading module 12 and cached to the third instruction slot 112, the state values of the data loading module completion signal and the third instruction slot completion signal are set to the values representing completion, and meanwhile, the state value of the second valid signal is set to the value representing invalidity, at this time, the control module 11 may cache the (i + 1) th control instruction to the second instruction slot 111; when the (i + 1) th control instruction is cached to the second instruction slot 111, the (i + 1) th control instruction is not sent to the data loading module 12 and is cached to the third instruction slot 112, so that the state value of the second valid signal is set to the value representing valid, and the state values of the data loading module completion signal and the third instruction slot completion signal are set to the values representing unfinished.
For a third instruction slot 112, in response to invalidation of the ith control instruction in the third instruction slot 112, the control module 11 caches the (i + 1) th control instruction in the second instruction slot 111 into the third instruction slot 112; and in response to the processing module 13 completing the execution of the ith control instruction, sending the (i + 1) th control instruction in the third instruction slot 112 to the processing module 13.
The third instruction slot 112 includes a third valid signal, a processing module completion signal, and a fourth instruction slot 113 completion signal. The third valid signal is used to indicate whether the control instruction cached in the third instruction slot 112 is valid; the processing module completion signal is used to indicate whether the processing module 13 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the processing module 13); the fourth instruction slot 113 completion signal is used to indicate whether the fourth instruction slot 113 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the third instruction slot 112 to the fourth instruction slot 113).
When the ith control instruction has been sent to the processing module 13 and cached to the fourth instruction slot 113, the state values of the processing module completion signal and the fourth instruction slot 113 completion signal are set to the values representing completion, and meanwhile, the state value of the third valid signal is set to the value representing invalidity, and the control module 11 may cache the (i + 1) th control instruction to the third instruction slot 112; when the (i + 1) th control instruction is cached to the third instruction slot 112, the (i + 1) th control instruction is not sent to the processing module 13 and is cached to the fourth instruction slot 113, the state value of the third valid signal is set as a value representing valid, and the state values of the processing module completion signal and the fourth instruction slot 113 completion signal are set as values representing incomplete.
For the fourth instruction slot 113, in response to invalidation of the ith control instruction in the fourth instruction slot 113, the control module 11 caches the (i + 1) th control instruction in the third instruction slot 112 to the fourth instruction slot 113; and in response to the data write-back module 14 completing the execution of the ith control instruction, sending the (i + 1) th control instruction in the fourth instruction slot 113 to the processing module 13.
Wherein, the fourth instruction slot 113 includes a fourth valid signal and a data write-back module 14 completion signal; the fourth valid signal is used to indicate whether the control instruction cached in the fourth instruction slot 113 is valid; the data write-back module 14 complete signal is used to indicate whether the data write-back module 14 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the data write-back module 14).
When the ith control instruction is sent to the data write-back module 14, the state value of the completion signal of the data write-back module 14 is set to a value representing completion, and meanwhile, the state value of the fourth valid signal is set to a value representing invalidity, and the control module 11 may cache the (i + 1) th control instruction to the fourth instruction slot 113; when the (i + 1) th control instruction is cached in the fourth instruction slot 113, the (i + 1) th control instruction is not yet sent to the data write-back module 14 by the control module 11, at this time, the state value of the fourth valid signal is set to a value representing valid, and the state value of the data write-back module 14 completion signal is set to a value representing incomplete.
In this embodiment, by setting the second instruction slot 111 corresponding to the data loading module 12, the third instruction slot 112 corresponding to the processing module 13, and the fourth instruction slot 113 corresponding to the data write-back module 14 in the control module 11, the control module 11 can respectively control the data loading module 12, the processing module 13, and the data write-back module 14 through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, so as to ensure that the data loading module 12, the processing module 13, and the data write-back module 14 can execute different control instructions at the same time, thereby reducing corresponding waiting time, and facilitating improvement of utilization rate of processing resources.
In another example, please refer to fig. 5, which is a schematic structural diagram of a fourth data processing apparatus according to an exemplary embodiment of the present application, in the embodiment shown in fig. 5, based on the embodiment shown in fig. 4, the control module 11 further includes a first instruction slot 114, where the first instruction slot 114 is used for caching the decoded control instruction; wherein the decoded control instruction may be represented by a corresponding instruction control word.
For the first instruction slot 114, the control module 11 decodes the ith control instruction; in response to the decoded ith control instruction in the first instruction slot 114 being invalid, caching the decoded (i + 1) th control instruction into the first instruction slot 114; and in response to invalidation of the decoded ith control instruction in the second instruction slot 111, caching the decoded (i + 1) th control instruction in the first instruction slot 114 into the second instruction slot 111.
Wherein the first instruction slot 114 comprises a first valid signal and a second instruction slot complete signal; the first valid signal is used to indicate whether the control instruction cached in the first instruction slot 114 is valid; the second instruction slot complete signal is used to indicate whether the second instruction slot 111 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the first instruction slot 114 to the second instruction slot 111).
When the decoded ith control instruction has been sent to the second instruction slot 111, the state value of the second instruction slot completion signal is set to a value representing completion, and meanwhile, the state value of the first valid signal is set to a value representing invalidity, and the control module 11 may cache the decoded (i + 1) th control instruction to the first instruction slot 114; when the decoded (i + 1) th control instruction is cached in the first instruction slot 114, the (i + 1) th control instruction is not cached in the second instruction slot 111 by the control module 11, the state value of the first valid signal is set to be a value representing valid, and the state value of the second instruction slot completion signal is set to be a value representing incomplete.
Accordingly, the control instructions buffered in the second instruction slot 111, the third instruction slot 112 and the fourth instruction slot 113 are decoded control instructions. For the execution process of the decoded control instruction cached in the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, reference may be made to the description in the embodiment shown in fig. 4, and details are not repeated here.
In this embodiment, by setting the first instruction slot 114, the second instruction slot 111 corresponding to the data loading module 12, the third instruction slot 112 corresponding to the processing module 13, and the fourth instruction slot 113 corresponding to the data writing-back module 14 in the control module 11, the control module 11 can respectively control the data loading module 12, the processing module 13, and the data writing-back module 14 through the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113, so as to ensure that the data loading module 12, the processing module 13, and the data writing-back module 14 can execute different control instructions at the same time, thereby reducing corresponding waiting time and facilitating improvement of utilization rate of processing resources.
In an exemplary embodiment, the data processing apparatus provided in the embodiment of the present application may be applied to a convolutional neural network to process a target object (the target object includes, but is not limited to, an image, audio, video, or text, etc.), and may perform a convolutional operation of a convolutional layer, a pooling operation of a pooling layer, or an activation operation of an activation layer, so as to accelerate an operation process of a deep neural network in a hardware manner, reduce an operation time of the deep neural network, and improve operation efficiency.
The following description will be given taking as an example the application of the convolutional neural network to the field of image processing, where the data processing apparatus is used to perform convolution operations of convolutional layers in the convolutional neural network: the control module 11 receives a convolution operation control instruction and distributes the convolution operation control instruction to the data loading module 12, the processing module 13 and the data writing back module 14. Referring to fig. 6, the control module 11 divides the distribution process of the control instruction into 4 stages, which are a decoding stage, a loading stage, an execution stage, and a storage stage. The embodiment realizes the control of the execution process of different control instructions through the above 4 stages.
The decoding stage corresponds to the first instruction slot 114, and the control module 11 caches the decoded control instruction in the first instruction slot 114.
The loading stage corresponds to the second instruction slot 111, and the control module 11 caches the decoded control instruction in the first instruction slot 114 to the second instruction slot 111, and then sends the decoded control instruction cached in the second instruction slot 111 to the data loading module 12.
The execution stage corresponds to the third instruction slot 112, and the control module 11 caches the decoded control instruction in the second instruction slot 111 to the third instruction slot 112, and then sends the decoded control instruction cached in the third instruction slot 112 to the processing module 13.
The storage stage corresponds to the fourth instruction slot 113, and the control module 11 caches the decoded control instruction in the third instruction slot 112 to the fourth instruction slot 114, and then sends the decoded control instruction cached in the fourth instruction slot 114 to the data write-back module 14.
The control method comprises the steps that different control instructions are cached in each instruction slot at different moments, and the execution of the control instructions is controlled through control instruction state signals and control instruction completion signals in the instruction slots. For convenience of understanding, please refer to fig. 7, which illustrates the buffering of the control instructions in each instruction slot at different time periods:
at time period T0: the control module 11 decodes the control instruction a and caches the decoded control instruction a to the first instruction slot 114 corresponding to the decoding stage; at this time, in the first instruction slot 114, the first valid signal indicates that the decoded control instruction a is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete the buffering of the decoded control instruction a.
At time period T1: the control module 11 caches the decoded control instruction a in the first instruction slot 114 to the second instruction slot 111 corresponding to the loading stage, and then sends the decoded control instruction a in the second instruction slot 111 to the data loading module 12; at this time, in the second instruction slot 111, the second valid signal indicates that the decoded control instruction a is valid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction a.
Meanwhile, since the decoded control instruction a in the first instruction slot 114 is cached to the second instruction slot 111, the first valid signal in the first instruction slot 114 indicates that the decoded control instruction a is invalid, the second instruction slot completion signal indicates that the second instruction slot 111 completes caching the decoded control instruction a, the control module 11 may decode the control instruction b and cache the decoded control instruction b in the first instruction slot 114, accordingly, in the first instruction slot 114, the first valid signal indicates that the decoded control instruction b is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete caching the decoded control instruction b.
At time period T2: the control module 11 caches the decoded control instruction a in the second instruction slot 111 to a third instruction slot 112 corresponding to the execution stage, and then sends the decoded control instruction a in the third instruction slot 112 to the processing module 13; at this time, in the third instruction slot, the third valid signal indicates that the decoded control instruction a is valid, the processing module completion signal indicates that the processing module 13 completes receiving the decoded control instruction a, and the fourth instruction slot completion signal indicates that the fourth instruction slot 113 does not complete buffering the decoded control instruction a.
Meanwhile, since the decoded control instruction a in the second instruction slot 111 is cached to the third instruction slot 112, the second valid signal in the second instruction slot indicates that the decoded control instruction a is invalid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction a, and the third instruction slot completion signal indicates that the third instruction slot 112 completes caching the decoded control instruction a, the control module 11 may cache the decoded control instruction b in the first instruction slot 114 into the second instruction slot 111, and in a possible case, the loading module 12 is still executing the decoded control instruction a, the decoded control instruction b cannot be issued, and the second valid signal indicates that the decoded control instruction b is valid in the second instruction slot, the data load module complete signal indicates that the data load module 12 does not complete receiving the decoded control instruction b, and the third instruction slot complete signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction b.
Meanwhile, since the decoded control instruction b in the first instruction slot 114 is cached to the second instruction slot 111, the first valid signal in the first instruction slot indicates that the decoded control instruction b is invalid, the second instruction slot completion signal indicates that the second instruction slot 111 completes caching the decoded control instruction b, and then the control module 11 may decode the control instruction c and cache the decoded control instruction c into the first instruction slot 114, and accordingly, in the first instruction slot, the first valid signal indicates that the decoded control instruction c is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete caching the decoded control instruction c.
At time T3: the control module 11 caches the decoded control instruction a in the third instruction slot 112 to the fourth instruction slot 113 corresponding to the storage stage, and then sends the decoded control instruction a in the fourth instruction slot 113 to the data write-back module 14.
Meanwhile, because the data loading module 12 is still executing the decoded control instruction a, the decoded control instruction b cannot be issued to the data loading module 12 and is still cached in the second instruction slot; at this time, in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, the data loading module completion signal indicates that the data loading module 12 does not complete receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction b.
At time T4: after the data loading module 12 finishes executing the decoded control instruction a, the control module 11 sends the decoded control instruction b in the second instruction slot to the loading module 12; at this time, in the second instruction slot, the second valid signal indicates that the decoded control instruction b is valid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction b.
At time T5: the control module 11 caches the decoded control instruction b in the second instruction slot 111 to a third instruction slot 112, and then sends the decoded control instruction b cached in the third instruction slot 112 to the processing module 13 after the processing module 13 finishes executing the decoded control instruction b; at this time, in the third instruction slot, the third valid signal indicates that the decoded control instruction b is valid, the processing module completion signal indicates that the processing module 13 completes receiving the decoded control instruction b, and the fourth instruction slot completion signal indicates that the fourth instruction slot 113 does not complete buffering the decoded control instruction b.
Meanwhile, since the decoded control instruction b in the second instruction slot 111 is cached to the third instruction slot 112, the second valid signal in the second instruction slot 111 indicates that the decoded control instruction b is invalid, the data loading module completion signal indicates that the data loading module 12 completes receiving the decoded control instruction b, and the third instruction slot completion signal indicates that the third instruction slot 112 completes caching the decoded control instruction b, the control module 11 may cache the decoded control instruction c in the first instruction slot 114 into the second instruction slot 111, and in a possible case, the loading module 12 is still executing the decoded control instruction b, and the decoded control instruction c cannot be issued to the data loading module 12, then the second valid signal in the second instruction slot 111 indicates that the decoded control instruction c is valid, the data load module complete signal indicates that the data load module 12 does not complete receiving the decoded control instruction c, and the third instruction slot complete signal indicates that the third instruction slot 112 does not complete buffering the decoded control instruction c.
Meanwhile, since the decoded control instruction c in the first instruction slot 114 is cached to the second instruction slot 111, the first valid signal in the first instruction slot indicates that the decoded control instruction c is invalid, the second instruction slot completion signal indicates that the second instruction slot 111 completes caching the decoded control instruction c, and then the control module 11 may decode the control instruction d and cache the decoded control instruction d into the first instruction slot 114, accordingly, the first valid signal in the first instruction slot 114 indicates that the decoded control instruction d is valid, and the second instruction slot completion signal indicates that the second instruction slot 111 does not complete caching the decoded control instruction d.
At time period T6: the control module 11 caches the decoded control instruction b in the third instruction slot 112 to a fourth instruction slot 113 corresponding to the storage stage, and then sends the decoded control instruction b cached in the fourth instruction slot 113 to the data write-back module 14 after the data write-back module 14 finishes executing the decoded control instruction a.
Meanwhile, because the loading module 12 is still executing the decoded control instruction b, the decoded control instruction c cannot be issued and still remains cached in the second instruction slot, the second valid signal in the second instruction slot 111 indicates that the decoded control instruction c is valid, the data loading module completion signal indicates that the data loading module 12 does not complete receiving the decoded control instruction c, and the third instruction slot completion signal indicates that the third instruction slot 112 does not complete caching the decoded control instruction c.
In an exemplary embodiment, the data to be processed includes an object to be processed and an operation parameter, wherein the object to be processed includes, but is not limited to, an image, audio or text; the operational parameters include, but are not limited to, convolution kernels, pooling parameters, or activation functions.
Wherein the processing module 13 comprises a systolic array; the processing module 13 responds to the ith control instruction, writes the object to be processed and the operation parameter corresponding to the ith control instruction into the systolic array, and performs an operation on the object to be processed and the operation parameter through the systolic array to obtain the processing result.
In one example, the following description will be given taking as an example the case where the data processing apparatus is used to perform a convolution operation of an image: the object to be processed is an image, the operation parameter is a convolution kernel, the processing module 13 writes the image and the convolution kernel into the systolic array, and performs product accumulation operation on the image and the convolution kernel through the systolic array to obtain a convolution image.
In an embodiment, in consideration that in a convolutional neural network application scenario, to-be-processed data to be loaded at least includes two parts, namely, an object to be processed and an operating parameter, in order to further improve data loading efficiency, please refer to fig. 8, which is a schematic diagram of a fifth data processing apparatus according to an exemplary embodiment of the present application. The data loading module 12 includes an object loading unit 121 and a parameter loading unit 122. The control module 11, in response to the object loading unit 121 completing executing the ith control instruction, sends the (i + 1) th control instruction to the object loading unit 121; and in response to the parameter loading unit 122 completing the execution of the ith control instruction, sending the (i + 1) th control instruction to the parameter loading unit 122. In this embodiment, the object loading unit 121 and the parameter loading unit 122 are used to load the object to be processed and the operating parameter respectively, and load the object and the operating parameter simultaneously, which is beneficial to improving the loading efficiency.
It should be noted that the fact that the object loading unit 121 finishes executing the ith control instruction means that the object loading unit 121 finishes loading the object to be processed corresponding to the i control instructions; the fact that the parameter loading unit 122 finishes executing the ith control instruction means that the parameter loading unit 122 finishes loading the operation parameters corresponding to the ith control instruction.
The object loading unit 121, in response to the ith control instruction, loads an object to be processed corresponding to the ith control instruction; the parameter loading unit 122 loads an operating parameter corresponding to the ith control instruction in response to the ith control instruction.
When the processing module 13 responds to the ith control instruction to process the to-be-processed data corresponding to the ith control instruction, if the object loading unit 121 has already loaded the to-be-processed object corresponding to the ith control instruction, the object loading unit 121 may directly receive the (i + 1) th control instruction sent by the control module 11 and load the to-be-processed object corresponding to the (i + 1) th control instruction without waiting for the processing module 13 to finish executing the ith control instruction; similarly, if the parameter loading unit 122 has already loaded the operation parameter corresponding to the ith control instruction, the parameter loading unit 122 may directly receive the (i + 1) th control instruction sent by the control module 11 and load the operation parameter corresponding to the (i + 1) th control instruction without waiting for the processing module 13 to finish executing the ith control instruction; in this embodiment, the waiting time of the object loading unit 121 and the parameter loading unit 122 for the (i + 1) th control instruction is further reduced, and the waste of processing resources caused by too long waiting time is avoided.
In one embodiment, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14; when the data loading module 12 includes the object loading unit 121 and the parameter loading unit 122, correspondingly, for the second instruction slot 111, the control module 11, in response to the invalidation of the ith control instruction in the second instruction slot 111, caches the (i + 1) th control instruction to the second instruction slot 111; and in response to the object loading unit 121 and the parameter loading unit 122 completing execution of the ith control instruction, sending the (i + 1) th control instruction in the second instruction slot 111 to the data loading module 12.
The second instruction slot 111 includes a second valid signal, an object load unit completion signal, a parameter load unit completion signal, and a third instruction slot completion signal; the second valid signal is used to indicate whether the control instruction cached in the second instruction slot 111 is valid; the object load unit completion signal is used to indicate whether the object load unit 121 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the object load unit 121); the parameter loading unit completion signal is used to indicate whether the parameter loading unit 122 completes receiving the control instruction (i.e. whether the control module 11 has sent the control instruction to the parameter loading unit 122); the third instruction slot complete signal is used to indicate whether the third instruction slot 112 completes caching the control instruction (i.e. whether the control module 11 has cached the cached instruction in the second instruction slot 111 to the third instruction slot 112).
When the ith control instruction has been sent to the object loading unit 121, the parameter loading unit 122, and has been cached to the third instruction slot 112, the state values of the object loading unit completion signal, the parameter loading unit completion signal, and the third instruction slot completion signal are set to the values representing completion, and meanwhile, the state value of the second valid signal is set to the value representing invalidity, at this time, the control module 11 may cache the (i + 1) th control instruction to the second instruction slot 111; when the (i + 1) th control instruction is cached to the second instruction slot 111, the (i + 1) th control instruction is not sent to the object loading unit 121, the parameter loading unit 122, and the third instruction slot 112, so that the state value of the second valid signal is set as a value representing valid, and the state values of the data loading module completion signal and the third instruction slot completion signal are set as values representing unfinished.
For the execution process of the third instruction slot 112 and the fourth instruction slot 113, reference may be made to the embodiment shown in fig. 3, which is not described herein in detail in this embodiment of the application.
In an embodiment, in order to further improve the efficiency of data processing, target data to be processed may be divided into at least two parts, where the target data to be processed is one part of the target data to be processed, the control module 11 performs control through at least two control instructions, and one control instruction indicates one part of the target data to be processed, so as to implement processing on the target data to be processed, and since the target data to be processed is divided into at least two parts, when the data to be processed is loaded based on one control instruction, the data loading module 12 loads only one part of the target data to be processed, which is beneficial to improving loading efficiency, so that the processing module 13 does not need to wait for the data loading module 12 to completely load the target data to be processed, and can process the loaded target data to be processed more quickly, further improving the processing efficiency.
The target data to be processed is divided into at least two parts, the control is controlled through at least two control instructions, one control instruction indicates one part of the target data to be processed, and the target data to be processed is processed.
The result non-write-back control instruction is used to instruct the processing module 13 to not send the processing result to the data write-back module 14 after obtaining the processing result, but to cache the processing result, where the processing result corresponding to the result non-write-back control instruction is not the final processing result finally written into the external storage module by the data write-back module 14, but is a part of the final processing result; after receiving the result non-write-back control instruction, the processing module 13 processes corresponding data to be processed according to the result non-write-back control instruction to obtain a processing result and cache the processing result, and then generates an end signal sent to the control signal after the cache is completed, the control module 11 receives the end signal sent after the processing module 13 caches the processing result, and the end signal represents that the execution of the result non-write-back control instruction in the data processing device is completed.
The result write-back control instruction is used to instruct the processing module 13 to send all processing results related to the target data to be processed to the data write-back module 14, after receiving the result write-back control instruction, the processing module 13 writes back, according to the result write-back control instruction, processing the corresponding data to be processed to obtain a processing result, integrating all processing results related to the target data to be processed, sending the integrated processing results to the data write-back module 14, writing the integrated processing results into an external storage module by the data write-back module 14, after the write operation is completed, the data write-back module 14 generates an end signal and sends it to the control module 11, the control module 11 receives an end signal sent by the data write-back module 14 after the processing result is written into the data write-back module, the end signal represents that the execution of the result write-back control instruction in the data processing device is finished.
In this embodiment, since the target data to be processed is divided into at least two parts, and each part is indicated by a control instruction, when the data to be processed corresponding to the control instruction is loaded by the data loading module 12, only a part of the target data to be processed needs to be loaded, which is beneficial to improving the loading efficiency, and reduces the time for the processing module 13 to wait for the data loading module 12 to load the target data to be processed, so that the processing can more quickly process the loaded target data to be processed, which is beneficial to improving the processing efficiency.
Further, the control module 11 may return an end signal corresponding to the control instruction to the external control module to notify the external control module that the control instruction has been executed, so that the external control module performs a next processing step based on a final processing result written in the external storage module.
The two control instructions have different processing modes for the obtained processing result and different sending ending signals, the result write-back control instruction sends an ending signal after the data write-back module 14 completes the instruction, and the result non-write-back control instruction sends an ending signal after the processing module 13 completes the instruction; in an exemplary scenario, the data write-back module 14 is writing the final processing result a corresponding to the target data a to be processed into an external storage module, at this time, the data write-back module 14 is not executed yet, and therefore, an end signal a sent to the control module 11 is not generated, at this time, the control module 11 may have already processed a part of the target data B to be processed, that is, a part of the target data B to be processed1And generates an end signal B sent to the control module 111If the control module 11 sends the end signal B to the host computer at this time1Returning to the external control module, which is directly based on the end signal B1The next step is performed ignoring the not yet received end signal a, and the external control module may directly skip the processing step based on the final processing result a, possibly causing a process flow error.
Therefore, in order to ensure the accuracy of the processing flow, the control module 11 returns the end signal corresponding to the control instruction to the external control module according to the receiving sequence and the first-in first-out principle of the control instruction, and if it is determined that the currently received end signal is not currently to be sent according to the receiving sequence and the first-in first-out principle of the control instruction, the currently received end signal is buffered first until the end signal is sent in turn, and the end signal is returned to the external control module. In this embodiment, the end signal is subjected to order preserving processing, so that the control signal received from the external control module first is ensured, and the corresponding end signal is returned to the external control module first, thereby ensuring the accuracy and the ordered execution of the data processing flow.
In an example, please refer to fig. 9, for example, the target data C to be processed is divided into two parts, including data C1 to be processed and data C2 to be processed, the control instruction includes a result write-back control instruction and a result non-write-back control instruction, the result non-write-back control instruction corresponds to the data C1 to be processed, and the result write-back control instruction corresponds to the data C2 to be processed.
In the embodiment shown in fig. 9, the data loading module 12 loads the to-be-processed data c1 based on the result non-write-back control instruction, and the processing module 13 processes the to-be-processed data c1 based on the result non-write-back control instruction, so as to obtain a processing result c1, and caches the processing result c 1; the data loading module 12 loads the data to be processed C2 based on the data write-back control instruction, the processing module 13 processes the data to be processed C2 based on the result write-back control instruction to obtain a processing result C2, then integrates the processing result C1 with the processing result C2 to obtain a final processing result (C1, C2) corresponding to the target data C to be processed, and then the data write-back module 14 writes the final processing result (C1, C2) into an external storage module.
In an exemplary embodiment, the data to be processed includes an object to be processed and an operation parameter, wherein the object to be processed includes, but is not limited to, an image, audio or text; the operational parameters include, but are not limited to, convolution kernels, pooling parameters, or activation functions. In order to further improve the efficiency of data processing, the target object to be processed may be divided into at least two parts and the target operation parameter may be divided into at least two parts, the target object to be processed is one part of the target object to be processed, and the operation parameter is one part of the target operation parameter.
Correspondingly, the control instruction includes a result write-back control instruction and a result non-write-back control instruction, the result non-write-back control instruction is used for instructing the processing module 13 not to send the processing result to the data write-back module 14 after obtaining the processing result, but to cache the processing result, and the processing result corresponding to the result non-write-back control instruction is not the final processing result finally written into the external storage module by the data write-back module 14, but is a part of the final processing result; the result write-back control instruction is configured to instruct the processing module 13 to send all processing results related to the target object to be processed and the target operating parameter to the data write-back module 14, the processing module 13 integrates all processing results related to the target object to be processed and the target operating parameter according to the result write-back control instruction and then sends the integrated processing results to the data write-back module 14, and the data write-back module 14 writes the integrated processing results into an external storage module.
In this embodiment, the target object to be processed and the target operation parameter are divided into at least two parts, and each part is indicated by a control instruction, so that the data loading module 12 only needs to load one part of the target object to be processed and the target operation parameter when loading the target object to be processed and the operation parameter corresponding to the control instruction, which is beneficial to improving the loading efficiency, and reduces the time for the processing module 13 to wait for the data loading module 12 to load the target data to be processed, so that the processing can more quickly process the loaded target data to be processed, and is beneficial to improving the processing efficiency.
It should be noted that, in the embodiments shown in fig. 4 and fig. 5, the control module 11 includes a second instruction slot 111 corresponding to the data loading module 12, a third instruction slot 112 corresponding to the data processing module 13, and a fourth instruction slot 113 corresponding to the data writing-back module 14; the second instruction slot 111, the third instruction slot 112, and the fourth instruction slot 113 are used for caching the control instruction, wherein when the control instruction is a result non-write-back control instruction, it is not necessary to cache the data non-write-back control instruction in the fourth instruction slot 113, nor to send the data non-write-back instruction to the data write-back module 14, that is, the data write-back module 14 does not need to execute the result non-write-back control instruction.
In an exemplary embodiment, the data processing apparatus provided in the embodiment of the present application may be applied to a convolutional neural network to process a target object (the target object includes, but is not limited to, an image, audio, video, or text, etc.), and may perform a convolutional operation of a convolutional layer, a pooling operation of a pooling layer, or an activation operation of an activation layer, so as to accelerate an operation process of a deep neural network in a hardware manner, reduce an operation time of the deep neural network, and improve operation efficiency.
The following description will be given taking as an example the application of the convolutional neural network to the field of image processing, where the data processing apparatus is used to perform convolution operations of convolutional layers in the convolutional neural network: the control module 11 receives a convolution operation control instruction and distributes the convolution operation control instruction to the data loading module 12, the processing module 13 and the data writing back module 14; referring to fig. 10, the control module 11 divides the distribution process of the control instruction into 4 stages, which are a decoding stage, a loading stage, an execution stage, and a storage stage. The decode stage corresponds to the first instruction slot 114, the load stage corresponds to the second instruction slot 111, the execute stage corresponds to the third instruction slot 112, and the store stage corresponds to the fourth instruction slot 113; when the control instruction is a result non-write-back control instruction, the data non-write-back control instruction does not need to be cached in the fourth instruction slot 113, and the data non-write-back instruction does not need to be sent to the data write-back module 14, that is, the data write-back module 14 does not need to execute the result non-write-back control instruction; when the control instruction is a result write-back control instruction, the result write-back control instruction needs to be cached in the fourth instruction slot 113 and sent to the data write-back module 14, and executed by the data write-back module 14.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement the method without creative effort.
Correspondingly, please refer to fig. 11, an embodiment of the present application further provides a data processing method, which is applied to a data processing apparatus, where the data processing apparatus includes a data loading module and a processing module; the method comprises the following steps:
in step S101, in response to a control instruction, data is loaded by the data loading module for data processing by the processing module.
In step S102, in response to the control instruction, performing data processing by the processing module; the data loading module and the processing module execute different control instructions at the same time.
In an embodiment, the method further comprises:
responding to the control instruction, and writing a processing result of the data to be processed into an external storage module; the data loading module, the processing module and the data writing back module execute different control instructions at the same time.
In an embodiment, the method further comprises:
responding to the fact that the data loading module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the data loading module; responding to the fact that the processing module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module; wherein i is an integer.
The step S101 includes: and responding to the ith control instruction, and loading the data to be processed corresponding to the ith control instruction.
The step S102 includes: and responding to the ith control instruction, and processing the data to be processed corresponding to the ith control instruction to obtain a processing result.
In an embodiment, the apparatus further comprises a data write back module.
The method further comprises the following steps:
and responding to the fact that the data write-back module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the data write-back module.
And responding to the ith control instruction, and writing a processing result corresponding to the ith control instruction into an external storage module through the data write-back module.
In one embodiment, the method further comprises:
when the processing module responds to the ith control instruction to process the data to be processed corresponding to the ith control instruction, the data loading module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the processing module.
In one embodiment, the method further comprises:
when the data write-back module responds to the ith control instruction and writes the processing result corresponding to the ith control instruction into an external storage module, the processing module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the data write-back module.
In one embodiment, the method further comprises:
and caching the control instruction through at least one instruction slot, and controlling the execution process of the control instruction according to the state of the control instruction recorded by the at least one instruction slot.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the at least one instruction slot comprises at least one control instruction state signal, wherein the control instruction state signal is used for indicating whether the control instruction cached in the corresponding instruction slot is valid or not.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the at least one instruction slot comprises at least one control instruction completion signal; the at least one control instruction completion signal is used to indicate whether a module corresponding to the at least one instruction slot completes an operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes an operation of the corresponding control instruction.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates the corresponding module to complete the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
In one embodiment, the method further comprises:
caching the control instruction through at least one instruction slot; and the (i + 1) th control instruction sent to the data loading module, the processing module and the data writing-back module is acquired from the instruction slot.
In one embodiment, the method further comprises:
and caching the control instruction corresponding to the data loading module through a second instruction slot, caching the control instruction corresponding to the processing module through a third instruction slot, and caching the control instruction corresponding to the data writing-back module through the third instruction slot.
In one embodiment, the method further comprises:
in response to the ith control instruction in the second instruction slot being invalid, caching the (i + 1) th control instruction into the second instruction slot.
The step of sending the (i + 1) th control instruction to the data loading module in response to the data loading module finishing executing the ith control instruction comprises:
and sending the (i + 1) th control instruction in the second instruction slot to the data loading module in response to the completion of the execution of the ith control instruction by the data loading module.
In one embodiment, the method further comprises:
in response to the ith control instruction in the third instruction slot being invalid, caching the (i + 1) th control instruction in the second instruction slot into the third instruction slot.
The step of sending the (i + 1) th control instruction to the processing module in response to the processing module finishing executing the ith control instruction comprises:
and sending the (i + 1) th control instruction in the third instruction slot to the processing module in response to the fact that the processing module finishes executing the ith control instruction.
In one embodiment, the method further comprises:
in response to the invalidation of the ith control instruction in the fourth instruction slot, caching the (i + 1) th control instruction in the third instruction slot into the fourth instruction slot.
The response to the data write-back module finishing executing the ith control instruction, sending the (i + 1) th control instruction to the data write-back module, including:
and in response to the completion of the execution of the ith control instruction by the data write-back module, sending the (i + 1) th control instruction in the fourth instruction slot to the processing module.
In one embodiment, the method further comprises:
decoding the ith control instruction; in response to invalidation of the decoded ith control instruction in the first instruction slot, caching the decoded (i + 1) th control instruction into the first instruction slot; and the number of the first and second groups,
in response to invalidation of the decoded ith control instruction in the second instruction slot, caching the decoded (i + 1) th control instruction in the first instruction slot into the second instruction slot.
In one embodiment, the control instructions include a result write-back control instruction and a result not write-back control instruction.
The data to be processed is one part of target data to be processed; the target data to be processed is divided into at least two parts.
The result non-write-back control instruction is used for indicating the processing module to cache the processing result;
the result write-back control instruction is used for instructing the processing module to send all processing results related to the target data to be processed to the data write-back module.
In one embodiment, the step S102 includes:
processing corresponding data to be processed according to the result non-write-back control instruction to obtain a processing result and cache the processing result; and the number of the first and second groups,
and processing the corresponding data to be processed according to the result write-back control instruction to obtain a processing result, and integrating all processing results related to the target data to be processed and then sending the integrated processing result to the data write-back module.
In one embodiment, the method further comprises:
if the control instruction is the result non-write-back control instruction, receiving an end signal sent by the processing module after the processing result is cached; and the number of the first and second groups,
if the control instruction is the result write-back control instruction, receiving an end signal sent by the data write-back module after the processing result is written in; wherein the end signal indicates that the result is not written back to the control instruction or that the execution of the result is completed in the data processing apparatus.
In one embodiment, the method further comprises:
and returning an ending signal corresponding to the control instruction to an external control module according to the receiving sequence of the control instruction and a first-in first-out principle.
In one embodiment, the method further comprises:
and if the currently received ending signal is determined not to be currently sent according to the receiving sequence of the control instruction and a first-in first-out principle, caching the currently received ending signal.
In an embodiment, the data to be processed includes an object to be processed and an operation parameter.
In one embodiment, the object to be processed includes any one of: images, audio, or text; the operating parameter includes any one of: convolution kernels, pooling parameters, or activation functions.
In one embodiment, the data loading module includes an object loading unit and a parameter loading unit.
The step S101 includes:
responding to the ith control instruction, and loading the object to be processed corresponding to the ith control instruction through the object loading unit; and the number of the first and second groups,
and responding to the ith control instruction, and loading the operating parameters corresponding to the ith control instruction through the parameter loading unit.
In one embodiment, the step S102 includes:
and responding to the ith control instruction, writing the data to be processed corresponding to the ith control instruction into a pulse array, and performing operation on the data to be processed through the pulse array to obtain a processing result.
In one embodiment, the step S102 includes:
and responding to the ith control instruction, respectively writing the object to be processed and the operation parameters corresponding to the ith control instruction into the pulse array, and performing operation on the object to be processed and the operation parameters through the pulse array to obtain a processing result.
For a specific implementation manner of the method embodiment, reference may be made to the description of the apparatus embodiment, which is not described herein again.
Correspondingly, the embodiment of the application also provides an accelerator, which comprises the device in any one of the above items.
The accelerator may be applied to various neural networks, such as convolutional neural networks.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method and apparatus provided by the embodiments of the present invention are described in detail above, and the principle and the embodiments of the present invention are explained in detail herein by using specific examples, and the description of the embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (57)

1. A data processing device is characterized by comprising a control module, a data loading module and a processing module;
the data loading module is used for responding to a control instruction of the control module and loading data to be processed for processing by the processing module;
the processing module responds to the control instruction of the control module and processes the data to be processed;
the control module controls the data loading module and the processing module to execute different control instructions at the same time.
2. The apparatus of claim 1, further comprising a data write back module,
the data write-back module responds to a control instruction of the control module and writes a processing result of the data to be processed into an external storage module;
the control module controls the data loading module, the processing module and the data writing back module to execute different control instructions at the same time.
3. The apparatus of claim 1,
the control module is specifically configured to: responding to the fact that the data loading module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the data loading module; responding to the fact that the processing module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module; wherein i is an integer;
the data loading module is specifically configured to: responding to the ith control instruction, and loading data to be processed corresponding to the ith control instruction;
the processing module is specifically configured to: and responding to the ith control instruction, and processing the data to be processed corresponding to the ith control instruction to obtain a processing result.
4. The apparatus of claim 3, further comprising a data write back module;
the control module is further configured to: responding to the completion of the execution of the ith control instruction by the data write-back module, and sending the (i + 1) th control instruction to the data write-back module;
the data write back module is to: and responding to the ith control instruction, and writing a processing result corresponding to the ith control instruction into an external storage module.
5. The apparatus of claim 3,
when the processing module responds to the ith control instruction to process the data to be processed corresponding to the ith control instruction, the data loading module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the processing module.
6. The apparatus of claim 4,
when the data write-back module responds to the ith control instruction and writes the processing result corresponding to the ith control instruction into an external storage module, the processing module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the data write-back module.
7. The apparatus of claim 1, wherein the control module comprises at least one instruction slot for caching the control instructions; and the control module controls the execution process of the control instruction according to the state of the control instruction recorded by at least one instruction slot.
8. The apparatus of claim 1, wherein the control module comprises at least one instruction slot for caching control instructions; the at least one instruction slot comprises at least one control instruction state signal, wherein the control instruction state signal is used for indicating whether the control instruction cached in the corresponding instruction slot is valid or not.
9. The apparatus of claim 1, wherein the control module comprises at least one instruction slot for caching control instructions; the at least one instruction slot comprises at least one control instruction completion signal; the at least one control instruction completion signal is used to indicate whether a module corresponding to the at least one instruction slot completes an operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes an operation of the corresponding control instruction.
10. The apparatus of claim 1, wherein the control module comprises an instruction slot to cache the control instruction; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates the corresponding module to complete the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
11. The apparatus of claim 1, wherein the control module comprises an instruction slot to cache the control instruction; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
12. The apparatus of claim 4, wherein the control module comprises an instruction slot to cache the control instruction;
the control module is specifically configured to: responding to the fact that the data loading module finishes executing the ith control instruction, and sending the (i + 1) th control instruction cached in the instruction slot to the data loading module; and responding to the fact that the processing module finishes executing the ith control instruction, and sending the (i + 1) th control instruction cached in the instruction slot to the processing module; and responding to the completion of the execution of the ith control instruction by the data write-back module, and sending the (i + 1) th control instruction cached in the instruction slot to the data write-back module.
13. The apparatus of claim 4, wherein the control module comprises a second instruction slot corresponding to the data load module, a third instruction slot corresponding to the processing module, and a fourth instruction slot corresponding to the data write back module;
the second instruction slot, the third instruction slot, and the fourth instruction slot are all configured to: after the control module finishes sending the ith control instruction at different moments, caching the corresponding (i + 1) th control instruction respectively;
the control module is specifically configured to: sending the (i + 1) th control instruction cached in the second instruction slot to the data loading module in response to the completion of the execution of the ith control instruction by the data loading module; and in response to the processing module finishing executing the ith control instruction, sending the (i + 1) th control instruction cached in the third instruction slot to the processing module; and responding to the completion of the execution of the ith control instruction by the data write-back module, and sending the (i + 1) th control instruction cached in the fourth instruction slot to the data write-back module.
14. The apparatus of claim 4, wherein the control module comprises a second instruction slot corresponding to the data loading module;
the second instruction slot is used for caching the control instruction sent to the data loading module;
the control module is further configured to: in response to the ith control instruction in the second instruction slot being invalid, caching the (i + 1) th control instruction into the second instruction slot; and sending the (i + 1) th control instruction in the second instruction slot to the data loading module in response to the completion of the execution of the ith control instruction by the data loading module.
15. The apparatus of claim 14, wherein the control module further comprises a third instruction slot corresponding to the processing module;
the third instruction slot is used for caching the control instruction sent to the processing module;
the control module is further configured to: in response to the ith control instruction in the third instruction slot being invalid, caching the (i + 1) th control instruction in the second instruction slot into the third instruction slot; and responding to the fact that the processing module finishes executing the ith control instruction, and sending the (i + 1) th control instruction in the third instruction slot to the processing module.
16. The apparatus of claim 15, wherein the control module further comprises a fourth instruction slot corresponding to the data write back module;
the fourth instruction slot is used for caching the control instruction sent to the data write-back module;
the control module is further configured to: in response to the ith control instruction in the fourth instruction slot being invalid, caching the (i + 1) th control instruction in the third instruction slot into the fourth instruction slot; and responding to the fact that the data writing-back module finishes executing the ith control instruction, and sending the (i + 1) th control instruction in the fourth instruction slot to the processing module.
17. The apparatus of claim 14, wherein the control module further comprises a first command slot;
the first instruction slot is used for caching the decoded control instruction;
the control module is further configured to: decoding the ith control instruction; in response to invalidation of the decoded ith control instruction in the first instruction slot, caching the decoded (i + 1) th control instruction into the first instruction slot; and in response to invalidation of the ith decoded control instruction in the second instruction slot, caching the (i + 1) th decoded control instruction in the first instruction slot into the second instruction slot.
18. The apparatus of claim 2, wherein the control instructions comprise a result write back control instruction and a result not write back control instruction;
the data to be processed is one part of target data to be processed; the target data to be processed is divided into at least two parts;
the result non-write-back control instruction is used for indicating the processing module to cache the processing result;
the result write-back control instruction is used for instructing the processing module to send all processing results related to the target data to be processed to the data write-back module.
19. The apparatus of claim 18,
the processing module is specifically configured to: processing corresponding data to be processed according to the result non-write-back control instruction to obtain a processing result and cache the processing result; and processing the corresponding data to be processed according to the result write-back control instruction to obtain a processing result, and integrating all processing results related to the target data to be processed and then sending the integrated processing result to the data write-back module.
20. The apparatus of claim 18,
if the control instruction is the result not to be written back to the control instruction, the control module is further configured to: receiving an end signal sent by the processing module after the processing result is cached; and
if the control instruction is the result write-back control instruction, the control module is further configured to: receiving an end signal sent by the data write-back module after the processing result is written into the data write-back module;
wherein the end signal indicates that the result is not written back to the control instruction or that the execution of the result is completed in the data processing apparatus.
21. The apparatus of claim 20,
the control module is further configured to: and returning an ending signal corresponding to the control instruction to an external control module according to the receiving sequence of the control instruction and a first-in first-out principle.
22. The apparatus of claim 21,
the control module is further configured to: and if the currently received ending signal is determined not to be currently sent according to the receiving sequence of the control instruction and a first-in first-out principle, caching the currently received ending signal.
23. The apparatus of claim 1, wherein the data to be processed comprises objects to be processed and operational parameters.
24. The apparatus according to claim 23, wherein the object to be processed comprises any one of: images, audio, or text;
the operating parameter includes any one of: convolution kernels, pooling parameters, or activation functions.
25. The apparatus of claim 23, wherein the data loading module comprises an object loading unit and a parameter loading unit;
the control module is specifically configured to: in response to the completion of the execution of the ith control instruction by the object loading unit, sending the (i + 1) th control instruction to the object loading unit; and in response to the completion of the execution of the ith control instruction by the parameter loading unit, sending the (i + 1) th control instruction to the parameter loading unit;
the object loading unit is used for: responding to the ith control instruction, and loading an object to be processed corresponding to the ith control instruction;
the parameter loading unit is used for: and responding to the ith control instruction, and loading the operating parameters corresponding to the ith control instruction.
26. The apparatus of claim 1, wherein the processing module comprises a systolic array;
the processing module is specifically configured to: and responding to the ith control instruction, writing the data to be processed corresponding to the ith control instruction into a pulse array, and performing operation on the data to be processed through the pulse array to obtain the processing result.
27. The apparatus of claim 23, wherein the processing module comprises a systolic array;
the processing module is specifically configured to: and responding to the ith control instruction, respectively writing the object to be processed and the operation parameters corresponding to the ith control instruction into the pulse array, and performing operation on the object to be processed and the operation parameters through the pulse array to obtain the processing result.
28. The apparatus of claim 1,
the control module comprises an instruction slot for caching the control instruction; the instruction slot includes a set of instruction cache flags and control status signals;
wherein the instruction cache flag is used to indicate whether the instructions cached in the instruction slot are valid; the set of control state signals is used for representing the working state of the corresponding module or representing the working state of other instruction slots related to the instruction slot.
29. The data processing method is applied to a data processing device, wherein the data processing device comprises a data loading module and a processing module; the method comprises the following steps:
responding to a control instruction, and loading data through the data loading module to be used by the processing module for data processing; and the number of the first and second groups,
responding to the control instruction, and performing data processing through the processing module; the data loading module and the processing module execute different control instructions at the same time.
30. The method of claim 29, further comprising:
responding to the control instruction, and writing a processing result of the data to be processed into an external storage module; the data loading module, the processing module and the data writing back module execute different control instructions at the same time.
31. The method of claim 29, further comprising:
responding to the fact that the data loading module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the data loading module; responding to the fact that the processing module finishes executing the ith control instruction, and sending the (i + 1) th control instruction to the processing module; wherein i is an integer;
the responding to the control instruction, loading data through the data loading module for the processing module to perform data processing, including:
responding to the ith control instruction, and loading data to be processed corresponding to the ith control instruction;
the responding to the control instruction, the data processing is carried out by the processing module, and the processing comprises the following steps:
and responding to the ith control instruction, and processing the data to be processed corresponding to the ith control instruction to obtain a processing result.
32. The method of claim 31, wherein the device further comprises a data write back module;
the method further comprises the following steps:
responding to the completion of the execution of the ith control instruction by the data write-back module, and sending the (i + 1) th control instruction to the data write-back module;
and responding to the ith control instruction, and writing a processing result corresponding to the ith control instruction into an external storage module through the data write-back module.
33. The method of claim 31, further comprising:
when the processing module responds to the ith control instruction to process the data to be processed corresponding to the ith control instruction, the data loading module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the processing module.
34. The method of claim 32, further comprising:
when the data write-back module responds to the ith control instruction and writes the processing result corresponding to the ith control instruction into an external storage module, the processing module receives the (i + 1) th control instruction without waiting for the completion of the execution of the ith control instruction by the data write-back module.
35. The method of claim 29, further comprising:
and caching the control instruction through at least one instruction slot, and controlling the execution process of the control instruction according to the state of the control instruction recorded in the at least one instruction slot.
36. The method of claim 29, further comprising:
caching the control instruction through at least one instruction slot; the at least one instruction slot comprises at least one control instruction state signal, wherein the control instruction state signal is used for indicating whether the control instruction cached in the corresponding instruction slot is valid or not.
37. The method of claim 29, further comprising:
caching the control instruction through at least one instruction slot; the at least one instruction slot comprises at least one control instruction completion signal; the at least one control instruction completion signal is used to indicate whether a module corresponding to the at least one instruction slot completes an operation of the control instruction corresponding to the at least one instruction slot, or the at least one control instruction completion signal is used to indicate whether the at least one instruction slot completes an operation of the corresponding control instruction.
38. The method of claim 29, further comprising:
caching the control instruction through at least one instruction slot; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates the corresponding module to complete the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
39. The method of claim 29, further comprising:
caching the control instruction through at least one instruction slot; the instruction slot comprises a control instruction state signal and a control instruction completion signal; when the control instruction completion signal indicates that the at least one instruction slot completes the operation of the corresponding control instruction, the control instruction state signal indicates that the control instruction cached in the corresponding instruction slot is invalid.
40. The method of claim 31, further comprising:
caching the control instruction through at least one instruction slot; and the (i + 1) th control instruction sent to the data loading module, the processing module and the data writing-back module is acquired from the instruction slot.
41. The method of claim 31, further comprising:
and caching the control instruction corresponding to the data loading module through a second instruction slot, caching the control instruction corresponding to the processing module through a third instruction slot, and caching the control instruction corresponding to the data writing-back module through the third instruction slot.
42. The method of claim 32, further comprising:
in response to the ith control instruction in the second instruction slot being invalid, caching the (i + 1) th control instruction into the second instruction slot;
the step of sending the (i + 1) th control instruction to the data loading module in response to the data loading module finishing executing the ith control instruction comprises:
and sending the (i + 1) th control instruction in the second instruction slot to the data loading module in response to the completion of the execution of the ith control instruction by the data loading module.
43. The method of claim 42, further comprising:
in response to the ith control instruction in the third instruction slot being invalid, caching the (i + 1) th control instruction in the second instruction slot into the third instruction slot;
the step of sending the (i + 1) th control instruction to the processing module in response to the processing module finishing executing the ith control instruction comprises:
and sending the (i + 1) th control instruction in the third instruction slot to the processing module in response to the fact that the processing module finishes executing the ith control instruction.
44. The method of claim 43, further comprising:
in response to the ith control instruction in the fourth instruction slot being invalid, caching the (i + 1) th control instruction in the third instruction slot into the fourth instruction slot;
the response to the data write-back module finishing executing the ith control instruction, sending the (i + 1) th control instruction to the data write-back module, including:
and in response to the completion of the execution of the ith control instruction by the data write-back module, sending the (i + 1) th control instruction in the fourth instruction slot to the processing module.
45. The method of claim 42, further comprising:
decoding the ith control instruction; in response to invalidation of the decoded ith control instruction in the first instruction slot, caching the decoded (i + 1) th control instruction into the first instruction slot; and the number of the first and second groups,
in response to invalidation of the decoded ith control instruction in the second instruction slot, caching the decoded (i + 1) th control instruction in the first instruction slot into the second instruction slot.
46. The method of claim 30, wherein the control instructions comprise a result write-back control instruction and a result not write-back control instruction;
the data to be processed is one part of target data to be processed; the target data to be processed is divided into at least two parts;
the result non-write-back control instruction is used for indicating the processing module to cache the processing result;
the result write-back control instruction is used for instructing the processing module to send all processing results related to the target data to be processed to the data write-back module.
47. The method of claim 46, wherein said processing data by said processing module in response to said control instructions comprises:
processing corresponding data to be processed according to the result non-write-back control instruction to obtain a processing result and cache the processing result; and the number of the first and second groups,
and processing the corresponding data to be processed according to the result write-back control instruction to obtain a processing result, and integrating all processing results related to the target data to be processed and then sending the integrated processing result to the data write-back module.
48. The method of claim 47, further comprising:
if the control instruction is the result non-write-back control instruction, receiving an end signal sent by the processing module after the processing result is cached; and the number of the first and second groups,
if the control instruction is the result write-back control instruction, receiving an end signal sent by the data write-back module after the processing result is written in; wherein the end signal indicates that the result is not written back to the control instruction or that the execution of the result is completed in the data processing apparatus.
49. The method of claim 48, further comprising:
and returning an ending signal corresponding to the control instruction to an external control module according to the receiving sequence of the control instruction and a first-in first-out principle.
50. The method of claim 49, further comprising:
and if the currently received ending signal is determined not to be currently sent according to the receiving sequence of the control instruction and a first-in first-out principle, caching the currently received ending signal.
51. The method of claim 29, wherein the data to be processed comprises objects to be processed and operational parameters.
52. The method according to claim 51, wherein the object to be processed comprises any one of: images, audio, or text;
the operating parameter includes any one of: convolution kernels, pooling parameters, or activation functions.
53. The method of claim 51, wherein the data loading module comprises an object loading unit and a parameter loading unit;
the responding to the control instruction, loading data through the data loading module for the processing module to perform data processing, including:
responding to the ith control instruction, and loading the object to be processed corresponding to the ith control instruction through the object loading unit; and the number of the first and second groups,
and responding to the ith control instruction, and loading the operating parameters corresponding to the ith control instruction through the parameter loading unit.
54. The method of claim 29, wherein said processing data by said processing module in response to said control instructions comprises:
and responding to the ith control instruction, writing the data to be processed corresponding to the ith control instruction into a pulse array, and performing operation on the data to be processed through the pulse array to obtain a processing result.
55. The method of claim 51, wherein said processing data by said processing module in response to said control instructions comprises:
and responding to the ith control instruction, respectively writing the object to be processed and the operation parameters corresponding to the ith control instruction into the pulse array, and performing operation on the object to be processed and the operation parameters through the pulse array to obtain a processing result.
56. The apparatus of claim 29,
the control module comprises an instruction slot for caching the control instruction; the instruction slot includes a set of instruction cache flags and control status signals;
wherein the instruction cache flag is used to indicate whether the instructions cached in the instruction slot are valid; the set of control state signals is used for representing the working state of the corresponding module or representing the working state of other instruction slots related to the instruction slot.
57. An accelerator comprising a data processing apparatus as claimed in any one of claims 1 to 28.
CN202080004332.0A 2020-03-11 2020-03-11 Data processing apparatus, data processing method, and accelerator Pending CN112602094A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/078876 WO2021179224A1 (en) 2020-03-11 2020-03-11 Data processing device, data processing method and accelerator

Publications (1)

Publication Number Publication Date
CN112602094A true CN112602094A (en) 2021-04-02

Family

ID=75208096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080004332.0A Pending CN112602094A (en) 2020-03-11 2020-03-11 Data processing apparatus, data processing method, and accelerator

Country Status (2)

Country Link
CN (1) CN112602094A (en)
WO (1) WO2021179224A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6351802B1 (en) * 1999-12-03 2002-02-26 Intel Corporation Method and apparatus for constructing a pre-scheduled instruction cache
JP6467743B2 (en) * 2013-08-19 2019-02-13 シャンハイ シンハオ マイクロエレクトロニクス カンパニー リミテッド High performance processor system based on general purpose unit and its method
BR112019022916A2 (en) * 2017-05-17 2020-05-26 Google Llc LOW LATENCY MATRIX MULTIPLICATION UNIT
CN111095294A (en) * 2017-07-05 2020-05-01 深视有限公司 Depth vision processor
CN108475347A (en) * 2017-11-30 2018-08-31 深圳市大疆创新科技有限公司 Method, apparatus, accelerator, system and the movable equipment of Processing with Neural Network
US10963379B2 (en) * 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths

Also Published As

Publication number Publication date
WO2021179224A1 (en) 2021-09-16

Similar Documents

Publication Publication Date Title
EP3451162B1 (en) Device and method for use in executing matrix multiplication operations
CN111860812B (en) Apparatus and method for performing convolutional neural network training
CN109284825B (en) Apparatus and method for performing LSTM operations
CN107766079B (en) Processor and method for executing instructions on processor
US11144330B2 (en) Algorithm program loading method and related apparatus
CN112633505B (en) RISC-V based artificial intelligence reasoning method and system
WO2023082575A1 (en) Graph execution pipeline parallelism method and apparatus for neural network model computation
CN110991619A (en) Neural network processor, chip and electronic equipment
CN111651202A (en) Device for executing vector logic operation
US20180349058A1 (en) Buffer-based update of state data
CN111091181B (en) Convolution processing unit, neural network processor, electronic device and convolution operation method
Das et al. Enabling on-device smartphone GPU based training: Lessons learned
CN104011682B (en) The method and computer system of predictive treatment are carried out to application affairs response
US20140331021A1 (en) Memory control apparatus and method
US20210304010A1 (en) Neural network training under memory restraint
US11436486B2 (en) Neural network internal data fast access memory buffer
CN111667819B (en) Voice recognition method, system, storage medium and electronic equipment based on CRNN
CN112602094A (en) Data processing apparatus, data processing method, and accelerator
US7594080B2 (en) Temporary storage of memory line while waiting for cache eviction
CN111860772A (en) Device and method for executing artificial neural network posing operation
CN111506522A (en) Data processing apparatus and method
US20220067872A1 (en) Graphics processing unit including delegator and operating method thereof
CN112214443B (en) Secondary unloading device and method arranged in graphic processor
US11947487B2 (en) Enabling accelerated processing units to perform dataflow execution
US11468304B1 (en) Synchronizing operations in hardware accelerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination