CN113918507A - Method and device for adapting deep learning framework to AI acceleration chip - Google Patents

Method and device for adapting deep learning framework to AI acceleration chip Download PDF

Info

Publication number
CN113918507A
CN113918507A CN202111497148.7A CN202111497148A CN113918507A CN 113918507 A CN113918507 A CN 113918507A CN 202111497148 A CN202111497148 A CN 202111497148A CN 113918507 A CN113918507 A CN 113918507A
Authority
CN
China
Prior art keywords
chip
memory
deep learning
type
acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111497148.7A
Other languages
Chinese (zh)
Other versions
CN113918507B (en
Inventor
王拓
杨非
黄振华
鲍虎军
华炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111497148.7A priority Critical patent/CN113918507B/en
Publication of CN113918507A publication Critical patent/CN113918507A/en
Application granted granted Critical
Publication of CN113918507B publication Critical patent/CN113918507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/781On-chip cache; Off-chip memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7817Specially adapted for signal processing, e.g. Harvard architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Signal Processing (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a method and a device for adapting an AI acceleration chip by a deep learning framework, which are specifically divided into three stages: the method comprises the following steps of chip type definition, chip type registration and chip memory support, wherein the chip type definition is to define the type of a chip to be supported into a proto file in the form of an enumeration value, so that the chip type can be correctly identified in a framework; the chip type registration is to register the infrastructure required by the chip into the hash table, so that the frame can conveniently find corresponding contents according to the chip type when required; the chip memory support is to put the relevant operations of the chip memory into a frame, so that the frame can carry out unified management on the memory space of the chip. The invention simplifies the work of the deep learning framework adaptive AI acceleration chip.

Description

Method and device for adapting deep learning framework to AI acceleration chip
Technical Field
The invention belongs to the field of deep learning basic software, and relates to a method and a device for adapting an AI acceleration chip to a deep learning framework.
Background
The deep learning framework is an operating system in the field of artificial intelligence, and helps a user conveniently realize various deep learning algorithms through five core components such as tensors, tensor-based operations (Op), computation graphs, automatic differentiation tools and hardware expansion packages (such as cublas and cudnn), so that computing resources of bottom hardware are fully released.
AI acceleration chips are also known as AI accelerators or computing cards, i.e., hardware dedicated to handling the vast amount of computing tasks in artificial intelligence applications. Different from the traditional chip, the AI chip has larger scale, more complex structure and stronger computing capability, and provides powerful support for computing power.
The varieties of the current AI accelerating chips are increasing day by day, and the chips are in a state of all flowers. The deep learning frame bottom layer is compatible with more types of AI accelerators, so that the compatibility of the frame can be improved, the most appropriate hardware is selected according to different application scenes, and the computational resources of the hardware are fully released. However, since the hardware structure of each AI acceleration chip is different and the operation mode is different, the whole process is performed from the beginning to the end and a great deal of repetitive work is performed when each hardware is supported in the deep learning framework.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a method and a device for adapting an AI acceleration chip by a deep learning framework, which simplify the work of the AI acceleration chip by the deep learning framework through three main steps of chip type definition, chip type registration and chip memory support, and the specific technical scheme is as follows:
a method for adapting an AI acceleration chip by a deep learning framework mainly comprises three stages:
a chip type definition phase, which defines the AI acceleration chip types to be supported in a customized or written file based on a certain data transmission format, such as Protobuf, wherein the AI acceleration chip types comprise enumeration types, the method is used for distinguishing different types of chips in the deep learning framework, so that the deep learning framework carries out corresponding processing according to different enumeration type values, the context manager, the device thread, the stream index generator, the computing core Kernel and other infrastructures in the deep learning framework are strongly bound with the chip types, because different chips operate differently, the above infrastructure implementations are also different, taking the implementation of the compute Kernel as an example, an OpenBLAS library may be used on the CPU, a cuBLAS library may be used on the GPU, the cnrt library and the cnnl library are used in the Membranan MLU, and through a chip type registration stage, a deep learning framework can automatically select a corresponding mode according to the chip type to realize the calculation of Kernel;
in the chip type registration stage, chip types, context managers, device threads, flow index generators and computing core Kernels related to the AI accelerating chip are registered in respective hash tables, and a registration mechanism based on a singleton mode is adopted to enable the chip types to be mapped to the context managers, the device threads, the flow index generators and the computing core Kernels one by one, so that a frame can conveniently find corresponding contents according to the chip types when needed, wherein the corresponding contents comprise the context managers, the device threads, the flow index generators and the computing core Kernels corresponding to the corresponding chip types;
in the chip memory support stage, operations related to the AI accelerating chip memory are put into a deep learning frame, so that the frame can uniformly manage the memory space of the chip.
Preferably, in the chip type definition stage, the type of the AI acceleration chip is added to a data structure related to the chip type definition.
Preferably, the key value of the hash table is a chip type to be registered, and the value is a processing function corresponding to the chip, so as to complete operations of creating various handles, calculating on a chip, and managing a memory;
the device context manager is a method capable of generating a chip operation handle, and the device context manager is registered in a hash table in a registration process and corresponds to chip types one by one, specifically: creating a context handle for generating various handles in the chip computing process, and then creating a device context manager for providing various handles to an external caller by calling the context handle and performing device synchronization operation, wherein the handle comprises: a flow handle and a chip operation handle;
the stream index generator is used for generating corresponding stream index numbers for different operations, the type registration is also used for creating a hash table, the key value is a chip type, and the value is a corresponding stream index generator.
Preferably, the device thread registration is performed, the process creates a device thread related to the chip type, and is used for creating a thread for starting the on-chip computing process, and after the device thread creation is completed, the device thread is registered in the hash table of the device thread to complete the one-to-one correspondence between the threads and the chip types.
Preferably, the Kernel registration is to first implement a computation logic inside the Kernel, then use a bituple formed by two items of a chip type and a data type as a key value, and register the Kernel as a value in a hash table related to the Kernel.
Preferably, in the chip memory support stage, based on different types of data transmission formats, the memory type of the AI acceleration chip to be supported is defined, and the on-chip memory is allocated, released, spatially partitioned, and data copied.
Preferably, the memory type of the AI accelerator chip that needs to be supported is defined to distinguish the memory types of different chips inside the frame, and the memory type of the AI accelerator chip is added to a data structure related to the memory type.
Preferably, the unified management of the storage space of the chip includes memory allocation, where the memory allocation includes: uniformly allocating a first storage space to the storage space of the chip, dividing the first storage space into sections with different space sizes according to different modules such as convolution, pooling and the like and corresponding to the required space sizes respectively, wherein the dividing method adopts a mode of adding an offset to an initial address or a mode of calling a specific API (application program interface) provided by the chip; the memory release is used for unified memory release.
Preferably, the unified management of the storage space of the chip includes data copying of an on-chip internal memory, and the AI acceleration chip copies data with the host before starting the calculation and after finishing the calculation, specifically: copying data from a host memory to an on-chip memory during the process of using the AI accelerating chip, copying the result back to the host memory after completing the calculation, copying the data between the host and the chip and between the chip and the chip during the calculation, and processing the data according to different conditions of the source and the destination of the data copy during the memory copy
The device for the deep learning framework to adapt the AI acceleration chip comprises one or more processors and is used for realizing the method for the deep learning framework to adapt the AI acceleration chip.
The invention has the beneficial effects that:
the invention simplifies the work of the deep learning framework adaptive AI acceleration chip.
Drawings
FIG. 1 is a schematic diagram illustrating an overall process of a deep learning framework adaptive AI acceleration chip according to the present invention;
FIG. 2 is a diagram illustrating deep learning framework adapted AI acceleration chip device manager and device thread registration in accordance with the present invention;
FIG. 3 is a schematic diagram of the infrastructure to be registered in the deep learning framework adapted AI acceleration chip registration step according to the present invention;
FIG. 4 is a flow chart illustrating the deep learning framework adapted AI to accelerate the allocation and release of space within the chip in accordance with the present invention;
fig. 5 is a block diagram of an apparatus for adapting an AI acceleration chip to a deep learning framework according to the present invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.
As shown in fig. 1, the method for adapting the AI acceleration chip by the deep learning framework of the present invention mainly includes three stages: chip type definition, chip type registration and chip memory support. The chip type definition is that the type of a chip needing to be supported is written into a related data structure of a proto file in an enumeration value form, so that the chip type can be correctly identified in a framework, the Protobuf is used as a serialization tool to explain the whole process, the tools of the same type also include JSON, Hessian and the like, but the current mainstream deep learning framework is the Protobuf used; (ii) a The chip type registration is to register the infrastructure required by the chip into the hash table, so that the frame can conveniently find corresponding contents according to the chip type when required; the chip memory support is to put the relevant operations of the chip memory into a frame, so that the frame can carry out unified management on the memory space of the chip.
The chip type definition phase is based on the fact that the chip type definition phase is based on a certain data transmission format, such as Protobuf, defining, in a custom or written file, AI acceleration chip types to be supported, the AI acceleration chip types including enumerated types, the method is used for distinguishing different types of chips in the deep learning framework, so that the deep learning framework carries out corresponding processing according to different enumeration type values, the context manager, the device thread, the stream index generator, the computing core Kernel and other infrastructures in the deep learning framework are strongly bound with the chip types, because different chips operate differently, the above infrastructure implementations are also different, taking the implementation of the compute Kernel as an example, an OpenBLAS library may be used on the CPU, a cuBLAS library may be used on the GPU, the cnrt library and the cnnl library are used in the Membranan MLU, and through a chip type registration stage, the deep learning framework can automatically select a corresponding mode according to the chip type to realize the Kernel calculation.
As shown in fig. 3, the chip registration includes registration of several infrastructures, such as a chip type, a context manager, a device thread, a stream index generator, and a compute core Kernel, and the device context manager, the device thread, the stream index generator, and the compute core Kernel, which correspond to the chip type and the chip, are registered in respective hash tables by using a registration mechanism based on a singleton pattern, so that the chip type and the device context manager, the device thread, the stream index generator, and the compute core Kernel, which correspond to the chip, are mapped one by one. The key value of the hash table is a chip type to be registered, and is an enumeration type essentially, the value is a processing function corresponding to the chip, and completes various operations such as creation of handles, on-chip computation, memory management and the like, the operation of the AI acceleration chip often needs to use a handle, the handle is essentially a pointer, and points to resources needed for completing the operation, for example, a multi-stream on a GPU uses a stream handle cudasstream _ t, a cublas library uses a cubesandle _ t, and a cudnn library uses a cudnandle _ t; the topscoxt _ t handle is used on the original DTU chip. The value of the hash table is a class type and comprises various member functions, wherein the creation of a handle is realized by the member functions; after the registration is finished, the hash table can be inquired by using the chip type, so that the corresponding processing function can be conveniently found. Specifically, the registration process herein may use four hash tables, key values of the four hash tables are all chip types, such as CPU, GPU, MLU, and the like, value values of the hash tables are the above several infrastructures, value of the first table is a context manager, value of the second table is a device thread, value of the third table is a stream index generator, value of the fourth table is a compute Kernel, as shown in tables 1 to 4 below;
Key Value
CPU context manager for CPU
GPU Context manager for GPU
MLU Context manager for MLU
Table 1 context manager registry
Key Value
CPU Device thread for CPU
GPU Device thread for GPU
MLU Device threads for MLUs
Table 2 device thread registry
Key Value
CPU Stream index generator for CPU
GPU Stream index generator of GPU
MLU Stream index generator of MLU
Table 3 stream index generator registry
Key Value
CPU Kernel of computation core on CPU
GPU Compute Kernel on GPU
MLU Compute Kernel on MLU
Table 4 compute Kernel registry
As shown in fig. 2, the registration process of two main infrastructures, a device context manager and a device thread, is shown, and a method for generating a chip operation handle is provided in the device context manager. The registration process is to put the chip type as a key value and the device context manager as a pair of key-value pairs of the value into a hash table, and conveniently query the corresponding method through the chip type and the hash table when in later use, thereby obtaining the corresponding handle. The device thread includes a method for creating a thread associated with the chip type, and the registration process is identical to the registration of the device context manager, except that it requires another hash table.
In the chip memory support step, unified management, namely unified allocation and release, is performed on the on-chip memory in the training process, a large storage space which is uniformly allocated is cut into small sections for different purposes, and the AI accelerating chip needs to copy data with the host before starting calculation and after finishing calculation, so that methods such as memory space allocation, release, segmentation, data copy and the like for the current chip need to be realized.
As shown in fig. 4, which is a flowchart of on-chip memory allocation, space partitioning, and release, the storage space is uniformly allocated and managed in the program running process. Specifically, the program first determines whether the required space is on the chip, if not, the program does not perform processing, if the required space is on the chip, the required space is allocated according to a pre-calculated result, which is usually a large space, and then the space is divided according to the requirements of different parts, and the dividing manner includes two manners: 1) the base address plus offset; 2) the specific API of the chip is called, and the specific splitting mode needs to be determined according to the specific situation of the chip. After the memory space is well allocated, subsequent training tasks can be carried out, after the training is finished, the memory space needs to be released before the program exits, the program firstly judges whether the space needing to be released is on the chip, if so, the space is released, otherwise, the program directly exits.
Corresponding to the embodiment of the method for adapting the AI acceleration chip by the deep learning framework, the invention also provides an embodiment of a device for adapting the AI acceleration chip by the deep learning framework.
Referring to fig. 5, an apparatus for adapting an AI acceleration chip by a deep learning framework according to an embodiment of the present invention includes one or more processors, and is configured to implement the method for adapting an AI acceleration chip by a deep learning framework in the foregoing embodiment.
The embodiment of the apparatus for adapting the AI acceleration chip in the deep learning framework of the present invention can be applied to any device with data processing capability, such as a computer or other devices or apparatuses. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of any device with data processing capability where the apparatus for adapting an AI acceleration chip to a deep learning frame of the present invention is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 5, in the embodiment, any device with data processing capability where the apparatus is located may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements the method for adapting the deep learning framework to the AI acceleration chip in the foregoing embodiments.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The basic principles of the present disclosure have been described in connection with specific embodiments, but it should be noted that it will be understood by those skilled in the art that all or any of the steps or components of the method and apparatus of the present disclosure may be implemented in any computing device, including processors, storage media, etc., or network of computing devices, in hardware, firmware, software, or a combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present disclosure.
Thus, the objects of the present disclosure may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. Thus, the object of the present disclosure can also be achieved merely by providing a program product containing program code for implementing the method or apparatus. That is, such a program product also constitutes the present disclosure, and a storage medium storing such a program product also constitutes the present disclosure. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future.
It is also noted that in the apparatus and methods of the present disclosure, it is apparent that individual components or steps may be disassembled and/or re-assembled. These decompositions and/or recombinations are to be considered equivalents of the present disclosure. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
The above detailed description should not be construed as limiting the scope of the disclosure. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (10)

1. A method for adapting an AI acceleration chip by a deep learning framework is characterized by comprising three stages:
a chip type definition stage, defining an AI acceleration chip type to be supported by a self-defined or written file based on different types of data transmission formats, wherein the AI acceleration chip type comprises enumeration types and is used for distinguishing different types of chips in a deep learning frame, so that the deep learning frame performs corresponding processing according to different enumeration type values;
in the chip type registration stage, chip types, device context managers, device threads, flow index generators and computing core Kernels related to the AI accelerating chip are registered in respective hash tables, and a registration mechanism based on a singleton mode is adopted to enable the chip types to be mapped to the context managers, the device threads, the flow index generators and the computing core Kernels one by one;
in the chip memory support stage, operations related to the AI accelerating chip memory are put into a deep learning frame, so that the frame can uniformly manage the memory space of the chip.
2. The method for adapting the AI acceleration chip through the deep learning framework of claim 1, wherein the chip type definition stage adds the type of the AI acceleration chip to a data structure associated with the chip type definition.
3. The method for adapting to the AI acceleration chip through the deep learning framework as claimed in claim 1, wherein the key value of the hash table is a chip type to be registered, and the value is a processing function corresponding to the chip, so as to complete operations of creating various handles, calculating on chip and managing memory;
the device context manager is a method capable of generating a chip operation handle, and in the registration process, the device context manager is registered in a hash table to be registered in the device context manager and corresponds to chip types one by one, specifically: creating a context handle, and creating a device context manager by calling the context handle, wherein the device context manager is used for providing various handles to an external caller and performing device synchronization operation, and the handle comprises: a flow handle and a chip operation handle;
the stream index generator is used for generating corresponding stream index numbers for different operations, the type registration is also used for creating a hash table, the key value is a chip type, and the value is a corresponding stream index generator.
4. The method for adapting the AI acceleration chip of the deep learning framework of claim 1, characterized in that the device thread is registered, the process creates a device thread related to a chip type for creating a thread for starting an on-chip computation process, and the device thread is registered in its hash table after creation to complete one-to-one correspondence between the thread and the chip type.
5. The method for adapting to the AI acceleration chip through the deep learning framework as claimed in claim 1, wherein the computing Kernel registration is to first implement a computing logic inside the computing Kernel, then use a binary group composed of a chip type and a data type as a key value, and register the computing Kernel as a value in a hash table related to the computing Kernel.
6. The method for adapting to the AI acceleration chip through the deep learning framework as claimed in claim 1, wherein in the chip memory support phase, memory types of the AI acceleration chip to be supported are defined based on different types of data transmission formats, and the operations of allocating, releasing, space partitioning, and data copying are performed on the on-chip memory.
7. The method as claimed in claim 6, wherein the memory type of the AI accelerator chip to be supported is defined to distinguish the memory types of different chips within the frame, and the memory type of the AI accelerator chip is added to the data structure related to the memory type.
8. The method for adapting the AI acceleration chip through the deep learning framework of claim 6, wherein the unified management of the memory space of the chip comprises memory allocation, and the memory allocation comprises: uniformly distributing a first storage space in the storage space of the chip, dividing the first storage space into sections with different space sizes according to the required space sizes respectively corresponding to different modules, wherein the dividing method adopts a mode of adding an offset into an initial address or a mode of calling a specific API (application program interface) provided by the chip; the memory release is used for unified memory release.
9. The method according to claim 6, wherein the unified management of the memory space of the chip includes data copying for an on-chip memory, and the data copying is performed between the AI acceleration chip and the host before the start of the computation and after the completion of the computation, specifically: copying data from a host memory to an on-chip memory in the process of using the AI accelerating chip, copying a result back to the host memory after completing calculation, copying data between the host and the chip and between the chip and the chip in the calculation process, and processing according to different conditions of a source and a destination of data copying in memory copying.
10. An apparatus for a deep learning framework adapted AI acceleration chip, comprising one or more processors configured to implement the method for a deep learning framework adapted AI acceleration chip of any of claims 1-9.
CN202111497148.7A 2021-12-09 2021-12-09 Method and device for adapting deep learning framework to AI acceleration chip Active CN113918507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111497148.7A CN113918507B (en) 2021-12-09 2021-12-09 Method and device for adapting deep learning framework to AI acceleration chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111497148.7A CN113918507B (en) 2021-12-09 2021-12-09 Method and device for adapting deep learning framework to AI acceleration chip

Publications (2)

Publication Number Publication Date
CN113918507A true CN113918507A (en) 2022-01-11
CN113918507B CN113918507B (en) 2022-04-08

Family

ID=79248860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111497148.7A Active CN113918507B (en) 2021-12-09 2021-12-09 Method and device for adapting deep learning framework to AI acceleration chip

Country Status (1)

Country Link
CN (1) CN113918507B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167437A (en) * 2023-04-18 2023-05-26 之江实验室 Chip management system, method, device and storage medium
CN116185371A (en) * 2023-04-24 2023-05-30 北京大学 Hardware device registration method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388532A (en) * 2018-03-13 2018-08-10 算丰科技(北京)有限公司 The AI operations that configurable hardware calculates power accelerate board and its processing method, server
US20180341852A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation Balancing memory consumption of multiple graphics processing units in deep learning
CN110955530A (en) * 2020-02-25 2020-04-03 深圳鲲云信息科技有限公司 Deep learning engine parallel processing data method, device, equipment and storage medium
CN111400021A (en) * 2019-01-02 2020-07-10 ***通信有限公司研究院 Deep learning method, device and system
CN112232497A (en) * 2020-10-12 2021-01-15 苏州浪潮智能科技有限公司 Method, system, device and medium for compiling AI chip

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341852A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation Balancing memory consumption of multiple graphics processing units in deep learning
CN108388532A (en) * 2018-03-13 2018-08-10 算丰科技(北京)有限公司 The AI operations that configurable hardware calculates power accelerate board and its processing method, server
CN111400021A (en) * 2019-01-02 2020-07-10 ***通信有限公司研究院 Deep learning method, device and system
CN110955530A (en) * 2020-02-25 2020-04-03 深圳鲲云信息科技有限公司 Deep learning engine parallel processing data method, device, equipment and storage medium
CN112232497A (en) * 2020-10-12 2021-01-15 苏州浪潮智能科技有限公司 Method, system, device and medium for compiling AI chip

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116167437A (en) * 2023-04-18 2023-05-26 之江实验室 Chip management system, method, device and storage medium
CN116185371A (en) * 2023-04-24 2023-05-30 北京大学 Hardware device registration method, device, equipment and storage medium
CN116185371B (en) * 2023-04-24 2023-09-19 北京大学 Hardware device registration method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113918507B (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN113918507B (en) Method and device for adapting deep learning framework to AI acceleration chip
US20200202246A1 (en) Distributed computing system, and data transmission method and apparatus in distributed computing system
CN105808328B (en) The methods, devices and systems of task schedule
CN110032369A (en) A kind of code automatic generation method, device and medium
CN111488205B (en) Scheduling method and scheduling system for heterogeneous hardware architecture
CN114741207B (en) GPU resource scheduling method and system based on multi-dimensional combination parallelism
CN104516769B (en) For the method for the switching between verifying logic zone configuration, medium and system
CN107113341A (en) The system of the high-throughput processing of affairs in the Distributed Relation Database Management System divided for data
CN106095563B (en) Flexible physical function and virtual function mapping
US8626799B2 (en) Mapping data structures
CN111338695A (en) Data processing method based on pipeline technology and related product
CN109471725A (en) Resource allocation methods, device and server
CN108513658A (en) A kind of transaction methods and device
CN107133243A (en) A kind of data processing method and server
CN113010286A (en) Parallel task scheduling method and device, computer equipment and storage medium
JP2020194522A (en) Method, apparatus, device, and medium for processing data
CN105335135B (en) Data processing method and central node
CN110515734A (en) The load processing method and device of data processing task
KR20150117522A (en) Graphics state manage apparatus and method
CN108984105B (en) Method and device for distributing replication tasks in network storage device
CN114493980A (en) Kernel function transmission method, device and equipment
CN114691566A (en) AI model operation method, loading method and device and IC chip
GB2601354A (en) Apparatus and method
CN115374024A (en) Memory data sorting method and related equipment
CN115577760B (en) Data processing method, system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant