CN116107669A - Operator registration method, device and equipment of deep learning framework and storage medium - Google Patents

Operator registration method, device and equipment of deep learning framework and storage medium Download PDF

Info

Publication number
CN116107669A
CN116107669A CN202310400779.5A CN202310400779A CN116107669A CN 116107669 A CN116107669 A CN 116107669A CN 202310400779 A CN202310400779 A CN 202310400779A CN 116107669 A CN116107669 A CN 116107669A
Authority
CN
China
Prior art keywords
operator
function
kernel function
deep learning
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310400779.5A
Other languages
Chinese (zh)
Other versions
CN116107669B (en
Inventor
曾炜
陈建平
袁孝宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202310400779.5A priority Critical patent/CN116107669B/en
Publication of CN116107669A publication Critical patent/CN116107669A/en
Application granted granted Critical
Publication of CN116107669B publication Critical patent/CN116107669B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses an operator registration method, device, equipment and storage medium of a deep learning framework, wherein the method comprises the following steps: acquiring information of an operator to be registered, defining an operator kernel function statement according to the information of the operator to be registered, wherein the operator kernel function statement comprises a preset shape output function; calling a shape output function preset in an operator kernel function statement to obtain an output shape value, defining an operator kernel function according to the shape value, wherein the operator kernel function is a computer function; constructing a multifunctional registration macro of the operator kernel function, and registering the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro. According to the operator registration method, an operator kernel function registration interface is designed, operators of different heterogeneous hardware can be easily registered into the deep learning framework through the interface, meanwhile, the code development amount of a developer in the process that the operators are accessed into the deep learning framework is reduced, and the grasping degree of the developer on the structure of the deep learning framework is reduced.

Description

Operator registration method, device and equipment of deep learning framework and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for operator registration of a deep learning framework.
Background
The operator is a calculation unit for constructing an artificial intelligence application algorithm, is encapsulated by related hardware, and can be called by the artificial intelligence application and obtain a return result. The operator interface defines the calling form and the function description of the operator, but different hardware has different supporting capacities on the operator, and the difference is mainly reflected in the calculation efficiency and the calculation precision. The operator implementation forms of the same function are also completely different under different hardware (GPU, NPU, XPU, etc.), so that one deep learning computing framework (such as Tensorflow, pytorch, paddlePaddle, etc.) can support different hardware operators, and from the development point of view, the operators with the same function need to be implemented from code for different hardware to access the deep learning framework. Thus, the AI algorithm developer has much more development workload and needs to learn the operator writing specification of the framework.
Currently, deep learning computing frameworks more or less provide some modes of custom operators, such as an OpKernel base class access mode of TensorFlow, a FuncBase base class access mode of Pytorch and the like, and each deep learning framework defines some basic classes and interfaces to support user operator access. An operator supporting a certain hardware, if it is intended to be used in a selected deep learning computing framework, requires either that a hardware vendor provide an existing operator library supporting the deep learning computing framework, or that an operator code supporting the deep learning computing framework be written by itself.
The use of each operator in the deep learning computing framework basically requires the processes of operator declaration, kernel registration, operator loading and the like. Every time a hardware structure is added, the operator execution flow needs to be redeveloped for the hardware. Operators used in the deep learning network model are numerous, and if the process is newly added to each new hardware architecture, the engineering quantity is huge. In addition, when a developer needs to add an operator and expand the operator to all the new hardware which is already adapted, kernel functions are needed to be realized according to relevant hardware characteristics, and high requirements are placed on hardware knowledge and development capability of the developer. The AI developer thus severely affects the efficiency of operator use.
Disclosure of Invention
The embodiment of the application provides an operator registration method, device and equipment of a deep learning framework and a storage medium. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides an operator registration method of a deep learning framework, including:
acquiring information of an operator to be registered, and defining an operator kernel function statement according to the information of the operator to be registered, wherein the operator kernel function statement comprises a preset shape output function;
invoking a shape output function preset in the operator kernel function statement to obtain an output shape value, and defining the operator kernel function according to the shape value, wherein the operator kernel function is a computer function;
constructing a multifunctional registration macro of the operator kernel function, and registering the operator kernel function to be registered to a deep learning framework according to the multifunctional registration macro.
In an alternative embodiment, defining an operator kernel function declaration based on the information of the operator to be registered includes:
acquiring names, input tensors, output tensors and parameters of operators to be registered;
the tensor shape variable is functionalized to obtain a created shape output function;
and defining an operator kernel function statement according to the operator name, the input tensor, the output tensor, the parameters and the shape output function, wherein the operator kernel function statement comprises the operator name, the input tensor, the output tensor, the parameters and the shape output function.
In an alternative embodiment, before defining the operator kernel function according to the shape value, further comprises:
acquiring auxiliary functions needed in the operator kernel function module;
and constructing a global static function based on the auxiliary function, and automatically executing the global static function by a framework.
In an alternative embodiment, defining the operator kernel function from the shape value includes:
defining a computer function, and realizing operator kernel calculation according to the computer function;
and calling the shape value output by the shape output function through a GetOutShape function in the computer calculation flow.
In an alternative embodiment, constructing a multi-functional registration macro of the operator kernel function includes:
constructing a registration macro, and adding a preset initialization function and a data cleaning function into the registration macro to obtain the multifunctional registration macro;
wherein the multifunctional registration macro is in the form of REG_KERNEL.
In an alternative embodiment, registering the operator kernel function to be registered to the deep learning framework according to the multi-functional registration macro includes:
invoking the multifunctional registration macro;
and executing data initialization and data release operation according to the multifunctional registration macro, and registering the operator kernel function to be registered to the deep learning framework.
In an optional embodiment, after registering the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro, the method further includes:
compiling the registered operator kernel function to a corresponding operator dynamic library by updating the CMake file of the configuration framework; or alternatively, the first and second heat exchangers may be,
compiling the registered operator kernel function to a corresponding operator dynamic library through updating the GCC command of the configuration framework; or alternatively, the first and second heat exchangers may be,
compiling the registered operator kernel functions to the corresponding operator dynamic libraries by updating the GXX command of the configuration framework.
In a second aspect, an embodiment of the present application provides an operator registration apparatus of a deep learning framework, including:
an operator declaration module, configured to obtain information of an operator to be registered, define an operator kernel function declaration according to the information of the operator to be registered, where the operator kernel function declaration includes a preset shape output function;
the operator definition module is used for calling a shape output function preset in the operator kernel function statement to obtain an output shape value, and defining the operator kernel function according to the shape value, wherein the operator kernel function is a computer function;
the operator registration module is used for constructing a multifunctional registration macro of the operator kernel function and registering the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory storing program instructions, where the processor is configured to execute, when executing the program instructions, an operator registration method of a deep learning framework provided in the foregoing embodiment.
In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon computer readable instructions that are executed by a processor to implement an operator registration method of a deep learning framework provided by the above embodiments.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
according to the operator registration method of the deep learning framework, a set of standard operator kernel function registration interfaces is provided, operators of different heterogeneous hardware can be easily registered into the deep learning framework through the registration interfaces, and therefore a deep learning user can conveniently use different hardware resources in the deep learning framework.
According to the registration method, all operations of the operator can be completed by only realizing one computer function by the defined operator kernel function, other auxiliary functions are not required to be defined, an output shape value is not required to be obtained through derivation in a computer calculation flow, and the output shape value can be directly obtained in the following kernel definition through shape functionalization, so that the calculation amount is greatly reduced. The kernel function is not required to be realized according to the related hardware characteristics, so that the code development amount of a developer accessing an operator into the deep learning framework is reduced, and the grasping degree of the developer on the self structure of the deep learning framework is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic illustration of an application environment for an operator registration method for a deep learning framework, according to an exemplary embodiment;
FIG. 2 is a schematic diagram of an application environment for an operator registration method for a deep learning framework, according to an example embodiment;
FIG. 3 is a flow diagram illustrating an operator registration method for a deep learning framework, according to an example embodiment;
FIG. 4 is a schematic diagram illustrating an operator registration method for a deep learning framework, according to an example embodiment;
FIG. 5 is a schematic diagram of an operator registration apparatus of a deep learning framework, according to an example embodiment;
fig. 6 is a schematic diagram of an electronic device according to an exemplary embodiment.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of systems and methods that are consistent with aspects of the invention as detailed in the accompanying claims.
According to an aspect of the embodiment of the present invention, there is provided an operator registration method of a deep learning framework, as an alternative implementation manner, the operator registration method of the deep learning framework may be, but is not limited to, applied to an application environment as shown in fig. 1. The application environment comprises the following steps: a terminal device 102, a network 104 and a server 106 which interact with a user in a man-machine manner. Human-machine interaction can be performed between the user 108 and the terminal device 102, and an operator registration program of the deep learning framework is operated in the terminal device 102. The terminal device 102 includes a man-machine interaction screen 1022, a processor 1024 and a memory 1026. The man-machine interaction screen 1022 is used for displaying a deep learning framework of the artificial intelligence application algorithm; processor 1024 is configured to obtain operators to be registered. The memory 1026 is used for storing the operator to be registered.
In addition, the server 106 includes a database 1062 and a processing engine 1064, where the database 1062 is used to store the above operators to be registered. The processing engine 1064 is configured to: acquiring information of an operator to be registered, defining an operator kernel function statement according to the information of the operator to be registered, wherein the operator kernel function statement comprises a preset shape output function; calling a shape output function preset in an operator kernel function statement to obtain an output shape value, defining an operator kernel function according to the shape value, wherein the operator kernel function is a computer function; constructing a multifunctional registration macro of the operator kernel function, and registering the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro.
In one or more embodiments, the operator registration method of the deep learning framework described above may be applied in the application environment shown in fig. 2. As shown in fig. 2, a human-machine interaction may be performed between a user 202 and a user device 204. The user device 204 includes a memory 206 and a processor 208. The user equipment 204 in this embodiment may, but is not limited to, refer to performing the operations performed by the terminal equipment 102 described above.
Optionally, the terminal device 102 and the user device 204 include, but are not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, a vehicle-mounted electronic device, a wearable device, and the like, and the network 104 may include, but is not limited to, a wireless network or a wired network. Wherein the wireless network comprises: WIFI and other networks that enable wireless communications. The wired network may include, but is not limited to: wide area network, metropolitan area network, local area network. The server 106 may include, but is not limited to, any hardware device that may perform calculations. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and is not limited in any way in the present embodiment.
The main objective of the application is to realize standardization of operator registration access deep learning framework, and simultaneously reduce code development amount of AI developers accessing the deep learning framework at hardware operators to the maximum extent, reduce grasping degree of the developers on self structure of the deep learning framework, and facilitate the developers to quickly and conveniently register operators of different hardware into the deep learning framework for use in the deep learning model.
The operator registration method of the deep learning framework according to the embodiment of the present application is described in detail below with reference to fig. 3, and as shown in fig. 3, the method mainly includes the following steps:
s301, obtaining information of an operator to be registered, defining an operator kernel function statement according to the information of the operator to be registered, wherein the operator kernel function statement comprises a preset shape output function.
The registration of the operator kernel function mainly comprises the following parts: checking whether the kernel function meets the requirements of a framework operator, supporting registration of operators to the framework, including operator names, hardware types, tensor layouts, data types, and the operator kernel function, and registering the kernel function.
In an alternative embodiment, defining an operator kernel function declaration based on information of an operator to be registered includes:
the name OpName, input tensor input, output tensor output and parameter attr of the operator to be registered are obtained.
To further reduce the amount of subsequent computation, the present application predefines a shape output function shapefunction in the operator declaration. shape is a definition, namely the shape of a tensor, is a basic concept introduced into operators, and assists in operator calculation, and contains a large number of calculations, so that the function of assisting the operator calculation can be well achieved, and in the prior art, shape value output is derived through certain calculation. The calculated amount is large. The shape of the tensor is functionalized, and for the existing operator, the shape value can be directly obtained without deducing once again, so that the calculation amount and the complexity are increased. Thus, the scheme of the present application pre-constructs the shape output function ShapeFunc.
Further, an operator kernel function declaration is defined from the operator name, input tensor, output tensor, parameter, and shape output function, the operator kernel function declaration including the operator name, input tensor, output tensor, parameter, and shape output function.
Specifically, the operator kernel function declaration mainly includes the following parts:
the kernel function name defines the operator function name currently registered; the kernel function inputs and outputs, and the parameters define the input tensor, the output tensor and the parameter information of the operator function, wherein the tensor can be float, a tensor type and the like. The Shape output function defines a function for the operator kernel to obtain the output Shape. In one possible implementation, the operator kernel function declaration is specifically as follows:
REGISTER_KERNEL_OP(“FuncName”)
.INPUT(“in1 float32”)
.OUTPUT(“out1 float32”)
.ATTR(“param1”)
.ATTR(“param2”)
.SHAPEFUNC([](DeviceContext *c){})
in the operator kernel function declaration of the embodiment of the present application, the INPUT tensor INPUT, the OUTPUT tensor OUTPUT, and the parameter ATTR may all have multiple definitions. The output Shape value can be directly obtained in the following kernel definition through the shaefunc Shape output function without further derivation.
S302, calling a shape output function preset in an operator kernel function statement to obtain an output shape value, defining an operator kernel function according to the shape value, wherein the operator kernel function is a computer function.
In one embodiment, defining an operator kernel function from shape values includes:
when the operator kernel function is defined, only one computer function is defined, and the computer function is the calculation logic of how the operator obtains input and how output, namely how the operator obtains input and how output are calculated, and the operator kernel calculation is realized according to the computer function.
When defining the computer function, the computer function applicable to the deep learning framework needs to be generated according to the base class defined by the deep learning framework and inherited from the deep learning framework kernel base class. The embodiment of the application can be compatible with some interfaces of the existing framework, such as kernel base class and computing function, so that the existing framework can be conveniently and quickly migrated. If the migration is to other frameworks, calling the base class interfaces of the other frameworks, and rapidly generating the computer function.
In the embodiment of the application, the shape value output by the shape output function can be quickly called through the GetOutShape function in the calculation flow of computer. Avoiding the need to obtain the output shape value by derivation in the computer computation flow.
Specifically, the operator kernel function defines the implementation of the operator kernel, and most importantly, the computer function of the kernel is implemented, and the operator kernel is defined as follows:
class FuncOp : extends BaseOpKernel {
void Op_Compute(ExcutionContext *context)
{
input Tensor
Tensor *tensor1 = const_cast<Tensor*>(&context->input(0))
CHECK_VALID(tensor1);
Output Shape is/obtained
TensorShape out_shape = GetOutShape(context, “Y”)
Output Tensor
Tensor *out = Allocate(context, “out”, out_shape.data(), out_shape.size());
CHECK_VALID(out);
Operator computation logic
}
}
The operator kernel definition method provided by the embodiment of the application is quite simple to operate, and the code quantity is greatly reduced. The kernel definition can complete all operations of the operator by only realizing one computer function, wherein the getout shape function uses the shapeunc declared by the operator to quickly acquire the existing operator output shape value, and the output shape value is prevented from being acquired by pushing in the computer calculation flow.
Further, auxiliary functions needed in the operator kernel function module are obtained; the global static function is constructed based on the auxiliary function, and is automatically executed by the framework.
In the application, in the definition of the operator kernel function, all operations of the operator can be completed by only defining one computer function, and in the prior art, other auxiliary functions such as an initialization function and the like are required to be defined. According to the scheme, the kernel function definition module can be simplified, and the use complexity of a developer is reduced.
Specifically, the application constructs other auxiliary functions needed in kernel function definition into global static functions, and the framework automatically executes the global static functions. Thus, some processing is hidden and automatically completed by the framework during design, and the use complexity of a developer is greatly reduced.
Therefore, the operator kernel function defined by the scheme of the application can complete all operations of the operator by only realizing one computer function, does not need to define other auxiliary functions, does not need to acquire an output shape value through derivation in a computer calculation flow, does not need to realize the kernel function according to related hardware characteristics, can directly acquire the output shape value in the following kernel definition through shape functionalization, and greatly reduces the calculated amount. The code development amount of a developer in the operator access deep learning framework is reduced, and the grasping degree of the developer on the structure of the deep learning framework is reduced.
S303, constructing a multifunctional registration macro of the operator kernel function, and registering the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro.
After the defined operator kernel function is obtained, registering the macro through the kernel function, and registering the kernel function into a deep learning framework. The registration macro provided by the embodiment of the application is a multifunctional registration macro. And adding a preset initialization function, a data cleaning function and other functions in the registration macro to obtain the multifunctional registration macro. Specific addition of which functional functions embodiments of the present application are not specifically limited. Wherein the multifunctional registration macro is in the form of REG_KERNEL.
The embodiment of the application provides a multifunctional registration macro, which integrates functions of kernel function registration, initialization, data cleaning and the like, and can be directly called in the subsequent use process by predefining the multifunctional registration macro.
In an alternative embodiment, registering an operator kernel function to be registered to a deep learning framework according to a multi-functional registration macro includes: invoking a multifunctional registration macro; and executing data initialization and data release operation according to the multifunctional registration macro, and registering the operator kernel function to be registered to the deep learning framework.
The operations of kernel function registration, initialization, data release and the like can be completed simultaneously by a person skilled in the art by calling a predefined multifunctional registration macro. For example, for some data release, simply called clearernel, after the actual operation is completed in the registration macro, an operation of clearernel is added, and some data cleaning operations, such as releasing some variables, are performed, so that the code development amount of a developer in the operator access deep learning framework is further reduced, and the use complexity of the developer is reduced.
In an alternative embodiment, after registering the operator kernel function to be registered to the deep learning framework according to the multi-functional registration macro, the method further includes: compiling the registered operator kernel function to a corresponding operator dynamic library by updating the CMake file of the configuration framework; or compiling the registered operator kernel function to a corresponding operator dynamic library through updating the GCC command of the configuration framework; or compiling the registered operator kernel function to the corresponding operator dynamic library through updating the GXX command of the configuration framework. In the deep learning framework, registered operator kernel functions may be invoked.
From the above description, the operator access deep learning framework is defined in a standardized manner, so that the operation of the operator access deep learning framework is simplified, the code development amount from a developer to the deep learning framework for accessing new operators of different hardware is greatly reduced through a predefined multifunctional registration macro and a predefined shape output function, and the operator development operator access is more convenient through the standardized definition.
Likewise, through the standardized definition of the scheme, the same operation mode can be adopted for different deep learning frames, corresponding standardized interfaces are defined according to different frames, and finally, the standardized flow of the corresponding frames can be adopted through the macro judgment frames.
In addition, the method shields some operator access detail operations of the existing framework, for example, only one computer function is defined to realize kernel function definition, other auxiliary functions are constructed into total static functions, and the workload of a developer for developing the operator access deep learning framework is reduced; and the system can be compatible with some interfaces of the existing framework, such as kernel base class and computing function, so that the existing framework can be conveniently and quickly migrated.
The operator registration method provided by the embodiment of the application effectively reduces the coding complexity of the operator defined by the deep learning framework, reduces the workload of a developer for developing the operator to access the deep learning framework, reduces the use complexity of the developer, and is convenient for conveniently expanding the special operator in an actual application scene. And obtaining a deep learning framework rich in operators. The deep learning framework may be a deep learning framework used in quantum computing, a deep learning framework used in biological computing, a deep learning framework used in artificial intelligent robots, and the like.
In an exemplary scenario, the deep learning framework of the application is a deep learning framework in an intelligent sweeping robot, and the hardware operators are hardware operators used in the intelligent sweeping robot, so that the problem of how to integrate the hardware operators used in the intelligent sweeping robot into an operator library of the framework quickly is solved. By the method, the self-defined hardware operators can be quickly accessed into the framework, and a deep learning framework with rich operators is obtained. The method has positive promotion effect on the iterative upgrade of the intelligent sweeping robot.
In order to facilitate understanding of the operator registration method of the deep learning framework provided in the embodiment of the present application, the following description is made with reference to fig. 4. As shown in fig. 4, the method includes the following steps.
The technical scheme of the application is to provide a standardized operator kernel function registration interface, and operator kernel functions of different hardware can be registered to the deep learning framework for summarization through the implementation of the interface.
Firstly, defining an operator function statement Op, which comprises an operator function name OpName, an input tensor input, an output tensor output, a parameter attr and a shape output function shape eFunc;
further, an operator kernel function is defined, the operator kernel function only needs to define a computer function, and the defined computer function inherits the basic class of the deep learning framework.
Further, the operator kernel function is registered, and the kernel function is registered into the deep learning framework through the kernel function registration macro definition.
Finally, compiling the operator kernel, namely compiling the operator kernel needing to be registered by updating the CMake file or the GCC/GXX command of the configuration framework.
According to the operator registration method of the deep learning framework, a set of standard operator kernel function registration interfaces is provided, and operators of different heterogeneous hardware can be easily registered into the deep learning framework. According to the registration method, all operations of the operator can be completed by only realizing one computer function by the defined operator kernel function, other auxiliary functions are not required to be defined, output shape values are not required to be obtained through deduction calculation in a computer calculation flow, and the output shape values can be directly obtained in the following kernel definition through shape functionalization, so that the calculation amount is greatly reduced. The kernel function is not required to be realized according to the related hardware characteristics, so that the code development amount of a developer accessing an operator into the deep learning framework is reduced, and the grasping degree of the developer on the self structure of the deep learning framework is reduced.
The embodiment of the application also provides an operator registration device of the deep learning framework, which is used for executing the operator registration method of the deep learning framework of the embodiment, as shown in fig. 5, and the device comprises:
the operator declaration module 501 is configured to obtain information of an operator to be registered, define an operator kernel function declaration according to the information of the operator to be registered, where the operator kernel function declaration includes a preset shape output function;
the operator definition module 502 is configured to call a shape output function preset in an operator kernel function statement to obtain an output shape value, define an operator kernel function according to the shape value, and the operator kernel function is a computer function;
the operator registration module 503 is configured to construct a multifunctional registration macro of the operator kernel function, and register the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro.
It should be noted that, when the operator registration device of the deep learning framework provided in the foregoing embodiment performs the operator registration method of the deep learning framework, only the division of the foregoing functional modules is used as an example, and in practical application, the foregoing functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the operator registration device of the deep learning framework provided in the above embodiment belongs to the same concept as the operator registration method embodiment of the deep learning framework, which embodies the detailed implementation process and is not described herein.
The embodiment of the application also provides the electronic equipment corresponding to the operator registration method of the deep learning framework provided by the embodiment, so as to execute the operator registration method of the deep learning framework.
Referring to fig. 6, a schematic diagram of an electronic device according to some embodiments of the present application is shown. As shown in fig. 6, the electronic device includes: a processor 600, a memory 601, a bus 602 and a communication interface 603, the processor 600, the communication interface 603 and the memory 601 being connected by the bus 602; the memory 601 stores a computer program executable on the processor 600, and the processor 600 executes the operator registration method of the deep learning framework provided in any of the foregoing embodiments of the present application when executing the computer program.
The memory 601 may include a high-speed random access memory (RAM: random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 603 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
Bus 602 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. The memory 601 is configured to store a program, and the processor 600 executes the program after receiving an execution instruction, and the operator registration method of the deep learning framework disclosed in any of the foregoing embodiments of the present application may be applied to the processor 600 or implemented by the processor 600.
The processor 600 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the methods described above may be performed by integrated logic circuitry in hardware or instructions in software in processor 600. The processor 600 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 601 and the processor 600 reads the information in the memory 601 and performs the steps of the method described above in combination with its hardware.
The electronic device provided by the embodiment of the application and the operator registration method of the deep learning framework provided by the embodiment of the application are the same in conception and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
The present application further provides a computer readable storage medium corresponding to the operator registration method of the deep learning framework provided in the foregoing embodiment, on which a computer program (i.e. a program product) is stored, where the computer program, when executed by a processor, performs the operator registration method of the deep learning framework provided in any of the foregoing embodiments.
It should be noted that examples of the computer readable storage medium may also include, but are not limited to, a phase change memory (PRAM), a Static Random Access Memory (SRAM), a Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a flash memory, or other optical or magnetic storage medium, which will not be described in detail herein.
The computer readable storage medium provided by the above embodiment of the present application and the operator registration method of the deep learning framework provided by the embodiment of the present application have the same advantages as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. An operator registration method of a deep learning framework, comprising:
acquiring information of an operator to be registered, and defining an operator kernel function statement according to the information of the operator to be registered, wherein the operator kernel function statement comprises a preset shape output function;
invoking a shape output function preset in the operator kernel function statement to obtain an output shape value, and defining the operator kernel function according to the shape value, wherein the operator kernel function is a computer function;
constructing a multifunctional registration macro of the operator kernel function, and registering the operator kernel function to be registered to a deep learning framework according to the multifunctional registration macro.
2. The method of claim 1, wherein defining an operator kernel function declaration based on the information of the operator to be registered comprises:
acquiring names, input tensors, output tensors and parameters of operators to be registered;
the tensor shape variable is functionalized to obtain a created shape output function;
and defining an operator kernel function statement according to the operator name, the input tensor, the output tensor, the parameters and the shape output function, wherein the operator kernel function statement comprises the operator name, the input tensor, the output tensor, the parameters and the shape output function.
3. The method of claim 1, further comprising, prior to defining the operator kernel function from the shape values:
acquiring auxiliary functions needed in the operator kernel function module;
and constructing a global static function based on the auxiliary function, and automatically executing the global static function by a framework.
4. The method of claim 1, wherein defining the operator kernel function from the shape values comprises:
defining a computer function, and realizing operator kernel calculation according to the computer function;
and calling the shape value output by the shape output function through a GetOutShape function in the computer calculation flow.
5. The method of claim 1, wherein constructing the multi-functional registration macro of the operator kernel function comprises:
constructing a registration macro, and adding a preset initialization function and a data cleaning function into the registration macro to obtain the multifunctional registration macro;
wherein the multifunctional registration macro is in the form of REG_KERNEL.
6. The method of claim 1, wherein registering operator kernel functions to be registered to a deep learning framework in accordance with the multi-function registration macro comprises:
invoking the multifunctional registration macro;
and executing data initialization and data release operation according to the multifunctional registration macro, and registering the operator kernel function to be registered to the deep learning framework.
7. The method of claim 1, further comprising, after registering the operator kernel function to be registered to a deep learning framework according to the multi-function registration macro:
compiling the registered operator kernel function to a corresponding operator dynamic library by updating the CMake file of the configuration framework; or alternatively, the first and second heat exchangers may be,
compiling the registered operator kernel function to a corresponding operator dynamic library through updating the GCC command of the configuration framework; or alternatively, the first and second heat exchangers may be,
compiling the registered operator kernel functions to the corresponding operator dynamic libraries by updating the GXX command of the configuration framework.
8. An operator registration apparatus of a deep learning framework, comprising:
an operator declaration module, configured to obtain information of an operator to be registered, define an operator kernel function declaration according to the information of the operator to be registered, where the operator kernel function declaration includes a preset shape output function;
the operator definition module is used for calling a shape output function preset in the operator kernel function statement to obtain an output shape value, and defining the operator kernel function according to the shape value, wherein the operator kernel function is a computer function;
the operator registration module is used for constructing a multifunctional registration macro of the operator kernel function and registering the operator kernel function to be registered to the deep learning framework according to the multifunctional registration macro.
9. An electronic device comprising a processor and a memory storing program instructions, the processor being configured, when executing the program instructions, to perform an operator registration method of a deep learning framework as claimed in any one of claims 1 to 7.
10. A computer readable medium having stored thereon computer readable instructions for execution by a processor to implement an operator registration method of a deep learning framework as claimed in any one of claims 1 to 7.
CN202310400779.5A 2023-04-14 2023-04-14 Operator registration method, device and equipment of deep learning framework and storage medium Active CN116107669B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310400779.5A CN116107669B (en) 2023-04-14 2023-04-14 Operator registration method, device and equipment of deep learning framework and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310400779.5A CN116107669B (en) 2023-04-14 2023-04-14 Operator registration method, device and equipment of deep learning framework and storage medium

Publications (2)

Publication Number Publication Date
CN116107669A true CN116107669A (en) 2023-05-12
CN116107669B CN116107669B (en) 2023-08-18

Family

ID=86264183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310400779.5A Active CN116107669B (en) 2023-04-14 2023-04-14 Operator registration method, device and equipment of deep learning framework and storage medium

Country Status (1)

Country Link
CN (1) CN116107669B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389421A (en) * 2023-12-07 2024-01-12 浙江网商银行股份有限公司 Trusted access processing method and device, storage medium and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270399A (en) * 2020-09-29 2021-01-26 北京百度网讯科技有限公司 Operator registration processing method and device based on deep learning and electronic equipment
CN112558942A (en) * 2020-12-22 2021-03-26 上海商汤智能科技有限公司 Operator registration method and related product
CN113342346A (en) * 2021-05-18 2021-09-03 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework
CN113703768A (en) * 2021-07-13 2021-11-26 清华大学 Tensor program optimization method and device
US20220092410A1 (en) * 2020-09-24 2022-03-24 Advanced Micro Devices, Inc. Architected library interface for kernel fusion
CN114911465A (en) * 2022-05-19 2022-08-16 北京百度网讯科技有限公司 Operator generation method, device, equipment and storage medium
CN115469864A (en) * 2022-08-23 2022-12-13 安世亚太科技股份有限公司 Application development device and method based on atomization packaging command
US11561826B1 (en) * 2020-11-12 2023-01-24 Xilinx, Inc. Scheduling processing of machine learning tasks on heterogeneous compute circuits

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220092410A1 (en) * 2020-09-24 2022-03-24 Advanced Micro Devices, Inc. Architected library interface for kernel fusion
CN112270399A (en) * 2020-09-29 2021-01-26 北京百度网讯科技有限公司 Operator registration processing method and device based on deep learning and electronic equipment
US11561826B1 (en) * 2020-11-12 2023-01-24 Xilinx, Inc. Scheduling processing of machine learning tasks on heterogeneous compute circuits
CN112558942A (en) * 2020-12-22 2021-03-26 上海商汤智能科技有限公司 Operator registration method and related product
CN113342346A (en) * 2021-05-18 2021-09-03 北京百度网讯科技有限公司 Operator registration method, device, equipment and storage medium of deep learning framework
US20220374238A1 (en) * 2021-05-18 2022-11-24 Beijing Baidu Netcom Science Technology Co., Ltd. Operator registration method and apparatus for deep learning framework, device and storage medium
CN113703768A (en) * 2021-07-13 2021-11-26 清华大学 Tensor program optimization method and device
CN114911465A (en) * 2022-05-19 2022-08-16 北京百度网讯科技有限公司 Operator generation method, device, equipment and storage medium
CN115469864A (en) * 2022-08-23 2022-12-13 安世亚太科技股份有限公司 Application development device and method based on atomization packaging command

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389421A (en) * 2023-12-07 2024-01-12 浙江网商银行股份有限公司 Trusted access processing method and device, storage medium and electronic equipment
CN117389421B (en) * 2023-12-07 2024-05-14 浙江网商银行股份有限公司 Trusted access processing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN116107669B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
US8286130B2 (en) Methods and systems for using type models to generate an implementation of a type
KR101795844B1 (en) Runtime system
US8966456B2 (en) System and method for providing and using meta-data in a dynamically typed array-based language
US9600411B2 (en) System and method for determining an object&#39;s lifetime in an object oriented environment
US8869100B1 (en) Data objects for model-based design
US8418134B2 (en) Method for efficiently managing property types and constraints in a prototype based dynamic programming language
JP7350923B2 (en) Deep learning framework operator registration method, equipment, device and storage medium
CN116107669B (en) Operator registration method, device and equipment of deep learning framework and storage medium
CN112765023A (en) Test case generation method and device
US9311111B2 (en) Programming environment with support for handle and non-handle user-created classes
JP2022545489A (en) Smart contract client program generation method, system, device, and medium
CN115934346B (en) Operator automatic detection method and device, electronic equipment and medium
CN114089975A (en) Expansion method and device of computing software, nonvolatile storage medium and processor
Di Natale et al. An MDA approach for the generation of communication adapters integrating SW and FW components from Simulink
CN109960709B (en) Database driver processing method, device, equipment and storage medium
CN112256249A (en) Method and equipment for expanding Android system function and computer storage medium
CN116737117A (en) Model development method based on Autosar architecture
CN112256355B (en) Data-driven system function loading method, equipment and storage medium
CN113656001A (en) Platform component development method and device, computer equipment and storage medium
CN116521181B (en) Script data processing method, device, equipment and medium based on game system
CN110045997B (en) Object initialization method, device, equipment and storage medium of basic function module
CN113778564B (en) Method, equipment and storage medium for efficiently executing EVM intelligent contract
US20090328020A1 (en) Interface optimization in a closed system
CN113971019A (en) Data type creating method, device, server and medium
CN110333870B (en) Simulink model variable distribution processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant