CN115509539A - Data calling method, device, equipment and medium - Google Patents

Data calling method, device, equipment and medium Download PDF

Info

Publication number
CN115509539A
CN115509539A CN202211191109.9A CN202211191109A CN115509539A CN 115509539 A CN115509539 A CN 115509539A CN 202211191109 A CN202211191109 A CN 202211191109A CN 115509539 A CN115509539 A CN 115509539A
Authority
CN
China
Prior art keywords
tensor
type
code
data
subgraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211191109.9A
Other languages
Chinese (zh)
Inventor
王慕雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211191109.9A priority Critical patent/CN115509539A/en
Publication of CN115509539A publication Critical patent/CN115509539A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The application discloses a data calling method, a data calling device, data calling equipment and a data calling medium, which relate to the field of artificial intelligence and comprise the following steps: carrying out abstract conversion on a pre-acquired model to obtain a calculation graph, and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph; vectorizing the computed subgraph to obtain tensor expression, determining the type of the tensor expression, and determining a corresponding target code according to the type of the tensor expression; and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file. Through the technical scheme, the deep learning model multi-platform deployment can be realized, other deep learning processors are supported, and the data calling efficiency is improved.

Description

Data calling method, device, equipment and medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a data calling method, a data calling device, data calling equipment and a data calling medium.
Background
With the gradual development and popularization of Artificial Intelligence (AI), demands for completing various tasks by using a deep learning network are increasing, how to quickly and stably operate a trained model in an actual production environment becomes one of key points of Artificial Intelligence application, an aim of AI deployment is to enable the trained deep learning network model to smoothly operate on a specific hardware platform and ensure high precision, low delay and stability of the model, and based on AI deployment demands, various large hardware manufacturers provide corresponding deployment tools such as OpenVINO of intel, tensorRT of intevada and the like for a self-hardware platform, so that inference tasks can be conveniently developed by the deep learning model on an inteval Central Processing Unit (CPU) platform and a Graphics Processing Unit (GPU) platform. TensorRT is an existing framework for deploying deep learning applications on Invitta graphics processing units. The TensorRT performs the optimization and builds an Inference engine (Inference engine) based on the definitions of the model network and the optimization options set by the user, and then saves the Inference engine in a serialized format for subsequent deployment. The TensorRT library is linked, the serialized files are deserialized into an inference engine on a corresponding hardware platform to wait for input data to arrive, and then deployment of the deep learning model on a specific platform can be completed. In addition, tensrT is currently in a semi-open source state, which is not conducive to deep development.
Therefore, how to realize deep learning model multi-platform deployment and support other deep learning processors in the data calling process is a problem to be solved in the field.
Disclosure of Invention
In view of this, the present invention provides a data calling method, apparatus, device and medium, which can implement multi-platform deployment of a deep learning model, support other deep learning processors, and improve data calling efficiency. The specific scheme is as follows:
in a first aspect, the present application discloses a data calling method, including:
carrying out abstract conversion on a pre-acquired model to obtain a calculation graph, and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph;
vectorizing the computed subgraph to obtain tensor expression, determining the type of the tensor expression, and determining a corresponding target code according to the type of the tensor expression;
and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file.
Optionally, the performing abstract transformation on the pre-obtained model to obtain the computation graph includes:
carrying out abstract conversion on a pre-acquired model by using a preset front-end interpreter realized by a Python script to obtain a calculation graph; wherein the calculation graph comprises operators and data flow information;
and saving the computation graph to the local in a TorchScript format.
Optionally, the performing image segmentation optimization on the computation graph to obtain each computation subgraph includes:
performing constant folding operation and operator fusion operation on the calculation graph to obtain an operated image;
and segmenting the operated graph to obtain each computation subgraph.
Optionally, the vectorizing the computed subgraph to obtain a tensor representation includes:
and performing cyclic blocking operation and vectorization processing on the computation subgraph by using the AutoTVM in the open source framework TVM to obtain tensor expression.
Optionally, the determining the type of tensor representation and determining a corresponding object code according to the type of tensor representation include:
determining the type of tensor representation;
and screening out target codes corresponding to the type expressed by the tensor from a preset code module library according to the type expressed by the tensor.
Optionally, the processing the tensor representation by using the target code to obtain a bottom layer code, and sending the bottom layer code to a compiler corresponding to the type of the computational subgraph to obtain a binary file includes:
performing language conversion processing on the tensor expression by using the target code to obtain a bottom layer code, and determining a compiler and an operator kernel corresponding to the type of the tensor expression;
and sending the bottom layer code to the compiler for compiling to obtain a binary file, and storing the binary file to an operator kernel corresponding to the type expressed by the tensor in a dynamic link library format.
Optionally, the invoking, by the service end, of the data in the binary file includes:
and the server determines a processor according to the service requirement, and establishes a connection relation between the operator kernels with the same type as the processor, so that the server calls the data in the binary file in the operator kernels according to the connection relation.
In a second aspect, the present application discloses a data calling apparatus, including:
the image segmentation module is used for carrying out abstract conversion on a pre-acquired model to obtain a calculation graph and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph;
the object code determining module is used for vectorizing the computed subgraph to obtain tensor representation, determining the type of the tensor representation and determining a corresponding object code according to the type of the tensor representation;
and the data calling module is used for processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file, so that a service end can call data in the binary file.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the data calling method.
In a fourth aspect, the present application discloses a computer storage medium for storing a computer program; wherein the computer program when executed by a processor implements the steps of the data call method disclosed in the foregoing.
The data calling method comprises the steps of carrying out abstract conversion on a pre-acquired model to obtain a calculation graph, and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph; vectorizing the computed subgraph to obtain tensor expression, determining the type of the tensor expression, and determining a corresponding target code according to the type of the tensor expression; and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file. The technical scheme of the invention is based on the compiler principle, develops an end-to-end deployment process to realize automatic optimization and multi-platform deployment of a deep learning model, and can solve the problem of narrow application range of a hardware platform of the current customized AI deployment tool. In some complex production environments, such as the situation that resources of a single type of artificial intelligence processor are insufficient or a model cannot be fully supported, the model can be deployed on various processors through the scheme, and inference performance on each processor is guaranteed through automatic performance tuning. The artificial intelligence multi-platform deployment scheme provided by the application has good expansibility: the front-end interpreter is expanded, various deep learning training frameworks can be supported, specific optimization and code generation related to a target platform are expanded, and various artificial intelligent processors can be supported.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a data calling method disclosed herein;
FIG. 2 is a flow chart of a data calling method disclosed herein;
FIG. 3 is a tensor representation diagram of a matrix multiplier as disclosed herein;
FIG. 4 is an exemplary diagram of a BANG C code disclosed herein;
FIG. 5 is a diagram of an example of a tensor-representative equivalent transformation for a matrix multiplier as disclosed herein;
FIG. 6 is a detailed flow chart of a data call method disclosed herein;
FIG. 7 is a schematic diagram of a data call apparatus according to the present disclosure;
fig. 8 is a block diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
With the gradual development and popularization of Artificial Intelligence (AI), there is an increasing demand for completing various tasks using a deep learning network, how to quickly and stably operate a trained model in an actual production environment becomes one of the key points of falling on the ground of Artificial Intelligence applications, an aim of AI deployment is to enable the trained deep learning network model to smoothly operate on a specific hardware platform and ensure high precision, low delay and stability of the model, and based on AI deployment demands, various large hardware manufacturers propose corresponding deployment tools such as OpenVINO of intel, tensorRT of intel and the like for a self-hardware platform, so that the deep learning model can develop inference tasks on an intel CPU (Central Processing Unit) platform and an intel GPU (Graphics Processing Unit) platform. TensorRT is an existing framework for deploying deep learning applications on Invitta graphics processing units. TensorRT performs the optimization and constructs an Inference engine (Inference engine) based on the model network definitions and user set optimization options, and then saves the Inference engine in a serialized format for subsequent deployment. The TensorRT library is linked, the serialized files are deserialized into an inference engine on a corresponding hardware platform to wait for input data to arrive, and then deployment of the deep learning model on a specific platform can be completed. In addition, tensrT is currently in a semi-open source state, which is not conducive to deep development. Therefore, how to realize multi-platform deployment of the deep learning model and support other deep learning processors in the data calling process is a problem to be solved in the field. The technical scheme of the invention is based on the compiler principle, develops an end-to-end deployment process to realize automatic optimization and multi-platform deployment of a deep learning model, and can solve the problem of narrow application range of a hardware platform of the current customized AI deployment tool. In some complex production environments, such as the situation that resources of a single type of artificial intelligence processor are insufficient or a model cannot be fully supported, the model can be deployed on various processors through the scheme, and inference performance on each processor is guaranteed through automatic performance tuning. The artificial intelligence multi-platform deployment scheme provided by the application has good expansibility: the front-end interpreter is expanded, various deep learning training frameworks can be supported, specific optimization and code generation related to a target platform are expanded, and various artificial intelligent processors can be supported.
Referring to fig. 1, an embodiment of the present invention discloses a data calling method, which may specifically include:
step S11: and carrying out abstract conversion on the pre-acquired model to obtain a calculation graph, and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph.
In this embodiment, a preset front-end interpreter implemented by using a Python script performs abstract conversion on a pre-acquired model to obtain a computation graph; the computational graph comprises operators and data flow information, and is stored locally in a TorchScript format, and it can be understood that a model trained under different-deep learning frames is converted into a self-defined more universal abstract intermediate representation, such as the computational graph, through a front-end interpreter realized by a Python script. The computation graph contains the serialized operators and data flow information. The model trained under the Pythrch framework is stored in TorchScript format, wherein the contained Graph (Graph) is converted into a self-defined computational Graph representation through a front-end interpreter, so as to facilitate subsequent processing.
In this embodiment, after performing abstract conversion on a pre-obtained model to obtain a computation graph, performing constant folding operation and operator fusion operation on the computation graph to obtain an operated image, and segmenting the operated graph to obtain computation subgraphs, that is, performing a series of optimizations on the computation graph obtained by conversion. Common computational graph optimization strategies comprise constant folding, operator fusion and the like, act on an operator level, equivalently perform equivalent transformation on a computational graph to achieve the purposes of reducing computation and memory overhead and accelerating reasoning speed, and then segment the serialized computational graph to generate a plurality of computation subgraphs so as to realize multi-platform deployment of the same model.
Step S12: vectorizing the computation subgraph to obtain tensor representation, determining the type of the tensor representation, and determining a corresponding target code according to the type of the tensor representation.
Step S13: and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computation subgraph to obtain a binary file, so that a service end can call data in the binary file.
In the embodiment, the pre-acquired model is subjected to abstract conversion to obtain a calculation graph, and then the calculation graph is subjected to image segmentation optimization to obtain each calculation subgraph; vectorizing the computed subgraph to obtain tensor expression, determining the type of the tensor expression, and determining a corresponding target code according to the type of the tensor expression; and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file. The method and the device for deploying the deep learning model based on the compiler principle develop an end-to-end deployment process to achieve automatic optimization and multi-platform remote deployment of the deep learning model, can solve the problem that a hardware platform of a current customized AI deployment tool is narrow in application range, for example, single artificial intelligence processor resources are insufficient or a model cannot be fully supported in some complex production environments, and the like. The technical scheme of the invention is based on the compiler principle, develops an end-to-end deployment process to realize automatic optimization and multi-platform deployment of a deep learning model, and can solve the problem of narrow application range of a hardware platform of the current customized AI deployment tool. In some complex production environments, such as the situation that resources of a single type of artificial intelligence processor are insufficient or a model cannot be fully supported, the model can be deployed on various processors through the scheme, and inference performance on each processor is guaranteed through automatic performance tuning. The artificial intelligence multi-platform deployment scheme provided by the application has good expansibility: the front-end interpreter is expanded, various deep learning training frameworks can be supported, specific optimization and code generation related to a target platform are expanded, and various artificial intelligent processors can be supported.
Referring to fig. 2, an embodiment of the present invention discloses a data call method, which may specifically include:
step S21: and carrying out abstract conversion on the pre-acquired model to obtain a calculation graph, and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph.
Step S22: and performing cyclic blocking operation and vectorization processing on the computation subgraph by using an AutoTVM in the open source framework TVM to obtain tensor representation, then determining the type of the tensor representation, and screening out target codes corresponding to the type of the tensor representation from a preset code module library according to the type of the tensor representation.
In this embodiment, the minimum granularity represented by the calculation graph is an operator, and the operator further converts the calculation graph into a finer-grained tensor expression form, such as a loop body, as shown in fig. 3, the loop body often includes a large number of basic operations such as addition, subtraction, multiplication, division and the like, and at the same time, the occupation ratio of the basic operations in the whole inference delay is obvious along with frequent data reading and writing. Common loop optimization includes loop blocking, vectorization, etc., and the loop body in fig. 3 can be equivalently transformed into the form as in fig. 4 after being blocked and vectorized, where tvm _ bang _ load _ matrix, tvm _ bang _ store _ matrix, and tvm _ bang _ conv are vectorization representations defined in a specific optimizer. Different bottom hardware has different storage levels, different levels of storage capacity, hardware vectorization instructions and parallelization capability, so that parameters such as block size, parallelism and the like are different when loop calculation performance is optimal, and therefore optimization of a loop body depends on the bottom hardware characteristics of a deployment platform, and the deployment platform needs to be selected before optimization. In addition, in order to reduce the manual parameter adjustment cost, an automatic optimization module (AutoTVM) in the open source framework TVM (sensor Virtual Machine) can be added in the inference process to realize the automatic search of the optimal parameters such as the size of the loop block, the parallelism and the like. The AutoTVM reads the loop body to be optimized as shown in fig. 3, and selects the optimal parameters according to the performance comparison of different block sizes on the target platform.
Step S23: and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file.
In this embodiment, the target code is used to perform language conversion processing on the tensor expression to obtain a bottom layer code, and a compiler and an operator kernel corresponding to the type of the tensor expression are determined; and sending the bottom layer code to the compiler for compiling to obtain a binary file, and storing the binary file to an operator kernel corresponding to the type expressed by the tensor in a dynamic link library format.
Specifically, the optimized abstract tensor representations are converted into languages supported by target hardware in a one-to-one correspondence mode, for example, CUDA C for imperial great graphic processors, BANG C for cambrian artificial intelligence processors, and the like, generated BANG C codes are shown in fig. 5, and example codes in fig. 3 represent, if the target platform is a cambrian artificial intelligence processor and vectorization zero-setting optimization (tvm _ BANG _ mlp _ init) defined for the cambrian processor is selected, the partial representations are converted into initialization zero-setting instructions in the BANG languages (__ BANG _ write _ zero). Similarly, the matrix multiplication tensor representation after the loop blocking and vectorization optimization in fig. 4 is processed into a data copy (__ memcpy) and convolution (__ BANG _ conv) instruction in BANG C. The generated bottom code is compiled by a corresponding compiler (CNCC) in the cambrian period to generate a binary file which can be operated and is saved in a format of a dynamic link library for remote transplantation.
In this embodiment, the server determines a processor according to a service requirement, and establishes a connection relationship between the operator cores of the same type as the processor, so that the server calls data in the binary file in the operator cores according to the connection relationship.
Specifically, the binary files are respectively deployed on corresponding target platforms, such as an imperial labris processor (GPU) and a cambrian processor (MLU). During model reasoning, the host computer performs calculation scheduling on operator kernels deployed on different processors, and data transfer between different types of processors is also performed through the host computer.
The whole multi-platform deployment process of the application is as shown in fig. 6, firstly, a preset front-end interpreter implemented by using a Python script performs abstract conversion on a model obtained in advance to obtain a computation graph, the computation graph is saved to the local in a TorchScript format, then, a general optimizer performs constant folding operation and operator fusion operation on the computation graph to obtain an operated image, the operated image is segmented to obtain computation subgraphs (namely subgraph 1 and subgraph 2), then, an auto TVM (namely, a specific optimizer) in an open source frame TVM is used for performing cyclic blocking operation and vectorization processing on the computation subgraph to obtain tensor representations (namely tensor representation 1 and tensor representation 2), then, the type of the tensor representation is determined, and a target code corresponding to the type of the tensor representation is screened from a preset code module library according to the type of the tensor representation, performing language conversion processing on the tensor expression by using the target code to obtain a bottom layer code, determining a compiler and an operator kernel corresponding to the type of the tensor expression, sending the bottom layer code to the compiler for compilation processing to obtain a binary file, storing the binary file to the operator kernel corresponding to the type of the tensor expression in a dynamic link library format, namely screening out the target code corresponding to the type of the tensor expression 1 from a preset code module library if the type of the tensor expression 1 is GPU, performing language conversion processing on the tensor expression 1 by using the target code to obtain the bottom layer code, and determining the compiler and the operator kernel (namely NVCC and GPU operator kernel) corresponding to the type of the tensor expression 1, the method comprises the steps of sending a bottom layer code to an NVCC (network video controller) for compiling processing to obtain a binary file, saving the binary file to a GPU (graphics processing unit) operator kernel in a dynamic link library format, similarly, if the type of tensor expression 2 is MLU, screening a target code corresponding to the type of tensor expression 2 from a preset code module library, then utilizing the target code to perform language conversion processing on the tensor expression 2 to obtain a bottom layer code, determining a compiler and an operator kernel (namely CNCC and the MLU operator kernel) corresponding to the type of the tensor expression 2, sending the bottom layer code to the CNCC for compiling processing to obtain a binary file, saving the binary file to the MLU operator kernel in the dynamic link library format, then determining a processor according to a service requirement, establishing a connection relation between the kernels which is the same as the processor type, so that the service end calls data in the binary file in the MLU operator kernel according to the connection relation, namely, determining a connection relation between the GPU and the MLU kernel, and the CPU kernel, and establishing the connection relation between the CPU kernel and the CPU kernel according to the connection relation between the CPU kernel.
In this embodiment, a preset front-end interpreter implemented by using a Python script performs abstract conversion on a pre-acquired model to obtain a computation graph, the computation graph is stored to the local in a TorchScript format, then a general optimizer performs constant folding operation and operator fusion operation on the computation graph to obtain an operated image, the operated image is divided to obtain computation subgraphs, then the computation subgraphs are subjected to cyclic blocking operation and vectorization processing respectively by using an auto TVM in an open source framework TVM to obtain tensor expression, then a type of tensor expression is determined, a target code corresponding to the type of tensor expression is screened out from a preset code module library according to the type of tensor expression, the tensor expression is subjected to language conversion processing by using the target code to obtain a bottom layer code, determining a compiler and an operator kernel corresponding to the type represented by the tensor, sending the bottom code to the compiler for compilation processing to obtain a binary file, saving the binary file to the operator kernel corresponding to the type represented by the tensor in a dynamic link library format, namely screening out a target code corresponding to the type represented by the tensor from a preset code module library if the type represented by the tensor 1 is GPU, performing language conversion processing on the tensor 1 by using the target code to obtain a bottom code, determining the compiler and the operator kernel corresponding to the type represented by the tensor 1, sending the bottom code to NVCC for compilation processing to obtain a binary file, and saving the binary file to the GPU kernel in a dynamic link library format, similarly, if the type of tensor expression 2 is MLU, screening out a target code corresponding to the type of tensor expression 2 from a preset code module library, then performing language conversion processing on the tensor expression 2 by using the target code to obtain a bottom layer code, determining a compiler and an operator kernel corresponding to the type of tensor expression 2, sending the bottom layer code to a CNCC for compilation processing to obtain a binary file, storing the binary file to an MLU operator kernel in a format of a dynamic link library, then determining a processor by a service end according to a service requirement, establishing a connection relation between the operator kernels with the same type as the processor, so that the service end calls data in the binary file in the operator kernels according to the connection relation, namely determining a GPU and an MLU by the service end according to the service requirement, establishing a connection relation between the GPU and the CPU operator kernels and between the MLU and the MLU operator kernels, and then calling data in the binary file in the CPUs operator kernels and the MLU kernels according to the connection relation.
Referring to fig. 7, an embodiment of the present invention discloses a data invoking device, which may specifically include:
the image segmentation module 11 is configured to perform abstract transformation on a pre-obtained model to obtain a computation graph, and then perform image segmentation optimization on the computation graph to obtain each computation subgraph;
the target code determining module 12 is configured to perform vectorization processing on the computed subgraph to obtain tensor expression, determine a type of tensor expression, and determine a corresponding target code according to the type of tensor expression;
and the data calling module 13 is configured to process the tensor expression by using the target code to obtain a bottom layer code, and send the bottom layer code to a compiler corresponding to the type of the computation subgraph to obtain a binary file, so that a service end calls data in the binary file.
In the embodiment, the pre-acquired model is subjected to abstract conversion to obtain a calculation graph, and then the calculation graph is subjected to image segmentation optimization to obtain each calculation subgraph; vectorizing the computed subgraph to obtain tensor expression, determining the type of the tensor expression, and determining a corresponding target code according to the type of the tensor expression; and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file. The technical scheme of the invention is based on the compiler principle, develops an end-to-end deployment process to realize automatic optimization and multi-platform deployment of a deep learning model, and can solve the problem of narrow application range of a hardware platform of the current customized AI deployment tool. In some complex production environments, such as the situation that resources of a single type of artificial intelligence processor are insufficient or a model cannot be fully supported, the model can be deployed on various processors through the scheme, and inference performance on each processor is guaranteed through automatic performance tuning. The artificial intelligence multi-platform deployment scheme provided by the application has good expansibility: the front-end interpreter is expanded, various deep learning training frameworks can be supported, specific optimization and code generation related to a target platform are expanded, and various artificial intelligent processors can be supported.
In some specific embodiments, the image segmentation module 11 may specifically include:
carrying out abstract conversion on a pre-acquired model by using a preset front-end interpreter realized by a Python script to obtain a calculation diagram; wherein the calculation graph comprises operators and data flow information;
saving the computation graph to local in TorchScript format.
In some specific embodiments, the image segmentation module 11 may specifically include:
performing constant folding operation and operator fusion operation on the calculation graph to obtain an operated image;
and segmenting the operated graph to obtain each computational subgraph.
In some specific embodiments, the object code determining module 12 may specifically include:
and performing cyclic blocking operation and vectorization processing on the computation subgraph by using the AutoTVM in the open source framework TVM to obtain tensor expression.
In some specific embodiments, the object code determining module 12 may specifically include:
determining the type of tensor representation;
and screening out target codes corresponding to the type expressed by the tensor from a preset code module library according to the type expressed by the tensor.
In some specific embodiments, the data calling module 13 may specifically include:
performing language conversion processing on the tensor expression by using the target code to obtain a bottom layer code, and determining a compiler and an operator kernel corresponding to the type of the tensor expression;
and sending the bottom layer code to the compiler for compiling to obtain a binary file, and storing the binary file to an operator kernel corresponding to the type expressed by the tensor in a dynamic link library format.
In some specific embodiments, the data calling module 13 may specifically include:
and the server determines a processor according to the service requirement, and establishes a connection relation between the operator kernels with the same type as the processor, so that the server calls the data in the binary file in the operator kernels according to the connection relation.
The technical scheme of the invention is based on the compiler principle, develops an end-to-end deployment process to realize automatic optimization and multi-platform deployment of a deep learning model, and can solve the problem of narrow application range of a hardware platform of the current customized AI deployment tool. In some complex production environments, such as a single type of artificial intelligence processor is insufficient in resource or cannot fully support a model, the model can be deployed on various processors through the scheme, inference performance on each processor is guaranteed through automatic performance tuning, and the technical key point of the method is that an end-to-end deployment process is provided to achieve automatic optimization and multi-platform remote deployment of a deep learning model. The artificial intelligence multi-platform deployment scheme provided by the application has good expansibility: the front-end interpreter is expanded, various deep learning training frameworks can be supported, specific optimization and code generation related to a target platform are expanded, and various artificial intelligent processors can be supported.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the data call method executed by the electronic device disclosed in any of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20, so as to realize the operation and processing of the data 223 in the memory 22 by the processor 21, which may be Windows, unix, linux, and the like. The computer programs 222 may further include computer programs that can be used to perform other specific tasks in addition to the computer programs that can be used to perform the data call methods disclosed by any of the foregoing embodiments and executed by the electronic device 20. The data 223 may include data received by the data call device and transmitted from an external device, data collected by the input/output interface 25, and the like.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Further, an embodiment of the present application further discloses a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is loaded and executed by a processor, the steps of the data calling method disclosed in any of the foregoing embodiments are implemented.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The data calling method, the data calling device, the data calling equipment and the data calling storage medium provided by the invention are described in detail, specific examples are applied in the description to explain the principle and the implementation mode of the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A data calling method, comprising:
carrying out abstract conversion on a pre-acquired model to obtain a calculation graph, and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph;
vectorizing the computed subgraph to obtain tensor expression, determining the type of the tensor expression, and determining a corresponding target code according to the type of the tensor expression;
and processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computational subgraph to obtain a binary file so that a service end can call data in the binary file.
2. The data calling method according to claim 1, wherein the performing abstract transformation on the pre-obtained model to obtain the computation graph comprises:
carrying out abstract conversion on a pre-acquired model by using a preset front-end interpreter realized by a Python script to obtain a calculation graph; the computational graph comprises operators and data flow information;
saving the computation graph to local in TorchScript format.
3. The data calling method of claim 1, wherein the performing image segmentation optimization on the computation graph to obtain each computation graph comprises:
performing constant folding operation and operator fusion operation on the calculation graph to obtain an operated image;
and segmenting the operated graph to obtain each computation subgraph.
4. The data calling method of claim 1, wherein vectorizing the computed subgraph to obtain a tensor representation comprises:
and performing cyclic blocking operation and vectorization processing on the computation subgraph by using the AutoTVM in the open source framework TVM to obtain tensor expression.
5. The data calling method according to any one of claims 1 to 4, wherein the determining the type of tensor representation and the determining the corresponding target code according to the type of tensor representation includes:
determining the type of tensor expression;
and screening out a target code corresponding to the type expressed by the tensor from a preset code module library according to the type expressed by the tensor.
6. The data calling method of claim 5, wherein the processing the tensor representation with the object code to obtain an underlying code and sending the underlying code to a compiler corresponding to the type of the computational subgraph to obtain a binary file comprises:
performing language conversion processing on the tensor expression by using the target code to obtain a bottom layer code, and determining a compiler and an operator kernel corresponding to the type of the tensor expression;
and sending the bottom layer code to the compiler for compiling to obtain a binary file, and storing the binary file to an operator kernel corresponding to the type expressed by the tensor in a dynamic link library format.
7. The data calling method of claim 6, wherein the making of the call to the service end to the data in the binary file comprises:
and the server determines a processor according to the service requirement, and establishes a connection relation between the operator kernels with the same type as the processor, so that the server calls the data in the binary file in the operator kernels according to the connection relation.
8. A data call apparatus, comprising:
the image segmentation module is used for carrying out abstract conversion on a pre-acquired model to obtain a calculation graph and then carrying out image segmentation optimization on the calculation graph to obtain each calculation subgraph;
the object code determining module is used for vectorizing the computed subgraph to obtain tensor representation, determining the type of the tensor representation and determining a corresponding object code according to the type of the tensor representation;
and the data calling module is used for processing the tensor expression by using the target code to obtain a bottom code, and sending the bottom code to a compiler corresponding to the type of the computation subgraph to obtain a binary file, so that a service end can call data in the binary file.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the data call method of any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements the data call method of any of claims 1 to 7.
CN202211191109.9A 2022-09-28 2022-09-28 Data calling method, device, equipment and medium Pending CN115509539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211191109.9A CN115509539A (en) 2022-09-28 2022-09-28 Data calling method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211191109.9A CN115509539A (en) 2022-09-28 2022-09-28 Data calling method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115509539A true CN115509539A (en) 2022-12-23

Family

ID=84505734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211191109.9A Pending CN115509539A (en) 2022-09-28 2022-09-28 Data calling method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115509539A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115796284A (en) * 2023-02-08 2023-03-14 苏州浪潮智能科技有限公司 Inference method, inference device, storage medium and equipment based on TVM compiler
CN117971251A (en) * 2024-04-01 2024-05-03 深圳市卓驭科技有限公司 Software deployment method, device, storage medium and product
WO2024139812A1 (en) * 2022-12-28 2024-07-04 华为技术有限公司 Resource type data transmission method and related apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024139812A1 (en) * 2022-12-28 2024-07-04 华为技术有限公司 Resource type data transmission method and related apparatus
CN115796284A (en) * 2023-02-08 2023-03-14 苏州浪潮智能科技有限公司 Inference method, inference device, storage medium and equipment based on TVM compiler
CN117971251A (en) * 2024-04-01 2024-05-03 深圳市卓驭科技有限公司 Software deployment method, device, storage medium and product

Similar Documents

Publication Publication Date Title
CN115509539A (en) Data calling method, device, equipment and medium
US20220092439A1 (en) Decoupled architecture for artificial intelligence model management
CN111209005A (en) Method and apparatus for compiling program file, and computer-readable storage medium
CN111527501A (en) Chip adaptation determining method and related product
EP4258175A1 (en) Node fusion method for computational graph, and device
CN114691148B (en) Model reasoning acceleration method, device, electronic equipment and storage medium
CN115686527A (en) Compiling method and device based on operator, computer equipment and storage medium
EP3866443A1 (en) Opc ua server, system operating using opc ua, and method of executing opc ua system
US20240062116A1 (en) Model processing method and apparatus
CN115423101A (en) Tensor data calculation reasoning method and device based on compiler and storage medium
CN111144571A (en) Deep learning reasoning operation method and middleware
CN114186678B (en) Hardware adaptation device and method based on deep learning
CN110941655B (en) Data format conversion method and device
CN112465112B (en) nGraph-based GPU (graphics processing Unit) rear-end distributed training method and system
US20220172044A1 (en) Method, electronic device, and computer program product for deploying machine learning model
CN113626035B (en) Neural network compiling method facing RISC-V equipment based on TVM
CN114564156A (en) Model slicing method and device, 3D printing system and electronic equipment
CN115237457A (en) AI application operation method and related product
CN114253595A (en) Code warehouse management method and device, computer equipment and storage medium
CN111459576B (en) Data analysis processing system and model operation method
CN116341633B (en) Model deployment method, device, equipment and storage medium
US20230110520A1 (en) Ui service package generation and registration method and apparatus, and ui service loading method and apparatus
CN110837896B (en) Storage and calling method and device of machine learning model
CN113391816A (en) Python code resource loading method and device
CN105183490A (en) Method and device for migrating offline processing logic to real-time processing frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination