CN116415665A

CN116415665A - Conversion method and conversion device of end-side push model

Info

Publication number: CN116415665A
Application number: CN202111672289.8A
Authority: CN
Inventors: 曾华荣; 韩峰; 涂威威
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-07-11

Abstract

The utility model discloses a conversion method and a conversion device of an end-side inference model, wherein the conversion method comprises the following steps: acquiring a network code segment of a server side reasoning model, wherein the server side reasoning model is a model which is trained and converged in advance at a server side; determining a network topology structure and a first operator of the server side reasoning model by analyzing the network code segment, and judging whether a first preset interface supports the first operator, wherein the first preset interface is used for calling an operator corresponding to the first operator in an end side operator library; when the terminal side operator library does not support the first operator, calling a self-defined second operator through a second preset interface; and converting the server side reasoning model into the end side reasoning model based on the network topological structure and the second operator. The conversion method can semi-automatically realize the conversion from the server side reasoning model to the end side reasoning model.

Description

Conversion method and conversion device of end-side push model

Technical Field

The present disclosure relates generally to the field of computer technology, and more particularly, to a method and apparatus for converting an end-to-side inference model.

Background

The terminal side equipment comprises mobile terminal equipment, internet of things equipment and the like, is closer to the user terminal device, can accelerate data processing and transmission speed, reduces delay, enables the terminal side equipment to work normally under the low bandwidth condition, can reduce the chance that the data is exposed to a public network, and protects data privacy. However, the computing performance of the end-side device is limited and the power consumption is low, and since the training process of the machine learning model requires a lot of computing resources, the training task is not generally performed on the end-side device, but only the reasoning task is performed on the end-side device. At present, a system running on an end-side device is usually an embedded system, and has a specially customized reasoning framework, such as a Jetson development board of NVIDIA, which can support model reasoning using a TensorRT framework, but needs to convert a server-side reasoning model under the server-side reasoning framework into an end-side reasoning model under the end-side reasoning framework.

Disclosure of Invention

The present disclosure provides a conversion method and a conversion device for an end-side inference model, which are used for at least solving the above-mentioned problems, or not solving the above-mentioned problems.

According to an aspect of the present disclosure, there is provided a conversion method of an end-side inference model, the conversion method including: acquiring a network code segment of a server side reasoning model, wherein the server side reasoning model is a model which is trained and converged in advance at a server side; determining a network topology structure and a first operator of the server side reasoning model by analyzing the network code segment, and judging whether a first preset interface supports the first operator, wherein the first preset interface is used for calling an operator corresponding to the first operator in an end side operator library; when the terminal side operator library does not support the first operator, calling a self-defined second operator through a second preset interface; and converting the server side reasoning model into the end side reasoning model based on the network topological structure and the second operator.

Optionally, the second operator is determined based on a base class of custom operators.

Optionally, the base classes include a data processing base class for performing format processing on data input to the model or data to be output by the model, and a model inference base class for performing inference tasks.

Optionally, the conversion method further includes: acquiring network parameters of the server side reasoning model; and acquiring weight data of the second operator based on the network parameters.

Optionally, the conversion method further includes: obtaining test data; and respectively inputting the test data into the server side reasoning model and the end side reasoning model so as to align the accuracy of the server side reasoning model and the end side reasoning model.

Optionally, inputting the test data into the server side inference model and the end side inference model respectively to perform precision alignment on the server side inference model and the end side inference model, including: and comparing the final output data of the server side inference model with the final output data of the end side inference model to perform precision alignment on the server side inference model and the end side inference model.

Optionally, the conversion method further includes: and configuring output hooks at the network middle layers of the server side reasoning model and the end side reasoning model respectively, wherein the output hooks are used for acquiring middle output data of the network middle layers.

Optionally, inputting the test data into the server side inference model and the end side inference model respectively to perform precision alignment on the server side inference model and the end side inference model, including: obtaining intermediate output data of the server side reasoning model and intermediate output data of the end side reasoning model by using the output hook; and performing precision alignment on the server side inference model and the end side inference model by comparing the intermediate output data of the server side inference model with the intermediate output data of the end side inference model and comparing the final output data of the server side inference model with the final output data of the end side inference model.

According to another aspect of the present disclosure, there is provided a conversion apparatus of an end-side inference model, the conversion apparatus including: a code acquisition unit configured to: acquiring a network code segment of a server side reasoning model, wherein the server side reasoning model is a model which is trained and converged in advance at a server side; a network parsing unit configured to: determining a network topology structure and a first operator of the server side reasoning model by analyzing the network code segment, and judging whether a first preset interface supports the first operator, wherein the first preset interface is used for calling an operator corresponding to the first operator in an end side operator library; an interface calling unit configured to: when the terminal side operator library does not support the first operator, calling a self-defined second operator through a second preset interface; a model conversion unit configured to: and converting the server side reasoning model into the end side reasoning model based on the network topological structure and the second operator.

Optionally, the conversion device further includes: a context management unit configured to: acquiring network parameters of the server side reasoning model; and acquiring weight data of the second operator based on the network parameters.

Optionally, the conversion device further includes: an offline estimation unit configured to: obtaining test data; and respectively inputting the test data into the server side reasoning model and the end side reasoning model so as to align the accuracy of the server side reasoning model and the end side reasoning model.

Optionally, the offline estimation unit is configured to: and comparing the final output data of the server side inference model with the final output data of the end side inference model to perform precision alignment on the server side inference model and the end side inference model.

Optionally, the conversion device further includes: an intermediate configuration unit configured to: and configuring output hooks at the network middle layers of the server side reasoning model and the end side reasoning model respectively, wherein the output hooks are used for acquiring middle output data of the network middle layers.

Optionally, the offline estimation unit is configured to: obtaining intermediate output data of the server side reasoning model and intermediate output data of the end side reasoning model by using the output hook; and performing precision alignment on the server side inference model and the end side inference model by comparing the intermediate output data of the server side inference model with the intermediate output data of the end side inference model and comparing the final output data of the server side inference model with the final output data of the end side inference model.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a method of converting an end-side inference model as described above.

According to another aspect of the present disclosure, there is provided a system comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform a method of converting an end-side inference model as described above.

The conversion method and the conversion device of the end-side push model of the embodiment of the disclosure can integrate manual conversion and full-automatic conversion schemes, and directly realize end-to-end model conversion in a semi-automatic mode through automatic realization of common simple operators and custom realization of special operators and complex operators, thereby reducing operator support problems caused by intermediate links and improving efficiency and compatibility of model conversion. In addition, according to the method for converting the end-side inference model in the exemplary embodiment of the disclosure, the model inference process can be abstracted into three steps of preprocessing, model inference calculation and post-processing, and for a new conversion task, the corresponding core logic such as preprocessing, model inference calculation and post-processing is only needed to be realized on the basis of the corresponding base class, so that the end-to-end offline inference prediction can be completed, and by using the output hook, the precision alignment between the server-side inference model under the server-side inference frame and the end-side inference model under the end-side inference frame can be better automatically realized, so that the debugging efficiency is further improved.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

These and/or other aspects and advantages of the present disclosure will become apparent from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flowchart illustrating a method of converting an end-side inference model in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 is a conversion framework schematic diagram illustrating an end-side inference model in accordance with an exemplary embodiment of the present disclosure;

fig. 3 is a block diagram illustrating a conversion apparatus of an end-side inference model according to an exemplary embodiment of the present disclosure.

Detailed Description

The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of embodiments of the invention defined by the claims and their equivalents. Various specific details are included to aid understanding, but are merely to be considered exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "including at least one of a and B" includes three cases side by side as follows: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.

At present, two modes of end-side push model conversion are mainly adopted, namely manual conversion and full-automatic conversion. The manual conversion is to manually build an inference network by using the language provided by the end-side inference framework, the process takes more time, has low efficiency, can perform conversion work only after the network structure is completely understood when a complex network model is encountered, and is difficult to accurately align the inference result of the model after conversion. For full-automatic conversion, such as onnx2trt, although the encoding process can be reduced, the conversion link is longer, requiring conversion from a frame such as PyTorch or TensorFlow to an onnx frame before further conversion to a TensorRT frame. On the one hand the above conversion procedure introduces more problems, such as whether onnx can support a certain operator or not, and on the other hand strict correspondence versions between onnx to tensort are required.

In order to solve the above technical problems, the present disclosure provides a method and an apparatus for converting a semi-automatic end-side push model, which on one hand, perform code replacement at a module importing stage during encoding, so as to reduce redundant workload brought by constructing a network from the beginning of encoding, and on the other hand, utilize a plug-in (plugin) interface to implement a complex custom operator. According to the conversion method and the conversion device of the end-side push theory model, which are disclosed by the embodiment of the invention, the conversion method and the conversion device can be configured and used on the Jetson platform of NVIDIA to realize end-to-end model conversion, so that the operator support problem caused by an intermediate format is reduced, and the efficiency of model conversion is improved. In addition, according to the method and the device for converting the end-side inference model in the exemplary embodiment of the disclosure, the model inference process can be abstracted into three steps of preprocessing, model inference calculation and post-processing, and for a new conversion task, core logic such as corresponding preprocessing, model inference calculation and post-processing is only needed to be realized on the basis of corresponding base classes, so that end-to-end offline inference prediction can be completed. Meanwhile, by using an output hook (hook), the precision alignment between the server side reasoning model under the server side reasoning frame and the end side reasoning model under the end side reasoning frame can be better realized, so that the debugging efficiency is further improved.

A conversion method and a conversion apparatus of an end-side inference model according to an exemplary embodiment of the present disclosure will be described in detail below with reference to fig. 1 to 3.

Fig. 1 is a flowchart illustrating a conversion method of an end-side inference model according to an exemplary embodiment of the present disclosure.

Referring to fig. 1, in step S101, a network code segment of a server-side inference model may be acquired. Here, the server side inference model is a model in which convergence is trained in advance at the server side. Further, the network code segments are written using a server side framework language.

Next, in step S102, the network topology and the first operator of the server inference model may be determined by analyzing the network code segment, and whether the first operator is supported by the first preset interface is determined. Here, the first preset interface may be used to invoke an operator corresponding to the first operator in the end side operator library.

Next, in step S103, when the end-side operator library does not support the first operator, a custom second operator may be invoked through a second preset interface. Here, the second operator may be determined based on a base class of the custom operator. By customizing the base class of the operator, for a new model conversion task, the new class can be realized by only carrying out fine adjustment on the basis of the base class so as to complete the conversion of the model. Further, the base classes may include a data processing base class that may be used to format data input to the model or data to be output by the model, and a model inference base class that may be used to perform inference tasks. According to an exemplary embodiment of the present disclosure, the model reasoning process may be abstracted into three steps of preprocessing, model reasoning calculation and post-processing, where both preprocessing and post-processing are for data, i.e. for a specific model, the accepted data format and the output data format are determined, possibly different from the data format provided on the service and the required output format, so that the original data needs to be preprocessed, converted into a format acceptable by the model, and the output of the model needs to be post-processed, converted into a format required by the task. However, the disclosure is not limited thereto, and the second operator may also be customized directly according to a complex operator or a complex operator of the server-side inference model. The conversion of the end-side inference model according to the exemplary embodiments of the present disclosure can be semi-automatically achieved through the customization of the base class or the customization of the second operator.

Next, in step S104, the server-side inference model may be converted into an end-side inference model based on the network topology and the second operator. According to an exemplary embodiment of the present disclosure, network parameters of the server-side inference model may also be obtained, and then weight data of the second operator may be obtained based on the network parameters. As an example, based on the obtained network topology and the second operator, and the obtained weight data of the second operator, an end-side inference model may be fully or partially formed, thereby implementing conversion of the end-side inference model according to an exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, test data may be acquired and then may be input into a server-side inference model and an end-side inference model, respectively, to perform precision alignment on the server-side inference model and the end-side inference model. Here, the accuracy alignment can be performed on the server-side inference model and the end-side inference model by comparing the final output data of the server-side inference model and the final output data of the end-side inference model.

According to an exemplary embodiment of the present disclosure, output hooks may be configured at the network middle layer of the server-side inference model and the end-side inference model, respectively. Here, the output hook may be used to obtain intermediate output data of the network intermediate layer. On the basis, the output hooks can be utilized to obtain the intermediate output data of the server side reasoning model and the intermediate output data of the end side reasoning model. Further, the server side inference model and the end side inference model can be aligned in precision by comparing the intermediate output data of the server side inference model with the intermediate output data of the end side inference model and comparing the final output data of the server side inference model with the final output data of the end side inference model.

According to the conversion method of the end-side push model, disclosed by the exemplary embodiment of the disclosure, manual conversion and full-automatic conversion schemes can be synthesized, and the end-to-end model conversion is directly realized in a semi-automatic mode through the automatic realization of common simple operators and the custom realization of special operators and composite operators, so that the operator support problem caused by an intermediate link is reduced, and the efficiency and compatibility of the model conversion are improved. In addition, according to the method for converting the end-side inference model in the exemplary embodiment of the disclosure, the model inference process can be abstracted into three steps of preprocessing, model inference calculation and post-processing, and for a new conversion task, the corresponding core logic such as preprocessing, model inference calculation and post-processing is only needed to be realized on the basis of the corresponding base class, so that the end-to-end offline inference prediction can be completed, and by using the output hook, the precision alignment between the server-side inference model under the server-side inference frame and the end-side inference model under the end-side inference frame can be better automatically realized, so that the debugging efficiency is further improved.

The conversion framework of the end-side inference model according to an exemplary embodiment of the present disclosure is described in detail below with reference to fig. 2.

Fig. 2 is a conversion framework diagram illustrating an end-side inference model according to an exemplary embodiment of the present disclosure.

Referring to fig. 2, the input part of the transformation framework is illustrated to include three branches, one is a pre-trained model (i.e., a server-side inference model under a server-side inference framework) that is pre-trained and converged at a server side, one is a network code segment (network snippet) of the pre-trained model written in a server-side framework language, and one is a test dataset (testing dataset) for model debugging. The output part of the illustrated conversion frame is an end-side push model under the end-side push frame.

As an example, as shown in fig. 2, the provided Network code segment may be read by a Network Parser (Network server), from which the Network topology of the model and the first operator used are parsed, and the Network construction may be performed by calling an operator API (Application Programming Interface, application program interface) of the TensorRT, thereby automatically parsing and constructing the Network. In addition, the pre-training model contains network parameters of each layer of the network, the network parameters can be read from the pre-training model through a Context Manager (Context Manager) and stored in a memory, and then when the network is analyzed and built, the weight data of the first operator can be obtained from the stored network parameters through the Context Manager. When the network is analyzed, as part of special operators or composite operators may not have operator library support of a TensorRT bottom layer, the plug in interface of TensorRT can be utilized to realize the base class of the self-defined operators, thereby rapidly realizing the special operators or composite operators in the network. When the network is analyzed, in order to facilitate the precision alignment of the models before and after conversion, the output hooks can be configured in a self-defining manner in the network middle layer of the model, so that the model can observe and compare the output data of the network middle layer when reasoning is performed. For the end side inference model obtained through conversion, an Offline predictor (Offline Inferer) and a precision aligner (Precision Aligner) can be used for verification, namely, provided test data are read through the Offline predictor, and then the test data are loaded into the original service end inference model and the end side inference model obtained through conversion respectively, so that the service end inference model and the end side inference model are aligned in precision through the precision aligner based on final output data of the model and intermediate output data of an intermediate layer.

Fig. 3 is a block diagram illustrating a conversion apparatus of an end-side inference model according to an exemplary embodiment of the present disclosure. The conversion device of the end-side inference model according to the exemplary embodiments of the present disclosure may be implemented in a computing device having sufficient operational capability.

Referring to fig. 3, a conversion apparatus 300 of an end-side inference model according to an exemplary embodiment of the present disclosure may include a code acquisition unit 310, a network parsing unit 320, an operator calling unit 330, and a model conversion unit 340.

The code acquisition unit 310 may acquire a network code segment of the server-side inference model. As described above, the server-side inference model is a model in which convergence is trained in advance at the server side.

The network parsing unit 320 may determine the network topology and the first operator of the server-side inference model by parsing the network code segment, and determine whether the first operator is supported by the first preset interface. Here, the first preset interface may be used to invoke an operator corresponding to the first operator in the end side operator library.

When the end side operator library does not support the first operator, the operator calling unit 330 may call the customized second operator through the second preset interface. Here, the second operator may be determined based on a base class of the custom operator. By customizing the base class of the operator, for a new model conversion task, the new class can be realized by only carrying out fine adjustment on the basis of the base class so as to complete the conversion of the model. Further, the base classes may include a data processing base class that may be used to format data input to the model or data to be output by the model, and a model inference base class that may be used to perform inference tasks. However, the disclosure is not limited thereto, and the second operator may also be customized directly according to a complex operator or a complex operator of the server-side inference model. The conversion of the end-side inference model according to the exemplary embodiments of the present disclosure can be semi-automatically achieved through the customization of the base class or the customization of the second operator.

The model conversion unit 340 may convert the server-side inference model into the end-side inference model based on the network topology and the second operator.

According to an exemplary embodiment of the present disclosure, the conversion apparatus 300 may further include a context management unit (not shown). The context management unit can acquire network parameters of the server side reasoning model; weight data for the second operator may then be obtained based on the network parameters.

According to an exemplary embodiment of the present disclosure, the conversion apparatus 300 may further include an offline estimation unit (not shown). The off-line estimating unit can acquire test data; then, the test data can be respectively input into a server side reasoning model and an end side reasoning model so as to align the precision of the server side reasoning model and the end side reasoning model.

According to an exemplary embodiment of the present disclosure, the offline estimation unit may perform accuracy alignment on the server-side inference model and the end-side inference model by comparing final output data of the server-side inference model and final output data of the end-side inference model.

According to an exemplary embodiment of the present disclosure, the conversion apparatus 300 may further include an intermediate configuration unit (not shown). The intermediate configuration unit may configure the output hooks at the network intermediate layers of the server-side inference model and the end-side inference model, respectively. Here, the output hook may be used to obtain intermediate output data of the network intermediate layer.

On the basis, according to the exemplary embodiment of the present disclosure, the offline estimation unit may further obtain intermediate output data of the server-side inference model and intermediate output data of the end-side inference model by using the output hooks; and then, the accuracy alignment can be carried out on the server side reasoning model and the end side reasoning model by comparing the intermediate output data of the server side reasoning model with the intermediate output data of the end side reasoning model and comparing the final output data of the server side reasoning model with the final output data of the end side reasoning model.

The conversion method and the conversion apparatus of the end-side inference model according to the exemplary embodiments of the present disclosure have been described above with reference to fig. 1 to 3.

The various elements in the conversion device of the end-side thrust model shown in fig. 3 may be configured as software, hardware, firmware, or any combination thereof that perform certain functions. For example, each unit may correspond to an application specific integrated circuit, may correspond to a pure software code, or may correspond to a module in which software is combined with hardware. Furthermore, one or more functions implemented by the respective units may also be uniformly performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).

In addition, the conversion method of the end-side inference model described with reference to fig. 1 may be implemented by a program (or instructions) recorded on a computer-readable storage medium. For example, according to an exemplary embodiment of the present disclosure, a computer-readable storage medium storing instructions may be provided, wherein the instructions, when executed by at least one computing device, cause the at least one computing device to perform a conversion method of an end-side inference model according to the present disclosure.

The computer program in the above-described computer-readable storage medium may be run in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc., and it should be noted that the computer program may also be used to perform additional steps other than the above-described steps or to perform more specific processes when the above-described steps are performed, and the contents of these additional steps and further processes have been mentioned in the description of the related method with reference to fig. 1, so that a repetition will not be repeated here.

It should be noted that each unit in the conversion apparatus of the end-side inference model according to the exemplary embodiment of the present disclosure may completely depend on the execution of the computer program to implement the corresponding function, i.e., each unit corresponds to each step in the functional architecture of the computer program, so that the entire system is called through a specific software package (e.g., lib library) to implement the corresponding function.

On the other hand, the respective units shown in fig. 3 may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium, such as a storage medium, so that the processor can perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, exemplary embodiments of the present disclosure may also be implemented as a computing device including a storage component and a processor, the storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform a method of converting an end-side inference model according to exemplary embodiments of the present disclosure.

In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the above set of instructions.

Here, the computing device is not necessarily a single computing device, but may be any device or aggregate of circuits capable of executing the above-described instructions (or instruction set) alone or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with locally or remotely (e.g., via wireless transmission).

In a computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some operations described in the conversion method of the end-side inference model according to the exemplary embodiment of the present disclosure may be implemented in software, some operations may be implemented in hardware, and furthermore, the operations may be implemented in a combination of software and hardware.

The processor may execute instructions or code stored in one of the memory components, where the memory component may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integrated with the processor, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, a storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, network connection, etc., such that the processor is able to read files stored in the storage component.

In addition, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via buses and/or networks.

The conversion method of the end-side inference model according to the exemplary embodiments of the present disclosure may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operate at non-exact boundaries.

Thus, the method of conversion of the end-side inference model described with reference to fig. 1 may be implemented by a system comprising at least one computing device and at least one storage device storing instructions.

According to an exemplary embodiment of the present disclosure, the at least one computing device is a computing device for performing a method of converting an end-side inference model according to an exemplary embodiment of the present disclosure, in which a set of computer-executable instructions is stored, which when executed by the at least one computing device, performs the method of converting an end-side inference model described with reference to fig. 1.

The foregoing description of exemplary embodiments of the present disclosure has been presented only to be understood as illustrative and not exhaustive, and the present disclosure is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. Accordingly, the scope of the present disclosure should be determined by the scope of the claims.

Claims

1. A conversion method of an end-side inference model, wherein the conversion method comprises:

acquiring a network code segment of a server side reasoning model, wherein the server side reasoning model is a model which is trained and converged in advance at a server side;

determining a network topology structure and a first operator of the server side reasoning model by analyzing the network code segment, and judging whether a first preset interface supports the first operator, wherein the first preset interface is used for calling an operator corresponding to the first operator in an end side operator library;

when the terminal side operator library does not support the first operator, calling a self-defined second operator through a second preset interface;

and converting the server side reasoning model into the end side reasoning model based on the network topological structure and the second operator.

2. The conversion method of claim 1, wherein the second operator is determined based on a base class of custom operators.

3. The conversion method of claim 2, wherein the base classes include a data processing base class for formatting data input to the model or data to be output by the model and a model inference base class for performing an inference task.

4. The conversion method of claim 1, wherein the conversion method further comprises:

acquiring network parameters of the server side reasoning model;

and acquiring weight data of the second operator based on the network parameters.

5. The conversion method of claim 1, wherein the conversion method further comprises:

obtaining test data;

and respectively inputting the test data into the server side reasoning model and the end side reasoning model so as to align the accuracy of the server side reasoning model and the end side reasoning model.

6. The conversion method of claim 5, wherein inputting the test data into the server-side inference model and the end-side inference model, respectively, to precision align the server-side inference model and the end-side inference model comprises:

and comparing the final output data of the server side inference model with the final output data of the end side inference model to perform precision alignment on the server side inference model and the end side inference model.

7. The conversion method of claim 5, wherein the conversion method further comprises:

and configuring output hooks at the network middle layers of the server side reasoning model and the end side reasoning model respectively, wherein the output hooks are used for acquiring middle output data of the network middle layers.

8. A conversion device of an end-side inference model, wherein the conversion device comprises:

a code acquisition unit configured to: acquiring a network code segment of a server side reasoning model, wherein the server side reasoning model is a model which is trained and converged in advance at a server side;

a network parsing unit configured to: determining a network topology structure and a first operator of the server side reasoning model by analyzing the network code segment, and judging whether a first preset interface supports the first operator, wherein the first preset interface is used for calling an operator corresponding to the first operator in an end side operator library;

an operator calling unit configured to: when the terminal side operator library does not support the first operator, calling a self-defined second operator through a second preset interface;

a model conversion unit configured to: and converting the server side reasoning model into the end side reasoning model based on the network topological structure and the second operator.

9. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of converting an end-side inference model as claimed in any one of claims 1 to 7.

10. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform a method of converting an end-side inference model as claimed in any one of claims 1 to 7.