CN117093376A

CN117093376A - Intelligent recognition model adaptation method applied to domestic GPU environment

Info

Publication number: CN117093376A
Application number: CN202311352128.XA
Authority: CN
Inventors: 马文胜; 韩丽萍; 李海宁; 何涛; 贺梓然; 戴军
Original assignee: Party Member Education Center Of Organization Department Of Shandong Provincial Committee Of Communist Party Of China
Current assignee: Party Member Education Center Of Organization Department Of Shandong Provincial Committee Of Communist Party Of China
Priority date: 2023-10-19
Filing date: 2023-10-19
Publication date: 2023-11-21

Abstract

The invention relates to the field of artificial intelligence and domestic basic platforms, and discloses an intelligent recognition model adaptation method applied to a domestic GPU environment, which comprises the following steps: s1: detecting a basic environment of hardware equipment; s1: detecting a basic environment of hardware equipment; s2: instruction set business architecture adaptation; s3: deep learning framework adaptation; s4: training, optimizing and reasoning of the intelligent recognition model; s5: the performance stability of the intelligent recognition model is improved; s6: verifying an intelligent auditing application; the invention can fully evaluate the suitability and reliability of the domestic hardware platform in the actual business application requirements, and ensure that the domestic hardware platform can meet the project requirements. Combining the potential development and optimization requirements, comprehensively examining whether all aspects of the capability of the hardware equipment can form good support.

Description

Intelligent recognition model adaptation method applied to domestic GPU environment

Technical Field

The invention belongs to the field of artificial intelligence and domestic basic platforms, and particularly relates to an intelligent recognition model adaptation method applied to domestic GPU environments.

Background

Along with the development of artificial intelligence and big data technology, intelligent identification and auxiliary auditing are increasingly widely applied to various platforms and website resource production and release. The current intelligent recognition technology is mainly realized by using a foreign GPU display card such as Yu Yingwei, and hardware products such as domestic chips and AI accelerator cards are relatively low in performance, poor in compatibility and low in adaptation degree, so that although domestic GPUs such as the Blackless and the like support a mainstream deep learning framework, the intelligent recognition technology lacks an adaptation technical means and butted software ecology with the mainstream AI framework, and the problems of instruction set support and the like need to be further solved.

Disclosure of Invention

The invention aims to adapt an artificial intelligent model developed by a mainstream deep learning framework to a domestic GPU platform, and provides an intelligent recognition model adaptation method applied to a domestic GPU environment. Combining the potential development and optimization requirements, comprehensively examining whether all aspects of the capability of the hardware equipment can form good support.

The invention provides the following technical scheme: an intelligent recognition model adaptation method applied to a domestic GPU environment comprises the following steps:

s1: detecting a basic environment of hardware equipment;

s2: instruction set business architecture adaptation;

s3: deep learning framework adaptation;

s4: training, optimizing and reasoning of the intelligent recognition model;

s5: the performance stability of the intelligent recognition model is improved;

s6: intelligent identification application verification.

The step S1: the specific steps of hardware equipment basic environment detection include:

s1.1: the specific method for adapting the hardware firmware and the driver comprises the following steps:

s1.1.a: installing firmware and a driver, and downloading and installing a high-version driver if the firmware or the driver version is too low in the installation process; if the problem of failure in installing the drive such as card falling occurs, reinstalling the drive program;

s1.1.b: confirm the valid installation of firmware and driver using the terminal command;

s1.2: the specific method for adapting the dependency component library comprises the following steps:

s1.2.a: acquiring a source code;

s1.2.b: installing a cross-compilation tool capable of supporting multiple target architectures;

s1.2.c: configuring compiling options, and managing compiling processes by constructing a system;

s1.2.d: operating and constructing a command compiling dependency library to generate a target architecture compiling dependency library;

s1.2.e: and installing the compiled dependency library, and confirming the effective installation of the dependency library through a terminal command.

The step S2: the specific steps of the instruction set business architecture adaptation include:

s2.1: the instruction set business architecture compatibility test method specifically comprises the following steps:

installing a related analysis processing tool kit aiming at data of a business scene; starting business service, testing, and checking whether the relevant dependence is successfully installed or not through a command; if the installation is successful, passing a compatibility test, and going to step S2.2;

otherwise, performing the adapting step of the related toolkit: 1) Obtaining a source code; 2) Configuring compiling options; 3) Generating a compiling library of the target architecture; 4) S2.2, after the installation test, going to the step;

s2.2: the instruction set service QPS performance test method specifically comprises the following steps:

the same set of business logic codes and algorithm models are respectively used for the original platform and the target platform to deploy business modules; testing the response speed and throughput of the hardware platform according to the algorithm and data in the service; and judging the performance test result of the instruction set service QPS according to the service requirement and the test result.

The step S3: the specific steps of the deep learning framework adaptation include:

s3.1: selecting a deep learning framework supported by a domestic acceleration platform;

s3.2: performing source code compiling, constructing and installing on the mainstream deep learning framework;

s3.3: according to the deep learning framework, official example demo code is run, verifying validity.

The step S4: the specific steps of intelligent recognition model training tuning and reasoning include:

s4.1: installing a dependent environment required by training and reasoning of the intelligent model;

s4.2: aiming at a business scene, preparing a data set, dividing a training set and a testing set, and generating a classification label;

s4.3: respectively implementing algorithm models on the original platform and the target platform, and keeping the model structure consistent with the parameters;

s4.4: reading in training data to start training, and storing a model file after training is completed;

s4.5: loading a trained intelligent model file, converting the model format into a format supported by a domestic platform, encapsulating a model reasoning interface, modifying an original platform preprocessing code and a post-processing code, and carrying out model reasoning and prediction by using the encapsulated interface.

The step S5: the specific steps of improving the performance stability of the intelligent recognition model comprise:

s5.1: the intelligent model performance evaluation method comprises the following steps:

s5.1a: and aiming at the service scene, constructing a test data set required by intelligent model evaluation, and uploading the test data set to different platforms. And under different platforms, using the same set of test data, algorithm model and evaluation standard to perform model identification test on the service data to be identified, and counting the identification result. The evaluation criteria include 4 evaluation indexes of accuracy, recall, F1 and mAP of the intelligent model. The precision and recall rate reflect the accuracy and the comprehensiveness of the prediction of the recognition model, and reflect the comprehensive index through F1; mAP reflects the average accuracy of the recognition model in the multi-category prediction scene. Measuring the performance of the identification model under different platforms through the evaluation standard;

s5.1b: reasoning the same picture for 10 times, observing the reasoning result, and solving the problems that the output performance of the model is unstable and the reasoning effect on the test set is poor;

s5.2: the specific method for improving the model output performance and the reasoning effect comprises the following steps:

s5.2a: and checking the quality of training data, and ensuring the accuracy and the sufficiency of the data. The data quality can be improved by means of data cleaning, data enhancement and the like;

s5.2b: and 3, adjusting the complexity of the model, and avoiding overfitting. The complexity of the model can be controlled by adding regularization items, reducing model parameters and the like;

s5.2c: cross-validation and other techniques are used to evaluate the performance of the model to avoid overfitting. The data set may be divided into a plurality of training sets and verification sets, with the verification sets being used to evaluate the performance of the model;

s5.2d: and parameter adjustment is carried out on the model, and the performance of the model is optimized. Searching the optimal super-parameter combination by means of grid searching, random searching and the like;

s5.2e: and the training data volume is increased, and the generalization capability of the model is improved. The training data volume can be increased by means of data enhancement, data synthesis and the like;

s5.2f: techniques such as transfer learning are used to enhance the generalization ability of the model. A pre-trained model can be used as a basic model, and new tasks can be adapted in a fine adjustment mode and the like;

s5.2g: and model tuning is performed on the test set, so that the generalization capability of the model is improved. The performance of the model can be evaluated using the validation set, and then model tuning is performed on the test set;

s5.3: performance improvement verification: integrating and normalizing tensors in an inference queue by using a bidirectional data binding method;

s5.4: and repeating the steps S5.2 and S5.3, and performing performance improvement verification.

The step S6: the specific steps of intelligent identification application verification include:

s6.1: the method for monitoring the source code safety comprises the following steps:

firstly, aiming at an AI computing acceleration card of a domestic GPU platform, selecting a supported deep learning frame version, and then combining a target service scene to perform security risk detection on source codes of an open source frame to prevent security problems caused by loopholes;

s6.2: the intelligent model development method specifically comprises the following steps:

s6.2.a: packaging the interface of the mainstream deep learning framework to realize a unified development interface;

s6.2.b: data preprocessing, including filtering, cleaning, amplifying and the like;

s6.2.c: constructing a proper deep neural network model by combining service data and requirements;

s6.2.d: initializing model training, and storing the model after training and verification are completed;

s6.3: the intelligent model deployment method specifically comprises the following steps:

s6.3.a: model migration: the trained and verified model is converted into a format of a domestic hardware platform environment, and an offline model is generated;

s6.3.b: model optimization: according to the characteristics of the deployment environment, performing operations such as model pruning, quantification, distillation and the like to reduce the size of the model and improve the performance of the model on specific hardware;

s6.3.c: deployment environment preparation: the method comprises the steps of installing necessary software libraries, configuring hardware equipment, setting network connection and the like;

s6.3.d: model deployment: deploying the optimized model into a target environment, and testing;

s6.3.e: model monitoring and updating: in the process of model deployment and operation, continuously monitoring the performance and functions of the model, and updating and optimizing the model according to the needs;

s6.3.f: reasoning application development: according to the auditing service requirement and the data flow, developing intelligent recognition application, calling an actual sample in the offline model automatic auditing service, transmitting a recognition result back to the service processing flow, and displaying the recognition result to an application interface.

The invention has the following beneficial effects:

the invention can fully evaluate the suitability and reliability of the domestic hardware platform in the actual application requirement, and ensure that the domestic hardware platform can meet the project requirement. Combining the potential development and optimization requirements, comprehensively examining whether all aspects of the capability of the hardware equipment can form good support.

Drawings

FIG. 1 is an inventive schematic;

FIG. 2 is an instruction set business architecture adaptation flow diagram;

fig. 3 is a flow chart of algorithm model adaptation.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples

FIG. 1 is a schematic diagram of an intelligent recognition model adaptation method applied to a domestic GPU environment, and the method specifically comprises the following steps:

s1: detecting a basic environment of hardware equipment;

s2: instruction set business architecture adaptation;

s3: deep learning framework adaptation;

s4: training, optimizing and reasoning of the intelligent recognition model;

s5: the performance stability of the intelligent recognition model is improved;

s6: intelligent identification application verification.

In this example, the test environment is a domestic GPU british smart accelerator card, model number is: MLU370-X8, the non-domestic display card equipment for comparison is the Injeida GPU, and the model is: NVIDIA 3080Ti, the deep learning framework is a hundred degree fly slurry PaddlePaddle framework.

The step S1: hardware device basic environment detection:

s1.1: adapting the hardware firmware and drivers:

firstly, installing firmware and a driver on GPU hardware, and downloading and installing a high-version driver if the firmware or the driver version is too low in the installation process; if the problem of failure in installing the drive such as card falling occurs, reinstalling the drive program;

after the installation is completed, confirming that the firmware and the driver are effectively installed by using a terminal command (cnmon);

s1.2: adaptation dependent component library:

step 1, source code acquisition: the source code of the dependency library is found in the official website or the GitHub repository of the project. Step 2, installing a cross compiling tool: for the current project, cross-compilation tools capable of supporting multiple target architectures, such as GCC (GNU compiler set), are installed. Step 3, configuring compiling options: for the current project, the compilation process, such as autoconf or cmake, is managed by building a system. Configuring compilation tools for project target architecture involves setting environment variables and possibly other flags and options, pointing to cross compilers. And step 4, a building command (such as make) is operated to compile the dependency library, a target architecture compiled dependency library is generated, installation and test are executed, and the installation and test processes are different in specific implementation according to different projects. The general procedure is to install the compiled library to the target system and then execute the official-provided demo program to confirm the validity of the installation.

The step S2: as shown in fig. 2, the instruction set business architecture adaptation specifically includes:

s2.1: instruction set business architecture compatibility test:

s2.2: instruction set service QPS performance test:

the same set of business logic codes and algorithm models are respectively used for the original platform and the target platform to deploy business modules; for algorithms and data in the business, fastAPI packaging interface service is used, and response speed and throughput of the hardware platform are tested; and judging the performance test result of the instruction set service QPS according to the service requirement and the test result.

The step S3: deep learning framework adaptation:

s3.1: mainstream frame adaptation:

step 1, performing source code compiling, constructing and installing on a main stream deep learning frame, wherein a hundred-degree fly-by-paddle Paddle frame is adopted in an example, and the compiling and installing steps comprise:

1) Preparing a correlation dependence:

mm_v0.1_aarch64-kylin10.tar；

cntoolkit-3.1.4-1.ky10.aarch64.rpm；

cnnl-static-1.14.2-1.ky10.aarch64.rpm；

cnnl-1.14.2-1.ky10.aarch64.rpm；

cncl-1.5.2-1.ky10.aarch64.rpm；

2) 2) compiling into a container, the code being as follows:

gh repo clone Cambricon/mlu-ops

cd mlu-ops/bangc-ops；

./build.sh；

copying the header file to the position under the new;

3) Compiling a pallet:

the pallet warehouse corresponding to CTR2.5 is a flyash 2.4 version library;

3.1 Using the rpm package setup prepared in step 1) to update the underlying library, the commands are:

ARG CNTOOLKIT_VERSION=3.1.4-1；

ARG CNNL_VERSION=1.14.2-1；

ARG CNCL_VERSION-1.5.2-1；

ARG MLUOPS_VERSION=0.4.1-1；

3.2 A) enter working environment command is:

cd Paddle；

3.3 Creating a compiled catalog, the commands being:

mkdir build&&cd build；

3.4 Executing a cmake, command:

cmake .. -DPY_VERSION=3.7 -DPYTHON_EXECUTABLE=`which python3` -DWITH_ARM=ON -DWITH_TESTING=OFF -DON_INFER=ON -DWITH_XBYAK=OFF -DCMAKE_CXX_FLAGS=”who-error -w” -DWITH_MLU=ON；

step 2, according to the deep learning framework, running official example demo codes, verifying validity, wherein the verification codes are as follows:

cd Paddle；

pip install build/python/dist/paddlepaddle_mlu-0.0.0-cp37-cp37m-arm；

python；

import paddle；

paddle.utils.run_check()；

the step S4: as shown in FIG. 3, the intelligent recognition model training tuning and reasoning implementation steps are as follows:

s4.1: the dependency environment required for training and reasoning of the installation intelligent model comprises: the chile GPU driver and the dock mirror of the dependent library, the installation chile mlu driver, the pad mlu and the yolox for late chile adaptation;

s4.2: aiming at a service scene, preparing a data set, dividing a training set and a testing set by using a hierarchical dividing method based on the condition that the class of a sample in the data set is unbalanced, ensuring that the class proportions in the training set and the testing set are similar, and generating a classification label;

s4.3: respectively implementing algorithm models on an original platform and a target platform, designating the same loss function, optimizer and evaluation index, and keeping the model structure and parameters consistent;

s4.4: reading in training data, starting training, and storing a model file after the training is finished;

s4.5: loading a trained intelligent model file, converting the model format of the Paddle framework into an ONNX format, converting the model in the ONNX format into a MagicMInd format supported by a domestic platform, encapsulating a model reasoning interface, modifying an original platform preprocessing code and a post-processing code, and carrying out model reasoning and prediction by using the encapsulated interface.

The step S5: and the performance stability of the intelligent recognition model is improved:

s5.1a: and aiming at the service scene, constructing a test data set required by intelligent model evaluation, and uploading the test data set to different platforms. And under different platforms, using the same set of test data, algorithm model and evaluation standard to perform model identification test on the business data to be audited, and counting the identification result. The evaluation criteria include 4 evaluation indexes of accuracy, recall, F1 and mAP of the intelligent model. The precision and recall rate reflect the accuracy and the comprehensiveness of the prediction of the recognition model, and reflect the comprehensive index through F1; mAP reflects the average accuracy of the recognition model in the multi-category prediction scene. Measuring the performance of the identification model under different platforms through the evaluation standard;

s5.2: performance improvement verification: integrating and normalizing tensors in an inference queue by using a bidirectional data binding method;

s5.3: and S5.1b, repeating the step S, and performing performance improvement verification.

The step S6: intelligent identification application verification:

s6.1: and (3) source code safety monitoring:

s6.2: and (3) developing an intelligent model:

step 1, packaging interfaces of a main stream deep learning frame to realize unified development interfaces; step 2, preprocessing data, including filtering, cleaning, amplifying and the like; step 3, combining intelligent identification service data and requirements to construct a deep neural network model; step 4, starting model training, and storing a model weight file after training and verification are completed; step 5, deploying the trained intelligent recognition model, wherein the specific method comprises the following steps of:

1) Model migration: the trained and verified model is converted into a format of a domestic hardware platform environment, and an offline model is generated;

2) Model optimization: according to the characteristics of the deployment environment, performing operations such as model pruning, quantification, distillation and the like to reduce the size of the model and improve the performance of the model on specific hardware;

step 6, preparing a deployment environment, including installing necessary software libraries, configuring hardware equipment, setting network connection and the like; step 7, executing model deployment, deploying the optimized model into a target environment, and testing; step 8, monitoring and updating a model: in the process of model deployment and operation, continuously monitoring the performance and functions of the model, and updating and optimizing the model according to the needs; and 9, reasoning application development, developing intelligent identification application according to the intelligent identification service requirement and the data flow, calling an actual sample in the offline model automatic identification service, transmitting the identification result back to the service processing flow, and displaying the identification result to the application interface.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The intelligent recognition model adaptation method applied to the domestic GPU environment is characterized by comprising the following steps of:

s1: detecting a basic environment of hardware equipment;

s2: instruction set business architecture adaptation;

s3: deep learning framework adaptation;

s4: training, optimizing and reasoning by using an intelligent auditing model;

s5: the performance stability of the intelligent auditing model is improved;

s6: and (5) verifying the intelligent auditing application.

2. The intelligent recognition model adaptation method applied to domestic GPU environment according to claim 1, wherein the steps of S1: the specific steps of hardware equipment basic environment detection include:

s1.1.a: installing firmware and a driver, and downloading and installing a high-version driver if the firmware or the driver version is too low in the installation process; if the problem of drive installation failure caused by card falling occurs, reinstalling the drive program;

s1.2.a: acquiring a source code;

3. The intelligent recognition model adaptation method applied to domestic GPU environment according to claim 1, wherein the step S2 is: the specific steps of the instruction set business architecture adaptation include:

otherwise, carrying out the adaptation step of the related tool kit;

4. The intelligent recognition model adaptation method applied to the domestic GPU environment according to claim 1, wherein the step S3: the specific steps of the deep learning framework adaptation include:

s3.1: the main flow frame adaptation method specifically comprises the following steps:

s3.1.a: performing source code compiling, constructing and installing on the mainstream deep learning framework;

s3.1.b: according to the deep learning framework, official example demo code is run, verifying validity.

5. The intelligent recognition model adaptation method applied to the domestic GPU environment according to claim 1, wherein the step S4 is: the specific steps of intelligent recognition model training tuning and reasoning include:

6. The intelligent recognition model adapting method applied to the domestic GPU environment according to claim 1, wherein the step S5 is: the specific steps of improving the performance stability of the intelligent recognition model comprise:

s5.1a: aiming at a service scene, constructing a test data set required by intelligent model evaluation, and uploading the test data set to different platforms; under different platforms, the same set of test data, algorithm model and evaluation standard are used for carrying out model identification test on service data to be identified, and the identification result is counted; the evaluation standard comprises 4 evaluation indexes of the accuracy rate, recall rate, F1 and mAP of the intelligent model; the precision and recall rate reflect the accuracy and the comprehensiveness of the prediction of the recognition model, and reflect the comprehensive index through F1; mAP reflects the average accuracy of the recognition model in the multi-category prediction scene, and the performance of the recognition model under different platforms is measured through the evaluation standard;

s5.3: and S5.1, repeating the step S, and performing performance improvement verification.

7. The intelligent recognition model adapting method applied to the domestic GPU environment according to claim 1, wherein the step S6 is: the specific steps of intelligent identification application verification include:

s6.2: developing an intelligent model;

s6.3: and (5) intelligent model deployment.

8. The intelligent recognition model adapting method applied to the domestic GPU environment according to claim 6, wherein the intelligent model development comprises the following steps:

s6.2.b: data preprocessing, including data filtering, data cleaning and data augmentation;

s6.2.d: initializing model training, and storing the model after training and verification.

9. The intelligent recognition model adapting method applied to the domestic GPU environment according to claim 6, wherein the intelligent model deployment comprises the following steps:

s4.3.a: model migration: the trained and verified model is converted into a format of a domestic hardware platform environment, and an offline model is generated;

s4.3.B: model optimization: model pruning is carried out, quantification is carried out, and distillation operation is carried out to reduce the size of the model and improve the performance of the model on specific hardware according to the characteristics of a deployment environment;

s4.3.C: deployment environment preparation: installing necessary software libraries, configuring hardware equipment and setting network connection;

s4.3.D: model deployment: deploying the optimized model into a target environment, and testing;

s4.3.E: model monitoring and updating: in the process of model deployment and operation, continuously monitoring the performance and functions of the model, and updating and optimizing the model according to the needs;

s4.3.f: reasoning application development: according to the intelligent identification service requirement and the data flow, developing an intelligent auditing application, calling an actual sample in the offline model automatic auditing service, transmitting an auditing result back to the service processing flow, and displaying the auditing result to an application interface.