CN111783974A

CN111783974A - Model construction and image processing method and device, hardware platform and storage medium

Info

Publication number: CN111783974A
Application number: CN202010809593.1A
Authority: CN
Inventors: 张晓雨; 丁涛; 李辰; 李玮; 廖强
Original assignee: Chengdu Jiahua Chain Cloud Technology Co ltd
Current assignee: Chengdu Jiahua Chain Cloud Technology Co ltd
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2020-10-16

Abstract

The application relates to the technical field of deep learning, and provides a model construction and image processing method, a model construction and image processing device, a hardware platform and a storage medium. The model construction method comprises the following steps: training a neural network model for image processing, wherein the neural network model comprises at least one depth separable convolution module, and the depth separable convolution module comprises a layer-by-layer convolution layer, a point-by-point convolution layer, a batch normalization layer and an activation layer which are sequentially connected; and quantizing the trained neural network model to obtain a quantized neural network model. According to the method, firstly, model parameters are quantized, so that the data volume of the parameters is effectively reduced, and the model is suitable for being deployed in NPU equipment. Secondly, the depth separable convolution module in the method is different from the prior art, a batch normalization layer and an activation layer are not arranged between the layer-by-layer convolution layer and the point-by-point convolution layer, so that the value of the model parameter is distributed in a reasonable range, and the model parameter can be quantized with high precision.

Description

Model construction and image processing method and device, hardware platform and storage medium

Technical Field

The invention relates to the technical field of deep learning, in particular to a model construction and image processing method, a model construction and image processing device, a hardware platform and a storage medium.

Background

The image processing technology is widely applied to various fields, including face recognition and intelligent video analysis in the security field, traffic scene recognition in the traffic field, content-based image retrieval and automatic album classification in the internet field, image recognition in the medical field and the like. In recent years, neural network models are widely adopted in the industry to execute image processing tasks in specific fields, and achieve good effects.

Because the neural network model involves a large amount of operations, in the prior art, a Graphics Processing Unit (GPU) is generally used for accelerating operations, but the GPU has a large volume and high power consumption, and therefore cannot be installed on the edge device. Currently, a Neural network processor (Neural Processing Unit) characterized by low power consumption and high performance has been developed for the practical problems faced in edge computing. However, the existing neural network is still designed for the GPU environment, and is directly migrated to the NPU environment for use, and the effect is not good.

Disclosure of Invention

An object of the embodiments of the present application is to provide a model building method and apparatus, an image processing method and apparatus, a hardware platform, and a storage medium, so as to improve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides a model building method, including: training a neural network model for image processing, wherein the neural network model comprises at least one depth separable convolution module, and the depth separable convolution module comprises a layer-by-layer convolution layer, a point-by-point convolution layer, a batch normalization layer and an activation layer which are sequentially connected; and quantizing the trained neural network model to obtain the quantized neural network model.

In the method, firstly, the model parameters are quantized, so that the data volume of the parameters is effectively reduced, and the model is suitable for being deployed in NPU equipment.

Secondly, the depth separable convolution module in the method is different from that in the prior art, a batch normalization layer and an activation layer are arranged between a layer-by-layer convolution layer and a point-by-point convolution layer, and the inventor finds that after the two layers are arranged, some larger parameters appear in the model, so that the distribution range of the model parameters is too large, and the precision of parameter quantification is reduced. According to the scheme of the application, after the two layers are deleted, the values of the model parameters are distributed in a reasonable range, so that the model parameters can be quantized with high precision, the influence of quantization operation on the performance of the model is favorably reduced, and the model deployed in the NPU equipment can achieve a good effect when an image processing task is executed. And the reasoning speed of the model is accelerated when the image processing task is executed due to the fact that the number of model layers is reduced.

In one implementation manner of the first aspect, after the training of the neural network model for image processing and before the quantifying of the trained neural network model, the method further includes: and performing parameter clipping on the trained neural network model.

In the implementation mode, the neural network model is subjected to parameter cutting, so that sparsity of network parameters is kept, the network scale is reduced, the reasoning speed is increased, and the quantized neural network model is suitable for being deployed in NPU equipment.

In one implementation manner of the first aspect, after the training of the neural network model for image processing and before the quantifying of the trained neural network model, the method further includes: replacing a first activation function adopted by the activation layer in the trained neural network model with a second activation function supported by a target platform, wherein the target platform is a hardware platform loaded with a neural network processor.

For example, during the model training process, the ReLU6 is adopted as the activation function to make the range of activation values more concentrated (not exceeding 6), which is beneficial to accelerate the model convergence, but after the model training is completed, the ReLU6 can be replaced by the ReLU because the ReLU6 is not supported on the existing NPU platform.

In one implementation of the first aspect, after the training of the neural network model for image processing and before the obtaining of the quantized neural network model, the method further comprises: and fusing adjacent convolution layers and batch normalization layers in the trained neural network model.

In the implementation mode, the partial convolution layers are fused, so that the access to the equipment memory in the use process of the model is reduced, and the equipment performance is improved.

In one implementation form of the first aspect, the training a neural network model for image processing includes: training a neural network model for image processing under a first deep learning framework to obtain a first model file storing the trained neural network model; the quantifying the trained neural network model to obtain the quantified neural network model includes: converting the first model file into a platform file under a target platform, and quantizing model parameters stored in the first model file in the conversion process; the first deep learning frame is a frame supported by the target platform, and the target platform is a hardware platform loaded with a neural network processor; or converting the first model file into a second model file under a second deep learning framework, converting the second model file into a platform file under a target platform, and quantizing the model parameters stored in the second model file in the conversion process; the second deep learning frame is a frame supported by the target platform, and the target platform is a hardware platform carrying a neural network processor.

Some NPUs only support model files under certain deep learning frameworks, but these deep learning frameworks are not necessarily adopted when training models, and thus format conversion of model files may be involved. In addition, in a target platform with an NPU, it is likely that a model file is not used directly, but a customized platform file is used, and therefore, the model file needs to be converted into the platform file first, and then the model can be deployed on the target platform.

In a second aspect, an embodiment of the present application provides an image processing method, including: acquiring a target image to be processed; and processing the target image by using the neural network model obtained by the model construction method provided by the first aspect or any one of the possible implementation manners of the first aspect, and outputting a processing result.

Since the neural network model obtained by the model construction method provided by the first aspect or any one of the possible implementation manners of the first aspect is subjected to parameter quantization and has higher quantization precision, the model is used for executing an image processing task with higher efficiency and better performance. The above-described image processing method may be, but is not limited to being, performed by the NPU.

In a third aspect, an embodiment of the present application provides a model building apparatus, including: the system comprises a training unit, a data processing unit and a data processing unit, wherein the training unit is used for training a neural network model for image processing, the neural network model comprises at least one depth separable convolution module, and the depth separable convolution module comprises a layer-by-layer convolution layer, a point-by-point convolution layer, a batch normalization layer and an activation layer which are sequentially connected; and the quantization unit is used for quantizing the trained neural network model to obtain the quantized neural network model.

In a fourth aspect, an embodiment of the present application provides an image processing apparatus, including: the image acquisition unit is used for acquiring a target image to be processed; an image processing unit, configured to process the target image by using the neural network model obtained by the model construction method provided in the first aspect or any one of the possible implementation manners of the first aspect, and output a processing result.

In a fifth aspect, an embodiment of the present application provides a hardware platform, including: the device comprises a central processing unit, a neural network processor and a memory; wherein the neural network processor is configured to: reading a target image to be processed and a platform file provided by the last possible implementation manner of the first aspect from the memory under the control of the central processing unit, processing the target image by using a neural network model stored in the platform file, and writing an obtained processing result into the memory.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where a neural network model obtained by using the model construction method provided in the first aspect or any one of the possible implementation manners of the first aspect is stored on the computer-readable storage medium.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 shows a possible flow of a model construction method provided by an embodiment of the present application;

fig. 2(a) to 2(C) show the structures of a convolution kernel of a normal convolution, a convolution kernel of a layer-by-layer convolution, and a convolution kernel of a point-by-point convolution;

FIG. 3 illustrates the structure of a depth separable convolution module in an embodiment of the present application;

FIG. 4 illustrates the principle of the network convergence process in an embodiment of the present application;

FIG. 5 shows a possible flow of an image processing method provided by an embodiment of the present application;

FIG. 6 illustrates one possible structure of a hardware platform provided by an embodiment of the present application;

FIG. 7 illustrates one possible overall business process;

FIG. 8 illustrates a specific flow of the model build phase in the overall business process;

FIG. 9 illustrates a detailed flow of a business development phase in an overall business process;

FIG. 10 is a diagram illustrating a possible structure of a model building apparatus according to an embodiment of the present disclosure;

fig. 11 shows a possible structure of an image processing apparatus provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "first," "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily being construed as indicating or implying any actual such relationship or order between such entities or actions.

Fig. 1 shows a possible flow of a model building method provided in an embodiment of the present application. The method may be performed by, but is not limited to, an electronic device, such as a PC, a server, a cluster of servers, a virtual machine, and the like. Referring to fig. 1, the method includes:

step S110: a neural network model for image processing is trained.

The image processing in step S110 includes, but is not limited to, image classification, object detection, object recognition, image segmentation, and other tasks.

The neural network model in step S110 includes at least one depth separable convolution module, and other structures are not limited. The advantages of the depth separable convolution are briefly introduced below:

the convolution kernel of the normal convolution is shown in fig. 2(a), where DK is the convolution kernel size, M is the number of input channels, N is the number of output channels, and the size of the input feature map is assumed to be DF.

The number of multiplications required for the convolution operation with the standard convolution kernel is:

DK·DK·M·N·DF·DF

the number of parameters (referred to as weight parameters) to be stored is:

DK·DK·M·N

depth Separable Convolution (Depthwise Separable Convolition) includes two Convolution operations that are performed in succession, namely, layer-by-layer Convolution (Depthwise Convolition) and point-by-point Convolution (PointwiseConvolition).

The convolution kernel for layer-by-layer convolution is shown in fig. 2(B), where DK is the convolution kernel size, M is the number of input channels, and the size of the input feature map is assumed to be DF.

The number of multiplications required for the convolution operation of the layer-by-layer convolution kernel is as follows:

DK·DK·M·DF·DF

the number of parameters to be stored is:

DK·DK·M

the convolution kernel of the point-by-point convolution is shown in fig. 2(C), where M is the number of input channels and N is the number of output channels, and the size of the input feature map is assumed to be DF.

The number of multiplication operations required for the convolution operation of the convolution kernel of the point-by-point convolution is as follows:

M·N·DF·DF

the number of parameters to be stored is:

M·N

thus, replacing the normal convolution with a deep separable convolution reduces the amount of computation required for the convolution operation:

the amount of memory required for the parameters decreases:

for example, typically when the convolution kernel is 3x3, the computational and memory amount of the depth separable convolution drops to 1/9 to 1/8 of the normal convolution.

In view of the above-mentioned advantages of depth-separable convolution, a model based on depth-separable convolution may be employed in some image processing tasks, for example, a MobilenetV2 network may be employed in an image classification task, in which a depth-separable convolution structure is included.

However, in the existing model, depth-separable convolution is generally implemented in the manner shown on the left side of fig. 3, that is, each depth-separable convolution module includes a layer-by-layer convolution layer and a point-by-point convolution layer, and a Batch Normalization layer (BN) and an activation layer (activation functions such as ReLU, ReLU6, ELU, and SELU) are added after the two layers. The batch normalization layer mainly has the functions of accelerating the training speed, preventing overfitting and the like, and the activation layer mainly has the function of introducing nonlinear factors into the model.

However, the inventor has found, through long-term research, that the arrangement of the batch normalization layer and the activation layer between the layer-by-layer convolutional layer and the point-by-point convolutional layer may cause some parameters with larger values to appear in the trained model, thereby causing the distribution range of the model parameters to be too large (i.e., some parameters have small values and some parameters have large values), and further causing the precision of parameter quantization to be reduced (regarding quantization, see step S120).

Thus, in the solution of the present application, the depth separable convolution module adopts the structure on the right side of fig. 3: the layer-by-layer convolution layer, the point-by-point convolution layer, the batch normalization layer and the activation layer are sequentially connected, namely the batch normalization layer and the activation layer after the layer-by-layer convolution layer are deleted. After the two layers are deleted, the values of the parameters in the trained model are distributed in a reasonable range, so that the model parameters can be quantized with high precision. In addition, because the layer-by-layer convolution layer and the point-by-point convolution layer are taken as a whole and are originally substitutes for the common convolution layer, the arrangement of the batch normalization layer and the activation layer between the two layers is not significant, and only the batch normalization layer and the activation layer are required to be arranged after the point-by-point convolution layer. For this reason, the inventors have conducted a number of experiments, and the following description will be given by taking the results of only one of the experiments:

the experiment used the MobilenetV2 network as the original neural network model, and modified the deep separable convolution module in the MobilenetV2 network (in the manner mentioned above) to obtain a new neural network model. The two neural network models are used for executing a mask wearing condition detection task (judging whether a pedestrian wears a mask or not), and experimental data are obtained as follows:

input image size: 128x128

Test sample: 6674 pieces

noBN：94％----BN：94％

noBN-0.5：94％----BN-0.5：94％

noBN-0.25：93％-----BN-0.25：93％

Wherein, noBN represents a new neural network model, BN represents an original neural network model, and 0.5 and 0.25 are parameters of the mobilenetV2 network. The experimental data can prove that the performance of the neural network model is not reduced after the batch normalization layer and the activation layer after the layer-by-layer convolution layer are deleted.

Step S120: and quantizing the trained neural network model to obtain a quantized neural network model.

The number of quantization bits for the model is not limited, and may be, for example, 8 bits, 16 bits, 32 bits, etc., depending on the actual requirements. The quantitative operation effectively reduces the data volume of the parameters, so that the storage space required by the storage model is smaller, and meanwhile, the calculation amount of the model for reasoning operation is smaller, so that the quantized neural network model is very suitable for being deployed in NPU equipment. The term NPU device as used herein generally refers to devices that employ an NPU, such as some edge devices, the hardware platform 300 shown in fig. 6, and so forth. In addition, the special design of the depth separable convolution module in step S110 is beneficial to improving the quantization precision of the model, i.e. the influence of quantization on the model performance is offset to a certain extent, so that the neural network model deployed in the NPU device can obtain a better effect when executing an image processing task. Moreover, compared with the prior art, the method simplifies the structure of the depth separable convolution module, so that the reasoning speed of the trained neural network model is improved when the image processing task is executed.

It should be noted that the method does not limit the use of the neural network model, that is, the quantized model is not necessarily deployed in the NPU device, and may be deployed in other environments, and similar beneficial effects can be obtained.

In some implementations, after the execution of step S110 is completed (after the training of the model is completed), and before the execution of step S120, the trained neural network model can be further subjected to parameter clipping. The parameter clipping refers to performing a clipping operation on parameters (such as weight parameters and the like) in the model, and deleting some convolution kernel channels with small contribution so as to maintain the sparsity of the weights. The neural network model after being cut is remarkably reduced in scale, and the inference speed is remarkably improved, so that the neural network model is suitable for being deployed in NPU equipment. It can be understood that after the model is cut, the model parameters can be further finely adjusted, and the phenomenon that the performance of the model is obviously reduced due to cutting is avoided.

In some implementations, the activation layer of the neural network model employs the first activation function when step S110 is executed, and the first activation function may be replaced with the second activation function after the execution of step S110 is completed (after the model is trained), and before step S120 is executed. The second activation function is an activation function supported by the target platform, and the target platform is a hardware platform loaded with an NPU, and as for a possible structure of the hardware platform, reference may be made to the following description about fig. 6.

For example, suppose the NPU adopted by the target platform is shang 310(Ascend310) from Hua corporation, and shang 310 is an efficient, flexible, programmable AI processor. Based on its typical configuration, the performance reaches 16TOPS at eight-bit integer precision (INT8) and 8TFLOPS at 16-bit floating point (FP16), while its power consumption is only 8W. The ReLU6 may be used as an activation function during model training to make the range of activation values more concentrated (maximum value not exceeding 6) to help speed up model convergence, but after model training is completed, the ReLU6 needs to be replaced with the ReLU 3578 because the ReLU6 is not supported by the anagen 310 as an activation function in the neural network model. The key point is that in the training phase of the model, different activation functions can be used than in the inference phase.

In some implementations, after the execution of step S110 is completed (after the model training is completed), and before the execution of step S120, adjacent convolutional layers and batch normalization layers in the neural network model may also be fused, as shown in fig. 4, the left side is the convolutional layer before fusion (denoted as convolutional layer 1), and the right side is the convolutional layer after fusion (denoted as convolutional layer 2). After the convolution layers are fused, the access to the equipment memory in the model reasoning process is reduced, and the equipment performance is improved. Note that the convolutional layer to be fused here may be a normal convolutional layer in the model, or may be a point-by-point convolutional layer in the depth separable module (because a batch normalization layer is not included after the layer-by-layer convolutional layer).

Fig. 5 shows a possible flow of an image processing method provided by an embodiment of the present application. The method may be performed by, but is not limited to, an NPU. Referring to fig. 5, the method includes:

step S210: and acquiring a target image to be processed.

The target image refers to an image required by a certain image processing task, and the source of the target image is not limited, for example, the target image can be acquired in real time, downloaded from a network, synthesized by an algorithm, and the like.

Step S220: the neural network model obtained by the model construction method provided by the embodiment of the application (including the method in fig. 1 and possible implementation manners) is used for processing the target image, and a processing result is output.

The neural network model in step S220 is a quantized neural network model, and the target image is input to the neural network model, and the model outputs the processing result. For example, if the model is used to perform an image classification task, the output may be the probability that the model predicts that the target image belongs to each class. Because the neural network model obtained by the model construction method provided by the embodiment of the application realizes high-precision parameter quantification (and possibly performs cutting and convolution layer fusion), the model is used for executing the image processing task, so that the efficiency is high, the performance is good, and the memory of equipment can be saved.

Before the following description, a possible structure of the hardware platform 300 provided in the embodiments of the present application is described. Referring to fig. 6, hardware platform 300 includes: a Central Processing Unit (CPU) 310, a memory 320, and a neural network processor 330, which are interconnected and in communication with each other via a communication bus 340 and/or other type of connection mechanism (not shown).

The Memory 320 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like. The central processor 310, the neural network processor 330, and possibly other components, may access the memory 320, read and/or write data therein, and read computer program instructions therein to implement the corresponding functions, as will be described in more detail below.

The central processor 310 includes one or more (only one is shown) which may be an integrated circuit chip having data processing capability, but is mainly used for performing operations other than model reasoning, and the central processor 310 may control other components (such as the neural network processor 330) in the hardware platform 300.

The neural network processor 330 includes one or more (only one is shown) which may be an integrated circuit chip having data processing capability, and is mainly used for performing model-based reasoning operations. For example, the neural network processor 330 may employ a liter 310 processor.

It will be appreciated that the configuration shown in FIG. 6 is merely illustrative, and that hardware platform 300 may include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6. In addition, the hardware platform 300 may be a stand-alone product or may be a part of a device (e.g., an edge device).

Fig. 7 shows the overall business flow of completing a certain image processing task, and for the sake of simplicity, the case where the task is to detect whether a pedestrian wears a mask is taken as an example. Referring to fig. 7, the overall business process includes two phases:

model building stage 40: a model is constructed and deployed to a target platform, such as a hardware platform 300 that employs the Shangtang 310 as an NPU (such a hardware platform 300 is used as an example in the following description).

Service development phase 50: and acquiring the image of the pedestrian to be detected, completing detection on the hardware platform 300 and outputting the result.

Further, with continued reference to fig. 8, the model construction phase 40 may be divided into 3 sub-phases, respectively:

model training 410: and training the neural network model.

The model result adopts the structure mentioned in step S110, and may be reconstructed from a MobilenetV2 network (reconstructing the structure of the depth separable convolution module), for example. The training data set and the training parameters are determined according to actual requirements, such as:

training data set:

input picture size 128x128, mean (RGB three channel): 0.406, 0.456, 0.485, variance (RGB three channel): 0.225, 0.225, 0.225, category label: 0-background (no pedestrian in the picture), 1-no mask, 2-mask, 3-unknown.

Training parameters:

batch _ size 64// batch size

epoch 50// iteration round

lr 0.01 × 0.8// learning rate

optimum, SGD + Momentum (Momentum is 0.9, weight _ decade is 5e-4)// optimizer

In some implementations, model training can be accomplished using some existing deep learning framework, not called the first deep learning framework, such frameworks including PyTorch, Caffe, TensorFlow, and the like. After the training is completed, a model file storing the trained model (including parameters, structure, etc. of the model) is obtained, which is referred to as a first model file. After the first model file is obtained, the model may be parametrically cropped, as described above.

Model conversion 420: a conversion from the first model file to the second model file is effected.

The hardware platform 300 to be deployed by the neural network model only supports the obtained second model file under the second deep learning framework, and the second deep learning framework and the first deep learning framework are not necessarily the same, so that format conversion of the model file may be involved. For example, the hardware platform 300 only supports model files obtained under a Caffe or TensorFlow framework, and if the first deep learning framework is PyTorch and the second deep learning framework is Caffe, the pth file needs to be converted into a prototxt and a cafemodel file under the Caffe framework. Of course, if the first deep learning frame is also Caffe, no conversion is necessary. This sub-phase of model conversion 420 is therefore optional.

The activation function may also be modified incidentally when the conversion of the model file is performed, as described earlier. Note, however, that the activation function in the model file may be modified even without converting the format of the model file, which are independent steps from each other.

Model deployment 430: a conversion is effected from the model file (which may be the first model file or the second model file, depending on whether a model conversion is performed) to a platform file under the target platform.

The model file is probably not directly used in the target platform, but the platform file with a custom format is used, so that the model can be deployed on the target platform only after the model file is converted into the platform file. For example, the hardware platform 300 must use the om file, so the transformation tool provided by the Shang 310 can be used to transform the second model file under the Caffe framework into the om file. In the process of converting the platform file, the quantization of the model parameters may be completed, and before the quantization, the convolution layer fusion may be performed, as described above.

The so-called model deployment is to transfer the platform file to the memory 320 of the hardware platform 300 for saving, so that the NPU can load the neural network model from the memory 320 when it needs to perform inference operation. The manner of transmitting data is not limited, and may be, for example, transmission via a data line, transmission via a network, or the like, depending on the configuration of hardware platform 300.

With continued reference to fig. 9, the business development phase 50 may be divided into 5 sub-phases, respectively:

data acquisition 510: the method includes acquiring an image of a target to be detected from a data source, for example, shooting pedestrians passing through an intersection by using a camera, wherein the scene of the intersection is the data source. The target image is finally transferred and stored into the memory 320 of the hardware platform 300.

Data preprocessing 520: the central processor 310 of the hardware platform 300 reads the target image from the memory 320 and performs a preprocessing, for example, one or more of size transformation, normalization, and noise reduction, and the preprocessed data may be written into the memory 320. Of course, if the target image inherently satisfies the processing requirements, the data preprocessing may not be performed.

Model reasoning 530: under the control of the central processor 310, the neural network processor 330 reads the target image to be processed (possibly preprocessed) and the platform file (stored in the model deployment 430) from the memory 320, processes the target image using the neural network model stored in the platform file, and writes the obtained processing result into the memory 320. The result of the processing is the probability that the target image predicted by the model belongs to each class.

According to the foregoing, since the neural network model is obtained by using the model construction method provided in the embodiment of the present application, the method for performing inference operation has the advantages of high efficiency, small memory occupation, high accuracy, and the like.

Results post-processing 540: the central processor 310 reads the processing result from the memory 320 and performs appropriate post-processing according to the business requirement, for example, selecting a largest one of the probability of each category as the category that the neural network model finally predicts, assuming that the determined final category label is 1.

Final result output 550: under the control of the central processing unit 310, the final result is output to the user, for example, the label 1 of the final category obtained in the previous step is converted into the text "no mask on mask" and then presented to the user.

It is to be understood that, although the image classification task is taken as an example in fig. 7 to 9, the business flow is similar for other image processing tasks.

Fig. 10 shows a functional block diagram of a model building apparatus 600 provided in an embodiment of the present application. Referring to fig. 10, the model building apparatus 600 includes:

a training unit 610, configured to train a neural network model for image processing, where the neural network model includes at least one depth separable convolution module, and the depth separable convolution module includes a layer-by-layer convolution layer, a point-by-point convolution layer, a batch normalization layer, and an activation layer, which are sequentially connected;

a quantizing unit 620, configured to quantize the trained neural network model, so as to obtain the quantized neural network model.

In one implementation of the model building apparatus 600, the apparatus further comprises: and the model clipping unit is used for performing parameter clipping on the trained neural network model after the training unit 610 trains the neural network model for image processing and before the quantization unit 620 quantizes the trained neural network model.

In one implementation of the model building apparatus 600, the apparatus further comprises: an activation function replacing unit, configured to replace, after the training unit 610 trains the neural network model for image processing and before the quantizing unit 620 quantizes the trained neural network model, a first activation function adopted by the activation layer in the trained neural network model with a second activation function supported by a target platform, where the target platform is a hardware platform loaded with a neural network processor.

In one implementation of the model building apparatus 600, the apparatus further comprises: and a network fusion unit, configured to fuse adjacent convolution layers and batch normalization layers in the trained neural network model after the training unit 610 trains the neural network model for image processing and before the quantization unit 620 obtains the quantized neural network model.

In one implementation of model building apparatus 600, training unit 610 trains a neural network model for image processing, including: training a neural network model for image processing under a first deep learning framework to obtain a first model file storing the trained neural network model;

the quantization unit 620 quantizes the trained neural network model to obtain the quantized neural network model, including: converting the first model file into a platform file under a target platform, and quantizing model parameters stored in the first model file in the conversion process; the first deep learning frame is a frame supported by the target platform, and the target platform is a hardware platform loaded with a neural network processor; or converting the first model file into a second model file under a second deep learning framework, converting the second model file into a platform file under a target platform, and quantizing the model parameters stored in the second model file in the conversion process; the second deep learning frame is a frame supported by the target platform, and the target platform is a hardware platform carrying a neural network processor.

The model building apparatus 600 provided in the embodiment of the present application, the implementation principle and the generated technical effects thereof have been introduced in the foregoing method embodiments, and for the sake of brief description, portions of the apparatus embodiments that are not mentioned may refer to corresponding contents in the method embodiments.

Fig. 11 shows a functional block diagram of an image processing apparatus 700 provided in an embodiment of the present application. Referring to fig. 11, an image processing apparatus 700 includes:

an image acquisition unit 710 for acquiring a target image to be processed;

the image processing unit 720 is configured to process the target image by using the neural network model obtained by the model construction method provided in the embodiment of the present application, and output a processing result.

The image processing apparatus 700 according to the embodiment of the present application, which has been described in the foregoing method embodiments, can refer to the corresponding contents in the method embodiments for the sake of brief description, and the portions of the apparatus embodiments that are not mentioned in the foregoing description.

The embodiment of the application also provides a computer-readable storage medium, wherein the computer-readable storage medium stores the neural network model obtained by using the model construction method provided by the embodiment of the application. For example, the computer readable storage medium may be implemented as the memory 320 in the hardware platform 300 of FIG. 6, but may also be a separate memory.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of model construction, comprising:

training a neural network model for image processing, wherein the neural network model comprises at least one depth separable convolution module, and the depth separable convolution module comprises a layer-by-layer convolution layer, a point-by-point convolution layer, a batch normalization layer and an activation layer which are sequentially connected;

and quantizing the trained neural network model to obtain the quantized neural network model.

2. The model building method of claim 1, wherein after the training of the neural network model for image processing and before the quantifying of the trained neural network model, the method further comprises:

and performing parameter clipping on the trained neural network model.

3. The model building method of claim 1, wherein after the training of the neural network model for image processing and before the quantifying of the trained neural network model, the method further comprises:

replacing a first activation function adopted by the activation layer in the trained neural network model with a second activation function supported by a target platform, wherein the target platform is a hardware platform loaded with a neural network processor.

4. The model building method of claim 1, wherein after the training of the neural network model for image processing and before the obtaining of the quantified neural network model, the method further comprises:

and fusing adjacent convolution layers and batch normalization layers in the trained neural network model.

5. The model building method according to any one of claims 1-4, wherein the training of the neural network model for image processing comprises:

training a neural network model for image processing under a first deep learning framework to obtain a first model file storing the trained neural network model;

the quantifying the trained neural network model to obtain the quantified neural network model includes:

converting the first model file into a platform file under a target platform, and quantizing model parameters stored in the first model file in the conversion process; the first deep learning frame is a frame supported by the target platform, and the target platform is a hardware platform loaded with a neural network processor;

alternatively, the first and second electrodes may be,

converting the first model file into a second model file under a second deep learning framework, converting the second model file into a platform file under a target platform, and quantizing model parameters stored in the second model file in the conversion process; the second deep learning frame is a frame supported by the target platform, and the target platform is a hardware platform carrying a neural network processor.

6. An image processing method, comprising:

acquiring a target image to be processed;

processing the target image by using the neural network model obtained by the model construction method according to any one of claims 1 to 5, and outputting a processing result.

7. A model building apparatus, comprising:

the system comprises a training unit, a data processing unit and a data processing unit, wherein the training unit is used for training a neural network model for image processing, the neural network model comprises at least one depth separable convolution module, and the depth separable convolution module comprises a layer-by-layer convolution layer, a point-by-point convolution layer, a batch normalization layer and an activation layer which are sequentially connected;

and the quantization unit is used for quantizing the trained neural network model to obtain the quantized neural network model.

8. An image processing apparatus characterized by comprising:

the image acquisition unit is used for acquiring a target image to be processed;

an image processing unit, configured to process the target image by using the neural network model obtained by the model construction method according to any one of claims 1 to 5, and output a processing result.

9. A hardware platform, comprising: the device comprises a central processing unit, a neural network processor and a memory;

wherein the neural network processor is configured to: reading a target image to be processed and the platform file according to claim 5 from the memory under the control of the central processing unit, processing the target image by using the neural network model stored in the platform file, and writing the obtained processing result into the memory.

10. A computer-readable storage medium having stored thereon a neural network model obtained by the model construction method according to any one of claims 1 to 5.