CN115272706A

CN115272706A - Image processing method and device, computer equipment and storage medium

Info

Publication number: CN115272706A
Application number: CN202210903614.5A
Authority: CN
Inventors: 康洋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2022-11-01

Abstract

The embodiment of the application discloses an image processing method, an image processing device, computer equipment and a storage medium, which can be applied to scenes such as computer vision, cloud technology, intelligent traffic, auxiliary driving and the like. The method comprises the following steps: acquiring an image to be processed; calling an image processing model to perform feature extraction on an image to be processed to obtain a first feature map of the image to be processed, wherein the image processing model is obtained based on a quantization feature map and quantization parameter training of a sample image, and the quantization feature map and the quantization parameter are obtained by performing quantization processing on a second feature map of the sample image; and calling an image processing model to carry out convolution processing on the first feature graph to obtain feature information of key points in the image to be processed. By adopting the embodiment of the application, the image to be processed is processed by the image processing model obtained by training based on the quantization characteristic diagram of the sample image and the quantization parameter, so that the occupation amount of the characteristic diagram on the video memory can be reduced, and the cost of hardware required by image processing is reduced.

Description

Image processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

For traditional tasks such as classification, detection and segmentation, the more layers and the more channels of the convolutional neural network under the same structure, the better the result (e.g., precision) of task processing is. However, the deeper the number of layers and the greater the number of channels, the more the network training memory increases. Due to the limited video memory of the graphics card being trained, many networks cannot be trained using a larger batch size (batch _ size, representing the number of samples taken per training). For example, in the segmentation task, not only the network is deep and wide, but also the input resolution of the image is very high, and many times it trains a card to use only a few batch _ sizes. While training of a small batch _ size tends to cause instability of the batch normalization (BatchNorize, batchNorm, or BN) layer, resulting in a decrease in model accuracy. Although synchronous Batch standardization (syncbatch norm) is subsequently proposed, on the one hand, the method uses more graphics cards to occupy more resources during training. On the other hand, in the neural network search task, many sub-network structures are often required to be trained at one time, and a large amount of video memory is consumed in the process of large network super-network training.

In the training process of the convolutional neural network, the occupation of the video memory mainly comes from two parts: a small part is the model parameters, and the other most part is the feature map generated by each layer of the network. The reason why the feature map occupies the video memory is that the network must cache the feature map to ensure that the feature map does not need to be recalculated and activated during backward calculation. Currently, the training memory of the convolution feature map can only be reduced to 1/2 by training the convolution neural network using the half precision floating point (float 16, fp 16) technique of mixed precision (Amp). Therefore, how to further reduce the occupation of the convolution characteristic diagram on the display memory so as to reduce the cost of image processing becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides an image processing method and device, computer equipment and a storage medium, wherein images to be processed are processed through an image processing model obtained through training based on a quantization feature map of a sample image and a quantization parameter, the occupation amount of the feature map on a video memory can be reduced, and therefore the cost of hardware required in image processing is reduced.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes:

acquiring an image to be processed;

calling an image processing model to perform feature extraction on an image to be processed to obtain a first feature map of the image to be processed, wherein the image processing model is obtained based on a quantization feature map and quantization parameter training of a sample image, and the quantization feature map and the quantization parameter are obtained by performing quantization processing on a second feature map of the sample image;

and calling an image processing model to carry out convolution processing on the first feature graph to obtain feature information of key points in the image to be processed.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

an acquisition unit for acquiring an image to be processed;

the processing unit is used for calling an image processing model to perform feature extraction on the image to be processed to obtain a first feature map of the image to be processed, the image processing model is obtained based on a quantization feature map and quantization parameter training of a sample image, and the quantization feature map and the quantization parameter are obtained by performing quantization processing on a second feature map of the sample image;

and the processing unit is also used for calling the image processing model to carry out convolution processing on the first feature map so as to obtain the feature information of the key points in the image to be processed.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor, a communication interface, and a memory, where the processor, the communication interface, and the memory are connected to each other, where the memory stores a computer program, and the processor is configured to call the computer program to execute an image processing method provided in an embodiment of the present application.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the image processing method provided by the present application.

In a fifth aspect, embodiments of the present application provide a computer program product or a computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method provided by the embodiment of the application.

In the embodiment of the application, computer equipment acquires an image to be processed; calling an image processing model to perform feature extraction on an image to be processed to obtain a first feature map of the image to be processed, wherein the image processing model is obtained based on a quantization feature map and quantization parameter training of a sample image, and the quantization feature map and the quantization parameter are obtained by performing quantization processing on a second feature map of the sample image; and calling an image processing model to carry out convolution processing on the first feature graph to obtain feature information of key points in the image to be processed. By adopting the embodiment of the application, the image to be processed is processed by the image processing model obtained by training based on the quantization characteristic diagram of the sample image and the quantization parameter, so that the occupation amount of the characteristic diagram on the video memory can be reduced, and the cost of hardware required by image processing is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a process diagram of an image processing scheme provided by an embodiment of the present application;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a training process of an image processing model according to an embodiment of the present disclosure;

FIG. 4 is a process diagram illustrating the forward operation of convolution according to an embodiment of the present application;

FIG. 5 is a process diagram illustrating a backward operation of convolution according to an embodiment of the present application;

fig. 6 is a schematic diagram of a virtual face key point location provided in an embodiment of the present application;

fig. 7 is a schematic diagram of a direct regression scheme and a heat map regression scheme for predicting key points of a virtual face according to an embodiment of the present application;

fig. 8 is a schematic diagram of an image processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

In order to facilitate understanding of the embodiments of the present application, some concepts related to the embodiments of the present application will be first explained, and the explanation of the concepts includes, but is not limited to, the following.

1. Characteristic diagram (feature map)

In the convolutional neural network, the data in each convolutional layer exists in three dimensions, which can be regarded as a plurality of two-dimensional pictures stacked together, wherein each two-dimensional picture can be referred to as a feature map. If the input layer inputs a gray picture, only one feature map is output; if a color picture is input at the input layer, the input picture is generally 3 feature maps (red, green and blue). A plurality of convolution kernels are arranged between layers, and the feature map of the next layer is generated by the convolution of the previous layer and each feature map with each convolution kernel.

In order to facilitate understanding of the embodiments of the present application, the image processing method of the present application is described below.

In order to reduce the occupation amount of a feature map on a video memory and reduce the cost of hardware required in image processing, the embodiment of the application provides an image processing scheme. Referring to fig. 1, fig. 1 is a schematic process diagram of an image processing scheme provided in an embodiment of the present application, and a general implementation process of the image processing scheme provided in the embodiment of the present application is described below with reference to fig. 1. As shown in FIG. 1, the solid lines connect the process of training the model and the dashed lines connect the process of using the model. First, the computer apparatus 101 acquires the sample image 102 and corresponding annotation information, which includes reference feature information of each key point in the sample image 102. Secondly, calling the convolutional neural network 1031 to perform feature extraction on the sample image to obtain a second feature map of the sample image, and performing quantization processing 1032 on the second feature map based on a preset number of bits to obtain a quantization feature map and quantization parameters of the sample image. Then, the convolutional neural network is trained based on the quantization feature map, the quantization parameter and the labeling information, so as to obtain an image processing model 103. Then, the computer device 101 acquires the image 104 to be processed, and invokes the image processing model 103 to perform feature extraction on the image 104 to be processed, so as to obtain a first feature map of the image to be processed. Finally, calling an image processing model 103 to carry out convolution processing on the first feature map to obtain feature information 105 of key points in the image to be processed.

Practice shows that the image processing scheme provided by the embodiment of the application can have the following beneficial effects: (1) the image to be processed is processed by the image processing model obtained by training based on the quantization characteristic diagram of the sample image and the quantization parameter, so that the occupation amount of the characteristic diagram on the video memory can be reduced, and the cost of hardware required by image processing is reduced. (2) The method can be widely applied to various different scenes, for example, the method can be applied to projects or products such as video image editing Application programs (apps) in computer equipment, short video apps or video calls and the like which need to process images through a neural network, and can also be applied to projects or products such as voice, text and the like which need to train a convolutional neural network in other fields.

It should be noted that: in a specific implementation, the above scheme can be executed by a computer device, and the computer device can be a terminal or a server; among others, the terminals mentioned herein may include but are not limited to: the system comprises a smart phone, a tablet computer, a notebook computer, a desktop computer, intelligent voice interaction equipment, an intelligent watch, an intelligent household appliance, an intelligent vehicle-mounted terminal and the like; various clients (APPs) can be operated in the terminal, such as a video playing client, a social client, a browser client, an information flow client, an education client, and the like. The server mentioned here may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data, and artificial intelligence platform, and the like. Moreover, the computer device mentioned in the embodiment of the present application may be located outside the blockchain network, or may be located inside the blockchain network, which is not limited to this; the blockchain network is a network formed by a peer-to-peer network (P2P network) and a blockchain, and the blockchain is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, a consensus mechanism, and an encryption algorithm, and is essentially a decentralized database, which is a string of data blocks (or called blocks) associated by using a cryptographic method.

The image processing method provided by the embodiment of the application can be realized based on an Artificial Intelligence (AI) technology. Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject, and relates to a wide range of fields, and the technologies in the hardware level and the technology AI basic technology in the software level generally comprise technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, electromechanical integration and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The image processing method provided by the embodiment of the application mainly relates to a Computer Vision technology (CV) in an AI technology. Computer vision is a science for researching how to make a machine "see", and in particular, it refers to that a camera and a computer are used to replace human eyes to make machine vision of identifying, following and measuring the target, and further make image processing, so that the computer processing becomes an image more suitable for human eyes observation or transmitted to an instrument for detection. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, image processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronized positioning and mapping, among other techniques. It should be noted that the present application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart traffic, driving assistance, and the like. In addition, the image processing method provided by the embodiment of the application also relates to deep learning in the AI technology.

Based on the above scheme, an embodiment of the present application provides an image processing method, please refer to fig. 2, and fig. 2 is a schematic flowchart of the image processing method provided in the embodiment of the present application. The method may be performed by a computer device, as shown in fig. 2, the image processing method may include the steps of:

s201, acquiring an image to be processed.

Optionally, the computer device may obtain the image to be processed from a local database, may also obtain the captured image from the capturing apparatus as the image to be processed, may also obtain the image to be processed from a public data set, and the like, which is not limited herein.

S202, calling an image processing model to perform feature extraction on the image to be processed to obtain a first feature map of the image to be processed, wherein the image processing model is obtained by training a quantization feature map and quantization parameters of a sample image, and the quantization feature map and the quantization parameters are obtained by performing quantization processing on a second feature map of the sample image.

In an optional implementation manner, the computer device may further obtain the sample image and corresponding annotation information, where the annotation information includes reference feature information of each key point in the sample image; calling a convolutional neural network to perform feature extraction on the sample image to obtain a second feature map of the sample image; quantizing the second feature map based on a preset number of bits to obtain a quantized feature map and quantized parameters of the sample image; and training the convolutional neural network based on the quantization characteristic diagram, the quantization parameters and the labeling information to obtain an image processing model.

Alternatively, the data included in the second feature map of the sample image may be 32-bit floating point type data; the preset number of bits may be 8 bits.

And S203, calling an image processing model to carry out convolution processing on the first feature map to obtain feature information of key points in the image to be processed.

Optionally, the feature information of the keypoints in the image to be processed includes an identifier, a position coordinate, and the like of the keypoint. For example, assuming that the image to be processed is a virtual face image, the feature information of the key points in the virtual face image may be the identification and position coordinates of each of a plurality of points corresponding to the eyesockets in the virtual face, the identification and position coordinates of each of a plurality of points corresponding to the mouth, the identification and position coordinates of each of a plurality of points corresponding to the nose, and the like.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a training process of an image processing model according to an embodiment of the present application, where the process may be executed by a computer device. The image processing model used in the image processing method shown in fig. 2 can be obtained by training the method shown in fig. 3. As shown in fig. 3, the training process of the image processing model may include the following steps:

s301, obtaining a sample image and corresponding labeling information, wherein the labeling information comprises reference characteristic information of each key point in the sample image.

Alternatively, the number of sample images may be one or more.

Alternatively, the computer device may obtain the sample image from a local database, may also obtain the captured image from the capturing device as the sample image, may also obtain the sample image from a published data set, and the like, which is not limited herein.

And S302, calling a convolutional neural network to perform feature extraction on the sample image to obtain a second feature map of the sample image.

Alternatively, the computer device may input the sample image to an input layer of the convolutional neural network, and after passing through the input layer and the convolutional layer, a second feature map of the sample image may be obtained.

And S303, carrying out quantization processing on the second feature map based on a preset bit number to obtain a quantization feature map and a quantization parameter of the sample image.

In an optional implementation manner, when the computer device performs quantization processing on the second feature map based on a preset number of bits to obtain a quantization feature map and a quantization parameter of the sample image, the computer device may perform the following steps: acquiring a quantization range and a quantization value range of the second characteristic diagram, wherein the quantization range is determined based on a preset bit number, and the quantization value range is determined based on a quantization function and a numerical value included in the second characteristic diagram; determining a quantization parameter of the second feature map based on the quantization range and the quantization value range; determining a quantization feature map of the sample image based on the quantization range, the second feature map and the quantization parameter, and storing the quantization feature map and the quantization parameter; the data included in the second characteristic diagram is floating point type data, and the data included in the quantization characteristic diagram is integer type data.

In this embodiment, the computer device may determine the quantization range of the second feature map based on a preset number of bits according to the following formula (1).

q min＝-2^bits-1；q max＝2^bits-1-1 (1)

In the above formula (1), bits represents the number of bits; qmin represents the minimum value of the quantization range of the second feature map; qmax denotes the maximum value of the quantization range of the second feature map.

For example, assuming that the preset number of bits is 8bits for the second feature map, the minimum value qmin = -2 in the quantization range of the second feature map^8-1= -128; maximum value qmax =2 in quantization range of second feature map^8-1-1＝127。

In this embodiment, when the computer device determines the quantization value range of the second feature map based on the quantization function Quantize and the values included in the second feature map, the computer device may dynamically count the quantization value range of the second feature map by using a moving average method. Alternatively, the quantization value domain of the second feature map may be denoted as [ min _ val, max _ val ]. Wherein, the dynamic statistics refers to calculating the maximum value and the minimum value of the second feature map in the current iteration at each time of training the forward network. Alternatively, in the next iteration of the current iteration, the quantization value range of the second feature map may be calculated according to the following formula (2).

In the above equation (2), xmax _ t1 represents the quantized maximum value of the second feature map at the next iteration of the current iteration; xmax represents the maximum value of the quantization of the second feature map at the current iteration; xmax _ t0 represents the quantized maximum value of the second feature map at the previous iteration of the current iteration; xmin _ t1 represents the maximum value of quantization of the second feature map at the next iteration of the current iteration; xmin represents the maximum value of the quantization of the second feature map at the current iteration; xmin _ t0 represents the maximum value of the quantization of the second feature map at the previous iteration of the current iteration. That is, the quantization scale of the second feature map in the current iteration is [ xmin, xmax ].

In this embodiment, the quantization parameter may include a first parameter and a second parameter, and the computer device may perform the following steps when determining the quantization parameter of the second feature map based on the quantization range and the quantization value range: determining a first parameter based on the quantization range and the quantization value range of the second feature map; the second parameter is determined based on the first parameter, the minimum value of the quantization range of the second feature map, and the minimum value of the quantization range of the second feature map.

Alternatively, the first parameter may be denoted scale. The computer device may determine the first parameter scale according to equation (3) below.

In the above equation (3), min _ val represents the maximum value of quantization of the second feature map; max _ val represents the minimum value of quantization of the second feature map, and the physical meanings of qmax and qmin can be referred to the description of the corresponding physical meanings in the foregoing formula (1), and are not described again here.

Alternatively, the second parameter may be denoted zp. The computer device may determine the second parameter zp according to equation (4) below.

In the above equation (4), qmin represents the minimum value in the quantization range of the second feature map; min _ val represents the minimum value of the quantization range of the second feature map; scale represents the first parameter, and the calculation of scale can be referred to as formula (3); round represents rounding to the integer function.

In this embodiment, the computer device determines the quantization feature map of the sample image based on the quantization range, the second feature map, and the quantization parameter, and may be determined according to the following formula (5).

In the above formula (5), X_QRepresenting a quantized feature map of the sample image; x_FA second feature map representing the sample image; scale denotes a first parameter; zp represents a second parameter; round represents rounding off the rounding function; the clamp represents a range-defining function that defines X_Q(ii) a The physical meanings of qmax and qmin can be referred to the description of the corresponding physical meaning in the foregoing formula (1), and are not described herein again.

Since the computer device needs to dynamically count the quantization value range of the second feature map in this embodiment, the quantization processing method of this embodiment may also be referred to as dynamic quantization.

Optionally, the data included in the second feature map may be 32-bit floating point type data, and the data included in the quantized feature map of the sample image obtained through the quantization processing may be 8-bit integer type data. That is, the preset number of bits is 8bits, and the computer device quantizes the second feature map of the sample image by 8 bits.

In another alternative embodiment, when the computer device performs quantization processing on the second feature map based on a preset number of bits to obtain a quantized feature map and a quantization parameter of the sample image, the computer device may perform the following steps: determining a quantization range based on a preset number of bits; determining a quantization parameter of the second feature map based on the quantization range and the value of the target point in the second feature map; and determining the quantization characteristic map of the sample image based on the quantization range, the second characteristic map and the quantization parameter. It can be seen that, compared to the foregoing dynamic quantization, in this embodiment, the computer device does not need the quantization value range determined based on the quantization function and the numerical value included in the second feature map, and only needs to convert the shift operation into the fixed-point operation, and therefore, the quantization processing method of this embodiment may also be referred to as static quantization. In this embodiment, the efficiency of quantization can be improved because of the fixed-point operation, but the loss of precision is also faster than that of dynamic quantization. That is, the precision of dynamic quantization is higher than that of static quantization.

Alternatively, after obtaining the quantization feature map and the quantization parameter of the sample image, the computer device may store the quantization feature map and the quantization parameter of the sample image in a buffer for calculating a convolution gradient of the convolution weight in a backward operation.

S304, training the convolutional neural network based on the quantization characteristic diagram, the quantization parameters and the labeling information to obtain an image processing model.

In an optional implementation manner, when the computer device trains the convolutional neural network based on the quantized feature map, the quantization parameter, and the labeling information to obtain the image processing model, the following steps may be performed: forward operation is carried out on the corresponding target convolution layer in the convolution neural network based on the second feature map and the convolution weight to obtain a third feature map; obtaining a quantization characteristic diagram and a quantization parameter from the cache, and performing backward operation on the target convolution layer based on the quantization characteristic diagram, the quantization parameter and the convolution gradient of the third characteristic diagram to obtain a convolution gradient of convolution weight; performing convolution processing on the convolution layer behind the target convolution layer in the convolutional neural network based on the convolution gradient of the convolution weight and the second characteristic diagram to obtain prediction characteristic information of each key point in the sample image; and adjusting network parameters of the convolutional neural network based on the labeling information and the prediction characteristic information of each key point in the sample image to obtain an image processing model.

Optionally, after step S303, the computer device may execute the step of performing a forward operation on the corresponding target convolutional layer in the convolutional neural network based on the second feature map and the convolutional weight to obtain a third feature map in this embodiment. That is, the computer device may also perform quantization processing on the second feature map of the sample data before performing the convolution operation on the second feature map of the sample data. This process may be referred to collectively as the forward operation of convolution in the embodiments of the present application.

Referring to fig. 4, fig. 4 is a schematic diagram of a forward convolution operation process according to an embodiment of the present application, as shown in fig. 4, X_FRepresenting a second feature map of the sample image; quantize represents the quantization function; scale denotes a first parameter; zp represents a second parameter; x_QRepresenting a quantized feature map of the sample image; w denotes the convolution weight; conv denotes a vector convolution operation; y denotes a third characteristic diagram.

In this embodiment, when the computer device performs a backward operation on the target convolution layer based on the quantized feature map, the quantization parameter, and the convolution gradient of the third feature map to obtain a convolution gradient of the convolution weight, the computer device may perform the following steps: carrying out inverse quantization processing on the quantization characteristic diagram based on the quantization parameter to obtain a fourth characteristic diagram; and performing backward operation on the target convolution layer based on the convolution gradients of the fourth feature map and the third feature map to obtain a convolution gradient of the convolution weight.

Optionally, the fourth feature map is a feature map obtained by superimposing quantization loss on the second feature map, that is, the fourth feature map is a second feature map obtained by quantizing loss. As can be seen from the foregoing, the data included in the second characteristic diagram is floating point type data, and therefore, the data included in the fourth characteristic diagram is also floating point type data.

Optionally, the computer device performs inverse quantization processing on the quantized feature map based on the quantization parameter to obtain a fourth feature map, which can be calculated according to an inverse quantization function Dequantize. Wherein the expression of Dequantize is as shown in the following formula (6).

X_F1＝(X_Q-zp)*scale (6)

In the above formula (6), X_F1A fourth feature diagram of the representation; x_QRepresenting a quantized feature map of the sample image; scale denotes a first parameter; zp represents the second parameter.

In this embodiment, when the computer device performs a backward operation on the target convolution layer based on the convolution gradients of the fourth feature map and the third feature map to obtain the convolution gradient of the convolution weight, the fourth feature map is a quantized feature map (i.e., X) based on the sample feature map_Q) Obtained, while X_QThere is a quantization loss and thus a loss in the resulting convolution gradient.

Alternatively, this embodiment may also be referred to as a backward operation of convolution. Referring to fig. 5, fig. 5 is a schematic process diagram of a backward operation of convolution according to an embodiment of the present application. As shown in FIG. 5, X_QRepresenting a quantized feature map of the sample image; scale denotes a first parameter; zp represents a second parameter; dequantize represents the inverse quantization function; x_F1A fourth feature map is shown, namely an inverse quantization feature map of the sample image; YGrad represents the convolved gradient of the third profile; w back denotes the inverse operation, also called back propagation, in convolutional neural networks, WGrad denotes the convolution gradient of the convolution weights. Wherein YGrad is obtained by partial derivation of the third feature map based on a loss function. Alternatively, the loss function may be an L2 loss function, a Mean Square Error (MSE) loss function, or the like, which is not limited herein.

In this embodiment, when the computer device adjusts the network parameters of the convolutional neural network based on the annotation information and the prediction feature information of each key point in the sample image to obtain the image processing model, the network parameters of the convolutional neural network may be adjusted based on the loss between the reference feature information of each key point in the sample image and the prediction feature information of each key point included in the annotation information to obtain the image processing model.

In the embodiment of the application, computer equipment acquires a sample image and corresponding annotation information, wherein the annotation information comprises reference characteristic information of each key point in the sample image; calling a convolutional neural network to perform feature extraction on the sample image to obtain a second feature map of the sample image; quantizing the second feature map based on a preset number of bits to obtain a quantized feature map and quantized parameters of the sample image; and training the convolutional neural network based on the quantization characteristic diagram, the quantization parameters and the labeling information to obtain an image processing model. Therefore, by adopting the embodiment of the application, the occupation amount of the feature graph on the video memory can be reduced by quantizing the feature graph based on the preset bit number, so that the cost of hardware required by image processing is reduced.

Referring to fig. 6, fig. 6 is a schematic diagram of a virtual human face key point location provided in the embodiment of the present application. The key point location prediction of the virtual face is to output key point locations with consistent semantics by inputting a virtual face image into a neural network. As shown in fig. 6, there are 228 key points in the virtual human face. Each key point has corresponding labeling information, such as identification information and coordinate information. Identification information such as 1, 2, 3, \ 8230;, 228.

Optionally, the computer device may predict the key points of the virtual face by: direct regression and heat map. The direct regression scheme directly predicts the coordinates of key points through a convolutional neural network, while the scheme of the heat map generally regresses the gaussian heat map of the key points, which has a structure of a plurality of hourglass (hour glass) networks, needs to output a feature map with higher resolution, and outputs the gaussian heat map for each key point, wherein the video memory occupation is positively correlated with the point number, so the requirement on a video card by adopting the heat map scheme is far higher than that of direct regression. Referring to fig. 7, fig. 7 is a schematic diagram of predicting virtual face key points by using a direct regression scheme and a heat map regression scheme according to an embodiment of the present application. As shown in fig. 7, where the networks in the dashed box belong to direct regression, the networks in the implementation box belong to heatmap regression. The direct regression scheme tends to have a lower accuracy than the thermogram regression scheme, but the performance of the direct regression scheme is better than the thermogram regression scheme. Optionally, when predicting the virtual face key points, a heatmap regression scheme is generally adopted to obtain high-precision virtual face key points.

As can be seen from the foregoing, the convolution gradient of the convolution weight results in a loss, and the training of the image processing model needs to be based on the convolution gradient. Therefore, in order to verify whether the precision of the image processing model is lost, the method performs an experiment based on a virtual human face key point regression network with a calculated amount of only 12 megabits (M) floating point operations (FLOPs), wherein the virtual human face key point regression network is a network structure similar to MobileNet V2, the MobileNet V2 is a lightweight convolutional neural network, and the experiment result is shown in the following table 1.

TABLE 1 results of the experiment

Scheme(s)	Conv_Float32-NME	Conv_Int8-NME
			mv2_12M Flops	0.04257	0.04275

In table 1, conv _ Float32 represents performing vector convolution operation on 32-bit floating point numbers (FP 32), and Conv _ Float32 represents nn.conv2d for Pytorch native convolution, and nn.conv2d represents a two-dimensional convolution function; conv _ Int8 represents that 8-bit integer data is subjected to vector convolution operation, and Conv _ Int8 is nn. Conv2d realized by the application; the Normalized Mean Error (NME) represents the evaluation index, i.e. the calculated predicted key point coordinates and the actual artificially labeled key point coordinatesThe smaller the value, the better. As can be seen from table 1, the accuracy of the 12M Flops network can be nearly lossless by using the quantization scheme provided in the present application, and therefore, it is believed that the accuracy of the network with higher computation amount can be guaranteed to be lossless. Experiments show that the characteristic diagram X is processed by the method in the convolution forward process_F8bit quantized to X_QDue to X_QCan be represented by int8, so that only 8bits of X are stored in the network forward_QAnd a quantization parameter. Compared with FP32, the video memory occupation of the convolution characteristic diagram can be reduced to 1/4, and X can be reduced in the backward direction_QDequantization to floating point numbers using quantization parameters is used for gradient calculations to maintain nearly the same precision as floating point training.

In addition, aiming at the heat map regression scheme of the key points of the human face, the video memory occupation can be greatly reduced and the training can be completed.

It should be noted that, when the embodiment of the present application is applied to a specific product or technology, the image to be processed, the sample image, the virtual face image, and the like related to the embodiment of the present application are obtained after obtaining the permission of the user or agreeing to the user; and the collection, use and processing of the images to be processed, the sample images, the virtual face images, etc. are required to comply with relevant laws and regulations and standards in relevant countries and regions.

Based on the description of the related embodiments of the image processing method, the embodiment of the present application also provides an image processing apparatus, which may be a computer program (including program code) running in a computer device. The image processing apparatus may execute the image processing method shown in fig. 2; referring to fig. 8, fig. 8 is a schematic diagram of an image processing apparatus according to an embodiment of the present disclosure, where the image processing apparatus may include the following units:

an acquisition unit 801 configured to acquire an image to be processed;

the processing unit 802 is configured to invoke an image processing model to perform feature extraction on an image to be processed to obtain a first feature map of the image to be processed, where the image processing model is obtained by training a quantization feature map and a quantization parameter of a sample image, and the quantization feature map and the quantization parameter are obtained by performing quantization processing on a second feature map of the sample image;

the processing unit 802 is further configured to invoke an image processing model to perform convolution processing on the first feature map, so as to obtain feature information of a key point in the image to be processed.

In an alternative embodiment, the image processing apparatus further comprises: a training unit 803.

In an alternative embodiment, the training unit 803 is configured to:

acquiring a sample image and corresponding annotation information, wherein the annotation information comprises reference characteristic information of each key point in the sample image;

calling a convolutional neural network to perform feature extraction on the sample image to obtain a second feature map of the sample image;

quantizing the second feature map based on a preset number of bits to obtain a quantized feature map and quantized parameters of the sample image;

and training the convolutional neural network based on the quantization characteristic diagram, the quantization parameters and the labeling information to obtain an image processing model.

In an optional implementation manner, when the training unit 803 is configured to perform quantization processing on the second feature map based on a preset number of bits to obtain a quantized feature map and a quantization parameter of the sample image, the training unit is specifically configured to:

acquiring a quantization range and a quantization value range of the second characteristic diagram, wherein the quantization range is determined based on a preset bit number, and the quantization value range is determined based on a quantization function and a numerical value included in the second characteristic diagram;

determining a quantization parameter of the second feature map based on the quantization range and the quantization value range;

determining a quantization feature map of the sample image based on the quantization range, the second feature map and the quantization parameter, and storing the quantization feature map and the quantization parameter;

the data included in the second characteristic diagram is floating-point data, and the data included in the quantized characteristic diagram is integer data.

In an alternative embodiment, the quantization parameter includes a first parameter and a second parameter, and the training unit 803, when being configured to determine the quantization parameter of the second feature map based on the quantization range and the quantization value range, is specifically configured to:

determining a first parameter based on the quantization range and the quantization value range of the second feature map;

the second parameter is determined based on the first parameter, the minimum value of the quantization range of the second feature map, and the minimum value of the quantization range of the second feature map.

In an optional implementation manner, the training unit 803, when being configured to train the convolutional neural network based on the quantization feature map, the quantization parameter, and the labeling information to obtain the image processing model, is specifically configured to:

forward operation is carried out on a corresponding target convolution layer in the convolution neural network on the basis of the second feature map and the convolution weight, and a third feature map is obtained;

obtaining a quantization characteristic diagram and a quantization parameter from the cache, and carrying out backward operation on the target convolution layer based on the convolution gradients of the quantization characteristic diagram, the quantization parameter and the third characteristic diagram to obtain a convolution gradient of convolution weight;

performing convolution processing on the convolution layer behind the target convolution layer in the convolutional neural network based on the convolution gradient of the convolution weight and the second characteristic diagram to obtain prediction characteristic information of each key point in the sample image;

and adjusting network parameters of the convolutional neural network based on the labeling information and the prediction characteristic information of each key point in the sample image to obtain an image processing model.

In an alternative embodiment, the training unit 803, when configured to perform a backward operation on the target convolutional layer based on the quantized feature map, the quantization parameter, and the convolutional gradient of the third feature map, to obtain a convolutional gradient of the convolutional weight, is specifically configured to:

carrying out inverse quantization processing on the quantization characteristic diagram based on the quantization parameter to obtain a fourth characteristic diagram;

and performing backward operation on the target convolution layer based on the convolution gradients of the fourth feature map and the third feature map to obtain a convolution gradient of the convolution weight.

In an alternative embodiment, the training unit 803, when configured to perform quantization processing on the second feature map based on a preset number of bits to obtain a quantized feature map of the sample image and a quantization parameter, is configured to:

determining a quantization range based on a preset bit number;

determining a quantization parameter of the second feature map based on the quantization range and the value of the target point in the second feature map;

and determining the quantization characteristic map of the sample image based on the quantization range, the second characteristic map and the quantization parameter.

According to an embodiment of the present application, the steps involved in the method shown in fig. 2 and the training process shown in fig. 3 may be performed by various units in the image processing apparatus shown in fig. 8. For example, step S201 shown in fig. 2 may be performed by the acquisition unit 801 shown in fig. 8, and step S202 and step S203 may be performed by the processing unit 802 shown in fig. 8. As another example, steps S301-S304 shown in fig. 3 may each be performed by the training unit 803 shown in fig. 8, and so on.

According to another embodiment of the present application, the units in the image processing apparatus shown in fig. 8 may be respectively or entirely combined into one or several other units to form the image processing apparatus, or some unit(s) may be further split into multiple units with smaller functions to form the image processing apparatus, which may achieve the same operation without affecting the achievement of the technical effects of the embodiments of the present application. The units are divided based on logic functions, and in practical applications, the functions of one unit can also be implemented by a plurality of units, or the functions of a plurality of units can also be implemented by one unit. In other embodiments of the present application, the image processing apparatus includes other units, and in practical applications, these functions may be implemented by assistance of other units, and may be implemented by cooperation of a plurality of units.

According to another embodiment of the present application, the image processing apparatus as shown in fig. 8 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method as shown in fig. 2 on a general-purpose computer device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like as well as a storage element, and the image processing method of the embodiment of the present application may be implemented. The computer program may be embodied on a computer-readable storage medium, for example, and loaded into and executed by the computer apparatus described above via the computer-readable storage medium.

It can be understood that, for specific implementation of each unit in the image processing apparatus and beneficial effects that can be achieved by the image processing apparatus provided in the embodiment of the present application, reference may be made to the description of the foregoing embodiment of the image processing method, and details are not described here again.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application further provides a computer device. Referring to fig. 9, the computer device at least comprises a processor 901, a memory 902 and a communication interface 903. The processor 901, the memory 902 and the communication interface 903 may be connected by a bus 904 or by other means, and the embodiment of the present application is exemplified by being connected by the bus 904.

The processor 901 (or referred to as a Central Processing Unit (CPU)) is a computing core and a control core of the computer device, and can analyze various instructions in the computer device and process various data of the computer device, for example: the CPU can be used for analyzing a power-on and power-off instruction sent to the computer equipment by a user and controlling the computer equipment to carry out power-on and power-off operation; the following steps are repeated: the CPU may transmit various types of interactive data between the internal structures of the computer device, and so on. The communication interface 903 may optionally include a standard wired interface, a wireless interface (e.g., wi-Fi, mobile communication interface, etc.), and is controlled by the processor 901 to transmit and receive data. The Memory 902 (Memory) is a Memory device in the computer device for storing computer programs and data. It will be appreciated that the memory 902 may comprise both internal memory of the computing device and, of course, expansion memory supported by the computing device. The memory 902 provides storage space that stores an operating system for the computer device, which may include, but is not limited to: windows system, linux system, android system, iOS system, etc., which are not limited in this application. In an alternative implementation, the processor 901 of the embodiment of the present application may execute the following operations by executing the computer program stored in the memory 902:

acquiring an image to be processed;

In an alternative embodiment, the processor 901 is further configured to:

In an optional implementation manner, when the processor 901 is configured to perform quantization processing on the second feature map based on a preset number of bits to obtain a quantized feature map and a quantization parameter of the sample image, the processor is specifically configured to:

acquiring a quantization range and a quantization value range of the second characteristic diagram, wherein the quantization range is determined based on a preset bit number, and the quantization value range is determined based on a quantization function and numerical values included in the second characteristic diagram;

the data included in the second characteristic diagram is floating point type data, and the data included in the quantization characteristic diagram is integer type data.

In an optional implementation manner, the quantization parameter includes a first parameter and a second parameter, and the processor 901 is specifically configured to, when configured to determine the quantization parameter of the second feature map based on the quantization range and the quantization value range:

In an alternative embodiment, the processor 901 is configured to perform a forward operation on a corresponding target convolutional layer in the convolutional neural network based on the second feature map and the convolutional weight to obtain a third feature map;

obtaining a quantization characteristic diagram and a quantization parameter from the cache, and performing backward operation on the target convolution layer based on the quantization characteristic diagram, the quantization parameter and the convolution gradient of the third characteristic diagram to obtain a convolution gradient of convolution weight;

In an optional implementation, the processor 901, when configured to perform a backward operation on the target convolutional layer based on the quantized feature map, the quantization parameter, and the convolutional gradient of the third feature map, to obtain a convolutional gradient of the convolutional weight, is specifically configured to:

determining a quantization range based on a preset bit number;

In a specific implementation, the processor 901, the memory 902, and the communication interface 903 described in this embodiment may execute an implementation manner of the computer device described in the image processing method provided in this embodiment, and may also execute an implementation manner described in the image processing apparatus provided in this embodiment, which is not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the image processing method according to any one of the above-mentioned possible implementation manners. For specific implementation, reference may be made to the foregoing description, which is not repeated herein.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the image processing method of any one of the possible implementations. For specific implementation, reference may be made to the foregoing description, which is not repeated herein.

It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art will recognize that the embodiments described in this specification are preferred embodiments and that acts or modules referred to are not necessarily required for this application.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

The above disclosure is only a few examples of the present application, and certainly should not be taken as limiting the scope of the present application, which is therefore intended to cover all modifications that are within the scope of the present application and which are equivalent to the claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an image to be processed;

calling an image processing model to perform feature extraction on the image to be processed to obtain a first feature map of the image to be processed, wherein the image processing model is obtained by training a quantization feature map and quantization parameters of a sample image, and the quantization feature map and the quantization parameters are obtained by performing quantization processing on a second feature map of the sample image;

and calling the image processing model to carry out convolution processing on the first feature graph to obtain feature information of key points in the image to be processed.

2. The method of claim 1, further comprising:

and training the convolutional neural network based on the quantization feature map, the quantization parameter and the labeling information to obtain the image processing model.

3. The method according to claim 2, wherein the quantizing the second feature map based on a preset number of bits to obtain a quantized feature map and a quantization parameter of the sample image includes:

acquiring a quantization range and a quantization value range of the second feature map, wherein the quantization range is determined based on a preset bit number, and the quantization value range is determined based on a quantization function and a numerical value included in the second feature map;

the data included in the second characteristic diagram is floating-point type data, and the data included in the quantization characteristic diagram is integer type data.

4. The method of claim 3, wherein the quantization parameter comprises a first parameter and a second parameter, and wherein determining the quantization parameter for the second feature map based on the quantization range and the quantization range comprises:

determining the first parameter based on a quantization range and a quantization value range of the second feature map;

determining the second parameter based on the first parameter, a minimum value of a quantization range of the second feature map, and a minimum value of a quantization range of the second feature map.

5. The method according to claim 2 or 3, wherein the training the convolutional neural network based on the quantization feature map, the quantization parameter and the labeling information to obtain the image processing model comprises:

performing forward operation on a corresponding target convolutional layer in the convolutional neural network based on the second feature map and the convolutional weight to obtain a third feature map;

obtaining the quantization feature map and the quantization parameter from a cache, and performing backward operation on the target convolution layer based on the convolution gradients of the quantization feature map, the quantization parameter and the third feature map to obtain a convolution gradient of the convolution weight;

performing convolution processing on convolution layers behind the target convolution layer in the convolutional neural network based on the convolution gradient of the convolution weight and the second feature map to obtain prediction feature information of each key point in the sample image;

and adjusting network parameters of the convolutional neural network based on the labeling information and the prediction characteristic information of each key point in the sample image to obtain the image processing model.

6. The method of claim 5, wherein performing a backward operation on the target convolution layer based on the quantized feature map, the quantization parameter, and a convolution gradient of the third feature map to obtain a convolution gradient of the convolution weight comprises:

and carrying out backward operation on the target convolution layer based on the convolution gradients of the fourth feature map and the third feature map to obtain the convolution gradient of the convolution weight.

7. The method according to claim 2, wherein the quantizing the second feature map based on a preset number of bits to obtain a quantized feature map and a quantization parameter of the sample image comprises:

determining a quantization range based on a preset number of bits;

and determining the quantization feature map of the sample image based on the quantization range, the second feature map and the quantization parameter.

8. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition unit is used for acquiring an image to be processed;

the processing unit is further configured to call the image processing model to perform convolution processing on the first feature map, so as to obtain feature information of the key points in the image to be processed.

9. A computer device comprising a memory, a communication interface, and a processor, wherein the memory, the communication interface, and the processor are interconnected; the memory stores a computer program, and the processor calls the computer program stored in the memory to implement the image processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, implements the image processing method of any one of claims 1 to 7.

11. A computer program product, characterized in that it comprises a computer program or computer instructions which, when executed by a processor, implement the image processing method according to any one of claims 1 to 7.