CN115131196A

CN115131196A - Image processing method, system, storage medium and terminal equipment

Info

Publication number: CN115131196A
Application number: CN202210348927.9A
Authority: CN
Inventors: 朱飞达; 朱俊伟; 邰颖; 汪铖杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-09-30

Abstract

The embodiment of the invention discloses an image processing method, an image processing system, a storage medium and terminal equipment, which are applied to the field of artificial intelligence. The image processing system can acquire a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image, and call a pre-trained image processing model, and the image processing model acquires a processed image of the target face image according to the combined data of the target face image and the three-dimensional image, wherein the resolution ratio of the processed image is higher than that of the target face image. The method and the device combine the corresponding three-dimensional image when processing the target face image with low definition (namely low resolution), so that the effect of the processed image is improved.

Description

Image processing method, system, storage medium and terminal equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an image processing method, an image processing system, a storage medium and terminal equipment.

Background

In the process of shooting or network transmission of an image, the shot image is not clear and has low quality due to factors such as inaccurate focusing, too much noise, too high picture compression ratio and the like, so that the image needs to be processed, and the definition quality of the image is improved.

In the existing image processing process, the image can be processed by a machine learning model based on artificial intelligence, and then the high-definition image can be obtained. However, after some images (such as human face images) are processed by using the existing image processing method, the effect of the processed images is not good due to the complexity of objects in the images.

Disclosure of Invention

The embodiment of the invention provides an image processing method, an image processing system, a storage medium and a terminal device, which improve the processing effect on a face image.

An embodiment of the present invention provides an image processing method, including:

acquiring a target face image and a three-dimensional coefficient of a face image contained in the target face image;

acquiring a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image;

calling a pre-trained image processing model;

and the image processing model acquires a processed image of the target face image according to the combined data of the target face image and the three-dimensional image, and outputs the processed image, wherein the resolution of the processed image is higher than that of the target face image.

Another aspect of an embodiment of the present invention provides an image processing system, including:

the coefficient acquisition unit is used for acquiring a target face image and a three-dimensional coefficient of the target face image;

the three-dimensional image unit is used for acquiring a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image;

the model calling unit is used for calling a pre-trained image processing model;

and the processing unit is used for acquiring a processed image of the target face image by the image processing model according to the combined data of the target face image and the three-dimensional image and outputting the processed image, wherein the resolution of the processed image is higher than that of the target face image.

In another aspect, an embodiment of the present invention further provides a computer-readable storage medium, which stores a plurality of computer programs, the computer programs being adapted to be loaded by a processor and to perform an image processing method according to an aspect of an embodiment of the present invention.

In another aspect, an embodiment of the present invention further provides a terminal device, including a processor and a memory;

the memory is used for storing a plurality of computer programs, and the computer programs are used for being loaded by the processor and executing the image processing method according to the aspect of the embodiment of the invention; the processor is configured to implement each of the plurality of computer programs.

As can be seen, in the method of this embodiment, the image processing system may obtain the three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image, and call the pre-trained image processing model, and the image processing model obtains the processed image of the target face image according to the merged data of the target face image and the three-dimensional image thereof, where the resolution of the processed image is higher than that of the target face image. Therefore, when the target face image with low definition (namely low resolution) is processed, the corresponding three-dimensional image is combined, wherein the data of the three-dimensional image can describe the face image in more detail, so that the effect of the processed image obtained by processing is well improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of image processing according to an embodiment of the present invention;

FIG. 3a is a block diagram illustrating the structure of an image processing model invoked in an embodiment of the present invention;

FIG. 3b is a block diagram of another image processing model invoked in an embodiment of the present invention;

FIG. 4 is a flow diagram of a method of training an image processing model in one embodiment of the invention;

FIG. 5 is a diagram illustrating an initial model of image processing determined during training of an image processing model in an embodiment of the present invention;

FIG. 6 is a diagram of a second decoding module in an embodiment of the present invention;

FIG. 7 is a diagram illustrating an image processing method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a distributed system to which an image processing method is applied in another embodiment of the present invention;

FIG. 9 is a block diagram illustrating an exemplary block structure according to another embodiment of the present invention;

FIG. 10 is a schematic diagram of a logic structure of an image processing system according to an embodiment of the present invention;

fig. 11 is a schematic logical structure diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention provides an image processing method, which is mainly used for processing any face image (namely a target face image) with low definition and low quality, and as shown in fig. 1, the image processing system of the embodiment can process the target face image according to the following method:

acquiring a target face image and a three-dimensional coefficient of the face image contained in the target face image; acquiring a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image; calling a pre-trained image processing model; and the image processing model acquires a processed image of the target face image according to the combined data of the target face image and the three-dimensional image, and outputs the processed image, wherein the resolution of the processed image is higher than that of the target face image.

In practical applications, the image processing system can be mainly applied to the following application terminals: mobile phones, computers, intelligent voice interaction equipment, intelligent household appliances, vehicle-mounted terminals, aircrafts and the like.

The pre-trained image processing model herein is an Artificial Intelligence (AI) based machine learning model, wherein AI is a theory, method, technique and application system that simulates, extends and extends human Intelligence, senses the environment, acquires knowledge and uses the knowledge to obtain optimal results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Therefore, when the target face image with low definition (namely low resolution) is processed, the corresponding three-dimensional image is combined, wherein the data of the three-dimensional image can describe the face image in more detail, so that the effect of the processed image obtained by processing is well improved.

An embodiment of the present invention provides an image processing method, which is a method executed by an image processing system, and a flowchart is shown in fig. 2, where the method includes:

step 101, obtaining a target face image and a three-dimensional coefficient of the face image contained in the target face image.

It is to be understood that, in one case, the image processing system may provide an interactive interface with the user, so that the user may specify a certain face image as the target face image through the interactive interface, and initiate the image processing flow of the embodiment. Or, in another case, when some specific event occurs, the image processing system may be triggered to take a certain face image as the target face image, and the image processing flow of this embodiment may be initiated.

Generally, the target face image is a two-dimensional image with low definition and low quality, and the three-dimensional coefficient of the acquired face image refers to information required for performing three-dimensional reconstruction on the face image included in the two-dimensional target face image to acquire a corresponding three-dimensional image, such as a shape and a posture of a face included in the target face image.

Specifically, the image processing system may obtain the three-dimensional coefficients of any human face image by using a preset coefficient prediction model. The coefficient prediction model is a machine learning model based on artificial intelligence, and can be obtained by training through a certain training method, the running logic of the trained coefficient prediction model is stored in the system in advance, and when the process of the embodiment is initiated, a coefficient prediction module preset in the system can be directly called to obtain the three-dimensional coefficient of the target face image.

The coefficient prediction model may adopt any structure of Network, such as a Residual Neural Network (ResNet) 50.

And 102, acquiring a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image.

Specifically, the image processing system may first reconstruct three-dimensional (3 Dimensions, 3D) mesh information of the target face image according to a three-dimensional coefficient of the target face image by using a three-dimensional reconstruction method, where the three-dimensional mesh information may include shapes (S) and textures (T) of a plurality of faces constituting the three-dimensional face, and the three-dimensional face is a three-dimensional face corresponding to a face included in the target face image; and then projecting the three-dimensional grid information to a two-dimensional plane in a rendering mode to obtain a three-dimensional image of the target face image.

In this embodiment, the image processing system may call a three-dimensional deformable human face model (3DMM), and the three-dimensional network information may be reconstructed by the 3DMM according to the three-dimensional coefficients obtained in the above steps.

It should be noted that, when the three-dimensional mesh information is reconstructed according to the three-dimensional coefficient, the three-dimensional mesh information of a small part of the human face region, such as the three-dimensional mesh information of the region except for the hair and the ears, can be reconstructed; or three-dimensional grid information of more regions of the human face is reconstructed, so that the recovered three-dimensional image has a good effect.

And 103, calling a pre-trained image processing model.

And 104, acquiring a processed image of the target face image by the image processing model according to the combined data of the target face image and the three-dimensional image, and outputting the processed image, wherein the resolution of the processed image is higher than that of the target face image.

The merged data of the target face image and the three-dimensional image is data obtained by connecting the data of the target face image and the data of the three-dimensional image in series.

Specifically, the image processing system may call an image processing model of the following structure, as shown in fig. 3a and 3b, the image processing model may include: a mapping module 11, a plurality of encoding modules 10, a plurality of decoding modules (including a first decoding module 12 and a plurality of second decoding modules 13), and an output module 14, wherein: the output of one encoding module 10 is connected to one input of one decoding module, while the output of the mapping module 11 is connected to the other input of the plurality of decoding modules, respectively; a plurality of encoding modules 10 are connected in series, a plurality of decoding modules are also connected in series, and the output of the last decoding module of the plurality of decoding modules is connected to an output module 14. Wherein the number of encoding modules 10 and decoding modules is equal.

The relationship between the encoding module 10 and the mapping module can be as shown in fig. 3a, there is no direct connection between the mapping module 11 and the encoding module 10, and the inputs of the two modules are the same and are both the target face image and the three-dimensional image thereof. The relationship between the encoding modules 10 and the mapping modules 11 may also be as shown in fig. 3b, with the output of the last encoding module 10 of the plurality of encoding modules 10 being connected to the mapping module 11.

Thus, when the processed image is obtained, the mapping module 11 may be used to determine a hidden variable in combination with the data of the target face image and the three-dimensional image, where the hidden variable is used to adjust convolution weights of convolution calculations related to each decoding module; extracting spatial features of one resolution of combined data of the target face image and the three-dimensional image through the coding module 10, namely, one coding module 10 corresponds to one resolution, and the resolutions corresponding to different coding modules 10 are different; a second decoding module 13 of the plurality of decoding modules performs convolution calculation according to the implicit variable, the spatial feature of one resolution extracted by one encoding module and the feature information of one resolution acquired by a previous decoding module of the second decoding module 13 to obtain the feature information of the other resolution; further, the output module 14 obtains a processed image of the target face image according to the feature information obtained by the last second decoding module 13.

Specifically, when the second decoding module 13 obtains feature information of another resolution, the second decoding module 13 may first adjust a convolution weight involved in the convolution calculation according to the hidden variable to obtain an adjusted convolution weight, and then perform convolution calculation on feature information of one resolution obtained by a previous decoding module according to the adjusted convolution weight to obtain a feature after convolution; and finally, obtaining feature information of another resolution ratio according to the spatial feature of one resolution ratio extracted by one coding module and the feature after convolution.

In this process, the first decoding module 12 of the plurality of decoding modules performs convolution calculation according to the implicit variable and the spatial feature of one resolution obtained by one encoding module to obtain feature information of another resolution. Specifically, the first decoding module 12 may adjust a convolution weight involved in the convolution calculation according to the hidden variable to obtain an adjusted convolution weight, and then perform convolution calculation on the preset initial feature information according to the adjusted convolution weight to obtain a feature after convolution; and finally, obtaining feature information of another resolution ratio according to the spatial feature of one resolution ratio extracted by one coding module and the feature after convolution. In the process, the first coding module 10 of the plurality of coding modules 10 directly extracts the spatial features according to the data of the target face image and the three-dimensional image to obtain the spatial features of a resolution; the other encoding modules 10 (the encoding modules 10 other than the first encoding module) of the plurality of encoding modules 10 obtain spatial features of one resolution from the spatial features of another resolution obtained by the previous encoding module 10.

In addition, in the process of acquiring the processed image, when the mapping module 11 determines the hidden variable, in the case shown in fig. 3a, the mapping module 11 directly determines the hidden variable according to the merged data of the target face image and the three-dimensional image; for the situation shown in fig. 3b, the merged data of the target face image and the three-dimensional image is subjected to feature extraction by the plurality of coding modules 10 to obtain a spatial feature with a certain resolution, and the mapping module 11 determines the hidden variable according to the spatial feature with the certain resolution output by the last coding module 10 of the plurality of coding modules 10.

In a specific embodiment, the image processing model called in step 103 may be pre-trained according to the following steps, and a flowchart is shown in fig. 4, and includes:

step 201, determining an image processing initial model, where the image processing initial model includes: a processing sub-module and a discrimination sub-module.

It is understood that when determining the image processing initial model, the image processing system determines the initial values of the parameters in the multilayer structure and each layer mechanism included in the image processing initial model. The parameters of the image processing initial model refer to fixed parameters used in the calculation process of each layer structure in the image processing initial model, and the parameters do not need to be assigned at any time, such as parameters of parameter scale, network layer number, weight value and the like.

Specifically, as shown in fig. 5, the image processing system may determine that the image processing initial model includes: the processing sub-module 20 is configured to obtain a processed image of the first face sample image according to the first face sample image and the three-dimensional face image corresponding to the first face sample image, and the determining sub-module 21 is configured to determine whether the processed image obtained by the processing sub-module 20 is true. In this embodiment, the determining sub-module 21 further needs to determine whether the second face sample image is true.

Specifically, the processing sub-module 10 may include the following structure: a mapping module 210, a plurality of encoding modules 220, a plurality of decoding modules (including a first decoding module 230 and a second decoding module 240), and an output module 250, wherein:

the mapping module 210 is configured to combine the data of the first face sample image and the three-dimensional face image thereof, that is, determine a hidden variable according to the combined data of the first face sample image and the three-dimensional face image thereof, where the hidden variable is used to adjust convolution weights of convolution calculations related to each decoding module; the encoding module 220 is configured to extract spatial features of a resolution of merged data of the first face sample image and the three-dimensional face image thereof, where the first encoding module 220 directly extracts spatial features of a resolution of the first face sample image and the three-dimensional face image thereof, and the other encoding modules 220 may obtain spatial features of another resolution according to the spatial features of a resolution obtained by the previous encoding module 220; the second decoding module 240 of the plurality of decoding modules is configured to perform convolution calculation according to the hidden variable, the spatial feature of one resolution extracted by one encoding module 220, and the feature information of one resolution acquired by a previous decoding module of the second decoding module 240 to obtain the feature information of another resolution, and the first decoding module 230 of the plurality of decoding modules performs convolution calculation according to the hidden variable and the spatial feature of one resolution obtained by one encoding module to obtain the feature information of another resolution; the output module 250 obtains a processed image of the first face sample image according to the feature information obtained by the last second decoding module 240.

It should be noted that the processing sub-module 20 and the determination sub-module 21 defined herein form a countermeasure Network (GAN), and the use of the countermeasure Network can implement unsupervised machine learning. The countermeasure network mainly comprises a generating network (generator) and a discriminating network (discriminator), in this embodiment, the generating network is specifically a processing sub-module 20, and the discriminating network is specifically a discriminating sub-module 21. Practice proves that the quality of the obtained high-resolution image is improved aiming at any low-resolution face image through the image processing model obtained through the antagonistic network training.

Step 202, determining a training sample, where the training sample includes a plurality of sample groups, and each sample group includes a first face sample image with a low resolution, a second face sample image with a high resolution corresponding to the first face sample image, and a three-dimensional face image.

Specifically, when the first face sample image is acquired, the resolution reduction processing may be performed on the second face sample image to obtain the first face sample image, where the resolution reduction processing may include, but is not limited to, processing in any one of the following manners: and adding processing modes such as blurring, down-sampling, noise and Joint Photographic Experts Group (JPEG) compression. Wherein:

the adding fuzzy processing is to add Gaussian blur, motion blur and the like randomly, the standard deviation of the Gaussian blur is selected randomly in a certain range, and the motion blur comprises a plurality of (38) self-defined fuzzy cores; the down-sampling treatment is to reduce the image resolution, and the sampling mode randomly selects the modes of biliner, bicubic, area and the like; the noise processing comprises Gaussian noise, Poisson noise and the like, and the noise intensity is randomly selected within a certain range; the JPEG compression processing is the image quality reduction in the process of simulating and storing the image. The compression ratio is randomly selected between 5 percent and 50 percent.

In step 203, the processing sub-module 20 obtains a processed image of the first face sample image according to the merged data of the first face sample image and the three-dimensional face image corresponding to the first face sample image, and the determination sub-module 21 determines whether the processed image obtained by the processing sub-module is true.

Specifically, the encoding module 220 of the processing submodule 20 in the image processing initial model extracts the spatial features of the merged data of the first face sample image and the three-dimensional face image thereof; meanwhile, the mapping module 210 obtains a hidden variable according to the merged data of the first face sample image and the three-dimensional face image thereof to adjust the convolution weight in the decoding module; the decoding module performs convolution calculation on the spatial features extracted by the encoding module 220 based on the convolution weight adjusted by the hidden variable, so as to obtain features after convolution, and finally the output module 250 outputs the processed image of the first face sample image according to the features after convolution.

The determining submodule 21 determines the processed image obtained by the processing submodule 20, and in this embodiment, the determining submodule 21 needs to determine whether the processed image obtained by the processing submodule 20 is true, and also needs to determine whether the second face sample image is true.

And 204, adjusting the image processing initial model according to the result obtained by the judging submodule 21 and the second face sample image in the training sample, wherein the processing submodule 20 in the adjusted image processing initial model is the pre-trained image processing model. The image processing model is mainly used for acquiring a corresponding high-resolution image according to any low-resolution image and the combined data of the three-dimensional image.

Specifically, the image processing system calculates a loss function related to the initial model of image processing according to the result obtained by the determining submodule 21 in the step 203 and the corresponding second face sample image, where the loss function is used to instruct the processing submodule 20 to obtain a processed image of the first face sample image and an error between an actual high-resolution image of the first face sample image (i.e. the second face sample image), such as a cross entropy loss function, which may be obtained by using the result obtained by the determining submodule 21 and the second face sample image; and then adjusting parameter values of parameters in the image processing initial model according to the loss function.

In the embodiment of the present invention, when the image processing system calculates the loss function, the following loss sub-functions may be calculated, but are not limited to: and taking the function calculation values of the characteristic loss subfunction, the countermeasure loss subfunction and the reconstruction loss subfunction as the loss function related to the image processing initial model, for example, taking the sum value of the subfunctions as the loss function. Wherein:

countering loss subfunction L _GAN Specifically, the result D (G (input)) for indicating whether the processed image of the first face sample image obtained by the discrimination sub-module 21 by the discrimination sub-module 20 is true and the result D (GT) for indicating whether the discrimination sub-module 21 discriminates the actual high-resolution image of the first face sample image (i.e. the second face sample image) is true may be expressed by the following formula 1, where D indicates the discrimination sub-module 21, G indicates the processing sub-module 20, GT indicates the high-resolution second face sample image in the training sample, and input indicates the low-resolution first face sample image in the training sample and the corresponding three-dimensional face image:

reconstruction of the loss subfunction L _rec Specifically, the difference between the processed image g (input) of the first face sample image obtained by the processing sub-module 20 and the actual high-resolution image (i.e. the second face sample image GT) of the first face sample image may be represented by the following formula 2, and the reconstruction loss sub-function may include a difference between the processed image g (input) and the second face sample image GT, and a difference between feature information LPIPS (g (input)) of the processed image and feature information LPIPS (GT) of the second face sample image:

L _rec ＝|G(input)-GT| _l -|LPIPS(G(input))-LPIPS(GT)| _l (2)

characteristic loss subfunction L _rec Specifically, the difference between the feature information of the processed image g (input) of the first face sample image obtained by the processing sub-module and the feature information of the actual high-resolution image (i.e., the second face sample image GT) of the first face sample image may be represented by the following formula 3, and the feature loss sub-function may include a distance between the feature information of the processed image g (input) and the feature information of the second face sample image GT:

L _ID ＝1-F _cos (F _ArcFace (G(input)),F _ArcFace (GT)) (3)

therefore, the loss function L calculated by the image processing system in relation to the initial model of image processing can be expressed by the following equation 4:

L＝L _GAN +L _rec +L _ID (4)

the training process of the image processing model is to reduce the value of the loss function as much as possible, and the training process is to continuously optimize the parameter values of the parameters in the image processing initial model determined in the step 201 by a series of mathematical optimization means such as back propagation derivation and gradient descent, and to minimize the calculated value of the loss function. Specifically, when the calculated loss function has a large function value, for example, a function value larger than a preset value, it is necessary to change a parameter value, for example, to reduce a weight value of a neuron connection, so that the calculated loss function has a small function value according to the adjusted parameter value.

It should be noted that, in the implementation process of actually adjusting the image processing initial model according to the calculated loss function, in one case, the parameter values of the parameters in the processing sub-module 20 and the determining sub-module 21 may be adjusted simultaneously according to the loss function obtained by the above formula 4. In another case, the parameter values of the parameters in the processing sub-module 20 may be fixed, and the parameter values of the parameters in the discrimination sub-module 21 may be adjusted according to the countermeasure loss sub-function obtained by the above formula 1; then, the parameter values of the parameters in the adjusted discrimination sub-module 21 are fixed, and the parameter values of the parameters in the processing sub-module 20 are adjusted according to the loss functions obtained by the above formulas 1 to 4.

In the process of adjusting the parameter values of the parameters in the determination submodule 21, the countermeasure loss subfunction obtained by the above formula 1 is the minimum, that is, the determination submodule 21 determines that the processed image of the first face sample image obtained by the determination processing submodule 20 is false, and determines that the second face sample image is true.

It should be noted that, while the above steps 203 to 204 are performed for the first adjustment of the parameter values in the image processing initial model according to the results obtained by the processing submodule 20 and the determination submodule 21 in the image processing initial model, in practical applications, the above steps 203 to 204 need to be continuously executed in a loop until the adjustment of the parameter values meets a certain stop condition.

Therefore, after executing steps 201 to 204 of the above embodiment, the image processing system needs to determine whether the current adjustment on the parameter value meets the preset stop condition, and if yes, the process is ended, and the processing sub-module in the image processing initial model obtained through the adjustment in step 204 is used as the preset image processing model; if not, the initial model is processed for the image after adjusting the parameter value, and the steps 203 to 204 are executed. Wherein the preset stop condition includes but is not limited to any one of the following conditions: the difference value between the current adjusted parameter value and the last adjusted parameter value is smaller than a threshold value, namely the adjusted parameter value reaches convergence; and the adjustment times of the parameter values are equal to the preset times, and the like.

It can be seen that, in the process of pre-training the image processing model in this embodiment, the countermeasure network formed by the processing sub-module 20 and the discrimination sub-module 21 is adopted, so that manual labeling of any sample image in the determined training sample is not required, unsupervised training of the image processing model is realized, and time spent on manual labeling is saved.

The image processing method of the present invention is described below as a specific application example, and the method of the present embodiment may include the following two parts:

(I) Pre-training image processing model

Specifically, the method adopted by the image processing system in training the image processing model is similar to the training method shown in fig. 4, except that in the present embodiment:

(1) in this embodiment, when determining the structure of the initial model for image processing, it may be specifically determined that the processing sub-module 20 includes 7 decoding modules (including one first decoding module 230 and 6 second decoding modules 240) and 7 encoding modules 220, and the resolutions corresponding to the encoding modules 220 and the decoding modules may both be increased from 4 × 4 to 512 × 512, where the structure of the second decoding module 240 is shown in fig. 6, where:

assuming that an encoding module 220 connected to the second decoding module 240 extracts a spatial feature of a resolution of the merged data of the first face sample image and the three-dimensional face image thereof as

And the feature information of one resolution acquired by the previous decoding module of the second decoding module 240 is

Wherein,

and

the corresponding resolution is the same, so:

after the hidden variable w of the mapping module 210 is input to the second decoding module 240, after being processed by a241, Mod242, and Demod243, the hidden variable w is input to a convolutional layer (Conv)244 to adjust the convolutional weight of the convolutional layer, so as to obtain an adjusted convolutional weight; and feature information of one resolution

The convolved features may be obtained by upsampling (upsample)245 and a convolution layer 244, performing convolution calculations in convolution layer 244 according to the adjusted convolution weights

When obtaining the hidden variable w, the mapping module 210 mainly uses the first face sample image I with low resolution _lq And corresponding three-dimensional face image I _3d The combined data of (a) is obtained, which is expressed by the following equation 5:

w＝MLP(I _lq ,I _3d ) (5)

spatial features of one resolution

After passing through another convolutional layer 246, is input into a series layer C247, and the series layer C247 will be based on spatial characteristics

Resulting post-convolution features

And based on the characteristic information

Resulting post-convolution features

The characteristic information of another resolution ratio can be obtained by serial connection

Specifically, it can be expressed by the following formula 6:

(2) in the present embodiment, the learning rate ratio of each of the encoding module 220, the output module 250 and the determining sub-module 21 in the processing sub-module 20 may be 100:10: 1.

It should be noted that, after the image processing model is trained, the operation logic of the image processing model may be preset in the system.

(II) as shown in FIG. 7, the image processing system can process any low-resolution target human face image:

in step 301, a user may designate a face image with a lower resolution (i.e., lower definition) as a target face image through a user interface provided by the image processing system, and trigger processing of the target face image.

Step 302, the image processing system may predict a network, such as ResNet50, based on the pre-trained 3D coefficients, i.e., predict the target face image I _lq The three-dimensional coefficient (coeff) of the face image included in (1) can be specifically expressed by the following formula 7:

coeff＝F _res50 (I _lq ) (7)

step 303, the image processing system may reconstruct three-dimensional mesh information (3D mesh) according to the obtained three-dimensional coefficients by calling a three-dimensional deformable face model (3DMM), the three-dimensional mesh information may include shapes (S) and textures (T) included in a plurality of surfaces forming a three-dimensional face (and a face image in a target face image), and then may project the three-dimensional mesh information onto a two-dimensional plane through a renderer, so as to obtain a three-dimensional image I of the target face image _3d The data of the three-dimensional image may include information such as a face shape and a light shadow. Specifically, the following equations 8 and 9 can be expressed when a three-dimensional image is acquired:

S，T＝F _3dmm (coeff) (8)

I _3d ＝F _render (S,T) (9)

and step 304, the image processing system calls a preset image processing model, and combines the data of the target face image and the three-dimensional image thereof to obtain combined data, namely the serial data of 6 channels, which is used as the input of the image processing model.

Step 305, the image processing model outputs the processed image of the target face image according to the merged data of the target face image and the three-dimensional image thereof.

Specifically, a coding module in the image processing model extracts spatial features of merged data of a target face image and a three-dimensional image thereof, a mapping module acquires hidden variables according to the merged data of the target face image and the three-dimensional image thereof to adjust convolution weights in a decoding module, the decoding module performs convolution calculation on the spatial features extracted by the coding module based on the convolution weights adjusted by the hidden variables, so as to obtain features after convolution, and finally an output module outputs a processed image of the target face image according to the features after convolution.

Therefore, the image processing model trained in advance in the embodiment can be combined with the two-dimensional target face image and the three-dimensional image thereof, the resolution of the obtained processed image is high, and the effect is improved.

The image processing method in the present invention is described below with another specific application example, the image processing system in the embodiment of the present invention is mainly a distributed system 100, and the distributed system may include a client 300 and a plurality of nodes 200 (any form of computing devices in an access network, such as a server and a user terminal), and the client 300 and the nodes 200 are connected through a network communication.

Taking a distributed system as an example of a blockchain system, referring To fig. 8, which is an optional structural schematic diagram of the distributed system 100 applied To the blockchain system provided in the embodiment of the present invention, the system is formed by a plurality of nodes 200 (computing devices in any form in an access network, such as servers and user terminals) and clients 300, a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

Referring to the functions of each node in the blockchain system shown in fig. 8, the functions involved include:

1) routing, a basic function that a node has, is used to support communication between nodes.

Besides the routing function, the node may also have the following functions:

2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization function to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.

For example, the service implemented by the application comprises code for image processing functions, which mainly comprise:

3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.

Referring to fig. 9, an optional schematic diagram of a Block Structure (Block Structure) provided in the embodiment of the present invention is shown, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.

An embodiment of the present invention further provides an image processing system, a schematic structural diagram of which is shown in fig. 10, and the image processing system may specifically include:

the coefficient acquiring unit 30 is configured to acquire a target face image and acquire a three-dimensional coefficient of the target face image.

A three-dimensional image unit 31, configured to obtain a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image obtained by the coefficient obtaining unit 30.

The three-dimensional image unit 31 is specifically configured to reconstruct three-dimensional network information of the target face image according to the three-dimensional coefficient of the target face image; the three-dimensional network information comprises shapes and textures of a plurality of surfaces forming a three-dimensional face, and the three-dimensional face is a three-dimensional face corresponding to a face contained in the target face image; and projecting the three-dimensional network information onto a two-dimensional plane in a rendering mode to obtain a three-dimensional image of the target face image.

And the model calling unit 32 is used for calling the pre-trained image processing model.

The model calling unit 32 is specifically configured to call an image processing model having the following structure: the image processing model includes: mapping module, a plurality of coding module, a plurality of decoding module and output module, wherein: the output of one encoding module is connected to one input of one decoding module, and the outputs of the mapping modules are respectively connected to the other inputs of the plurality of decoding modules; the plurality of encoding modules are connected in series, the plurality of decoding modules are connected in series, and the output of the last decoding module in the plurality of decoding modules is connected to the output module.

And a processing unit 33, configured to obtain a processed image of the target face image according to the merged data of the target face image and the three-dimensional image obtained by the three-dimensional image unit 31 by using the image processing model called by the model calling unit 32, and output the processed image, where a resolution of the processed image is higher than a resolution of the target face image.

A processing unit 33, configured to determine, by the mapping module, a hidden variable in combination with data of the target face image and the three-dimensional image, where the hidden variable is used to adjust a convolution weight of convolution calculation related to the decoding module; extracting spatial features of one resolution ratio of the merged data of the target face image and the three-dimensional image through the coding module; a second decoding module in the plurality of decoding modules performs convolution calculation according to the implicit variable, the spatial feature of one resolution ratio extracted by one coding module and the feature information of one resolution ratio acquired by a previous decoding module of the second decoding module to acquire the feature information of another resolution ratio; and the output module acquires the processed image of the target face image according to the feature information obtained by the last second decoding module.

In the processing unit 33, when performing convolution calculation by the second decoding module according to the hidden variable, the spatial feature of one resolution extracted by one encoding module, and the feature information of one resolution acquired by the previous decoding module of the second decoding module to obtain the feature information of another resolution, the convolution calculation method is specifically used for adjusting the convolution weight related to the convolution calculation by the second encoding module according to the hidden variable to obtain the adjusted convolution weight, and performing convolution calculation on the feature information of one resolution according to the adjusted convolution weight to obtain the convolved feature; and obtaining the feature information of the other resolution according to the spatial feature of the one resolution and the convolved feature. And in the process that the image processing model acquires the processed image of the target face image according to the merged data of the target face image and the three-dimensional image, a first decoding module of the plurality of decoding modules performs convolution calculation according to the hidden variable and the spatial feature obtained by one coding module to obtain feature information of a resolution.

Further, the image processing system in this embodiment may further include:

a training unit 34 for determining an image processing initial model; the image processing initial model comprises: a processing submodule and a discrimination submodule; determining a training sample, wherein the training sample comprises a plurality of sample groups, and each sample group comprises a first face sample image with low resolution, a second face sample image with high resolution corresponding to the first face sample image and a three-dimensional face image; the processing submodule acquires a processed image of the first face sample image according to the merged data of the first face sample image and the three-dimensional face image corresponding to the first face sample image, and the judging submodule judges whether the processed image acquired by the processing submodule is true or not; and adjusting the image processing initial model according to the result obtained by the discrimination submodule and the second face sample image in the training sample, wherein the processing submodule in the adjusted image processing initial model is the pre-trained image processing model called by the model calling unit 32.

The first face sample image is obtained by performing resolution reduction processing on the second face sample image, and the resolution reduction processing includes any one of the following processing modes: add blurring, downsampling, noise, and joint image expert group compression processing.

The training unit 34 is configured to adjust the image processing initial model according to the result obtained by the discrimination submodule and the second face sample image in the training sample, and specifically, is configured to calculate a loss function related to the image processing initial model according to the result obtained by the discrimination submodule and the second face sample image in the training sample, and adjust a parameter value of a parameter in the image processing initial model according to the loss function. Wherein, the training unit 34 is specifically configured to calculate a characteristic loss subfunction, a countering loss subfunction, and a reconstruction loss subfunction after calculating a loss function related to the image processing initial model according to the result obtained by the discrimination subfunction; the characteristic loss subfunction is used for representing the difference between the characteristic information of the processed image of the first face sample image acquired by the processing submodule and the characteristic information of the second face sample image; the countermeasure loss subfunction is used for expressing the result of whether the processed image of the first human face sample image acquired by the discrimination submodule discrimination processing submodule is true or not and the expectation of the result of whether the discrimination submodule discriminates the second human face sample image is true or not; the reconstruction loss sub-function is used for representing the difference between the processed image of the first face sample image acquired by the processing sub-module and the second face sample image; and taking the function calculation value of the characteristic loss sub-function, the countermeasure loss sub-function and the reconstruction loss sub-function as the loss function.

Further, the training unit 34 is further configured to stop the adjustment of the parameter value when the adjustment number of times of the parameter value is equal to a preset number of times, or if a difference between the currently adjusted parameter value and the last adjusted parameter value is smaller than a threshold value.

As can be seen, in the image processing system of this embodiment, the three-dimensional image unit 31 may obtain a three-dimensional image of the target face image according to the three-dimensional coefficient of the target face image, and the model calling unit 32 calls the pre-trained image processing model, and the image processing model obtains a processed image of the target face image according to the merged data of the target face image and the three-dimensional image thereof in the processing unit 33, where the resolution of the processed image is higher than the resolution of the target face image. Therefore, when the target face image with low definition (namely low resolution) is processed, the corresponding three-dimensional image is combined, wherein the data of the three-dimensional image can describe the face image in more detail, so that the effect of the processed image obtained by processing is well improved.

An embodiment of the present invention further provides a terminal device, a schematic structural diagram of which is shown in fig. 11, where the terminal device may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 40 (e.g., one or more processors) and a memory 41, and one or more storage media 42 (e.g., one or more mass storage devices) storing an application 421 or data 422. Memory 41 and storage medium 42 may be, among other things, transient or persistent storage. The program stored in the storage medium 42 may include one or more modules (not shown), each of which may include a series of instruction operations for the terminal device. Still further, the central processor 40 may be arranged to communicate with the storage medium 42, and to execute a series of instruction operations in the storage medium 42 on the terminal device.

Specifically, the application 421 stored in the storage medium 42 includes an application for image processing, and the application may include the coefficient acquisition unit 30, the three-dimensional image unit 31, the model calling unit 32, the processing unit 33, and the training unit 34 in the image processing system, which will not be described herein again. Further, the central processor 40 may be configured to communicate with the storage medium 42, and execute a series of operations corresponding to the application program of the image processing stored in the storage medium 42 on the terminal device.

The terminal equipment may also include one or more power supplies 43, one or more wired or wireless network interfaces 44, one or more input-output interfaces 45, and/or one or more operating systems 423, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the image processing system described in the above method embodiment may be based on the structure of the terminal device shown in fig. 11.

Further, another aspect of the embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a plurality of computer programs, and the computer programs are adapted to be loaded by a processor and execute the image processing method executed by the image processing system.

the memory is used for storing a plurality of computer programs, and the computer programs are loaded by the processor and used for executing the image processing method executed by the image processing system; the processor is configured to implement each of the plurality of computer programs.

Further, according to an aspect of the application, a computer program product or a computer program is provided, comprising computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method for processing the image under the condition provided in the above-mentioned various optional implementation modes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The image processing method, system, storage medium and terminal device provided in the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image processing method, comprising:

acquiring a target face image and a three-dimensional coefficient of the face image contained in the target face image;

calling a pre-trained image processing model;

2. The method according to claim 1, wherein the obtaining a three-dimensional image of the target face image according to the three-dimensional coefficients of the target face image specifically comprises:

reconstructing three-dimensional network information of the target face image according to the three-dimensional coefficient of the target face image; the three-dimensional network information comprises shapes and textures of a plurality of surfaces forming a three-dimensional face, and the three-dimensional face is a three-dimensional face corresponding to a face contained in the target face image;

and projecting the three-dimensional network information onto a two-dimensional plane in a rendering mode to obtain a three-dimensional image of the target face image.

3. The method of claim 1, wherein said invoking the pre-trained image processing model specifically comprises:

calling an image processing model of the following structure:

the image processing model includes: mapping module, a plurality of coding modules, a plurality of decoding modules and output module, wherein:

the output of one encoding module is connected to one input of one decoding module, and the outputs of the mapping modules are respectively connected to the other inputs of the plurality of decoding modules; the plurality of encoding modules are connected in series, the plurality of decoding modules are connected in series, and the output of the last decoding module in the plurality of decoding modules is connected to the output module.

4. The method according to claim 3, wherein the image processing model obtains the processed image of the target face image according to the merged data of the target face image and the three-dimensional image, and specifically comprises:

determining a hidden variable by the mapping module in combination with the data of the target face image and the three-dimensional image, wherein the hidden variable is used for adjusting the convolution weight of the convolution calculation related to the decoding module;

extracting spatial features of one resolution ratio of the merged data of the target face image and the three-dimensional image through the coding module;

a second decoding module in the plurality of decoding modules performs convolution calculation according to the implicit variable, the spatial feature of one resolution ratio extracted by one coding module and the feature information of one resolution ratio acquired by a previous decoding module of the second decoding module to acquire the feature information of another resolution ratio;

and the output module acquires the processed image of the target face image according to the feature information obtained by the last second decoding module.

5. The method as claimed in claim 4, wherein said second decoding module performs convolution calculation according to said hidden variable, the spatial feature of a resolution extracted by an encoding module, and the feature information of said resolution obtained by the previous decoding module of said second decoding module to obtain the feature information of another resolution, and specifically includes:

the second coding module adjusts the convolution weight involved in the convolution calculation according to the hidden variable to obtain an adjusted convolution weight, and performs convolution calculation on the feature information of the resolution ratio according to the adjusted convolution weight to obtain a feature after convolution;

and obtaining the feature information of the other resolution according to the spatial feature of the one resolution and the convolved feature.

6. The method of claim 4,

and in the process that the image processing model acquires the processed image of the target face image according to the merged data of the target face image and the three-dimensional image, a first decoding module in the plurality of decoding modules performs convolution calculation according to the hidden variable and the spatial feature of one resolution obtained by one coding module to obtain the feature information of the other resolution.

7. The method of any one of claims 1 to 6,

determining an image processing initial model; the image processing initial model comprises: a processing submodule and a discrimination submodule;

determining a training sample, wherein the training sample comprises a plurality of sample groups, and each sample group comprises a first face sample image with low resolution, a second face sample image with high resolution corresponding to the first face sample image and a three-dimensional face image;

the processing submodule acquires a processed image of the first face sample image according to the merged data of the first face sample image and the three-dimensional face image corresponding to the first face sample image, and the judging submodule judges whether the processed image acquired by the processing submodule is true or not;

and adjusting the image processing initial model according to the result obtained by the discrimination submodule and a second face sample image in the training sample, wherein a processing submodule in the adjusted image processing initial model is the pre-trained image processing model.

8. The method of claim 7, wherein the first face sample image is obtained by performing a resolution reduction process on the second face sample image, the resolution reduction process comprising any one of the following processes: add blurring, down-sampling, noise and joint image expert group compression processing.

9. The method of claim 7, wherein the adjusting the image processing initial model according to the result obtained by the discrimination sub-module and the second face sample image in the training sample comprises:

and calculating a loss function related to the image processing initial model according to a result obtained by the discrimination submodule and a second face sample image in the training sample, and adjusting parameter values of parameters in the image processing initial model according to the loss function.

10. The method according to claim 9, wherein the calculating a loss function associated with the initial image processing model according to the result obtained by the discrimination sub-module and the second face sample image in the training sample comprises:

calculating a characteristic loss sub-function, a countering loss sub-function and a reconstruction loss sub-function;

the characteristic loss subfunction is used for representing the difference between the characteristic information of the processed image of the first face sample image acquired by the processing submodule and the characteristic information of the second face sample image; the countermeasure loss subfunction is used for expressing the result of whether the processed image of the first human face sample image acquired by the discrimination submodule discrimination processing submodule is true or not and the expectation of the result of whether the discrimination submodule discriminates the second human face sample image is true or not; the reconstruction loss sub-function is used for representing the difference between the processed image of the first face sample image acquired by the processing sub-module and the second face sample image;

and taking the function calculation value of the characteristic loss sub-function, the countermeasure loss sub-function and the reconstruction loss sub-function as the loss function.

11. The method of claim 9, wherein the adjusting of the parameter value is stopped when the number of times of adjustment of the parameter value is equal to a preset number of times or if a difference between a currently adjusted parameter value and a last adjusted parameter value is less than a threshold value.

12. An image processing system, comprising:

13. An image processing method, comprising:

determining a training sample, wherein the training sample comprises a plurality of sample groups, and each sample group comprises a first face sample image with low resolution, a second face sample image with high resolution and a three-dimensional face image;

and adjusting the image processing initial model according to the result obtained by the discrimination submodule and a second face sample image in the training sample, wherein the processing submodule in the adjusted image processing initial model is an image processing model and is used for acquiring a corresponding high-resolution image according to any low-resolution image and the merging data of the three-dimensional images thereof.

14. The method according to claim 13, wherein the determining an initial model of image processing comprises:

determining processing sub-modules comprising the following structures in the image processing initial model:

the processing submodel comprises: mapping module, a plurality of coding modules, a plurality of decoding modules and output module, wherein:

15. The method according to claim 13 or 14, wherein the adjusting the image processing initial model according to the result obtained by the discrimination sub-module and the second face sample image in the training sample comprises:

taking the function calculation values of the characteristic loss sub-function, the antagonistic loss sub-function and the reconstruction loss sub-function as the loss function;

and adjusting parameter values of parameters in the image processing initial model according to the loss function.

16. A computer-readable storage medium, characterized in that it stores a plurality of computer programs adapted to be loaded by a processor and to perform the image processing method according to any one of claims 1 to 11, or to perform the image processing method according to any one of claims 13 to 15.

17. A terminal device comprising a processor and a memory;

the memory is for storing a plurality of computer programs for being loaded by the processor and for performing the image processing method of any one of claims 1 to 11, or for performing the image processing method of any one of claims 13 to 15; the processor is configured to implement each of the plurality of computer programs.