CN116309018A - Image processing method and device - Google Patents

Image processing method and device Download PDF

Info

Publication number
CN116309018A
CN116309018A CN202310155787.8A CN202310155787A CN116309018A CN 116309018 A CN116309018 A CN 116309018A CN 202310155787 A CN202310155787 A CN 202310155787A CN 116309018 A CN116309018 A CN 116309018A
Authority
CN
China
Prior art keywords
image
network model
target
processed
hair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310155787.8A
Other languages
Chinese (zh)
Inventor
邓世豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202310155787.8A priority Critical patent/CN116309018A/en
Publication of CN116309018A publication Critical patent/CN116309018A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application discloses an image processing method and device, and belongs to the technical field of artificial intelligence. The image processing method comprises the following steps: acquiring an image to be processed; inputting a target image area into a target residual error network model to obtain residual error output of the target residual error network model; wherein the target image area includes: the image area of the hair and the forehead is displayed in the image to be processed; and superposing the residual output to the image to be processed to obtain a target image after the image to be processed is amplified.

Description

Image processing method and device
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to an image processing method and device.
Background
With the increasing age and various factors in terms of social stress, more and more people gradually start losing hair and become annoyed by losing hair. The reduction of the hair volume brings great influence to part of people. Currently, a StyleGAN (style generation antagonism network) model can be used to generate an augmented effect graph. Thus, even if the user's hair volume is small, a hair-increasing effect map with a large hair volume can be generated.
As shown in fig. 1, a schematic process of generating an augmented effect graph using the StyleGAN model is shown. The effect of the encoder 11 is to Encode the input picture into a matrix of w+ and w+ as a series of numbers to represent user information. The StyleGAN12 realizes the enhancement of the hair by decoding the w+ information and adjusting the hair information.
However, the enhancement of the StyleGAN model has high performance requirements on hardware parts and high power consumption cost.
Disclosure of Invention
The embodiment of the application aims to provide an image processing method and device, which can solve the problems of higher performance requirements on hardware parts and higher power consumption expense when the increasing effect of people is realized in the related technology.
In a first aspect, an embodiment of the present application provides a method for image processing, including:
acquiring an image to be processed;
inputting a target image area into a target residual error network model to obtain residual error output of the target residual error network model; wherein the target image area includes: the image area of the hair and the forehead is displayed in the image to be processed;
and superposing the residual output to the image to be processed to obtain a target image after the image to be processed is amplified.
In a second aspect, an embodiment of the present application provides an apparatus for image processing, including:
the first acquisition module is used for acquiring an image to be processed;
the residual error module is used for inputting the target image area into a target residual error network model to obtain residual error output of the target residual error network model; wherein the target image area includes: the image area of the hair and the forehead is displayed in the image to be processed;
and the processing module is used for superposing the residual output to the image to be processed to obtain a target image after the image to be processed is amplified.
In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement a method as in the first aspect.
In the embodiment of the application, aiming at the image to be processed, the target image area with the hair and forehead displayed is input into the target residual error network model, so that residual error output representing the difference before and after the expansion of the image to be processed can be obtained. And further, by superposing the residual output on the image to be processed, the target image after the expansion of the image to be processed can be obtained. Because the target residual network model utilizes a residual network with lower requirements on hardware performance and lower power consumption expenditure. Therefore, the method and the device can reduce the performance requirement on hardware and reduce the power consumption overhead of the hardware on the basis of realizing the hair increasing effect.
Drawings
FIG. 1 is a schematic diagram of a process of generating an augmented effect graph currently using a StyleGAN model;
FIG. 2 is a flow chart of steps of a method of image processing provided in an embodiment of the present application;
FIG. 3 is a schematic diagram showing the change of the image to be processed before and after the image to be processed is amplified in the embodiment of the application;
FIG. 4 is a schematic diagram of training style generation countermeasure network model provided by embodiments of the present application;
FIG. 5 is a schematic diagram of obtaining a pair of contrast images before and after hair enhancement according to an embodiment of the present application;
fig. 6 is a schematic diagram of practical application of the method for image processing provided in the embodiment of the present application;
fig. 7 is a block diagram of an apparatus for image processing according to an embodiment of the present application;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application;
fig. 9 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application are within the scope of the protection of the present application.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type and do not limit the number of objects, for example, the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
The method for processing the image provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.
As shown in fig. 2, a schematic step diagram of a method for image processing according to an embodiment of the present application may include:
step 201: and acquiring an image to be processed.
In this step, the image to be processed is an image to be added with the hair-increasing effect, i.e. a pre-hair-increasing image. The method can be used for shooting the obtained images for electronic equipment such as mobile phones, cameras and the like. Here, the image to be processed may be acquired locally, for example, the method provided in the application is applied to a mobile phone, and the image to be processed may be a local image stored in the mobile phone. Of course, the image to be processed transmitted from other electronic devices can also be received. For example, the method provided by the application is applied to a server, and the image to be processed can be an image uploaded to the server by a mobile phone user.
It is understood that the image content of the image to be processed includes at least hair and forehead. Specifically, the image to be processed includes an image on which an hairline is displayed. It is to be noted that the size of the image to be processed is not limited here. That is, the size of the image to be processed may be any size. For example, the size of the image to be processed may be 1536×768, 4096×4096, but is not limited thereto.
Step 202: and inputting the target image area into a target residual error network model to obtain residual error output of the target residual error network model.
It should be noted that the target image area includes: the image area of hair and forehead is displayed in the image to be processed. Specifically, the target image area is the whole image area or part of the image area of the image to be processed. In the case where the target image area is a partial image area of the image to be processed, the partial image area is an image area in which hair and forehead are displayed.
The target residual network model is used for calculating the difference before and after the person increases. Specifically, the target residual network model is a model built based on a residual network. The target residual error network model can calculate the difference before and after the person increases, and output the difference as residual error. Wherein, the person hair extension refers to the increase of the hair or the hair quantity of the person. The hair volume after the person increases is obviously higher than the hair volume before the person increases.
Step 203: and overlapping the residual output to the image to be processed to obtain a target image after the expansion of the image to be processed.
It should be noted that the residual output is the difference between before and after the person's increase. Here, the image to be processed is taken as the image before the person is amplified, so that the image after the person is amplified, namely the target image after the image to be processed is amplified, can be obtained by overlapping the residual output with the image before the person is amplified. It will be appreciated that the location of the augmentation may be the location of the hairline of the image to be processed. So that the target image appears to be a person with a hairline moving toward the forehead portion, as compared to the image to be processed. As shown in fig. 3, the target image 33 can be obtained after the image 31 to be processed is superimposed with the residual output 32. It is apparent that the hairline in the target image 33 is closer to the face than the image 31 to be processed, so that the exposed forehead area is smaller. Of course, the hair increasing position can also be a position with sparse hair in the image to be processed, so that compared with the image to be processed, the target image is seen as a part with sparse hair of the person to become dense.
In the embodiment of the application, aiming at the image to be processed, the target image area with the hair and forehead displayed is input into the target residual error network model, so that residual error output representing the difference before and after the expansion of the image to be processed can be obtained. And further, by superposing the residual output on the image to be processed, the target image after the expansion of the image to be processed can be obtained. Because the target residual network model utilizes a residual network with lower requirements on hardware performance and lower power consumption expenditure. Therefore, the method and the device can reduce the performance requirement on hardware and reduce the power consumption overhead of the hardware on the basis of realizing the hair increasing effect.
Optionally, before inputting the target image region into the target residual network model, obtaining a residual output of the target residual network model, the method further comprises:
acquiring a plurality of pairs of contrast images before and after hair enhancement; wherein each pair of contrast images includes a pre-contrast image and a post-contrast image.
And taking the image before the expansion as input, taking the image after the expansion as a true value, and training the initial residual error network model to obtain the target residual error network model.
It should be noted that the initial residual network model is an untrained model built based on the residual network. The residual network may be built in a network form of an encoding and decoding structure, for example, the residual network may be built by UNet. In the process of training the initial residual network model, the learning rate and the loss function can be set automatically according to experience. For example, the learning rate may be set to 0.001, and the loss function may be a combination of the LPIPS (learning perceived image block similarity, learned Perceptual Image Patch Similarity) loss function+l1 loss function, but is not limited thereto. In the multiple pairs of contrast images before and after the hair extension, each pair of contrast images before and after the hair extension is an image of the same person, namely an image before the hair extension of the person and an image after the hair extension of the person, and the image after the hair extension of the person is an image obtained by adding the hair extension on the basis of the image before the hair extension. When a plurality of pairs of contrast images before and after the hair extension are acquired, a certain number of person images may be acquired first, then one person image after the hair extension is drawn for each person image, or the person image after the hair extension is generated by using an application program capable of realizing the hair extension effect at present, and the person image after the hair extension are used as a pair of contrast images before and after the hair extension, but the invention is not limited thereto.
The multiple pairs of contrast images before and after the hair increasing are used for carrying out model training on the initial residual error network model, so that a trained model, namely a target residual error network model, is obtained. Here, the number of contrast images before and after the hair extension is not limited here. Specifically, each pair of contrast images before and after the development can be used as a sample for one model training. And performing model training on the initial residual error network model by using a plurality of pairs of contrast images before and after the development by adopting a supervised learning mode until the model converges or the training times reach a threshold value. It is noted that, for supervised learning of the residual network, in each model training process, a pre-expansion image in a pair of pre-expansion and post-expansion contrast images is taken as an INPUT, the post-expansion image is taken as a real value (GT) and the model training is performed. Residual error output in the model can gradually and accurately represent differences before and after the person increases through continuous model training.
In the embodiment of the application, a plurality of pairs of contrast images before and after the person is amplified are used as training samples, and then the training samples are utilized to train the initial residual error network model, so that a target residual error network model for calculating the difference before and after the person is amplified can be obtained.
Optionally, acquiring a plurality of pairs of contrast images before and after hair enhancement includes:
a plurality of three-dimensional matrices conforming to a gaussian distribution are acquired.
And generating an countermeasure network model based on the pre-trained style for each three-dimensional matrix, and generating a pair of contrast images before and after hair increasing.
It should be noted that the trained style generates model inputs against the network model as a plurality of three-dimensional matrices conforming to gaussian distributions, and the model outputs as images containing hair and forehead. That is, any three-dimensional matrix conforming to Gaussian distribution is input into a trained style to generate an countermeasure network model, and an image containing hair and forehead can be obtained. Finally, a pair of images which both contain hair and forehead can be obtained by using the model output as a pair of contrast images before and after increasing hair. Preferably, the three-dimensional matrix obtained here is a randomly generated three-dimensional matrix.
In the embodiment of the application, the countermeasure network model is generated based on the three-dimensional matrix and the pre-trained style, so that the contrast images before and after the hair enhancement can be generated rapidly, and meanwhile, the manufacturing process of the contrast images before and after the hair enhancement is simplified.
Optionally, before generating the countermeasure network model based on the pre-trained style for each three-dimensional matrix, and generating a pair of contrast images before and after the hair extension, the method further includes:
Acquiring a training sample set; the training sample set comprises a plurality of random three-dimensional matrixes conforming to Gaussian distribution and training images containing forehead and hair;
training the target model based on the training sample set to obtain a pre-trained style generation countermeasure network model;
wherein the object model comprises: the initial style generates an countermeasure network model and a discriminant model, and the initial style generates an output of the countermeasure network model and a training image as inputs of the discriminant model.
It should be noted that the training sample set is used for training of the target model. An image containing hair and forehead may be generated due to the trained style generation countermeasure network model. Therefore, a training image including the forehead and the hair needs to be used. The size of the training images is not limited here, but it is required that each training image has the same size. Specifically, when a training image is acquired, high-definition portrait data obtained by shooting with a single-phase inverter can be acquired first; and then, carrying out face detection and face rotation correction alignment on the high-definition human image data. Finally, hair region clipping is carried out on the face after rotation alignment, the face is uniformly clipped into a size with width of 2:1 and height of 2:1 (other sizes can also be adopted, 1536×768 are taken as an example here), and the clipped image is a training image.
Since the object model includes: the initial style generates an countermeasure network model and a discriminant model, and the initial style generates an output of the countermeasure network model and a training image as inputs of the discriminant model. Therefore, the process of training the target model is the process of generating the countermeasure network model by the initial style in the training target model, and the process of generating the countermeasure network model by the initial style after the training is completed is the process of generating the countermeasure network model by the trained style. Wherein the initial style generates an countermeasure network model as a generator, and an image can be generated based on a random three-dimensional matrix conforming to a Gaussian distribution. Therefore, the initial style generation countermeasure network model herein can also be understood as a synthesized network section in the style generation countermeasure network model constituted by a Mapping network (Mapping network) and a synthesized network (Synthesis network). It can be understood that in each training process of the target model, a random three-dimensional matrix is required to be input into the initial style generation countermeasure network model, and then the model output of the initial style generation countermeasure network model and a training image are input into the discriminator model, so as to obtain the model output of the discriminator model. After calculating the model loss, model parameters of the initial style generation countermeasure network model and the discriminant model are reversely updated by using a back propagation algorithm. Model training is stopped after the initial style generation antagonism network model is fully converged.
Specifically, as shown in FIG. 4, a generator 41 may be initialized and a discriminant model 42 may be initialized.
Step 401 is first performed: a three-dimensional matrix w+ conforming to a gaussian distribution is randomly generated, and w+ is input to the generator 41, resulting in an RGB image 43 output by the generator 41, where RGB represents the colors of the three channels red, green, and blue. At this time, a training image 44 is taken as a real hair picture, and its corresponding label is set to True (correct). Taking the RGB image 43 as a False hair picture, setting a label corresponding to the RGB image as False (error), calculating loss by using a loss function formula 1 at the moment, and back-propagating and updating model parameters of a discriminator model;
loss=log(exp(D(G(W+)))+1)+log(exp(-D(x))+1) (1)
where G (W+) represents RGB image 43, x training image 44, and D represents the discriminant model 42.
Step 402 is performed: the three-dimensional matrix w++ conforming to the gaussian distribution is randomly generated again, and w++ is input into the generator 41 to obtain a new RGB image. The trained arbiter model 42 and a new RGB image are present at this time, the loss is calculated by the loss function equation 2, and the weights of the update generator 41 are back-propagated;
loss=-log(exp(D(G(W+)))+1) (2)
by repeating these two steps in the order of step 401 and step 402, training of the generator 41 can be achieved until the generator 41 is fully converged. At this time, a mapping relationship of w+ to 1536×768 images will be obtained. Any random three-dimensional matrix input to the generator 41 that corresponds to a gaussian distribution will result in a corresponding 1536 x 768 image that includes hair and forehead.
In the embodiment of the application, training of generating the countermeasure network model for the style can be achieved by utilizing the random three-dimensional matrix and the training image containing the forehead and the hair, so that the network model for generating the image containing the forehead and the hair by utilizing the three-dimensional matrix is obtained.
Optionally, generating, for each three-dimensional matrix, an countermeasure network model based on a pre-trained style, and generating a pair of contrast images before and after hair augmentation, including:
and respectively inputting the three-dimensional matrixes into a style generation countermeasure network model to respectively obtain pre-expansion images output by the style generation countermeasure network model.
Reversely adjusting the input of the style generation countermeasure network model based on the output of the style generation countermeasure network model and the guide image until the times of reversely adjusting reach the target times; the guiding image is a contrast image after the expansion, which is drawn based on the image before the expansion.
And after the last reverse adjustment, determining the output of the style generation countermeasure network model as an amplified image.
It should be noted that, a pair of contrast images before and after the hair enhancement may be generated for each three-dimensional matrix, and the embodiment of the present application will be described by taking only one three-dimensional matrix as an example to generate a pair of contrast images before and after the hair enhancement. Since any arbitrary three-dimensional matrix conforming to Gaussian distribution is input into the style generation countermeasure network model, a corresponding image containing hair and forehead is obtained. Therefore, the first obtained image is used as a pre-expansion image of a pair of pre-expansion and post-expansion contrast images. And then drawing an amplified contrast image based on the image before amplification, and reversely adjusting the three-dimensional matrix by taking the amplified contrast image as a guide image. When the number of times of reverse adjustment reaches the target number of times, the image output by the style generation countermeasure network model is quite close to the guide image, so that the image output at the moment is used as an amplified image in a pair of contrast images before and after the amplification.
As shown in fig. 5, in order to obtain a pre-hair growth image 51 and a post-hair growth image 52 of a pair of pre-hair growth and post-hair growth contrast images, the method specifically includes:
step 501: a three-dimensional matrix w+ conforming to the gaussian distribution is input to the style generation countermeasure network model 53 to obtain the pre-hair growth image 51, and the pre-hair growth image 51 is stored as a pre-hair growth image in a pair of pre-hair growth and post-hair growth comparison images.
Step 502: the guide image 54 is generated by drawing hair in the hair-increasing area (only the hair color may be drawn) in the pre-hair-increasing image 51 using a brush.
Step 503: the difference between the pre-expansion image 51 and the guide image 54 is calculated using the L1 loss function, and the difference in effects of the two images loss= |pre-expansion image 51-guide image 54|.
Step 504: the learning rate in the model training process is set to 0.01, and the optimizer is set to Adam (adaptive moment estimation ) optimizer, and the back propagation method is used to generate counter-propagation of the loss along the wind lattice to the three-dimensional matrix w+ against the network model 53, and at this time, the number in the three-dimensional matrix w+ is updated by the gradient of the back propagation to obtain a new three-dimensional matrix w+.
Step 505: the new three-dimensional matrix w+ is input to the style generation countermeasure network model 53, and a new pre-expansion image is obtained.
Step 506: the difference between the new pre-expansion image and the guide image 54 is calculated using the L1 loss function, resulting in an effect difference loss= |new pre-expansion image-guide image 54|.
Steps 504 to 506 are repeatedly performed, and the target number of times is repeatedly performed. Wherein, in the course of repeated execution, the style generation countermeasure network model 53 refers to the guide image 54 to gradually perform hair growth on the pre-hair-growth image 51 and fill the hair texture. The target number of times may be arbitrarily set, for example, the target number of times may be 100 times, but is not limited thereto. The image finally output by the style generation countermeasure network model 53 is used as a post-expansion image of a pair of pre-expansion and post-expansion contrast images.
In the embodiment of the application, after the pre-expansion image is generated, the drawn contrast image after expansion is taken as the guiding image. By continuously and reversely adjusting the input of the style generation countermeasure network model, the post-expansion image can be obtained, and the size of the pre-expansion image is the same as that of the post-expansion image.
Optionally, the input of the target residual network model is an image of a first size, and when the size of the image to be processed is smaller or larger than the first size, the target image area is input to the target residual network model to obtain a residual output of the target residual network model, including:
Affine transformation is carried out on the target image area, and a first intermediate image with a first size is generated;
inputting the first intermediate image into a target residual error network model to obtain residual error output of the target residual error network model;
the residual output is overlapped to the image to be processed to obtain a target image after the image to be processed is amplified, and the method comprises the following steps:
performing inverse affine transformation on the residual output to generate a second intermediate image with a second size, wherein the second size is the size of the image to be processed;
and superposing the second intermediate image on the image to be processed to obtain a target image.
It should be noted that the target residual network model is a pre-trained network model, and the size of the input image will be fixed after training is finished. Since the size of the image to be processed does not necessarily meet the size requirement of the target residual network model on the input image. Thus, the size of the image to be processed can be detected in advance. If the size of the image to be processed is the first size, that is, the size of the image to be processed meets the size requirement of the target residual error network model on the input image, the image to be processed can be directly used. Otherwise, affine transformation is needed, and inverse affine transformation is conducted on the residual output, so that the size of the residual output is consistent with the size of the image to be processed, and various problems caused by inconsistent sizes are avoided. For example, the target residual network model is trained using 1024 resolution images. Under the condition that the original image (to-be-processed image) is 4096 resolution, down-sampling to 1024 resolution by affine transformation, then feeding the target residual error network model to generate a residual error output with 1024 resolution, and amplifying the residual error output to 4096 resolution to be superimposed on the original image.
In the embodiment of the application, the to-be-processed images with various sizes can be processed by carrying out affine transformation on the to-be-processed images and carrying out inverse affine transformation on residual output, and the definition of the target image is not affected.
Optionally, affine transforming the target image area to generate a first intermediate image of a first size, comprising:
and cutting out a target image area in the image to be processed to obtain a cut-out image.
Affine transformation is carried out on the clipping image, and a first intermediate image with a first size is generated.
It should be noted that, since the method provided in the present application is to implement an enhancement effect on the image to be processed. Therefore, the processing may be performed only for the target image area in the image to be processed. Here, the target image area is cropped out, resulting in a cropped image.
In the embodiment of the application, the clipping image can be obtained by clipping the target image area in the image to be processed. And then the residual error output is obtained by utilizing the clipping image, so that the influence caused by the information of other image areas except the target image area in the image to be processed can be avoided.
Fig. 6 is a schematic diagram of practical application of the image processing method according to the embodiment of the present application; the resolution is taken as the size of the image, and the target image 62 after the expansion is generated for the image 61 to be processed with the size of 4096 x 4096. Assume that the size of the input image of the target residual network model 63 that has been trained is 1536×768. The process includes:
Step 601: a to-be-processed image 61 of a size 4096×4096 is acquired.
Step 602: for the image to be processed 61, a human face frame is detected and the image areas of the hair and forehead are cut out to obtain a cut image 64.
Step 603: the cropped image 64 is scaled to 1536 x 768 to obtain the original intermediate image 65.
Step 604: the original intermediate image 65 of 1536×768 is input into the target residual network model 63 to obtain the residual output 66 of 1536×768. Wherein the residual output 66 may be understood as the residual corresponding to the original intermediate image 65.
Step 605: the residual output 66 of 1536 x 768 is scaled to the intermediate residual output 67 of 4096 x 4096. Wherein the intermediate residual output 67 may be understood as corresponding cropped image 64.
Step 606: based on the intermediate residual output 67, a residual, i.e. a target residual 68, corresponding to the image 61 to be processed is generated. Superimposing the target residual 68 to the cropped area of the image to be processed 61 results in a final effect map, i.e. the target image 62.
According to the method and the device, the purpose that the definition of the final picture can be improved under the condition of controlling performance power consumption is achieved through designing the residual error network, and the defects that the definition is reduced, the performance power consumption is too high and operators are not supported under the traditional style generation countermeasure network route are overcome.
It should be noted that, in the image processing method provided in the embodiment of the present application, the execution subject may be an image processing apparatus or a control module of the image processing apparatus for executing the image processing method. In the embodiments of the present application, a method for executing image processing by an image processing apparatus is taken as an example, and the image processing apparatus provided in the embodiments of the present application is described.
As shown in fig. 7, an embodiment of the present application further provides an apparatus for image processing, where the apparatus includes:
a first acquisition module 71 for acquiring an image to be processed;
the residual error module 72 is configured to input the target image area to a target residual error network model, and obtain a residual error output of the target residual error network model; wherein the target image area includes: the image area of hair and forehead is displayed in the image to be processed;
and the processing module 73 is used for superposing the residual output on the image to be processed to obtain the target image after the image to be processed is amplified.
Optionally, the apparatus further comprises:
the second acquisition module is used for acquiring a plurality of pairs of contrast images before and after hair increasing; wherein, each pair of the contrast images comprises an image before the hair is amplified and an image after the hair is amplified;
the first training module is used for taking the image before the expansion as input and taking the image after the expansion as a true value, and training the initial residual error network model to obtain the target residual error network model.
Optionally, the second acquisition module includes:
an acquisition unit for acquiring a plurality of three-dimensional matrices conforming to gaussian distribution;
the generating unit is used for generating an countermeasure network model based on the pre-trained style for each three-dimensional matrix and generating a pair of contrast images before and after hair increasing.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring a training sample set; the training sample set comprises a plurality of random three-dimensional matrixes conforming to Gaussian distribution and training images containing forehead and hair;
the second training module is used for training the target model based on the training sample set to obtain a pre-trained style generation countermeasure network model;
wherein the object model comprises: the initial style generates an countermeasure network model and a discriminant model, and the initial style generates an output of the countermeasure network model and a training image as inputs of the discriminant model.
Optionally, the generating unit is specifically configured to:
respectively inputting a plurality of three-dimensional matrixes into a style generation countermeasure network model to respectively obtain pre-expansion images output by the style generation countermeasure network model;
reversely adjusting the input of the style generation countermeasure network model based on the output of the style generation countermeasure network model and the guide image until the times of reversely adjusting reach the target times; the guiding image is a contrast image after the expansion, which is drawn based on the image before the expansion;
And after the last reverse adjustment, determining the output of the style generation countermeasure network model as an amplified image.
Optionally, the input of the target residual network model is an image of a first size, and in the case that the size of the image to be processed is smaller or larger than the first size, the residual module 72 includes:
an affine transformation unit for performing affine transformation on the target image area to generate a first intermediate image of a first size;
the residual error unit is used for inputting the first intermediate image into the target residual error network model to obtain residual error output of the target residual error network model;
a processing module 73 comprising:
an inverse affine transformation unit for performing inverse affine transformation on the residual output to generate a second intermediate image of a second size, wherein the second size is the size of the image to be processed;
and the processing unit is used for superposing the second intermediate image on the image to be processed to obtain a target image.
Optionally, the affine transformation unit is specifically configured to:
cutting a target image area in the image to be processed to obtain a cut image;
affine transformation is carried out on the clipping image, and a first intermediate image with a first size is generated.
In the embodiment of the application, aiming at the image to be processed, a target image area with hairs and forehead displayed is input into a target residual error network model. Because the target residual network model is used for calculating the difference before and after the person adds the hair, residual output representing the difference before and after the image to be processed adds the hair can be obtained. And further, by superposing the residual output on the image to be processed, the target image after the expansion of the image to be processed can be obtained. Because the target residual network model utilizes a residual network with lower requirements on hardware performance and lower power consumption expenditure. Therefore, the method and the device can reduce the performance requirement on hardware and reduce the power consumption overhead of the hardware on the basis of realizing the hair increasing effect.
The image processing apparatus in the embodiment of the present application may be an electronic device, or may be a component in an electronic device, for example, an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.
The image processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
The image processing device provided in the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 2 to fig. 6, and achieve the same technical effects, so that repetition is avoided, and no further description is given here.
Optionally, as shown in fig. 8, the embodiment of the present application further provides an electronic device 800, including a processor 801 and a memory 802, where a program or an instruction capable of being executed on the processor 801 is stored in the memory 802, and the program or the instruction is executed by the processor 801 to implement each step of the above-mentioned image processing method embodiment, and achieve the same technical effect, so that repetition is avoided, and no further description is given here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 9 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 900 includes, but is not limited to: radio frequency unit 901, network module 902, audio output unit 903, input unit 904, sensor 905, display unit 906, user input unit 907, interface unit 908, memory 909, and processor 910.
Those skilled in the art will appreciate that the electronic device 900 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 910 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 9 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
A processor 910, configured to acquire an image to be processed.
The processor 910 is further configured to input the target image area to a target residual error network model, to obtain a residual error output of the target residual error network model; wherein the target image area includes: the image area of hair and forehead is displayed in the image to be processed.
The processor 910 is further configured to superimpose the residual output on the image to be processed, so as to obtain a target image after the image to be processed is amplified.
In the embodiment of the application, aiming at the image to be processed, the target image area with the hair and forehead displayed is input into the target residual error network model, so that residual error output representing the difference before and after the expansion of the image to be processed can be obtained. And further, by superposing the residual output on the image to be processed, the target image after the expansion of the image to be processed can be obtained. Because the target residual network model utilizes a residual network with lower requirements on hardware performance and lower power consumption expenditure. Therefore, the method and the device can reduce the performance requirement on hardware and reduce the power consumption overhead of the hardware on the basis of realizing the hair increasing effect.
It should be appreciated that in embodiments of the present application, the input unit 904 may include a graphics processor (Graphics Processing Unit, GPU) 9041 and a microphone 9042, with the graphics processor 9041 processing image data of still pictures or video obtained by an image capture device (e.g., a camera) in a video capture mode or an image capture mode. The display unit 906 may include a display panel 9061, and the display panel 9061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 907 includes at least one of a touch panel 9071 and other input devices 9072. Touch panel 9071, also referred to as a touch screen. The touch panel 9071 may include two parts, a touch detection device and a touch controller. Other input devices 9072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
The memory 909 may be used to store software programs as well as various data. The memory 909 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 909 may include volatile memory or nonvolatile memory, or the memory x09 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 909 in embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
Processor 910 may include one or more processing units; optionally, the processor 910 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 910.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, where the program or the instruction implements each process of the method embodiment of image processing described above when executed by a processor, and the process can achieve the same technical effect, so that repetition is avoided and no further description is given here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, the processor is configured to run a program or an instruction, implement each process of the above image processing method embodiment, and achieve the same technical effect, so that repetition is avoided, and no further description is provided here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
The embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the method embodiments of image processing described above, and achieve the same technical effects, and are not described herein in detail for avoiding repetition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims (10)

1. A method of image processing, the method comprising:
acquiring an image to be processed;
inputting a target image area into a target residual error network model to obtain residual error output of the target residual error network model; wherein the target image area includes: the image area of the hair and the forehead is displayed in the image to be processed;
and superposing the residual output to the image to be processed to obtain a target image after the image to be processed is amplified.
2. The method of claim 1, wherein prior to said inputting the target image region into the target residual network model to obtain a residual output of the target residual network model, the method further comprises:
acquiring a plurality of pairs of contrast images before and after hair enhancement; wherein, each pair of the contrast images comprises an image before the hair is amplified and an image after the hair is amplified;
and taking the pre-expansion image as input, taking the post-expansion image as a true value, and training an initial residual error network model to obtain the target residual error network model.
3. The method of claim 2, wherein the acquiring the plurality of pairs of pre-and post-contrast-with-hair images comprises:
Acquiring a plurality of three-dimensional matrixes conforming to Gaussian distribution;
and generating an countermeasure network model based on the pre-trained style for each three-dimensional matrix, and generating a pair of contrast images before and after hair increasing.
4. A method according to claim 3, wherein said generating a pair of pre-and post-augmentation contrast images for each of said three-dimensional matrices based on a pre-trained style, respectively, comprises:
respectively inputting the three-dimensional matrixes into the style generation countermeasure network model to respectively obtain the pre-expansion images output by the style generation countermeasure network model;
reversely adjusting the input of the style generation countermeasure network model based on the output of the style generation countermeasure network model and the guide image until the times of reversely adjusting reach the target times; the guiding image is a contrast image after the hair is amplified, which is drawn based on the image before the hair is amplified;
and after the last reverse adjustment, determining the output of the style generation countermeasure network model as the augmented image.
5. The method according to claim 1, wherein the input of the target residual network model is an image of a first size, and wherein the inputting the target image area to the target residual network model to obtain the residual output of the target residual network model in the case that the size of the image to be processed is smaller or larger than the first size comprises:
Affine transformation is carried out on the target image area, and a first intermediate image with a first size is generated;
inputting the first intermediate image into the target residual error network model to obtain residual error output of the target residual error network model;
the step of overlapping the residual output to the image to be processed to obtain the target image after the image to be processed is amplified, comprises the following steps:
performing inverse affine transformation on the residual output to generate a second intermediate image with a second size, wherein the second size is the size of the image to be processed;
and superposing the second intermediate image on the image to be processed to obtain the target image.
6. An apparatus for image processing, the apparatus comprising:
the first acquisition module is used for acquiring an image to be processed;
the residual error module is used for inputting the target image area into a target residual error network model to obtain residual error output of the target residual error network model; wherein the target image area includes: the image area of the hair and the forehead is displayed in the image to be processed;
and the processing module is used for superposing the residual output to the image to be processed to obtain a target image after the image to be processed is amplified.
7. The apparatus of claim 6, wherein the apparatus further comprises:
the second acquisition module is used for acquiring a plurality of pairs of contrast images before and after hair increasing; wherein, each pair of the contrast images comprises an image before the hair is amplified and an image after the hair is amplified;
and the first training module is used for taking the pre-expansion image as input, taking the post-expansion image as a true value and training an initial residual error network model to obtain the target residual error network model.
8. The apparatus of claim 7, wherein the second acquisition module comprises:
an acquisition unit for acquiring a plurality of three-dimensional matrices conforming to gaussian distribution;
the generation unit is used for generating an countermeasure network model based on the pre-trained style for each three-dimensional matrix and generating a pair of contrast images before and after hair increasing.
9. The apparatus according to claim 8, wherein the generating unit is specifically configured to:
respectively inputting the three-dimensional matrixes into the style generation countermeasure network model to respectively obtain the pre-expansion images output by the style generation countermeasure network model;
reversely adjusting the input of the style generation countermeasure network model based on the output of the style generation countermeasure network model and the guide image until the times of reversely adjusting reach the target times; the guiding image is a contrast image after the hair is amplified, which is drawn based on the image before the hair is amplified;
And after the last reverse adjustment, determining the output of the style generation countermeasure network model as the augmented image.
10. The apparatus of claim 6, wherein the input of the target residual network model is an image of a first size, and wherein the residual module comprises, if the size of the image to be processed is smaller or larger than the first size:
an affine transformation unit for performing affine transformation on the target image area to generate a first intermediate image of a first size;
the residual unit is used for inputting the first intermediate image into the target residual network model to obtain residual output of the target residual network model;
the processing module comprises:
an inverse affine transformation unit, configured to perform inverse affine transformation on the residual output, and generate a second intermediate image with a second size, where the second size is a size of the image to be processed;
and the processing unit is used for superposing the second intermediate image on the image to be processed to obtain the target image.
CN202310155787.8A 2023-02-21 2023-02-21 Image processing method and device Pending CN116309018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310155787.8A CN116309018A (en) 2023-02-21 2023-02-21 Image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310155787.8A CN116309018A (en) 2023-02-21 2023-02-21 Image processing method and device

Publications (1)

Publication Number Publication Date
CN116309018A true CN116309018A (en) 2023-06-23

Family

ID=86835295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310155787.8A Pending CN116309018A (en) 2023-02-21 2023-02-21 Image processing method and device

Country Status (1)

Country Link
CN (1) CN116309018A (en)

Similar Documents

Publication Publication Date Title
CN109961507B (en) Face image generation method, device, equipment and storage medium
CN109859098B (en) Face image fusion method and device, computer equipment and readable storage medium
KR102134405B1 (en) System and Method for Improving Low Light Level Image Using Generative Adversarial Network
CN107948529B (en) Image processing method and device
US10580182B2 (en) Facial feature adding method, facial feature adding apparatus, and facial feature adding device
US11367163B2 (en) Enhanced image processing techniques for deep neural networks
CN112561846A (en) Method and device for training image fusion model and electronic equipment
Akimoto et al. 360-degree image completion by two-stage conditional gans
CN111127309A (en) Portrait style transfer model training method, portrait style transfer method and device
CN111768326A (en) High-capacity data protection method based on GAN amplification image foreground object
CN115294055A (en) Image processing method, image processing device, electronic equipment and readable storage medium
CN115984447A (en) Image rendering method, device, equipment and medium
CN113487512B (en) Digital image restoration method and device based on edge information guidance
CN114170472A (en) Image processing method, readable storage medium and computer terminal
US20240013358A1 (en) Method and device for processing portrait image, electronic equipment, and storage medium
CN114677716A (en) Image style migration method and device, computer equipment and storage medium
CN115393183B (en) Image editing method and device, computer equipment and storage medium
CN116309018A (en) Image processing method and device
US20230087476A1 (en) Methods and apparatuses for photorealistic rendering of images using machine learning
CN113920005B (en) Method for constructing single face skin difference picture pair
Akimoto et al. Scenery image extension via inpainting with a mirrored input
CN115496651A (en) Feature processing method and device, computer-readable storage medium and electronic equipment
US20220292753A1 (en) Animation generation method for tracking facial expression and neural network training method thereof
CN115546584A (en) Image processing method and device
CN115861110A (en) Image processing method, image processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination