CN111402113B

CN111402113B - Image processing method, image processing device, electronic equipment and computer readable medium

Info

Publication number: CN111402113B
Application number: CN202010157936.0A
Authority: CN
Inventors: 李华夏
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2021-10-15
Anticipated expiration: 2040-03-09
Also published as: CN111402113A

Abstract

The method adopts different first attributes to be interpolated to represent different ages, carries out down-sampling on an image to be processed through an age-varying special effect network to obtain a down-sampling result of the image to be processed, carries out up-sampling interpolation on the down-sampling result according to the first attributes to be interpolated corresponding to the target age, can realize age-varying special effect processing on the image to be processed to obtain an age-varying special effect image, namely realizes automatic conversion of the image among different ages, and provides a brand new special effect experience for users.

Description

Image processing method, image processing device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable medium.

Background

With the rapid development of computer technology and communication technology, the use of intelligent terminals is widely popularized, and more application programs are developed to facilitate and enrich the work and life of people.

Currently, many applications are dedicated to providing more personalized visual special effects with better visual perception for intelligent terminal users, such as filter effects, sticker effects, deformation effects, and the like. The diversity, interactivity and sociality of the visual special effect experience are fully developed.

Users often have a need to build various network images using visual features according to their interests or other motivations. For example, the patterns of the user who becomes younger or older can be displayed through photos or videos on the network, so that the camouflage effect is achieved.

However, in the prior art, a special effect of interconversion between a young image and an old image has not been achieved.

Disclosure of Invention

In order to overcome the above technical problems or at least partially solve the above technical problems, the following technical solutions are proposed:

in a first aspect, the present disclosure provides an image processing method, including:

acquiring an image to be processed and a pre-trained age-variable special effect network, wherein the age-variable special effect network corresponds to at least three ages;

determining a target age to be transformed, and acquiring a first attribute to be interpolated corresponding to the target age, wherein the first attribute to be interpolated corresponding to each age is determined according to training of an age-varying special effect network;

and performing down-sampling on the image to be processed through an age-varying special effect network to obtain a down-sampling result of the image to be processed, and performing up-sampling interpolation on the down-sampling result according to a first attribute to be interpolated corresponding to the target age to obtain an age-varying special effect image.

In a second aspect, the present disclosure provides an image processing apparatus comprising:

the acquisition module is used for acquiring an image to be processed and a pre-trained variable-age special effect network, and the variable-age special effect network corresponds to at least three ages;

the determining module is used for determining the target age to be transformed and acquiring a first attribute to be interpolated corresponding to the target age, wherein the first attribute to be interpolated corresponding to each age is determined according to training of the age-varying special effect network;

and the special effect processing module is used for carrying out down-sampling on the image to be processed through the age-variable special effect network to obtain a down-sampling result of the image to be processed, and carrying out up-sampling interpolation on the down-sampling result according to a first attribute to be interpolated corresponding to the target age to obtain the age-variable special effect image.

In a third aspect, the present disclosure provides an electronic device comprising:

a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement a method as set forth in the first aspect of the disclosure.

In a fourth aspect, the present disclosure provides a computer readable medium for storing a computer instruction, program, code set or instruction set which, when run on a computer, causes the computer to perform the method as set forth in the first aspect of the disclosure.

According to the image processing method, the image processing device, the electronic equipment and the computer readable medium, different first attributes to be interpolated are adopted to represent different ages, the images to be processed are subjected to down-sampling through an age-varying special effect network to obtain down-sampling results of the images to be processed, the down-sampling results are subjected to up-sampling interpolation according to the first attributes to be interpolated corresponding to the target ages, then the age-varying special effect processing of the images to be processed can be achieved, the age-varying special effect images are obtained, namely, the images are automatically converted among different ages, and a brand new special effect experience is provided for users.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a countermeasure generation network provided by an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a training model provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing the features, data, elements, devices, modules or units, and are not used for limiting the features, data, elements, devices, modules or units to be specific to different features, data, elements, devices, modules or units, and also for limiting the sequence or interdependence relationship of the functions performed by the features, data, elements, devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

An embodiment of the present disclosure provides an image processing method, as shown in fig. 1, the method including:

step S110: acquiring an image to be processed and a pre-trained age-variable special effect network, wherein the age-variable special effect network corresponds to at least three ages;

step S120: determining a target age to be transformed, and acquiring a first attribute to be interpolated corresponding to the target age, wherein the first attribute to be interpolated corresponding to each age is determined according to training of an age-varying special effect network;

step S130: and performing down-sampling on the image to be processed through an age-varying special effect network to obtain a down-sampling result of the image to be processed, and performing up-sampling interpolation on the down-sampling result according to a first attribute to be interpolated corresponding to the target age to obtain an age-varying special effect image.

According to the image processing method provided by the embodiment of the disclosure, different first attributes to be interpolated are adopted to represent different ages, the image to be processed is subjected to down-sampling through an age-varying special effect network to obtain a down-sampling result of the image to be processed, and after the down-sampling result is subjected to up-sampling interpolation according to the first attributes to be interpolated corresponding to the target age, the age-varying special effect processing of the image to be processed can be realized to obtain an age-varying special effect image, namely, the image is automatically converted among different ages, and a brand new special effect experience is provided for a user.

In the embodiment of the disclosure, the age-varying special effect network is obtained by training through the following steps:

step S210: acquiring a pre-constructed countermeasure generating network, wherein the countermeasure generating network comprises a generating network, a first judging network, a second judging network and a first regression classification network; sampling each sample image containing age information down through a generation network to obtain corresponding image characteristics, and performing up-sampling and age interpolation processing on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image; judging the age reality of the image features of each sample image through a first judging network to obtain a corresponding first judging result; judging the authenticity of each sample image and the corresponding generated image through a second judgment network to obtain a corresponding second judgment result; obtaining an age regression result corresponding to the age information of each sample image according to the generated image corresponding to each sample image through a first regression classification network;

step S220: and performing countermeasure training on the antibiotic network based on the first judgment result, the second judgment result, the age regression result and the generated image corresponding to each sample image, and determining the trained generated network as the age-variable special effect network.

In the embodiment of the present disclosure, the sample image may be a human face image, but is not limited thereto, and according to different application scenarios, in other embodiments, the sample image may also include other features capable of distinguishing ages, such as color, posture, and the like, or the sample image may also be an animal image and the like.

The challenge generation network may be constructed based on various types of challenge generation networks (GAN), and the main structure of GAN includes a generator g (generator) and a discriminator d (discriminator).

For the embodiment of the disclosure, as shown in fig. 2, a generation network is defined as a generator G, and the network uses a processing manner of down-sampling and up-sampling, which can better represent potential features in a sample image. Specifically, in each training, in the down-sampling stage, the generation network extracts image features from the sample image, and in practical application, the image features may be expressed in the form of feature vectors. In the up-sampling stage, the generation network performs the age interpolation processing of the image again according to the age information of the sample image in the process of generating the image features to the image, so that the generation network learns the capability of generating the age image corresponding to the age information. For example, if the sample image whose age information is 20 years is input to the generation network during the training, the generation network performs interpolation processing related to the 20 years of age on the image in the up-sampling stage, and outputs the generated image related to the 20 years of age generated for the generation network, that is, the age of the generated image output by the generation network and the age information of the corresponding input sample image should be the same or similar. After training with sample images of different age information, the generating network can learn the ability to transform into images of various ages.

For the embodiment of the present disclosure, as shown in fig. 2, two discriminators D are also defined, which are a first discrimination network and a second discrimination network respectively. And the first judging network is used for judging the age reality of the image features during each training. Wherein the judgment result and the age information of the sample image are related. For example, if the training input generation network is a sample image whose age information is 20 years old, the first discrimination network discriminates that the image feature is 20 years old as true, and the image feature is other ages as false. The second discrimination network discriminates authenticity of the sample image and the generated image, that is, discriminates whether the sample image is true (Real) or false (Fake), and whether the generated image is true or false.

For the embodiment of the present disclosure, as shown in fig. 2, a regression classifier, i.e., a first regression classification network, is further defined for assisting training. And during each training, the first regression classification network is used for performing regression on the age of the generated image to obtain an age regression result of the generated image. Because the age is a continuous value, a regression classifier is used to regress the true age of the generated image. Since the age of the generated image output by the generation network and the age information of the corresponding input sample image should be the same or similar, i.e., the age regression result of the generated image and the age of the corresponding input sample image are the same or similar. For example, if the training input generation network is a sample image with age information of 20 years old, the first regression classification network should obtain the same or similar age regression result of the generated image as 20 years old.

In the embodiment of the present disclosure, the confrontation training in step S220 may specifically adopt the following procedures:

initializing a network parameter of a generated network, a network parameter of a first judgment network, a network parameter of a second judgment network and a network parameter of a first regression classification network.

Based on m sets of sample images { a₁，a₂，…，a_mAnd m image features z derived from the generation network₁，z₂，…，z_mM generated images

And (5) performing confrontation training.

Training a first discrimination network to distinguish the real age of the image features as accurately as possible; the training generation network keeps the image features far from the age information as much as possible, which means that the age of the image features is judged incorrectly by the first judgment network as much as possible. Training a second discrimination network to distinguish the real sample (sample image) from the generated sample (generated image) as accurately as possible; training the generating network to blur the difference between the generated sample (generated image) and the actual sample (sample image) as much as possible also means that the second discrimination network discriminates erroneously as much as possible. And training the first regression classification network to accurately classify the generated image as much as possible. That is, the generation network can improve the generation capability in the process of the confrontation training, the first discrimination network and the second discrimination network can improve the discrimination capability, and the first regression classification network can improve the regression classification capability.

After multiple update iterations, the final ideal situation is that the first discrimination network cannot discriminate the real age of the image features, and the second discrimination network cannot discriminate whether the sample is a generated sample or a real sample.

Because the generation capacity of the generation network reaches an ideal state through countertraining, the generation network after training is determined to be the age-variable special effect network, and a good image age-variable special effect can be realized.

According to the image processing method provided by the embodiment of the disclosure, the pre-constructed countermeasure generation network is adopted when the variable-age special effect network is trained, and countermeasure training is performed on the image characteristic layer and the generation image layer of the sample image together, so that the generation network in the countermeasure generation network can effectively improve the capability of generating the generated image of the target age, and when the trained generation network is used as the variable-age special effect network to perform variable-age special effect processing on the image to be processed, the variable-age special effect image is obtained, automatic conversion of the image among different ages can be realized, and a brand new special effect experience is provided for a user.

In the embodiment of the present disclosure, a feasible implementation manner is provided for generating variable-age processing in a network upsampling stage, and specifically, the method includes the following steps:

step SA: determining a preset second attribute to be interpolated corresponding to the age information of each sample image according to the age information of each sample image;

step SB: and according to the second attribute to be interpolated of each sample image, performing up-sampling interpolation on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image.

In the embodiment of the present disclosure, for step SA, specifically, the second attribute to be interpolated is determined according to the age of the sample image represented by the age information.

Wherein, a person skilled in the art can set the second attribute to be interpolated corresponding to different ages according to actual conditions, specifically:

in one possible embodiment, one interpolation attribute is set for each age, and as an example, an interpolation attribute of 1 year is set to 1, an interpolation attribute of 2 years is set to 2, and so on. With this embodiment, the generation network performs interpolation processing of the age corresponding to the age information on the image at the up-sampling stage. For example, if the training is input into the generation network for a sample image with age information of 20 years, the generation network performs interpolation processing of 20 years of age on the image in an up-sampling stage, and outputs a generated image of 20 years of age generated for the generation network.

Specifically, in this embodiment, a mapping relationship between each age and the corresponding interpolation attribute may be preset, and then step SA may include the steps of: and determining the interpolation attribute corresponding to the age of each sample image as the second attribute to be interpolated of each sample image according to the mapping relation between each age and the corresponding interpolation attribute.

In another possible embodiment, one interpolation attribute is set for each age group, for example, the interpolation attribute of 1 to 10 years old is set to 0, the interpolation attribute of 11 to 20 years old is set to 1, and so on. With this embodiment, the generation network performs interpolation processing of an age group corresponding to the age information on the image at the up-sampling stage. For example, if the training is input into the generation network and the sample image with age information of 20 years old, the generation network performs interpolation processing on the image in the age range of 11 to 20 years old in an up-sampling stage and outputs the generated image in the age range of 11 to 20 years old generated for the generation network.

Because the generation network can automatically learn the association relationship between the interpolation attribute and different ages or different age groups in the confrontation training process, after the training is completed, the different interpolation attributes can be used for representing different ages or different age groups.

Specifically, in this embodiment, an age segmentation model may be preset to associate age segments to which each age is assigned, and a person skilled in the art may set the segmentation interval and range of the age segmentation model according to circumstances. The mapping relationship between each age group and the corresponding interpolation attribute may also be preset, and then step SA may include the steps of:

determining a target age group corresponding to the age information of each sample image according to a preset age segmentation model;

and determining the interpolation attribute corresponding to the target age group corresponding to each sample image as the second attribute to be interpolated of each sample image according to the preset mapping relationship between the age groups and the interpolation attributes.

In the embodiment of the disclosure, the generation network performs interpolation processing of the age group on the image through learning, so that an age-varying characteristic network with more perfect change condition can be obtained through training by using fewer sample images, and the change coverage rate of the age-varying characteristic is improved.

In the embodiment of the present disclosure, for step SB, the second attribute to be interpolated may be inserted in the upsampling process in the form of a channel.

Specifically, during upsampling, a layer of channel is additionally added to the original number of channel layers of the image feature for filling the second attribute to be interpolated.

For example, assuming that the interpolation attribute of 20 years old is 1, assuming that the image features include 8 × 8 feature maps of 4 channels (i.e., the size is 4 × 8 × 8), the training input generates a sample image of which the network is 20 years old, after the second attribute to be interpolated of the determined sample image is 1, upsampling is performed to insert a layer of 1 channels, and finally 8 × 8 feature maps of 5 channels (i.e., the size is 5 × 8 × 8) are obtained.

As will be understood by those skilled in the art, the age information of the sample image is hidden in the original channel (i.e., in the 8 × 8 feature map of the 4 channels in the above example), and through the antagonistic training of the generation network and the first discrimination network, the age information in the original channel can be removed in the down-sampling stage, so as to obtain the image features without the age information, and then the interpolation attribute which is learned by the generation network and characterizes the age is inserted in the up-sampling stage, so that the age of the generated image can be determined by using the interpolation attribute.

In the embodiment of the disclosure, a corresponding loss function is provided for the countermeasure training process, so as to better optimize the countermeasure generation network in the training process.

Specifically, step S220 includes the steps of:

step S221: determining real age loss and false age loss corresponding to corresponding image features according to a first judgment result corresponding to each sample image;

since the first discriminant network needs to determine the ages of the m image features as true (true when the age information of the sample image is the same, and true probability is 1), but in the actual training process, the probability that the age of each image feature is discriminated as true by the first discriminant network may not be 1, at this time, a countermeasure loss may be determined based on the determination of the true and false probabilities of the ages of the image features, which is defined as the true age loss corresponding to the image features in the embodiment of the present disclosure, and for convenience of description, the true age loss corresponding to the image features is hereinafter abbreviated as L3_ loss 1.

Since the generation network needs to separate the image features from the age information as much as possible, it is equivalent to determine the ages of the m image features as false by making the age determination of the image features by the first determination network as wrong as possible. At this time, a countermeasure loss may be determined based on the judgment (erroneous judgment) of the true and false probability of the age of the image feature caused by the generation network, which is defined as a false age loss corresponding to the image feature in the embodiment of the present disclosure, and for convenience of description, the false age loss corresponding to the image feature is abbreviated as L3_ loss2 hereinafter.

Step S222: determining the true sample loss corresponding to the corresponding sample image, the false sample true loss corresponding to the generated image and the false sample false loss corresponding to the generated image according to the second judgment result corresponding to each sample image;

in the embodiment of the present disclosure, because the second decision network needs to decide all m sample images as true samples (that is, true samples, where the true probability is 1), but in the actual training process, the probability that each sample image is decided as true by the second decision network may not be 1, at this time, a countermeasure loss may be determined based on the decision of the true and false probabilities of the sample images, which is defined as the true sample loss corresponding to the sample image, and for convenience of description, the true sample loss corresponding to the sample image is hereinafter abbreviated as L2_ loss 1.

Since the second decision network needs to decide all m generated images as false samples (i.e. the generated samples have a true probability of 0), in the actual training process, the probability that each generated image is decided as true by the second decision network may not be 0. At this time, a countermeasure loss may be determined based on the determination of the true and false probability of the generated image, which is defined as a false sample true loss corresponding to the generated image in the embodiment of the present disclosure, and for convenience of description, the false sample true loss corresponding to the generated image is hereinafter abbreviated as L2_ loss 2.

Since the generation network needs to blur the difference between the generated sample (generated image) and the real sample (sample image) as much as possible, that is, the generation network makes the second determination network wrong as much as possible, and all the m generated images are determined as the real sample. At this time, a countermeasure loss may be determined based on the judgment (erroneous judgment) of the true and false probability of the generated image caused by the generation network, which is defined as a false sample false loss corresponding to the generated image in the embodiment of the present disclosure, and for convenience of description, the false sample false loss corresponding to the generated image is hereinafter referred to as L2_ loss 3.

In practical applications, all three losses can be calculated based on the least squares loss function, but are not limited thereto.

Step S223: determining corresponding age regression loss according to the age regression result corresponding to each sample image;

since the first regression classification network needs to be accurate in the age regression of the generated image as much as possible, a regression loss is generated, which is defined as the age regression loss of the generated image in the embodiment of the present disclosure, and for convenience of description, the age regression loss is abbreviated as L4_ loss hereinafter.

In the embodiment of the disclosure, for each sample image, based on a preset absolute loss function, a corresponding age regression loss is determined according to the corresponding age information and the age regression result. Specifically, for each sample image, the age value gap of the corresponding age information and the age regression result can be determined; the corresponding age regression loss is determined based on a preset absolute loss function and the age value gap.

Specifically, the age regression loss may be calculated using a squared loss function of the absolute loss functions when the age value gap is less than or equal to a threshold; when the age value gap is greater than a threshold, an age regression loss is calculated using an absolute loss function of the absolute loss functions.

The threshold value and the number of times in the absolute loss function can be set by those skilled in the art according to practical situations, and the embodiments of the present disclosure are not limited herein. As an example, assuming the age value gap is x and the threshold is 1, the age regression loss may be:

in the embodiment of the disclosure, the age regression loss is calculated by using the absolute loss function, so that the training at the initial stage of the training is more stable, and the convergence is easier at the later stage of the training, thereby achieving higher precision of the training.

Step S224: determining an image loss between each sample image and the corresponding generated image;

wherein, as is clear to a person skilled in the art, since the generated image is obtained by down-sampling and up-sampling the corresponding sample image, the pixel size of each sample image and the corresponding generated image is the same, e.g. a₁And

are the same. However, in the actual training process, the content of each sample image is different from that of the corresponding generated image, and the same pixels in the corresponding sample image and the corresponding generated image can be compared one by one, so that the difference value of each pixel is determined, and the image loss between the sample image and the generated image is determined according to the difference value of each pixel.

In one possible implementation, the difference values of each pixel are summed to obtain the image loss between the sample image and the generated image.

Hereinafter, for convenience of description, the image loss between the sample image and the generated image is simply referred to as L1_ loss.

Step S225: and optimizing the antibiotic network according to the real age loss, the false age loss, the real sample loss, the false sample real loss, the false sample false loss, the age regression loss and the image loss corresponding to each sample image.

In a possible implementation manner, L1_ loss, L2_ loss1, L2_ loss2, L2_ loss3, L3_ loss1, L3_ loss2 and L4_ loss of each training are fused, for example, weighted fusion, addition, averaging or other fusion methods are performed, so as to obtain the corresponding total loss. In this step, the antibiotic network is optimized according to the total loss of each training, and the best training effect is obtained step by step.

In addition, the training process provided by the embodiment of the present disclosure may further include the steps of: acquiring a pre-trained second regression classification network; and labeling the age information on the image data through a second regression classification network to obtain each sample image containing the age information.

The second regression classification network is trained by using an open data set and used as an evaluation index for distinguishing age information of the image. Because the age belongs to a continuous value, a regression classifier is used to return the age information of the image. In the embodiment of the present disclosure, a specific form of the age information is not limited, and for example, the age information may be directly labeled with a number corresponding to the age, or other types of labeled information may be used.

The second regression classification network is used to label age information to image data from various sources, so as to obtain a large number of sample images containing age information for processing in steps S210 to S220.

Based on the foregoing embodiments of the present disclosure, an embodiment of the present disclosure further provides a training model, where the training model includes: the generating network is used for down-sampling each sample image containing age information to obtain corresponding image characteristics, up-sampling and age interpolation processing are carried out on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image, the first judging network is used for judging the age authenticity of the image characteristics of each sample image to obtain a corresponding first judging result, the second judging network is used for respectively judging the authenticity of each sample image and the corresponding generated image to obtain a corresponding second judging result, and the first regression classification network is used for obtaining an age regression result corresponding to the age information of each sample image according to the generated image corresponding to each sample image;

as shown in fig. 3, the generation network is connected to the first discrimination network, the second discrimination network and the first regression classification network, respectively, so that the countermeasures against the antibiotic network are trained based on the first discrimination result, the second discrimination result, the age regression result and the generation image corresponding to each sample image, and the trained generation network is obtained.

Further, the training model may further include a second regression classification network, where the second regression classification network is connected to the generation network and is used to label the image data with age information to obtain each sample image containing the age information.

The implementation principle and the generated technical effect of the training model provided in the embodiments of the present disclosure are the same as those of the countermeasure generation network in the embodiments of the foregoing methods, and for the sake of brief description, no part of this embodiment is mentioned, and reference may be made to corresponding contents in the embodiments of the foregoing methods, and details are not repeated here.

Based on the above embodiments of the present disclosure, in the embodiments of the present disclosure, the processing instruction of the age-changing special effect may be issued through an operation of the user on the terminal device. The terminal devices include, but are not limited to, mobile terminals, smart terminals, and the like, such as mobile phones, smart phones, tablet computers, notebook computers, personal digital assistants, portable multimedia players, navigation devices, and the like. It will be understood by those skilled in the art that the configuration according to the embodiments of the present disclosure can be applied to a fixed type terminal such as a digital television, a desktop computer, etc., in addition to elements particularly used for mobile purposes.

In the embodiment of the present disclosure, the execution subject of the method may be the terminal device or an application installed on the terminal device. Specifically, after receiving a processing instruction of the age-varying special effect, in step S110, an image to be processed corresponding to the processing instruction is obtained, and an age-varying special effect network trained by using the training step provided in any of the above embodiments of the present disclosure is obtained.

Further, in step S120, a target age to be transformed is determined, and a first attribute to be interpolated corresponding to the target age is obtained; how to set the attribute to be interpolated corresponding to each age can be referred to the description of the training phase, and is not described herein again. Specifically, when the processing instruction includes age information of a target age to be converted, the target age and the corresponding first attribute to be interpolated may be determined according to the processing instruction. For example, when the processing instruction indicates that it is about to become 50 years old, it may be determined that the target age is 50 years old, and the corresponding first attribute to be interpolated is an interpolation attribute corresponding to 50 years old, or an interpolation attribute corresponding to an age range of 41-50 years old. In this case, there is no need to pay attention to the age information of the image to be processed, that is, the age corresponding to the processing instruction ends up becoming the age of the image to be processed regardless of the age. When the processing instruction does not include the age information of the target age to be transformed, the default target age and the corresponding first attribute to be interpolated can be directly determined, for example, the default target age is preset to be 71-80 years old.

In practical applications, the age information of the changed age may be one or more, for example, gradually changing from 10 to 80 years. At this time, according to the plurality of age information, the corresponding plurality of first attributes to be interpolated may be sequentially determined and sequentially processed.

Further, in step S130, the age-varying special effect processing is performed on the image to be processed through the acquired age-varying special effect network. Specifically, the image to be processed is downsampled through the age-variable special effect network to obtain a downsampling result of the image to be processed, and as can be seen from the above, the downsampling result is an image feature far away from the age information, and then the downsampling result is upsampled and interpolated according to the first attribute to be interpolated corresponding to the target age, so that the age-variable special effect image can be obtained. The manner of upsampling and interpolating may refer to the description of the generation network, and is not described herein again.

Further, after obtaining the age-varying special effect image, the method may further include the steps of: and displaying the age-changing special effect image on a display screen.

Or, the execution subject of the method may be a server, after receiving a processing instruction of an age-varying special effect sent by a terminal device, similar to the terminal device, receiving an image to be processed corresponding to the processing instruction, and obtaining an age-varying special effect network obtained by training in the training step provided in any of the embodiments of the present disclosure, determining a target age to be transformed according to the processing instruction or the image to be processed, and obtaining a first attribute to be interpolated corresponding to the target age, down-sampling the image to be processed through the age-varying special effect network to obtain a down-sampling result of the image to be processed, and then up-sampling and interpolating the down-sampling result according to the first attribute to be interpolated corresponding to the target age, so as to obtain the age-varying special effect image. And for the server, the age-changing special effect image needs to be sent to the terminal equipment for showing.

It should be noted that the first attribute to be interpolated and the second attribute to be interpolated only represent the distinction between the age interpolation attributes in the training phase and the application phase, and are not to be construed as the definition of the interpolation attribute. In practical application, the expression forms of the first attribute to be interpolated and the second attribute to be interpolated are determined according to the setting of the generation network.

In the embodiment of the present disclosure, the type of the image to be processed should be the same as or similar to the type of the sample image, and reference may be specifically made to the description of the sample image, which is not described herein again.

In practical applications, the number of the images to be processed may be one or more. When the number of the images to be processed is multiple, the images to be processed may also be videos to be processed. The image processing method can be adopted to process each frame of image in the video to be processed so as to obtain the age-variable special effect video.

It can be understood by those skilled in the art that the obtained age-varying special effect image and the to-be-processed image are the same human face with different ages. In practical application, the special effect image obtained by the image processing method provided by the embodiment of the disclosure has higher definition and sharpening degree than the original image (to-be-processed image).

The embodiment of the present disclosure also provides an image processing apparatus, as shown in fig. 4, the image processing apparatus 40 may include: an acquisition module 410, a determination module 420, and a special effects processing module 430, wherein,

the obtaining module 410 is configured to obtain an image to be processed and a pre-trained age-varying special effect network, where the age-varying special effect network corresponds to at least three ages;

the determining module 420 is configured to determine a target age to be transformed, and obtain a first attribute to be interpolated corresponding to the target age, where the first attribute to be interpolated corresponding to each age is determined according to training of an age-varying special effect network;

the special effect processing module 430 is configured to perform downsampling on the image to be processed through the age-varying special effect network to obtain a downsampling result of the image to be processed, and perform upsampling and interpolation on the downsampling result according to a first attribute to be interpolated corresponding to the target age to obtain the age-varying special effect image.

In an alternative implementation, the variable-age special effect network is trained by the following steps:

acquiring a pre-constructed countermeasure generating network, wherein the countermeasure generating network comprises a generating network, a first judging network, a second judging network and a first regression classification network;

sampling each sample image containing age information down through a generation network to obtain corresponding image characteristics, and performing up-sampling and age interpolation processing on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image;

judging the age reality of the image features of each sample image through a first judging network to obtain a corresponding first judging result;

judging the authenticity of each sample image and the corresponding generated image through a second judgment network to obtain a corresponding second judgment result;

obtaining an age regression result corresponding to the age information of each sample image according to the generated image corresponding to each sample image through a first regression classification network;

and performing countermeasure training on the antibiotic network based on the first judgment result, the second judgment result, the age regression result and the generated image corresponding to each sample image, and determining the trained generated network as the age-variable special effect network.

In an optional implementation manner, performing upsampling and age interpolation processing on image features of each sample image to obtain a generated image corresponding to age information of each sample image, includes:

determining a preset second attribute to be interpolated corresponding to the age information of each sample image according to the age information of each sample image;

and according to the second attribute to be interpolated of each sample image, performing up-sampling interpolation on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image.

In an optional implementation manner, determining a preset second attribute to be interpolated corresponding to the age information of each sample image according to the age information of each sample image includes:

In an optional implementation manner, performing countermeasure training on an anti-biotic net based on the first discrimination result, the second discrimination result, the age regression result and the generated image corresponding to each sample image includes:

determining real age loss and false age loss corresponding to corresponding image features according to a first judgment result corresponding to each sample image;

determining the true sample loss corresponding to the corresponding sample image, the false sample true loss corresponding to the generated image and the false sample false loss corresponding to the generated image according to the second judgment result corresponding to each sample image;

determining corresponding age regression loss according to the age regression result corresponding to each sample image;

determining an image loss between each sample image and the corresponding generated image;

and optimizing the antibiotic network according to the real age loss, the false age loss, the real sample loss, the false sample real loss, the false sample false loss, the age regression loss and the image loss corresponding to each sample image.

In an alternative implementation, determining the corresponding age regression loss according to the corresponding age regression result of each sample image includes:

and determining the corresponding age regression loss according to the corresponding age information and the age regression result based on a preset absolute loss function for each sample image.

In an alternative implementation, determining the corresponding age regression loss according to the corresponding age information and the age regression result based on a preset absolute loss function includes:

determining age value gaps of corresponding age information and age regression results;

calculating an age regression loss using a squared loss function of the absolute loss functions when the age value gap is less than or equal to a threshold;

when the age value gap is greater than a threshold, an age regression loss is calculated using an absolute loss function of the absolute loss functions.

In an optional implementation manner, the method further includes:

acquiring a pre-trained second regression classification network;

and labeling the age information on the image data through a second regression classification network to obtain each sample image containing the age information.

The image processing apparatus provided in the embodiment of the present disclosure may be specific hardware on the device, or software or firmware installed on the device, and the implementation principle and the generated technical effect are the same as those of the foregoing method embodiment, and for brief description, no part of the embodiment of the device is mentioned, and reference may be made to corresponding contents in the foregoing method embodiment, and details are not repeated here.

For training of the variable-age special-effect network, the embodiment of the present disclosure further provides a training device, where the training device may include: a network acquisition module and a network training module, wherein,

the network acquisition module is used for acquiring a pre-constructed countermeasure generation network, and the countermeasure generation network comprises a generation network, a first judgment network, a second judgment network and a first regression classification network;

the generation network is used for down-sampling each sample image containing the age information to obtain corresponding image characteristics, and up-sampling and age interpolation processing are carried out on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image;

the first judging network is used for judging the age authenticity of the image features of each sample image to obtain a corresponding first judging result;

the second judging network is used for judging the authenticity of each sample image and the corresponding generated image to obtain a corresponding second judging result;

the first regression classification network is used for obtaining an age regression result corresponding to the age information of each sample image according to the generated image corresponding to each sample image;

the network training module is used for carrying out countermeasure training on the anti-biotic network based on the first judgment result, the second judgment result, the age regression result and the generated image corresponding to each sample image, and determining the trained generated network as the age-variable special effect network.

In an optional implementation manner, when the generation network is configured to perform upsampling and age interpolation processing on image features of each sample image to obtain a generated image corresponding to age information of each sample image, the generation network is specifically configured to:

In an optional implementation manner, when the generation network is configured to determine, according to the age information of each sample image, a preset second attribute to be interpolated corresponding to the age information of each sample image, the generation network is specifically configured to:

In an optional implementation manner, the network training module, when configured to perform countercheck training on the anti-biotic network based on the first determination result, the second determination result, the age regression result, and the generated image corresponding to each sample image, is specifically configured to:

In an alternative implementation, the network training module, when configured to determine the corresponding age regression loss according to the age regression result corresponding to each sample image, is specifically configured to:

In an optional implementation manner, when the network training module is configured to determine the corresponding age regression loss according to the corresponding age information and the age regression result based on a preset absolute loss function, the network training module is specifically configured to:

In an optional implementation manner, the network obtaining module is further configured to:

acquiring a pre-trained second regression classification network;

The training apparatus provided in the embodiments of the present disclosure may be specific hardware on the device, or software or firmware installed on the device, etc., and the implementation principle and the generated technical effect are the same as those of the foregoing method embodiments.

Referring now to FIG. 5, a schematic diagram of an electronic device 50 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

The electronic device includes: a memory and a processor, wherein the processor may be referred to as the processing device 501 hereinafter, and the memory may include at least one of a Read Only Memory (ROM)502, a Random Access Memory (RAM)503 and a storage device 508 hereinafter, which are specifically shown as follows:

as shown in fig. 5, electronic device 50 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 50 are also stored. The processing device 501, the ROM502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 50 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 50 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the image processing method shown in any of the above embodiments of the present disclosure.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules or units described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the designation of a module or unit does not in some cases constitute a limitation of the unit itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides, according to one or more embodiments of the present disclosure, an image processing method including:

In an optional implementation, the method further includes:

acquiring a pre-trained second regression classification network;

Example 2 provides the image processing apparatus of example 1, the apparatus including:

In an optional implementation manner, the method further includes:

acquiring a pre-trained second regression classification network;

Example 3 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:

a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions that is loaded and executed by the processor to implement a method as shown in example 1 or any of the alternative implementations of example 1 of the present disclosure.

Example 4 provides a computer readable medium for storing a computer instruction, program, code set or instruction set which, when run on a computer, causes the computer to perform a method as shown in example 1 or any one of the alternative implementations of example 1 of the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed and a pre-trained age-varying special effect network, wherein the age-varying special effect network corresponds to at least three ages;

determining a target age to be transformed, and acquiring a first attribute to be interpolated corresponding to the target age, wherein the first attribute to be interpolated corresponding to each age is determined according to a preset mapping relation between the age group and the interpolation attribute;

the image to be processed is down-sampled through the age-varying special effect network to obtain a down-sampling result of the image to be processed, a layer of channel is additionally added on the original channel layer number of the image characteristic corresponding to the down-sampling result of the image to be processed according to a first attribute to be interpolated corresponding to the target age to fill the first attribute to be interpolated, the down-sampling result is up-sampled and interpolated to obtain an age-varying special effect image,

the variable-age special effect network is obtained by training through the following steps:

sampling each sample image containing age information down through the generation network to obtain corresponding image characteristics, and performing up-sampling and age interpolation processing on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image;

judging the age reality of the image features of each sample image through the first judging network to obtain a corresponding first judging result;

judging the authenticity of each sample image and the corresponding generated image through the second judgment network to obtain a corresponding second judgment result;

obtaining an age regression result corresponding to the age information of each sample image according to the generated image corresponding to each sample image through the first regression classification network;

performing countermeasure training on the countermeasure generation network based on the first determination result, the second determination result, the age regression result and the generated image corresponding to each sample image, and determining the trained generation network as the age-variable special effect network.

2. The image processing method according to claim 1, wherein the up-sampling and age interpolation processing of the image features of each sample image to obtain a generated image corresponding to the age information of each sample image comprises:

according to the second attribute to be interpolated of each sample image, performing up-sampling interpolation on the image characteristics of each sample image to obtain a generated image corresponding to the age information of each sample image.

3. The image processing method according to claim 2, wherein the determining, according to the age information of each sample image, a preset second attribute to be interpolated corresponding to the age information of each sample image comprises:

and determining the interpolation attribute corresponding to the target age group corresponding to each sample image as the second attribute to be interpolated of each sample image according to a preset mapping relation between the age groups and the interpolation attributes.

4. The image processing method according to claim 1, wherein the performing countermeasure training on the countermeasure generation network based on the first determination result, the second determination result, the age regression result, and the generation image corresponding to each sample image includes:

determining real age loss and false age loss corresponding to corresponding image features according to the first discrimination result corresponding to each sample image;

determining a true sample loss corresponding to the corresponding sample image, a false sample true loss corresponding to the generated image, and a false sample false loss corresponding to the generated image according to the second determination result corresponding to each sample image;

determining a corresponding age regression loss according to the age regression result corresponding to each sample image;

optimizing the countermeasure generation network according to the real age loss, the false age loss, the real sample loss, the false sample real loss, the false sample false loss, the age regression loss and the image loss corresponding to each sample image.

5. The image processing method of claim 4, wherein said determining a respective age regression loss from said age regression result for said each sample image comprises:

and determining the corresponding age regression loss according to the corresponding age information and the age regression result based on a preset absolute loss function aiming at each sample image.

6. The image processing method according to claim 5, wherein determining the corresponding age regression loss according to the corresponding age information and the age regression result based on a preset absolute loss function comprises:

determining corresponding age information and an age value gap of the age regression result;

calculating an age regression loss using a squared loss function of the absolute loss functions when the age value difference is less than or equal to a threshold;

calculating an age regression loss using an absolute loss function of the absolute loss functions when the age value difference is greater than a threshold.

7. The image processing method according to claim 1, further comprising:

acquiring a pre-trained second regression classification network;

and labeling age information on the image data through the second regression classification network to obtain each sample image containing the age information.

8. An image processing apparatus characterized by comprising:

the system comprises an acquisition module, a pre-training module and a display module, wherein the acquisition module is used for acquiring an image to be processed and a pre-trained variable-age special effect network, and the variable-age special effect network corresponds to at least three ages;

the determining module is used for determining a target age to be transformed and acquiring a first attribute to be interpolated corresponding to the target age, wherein the first attribute to be interpolated corresponding to each age is determined according to a preset mapping relation between the age group and the interpolation attribute;

the special effect processing module is used for carrying out down-sampling on the image to be processed through the age-variable special effect network to obtain a down-sampling result of the image to be processed, additionally adding a layer of channel on the original channel layer number of the image characteristic corresponding to the down-sampling result of the image to be processed according to the first attribute to be interpolated corresponding to the target age, filling the first attribute to be interpolated, carrying out up-sampling interpolation on the down-sampling result to obtain the age-variable special effect image,

9. An electronic device, comprising:

a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of any of claims 1-7.

10. A computer readable medium for storing a computer instruction, a program, a set of codes, or a set of instructions, which when run on a computer, causes the computer to perform the method of any one of claims 1-7.