CN114331906A

CN114331906A - Image enhancement method and device, storage medium and electronic equipment

Info

Publication number: CN114331906A
Application number: CN202111669721.8A
Authority: CN
Inventors: 唐斯伟; 郑程耀; 吴文岩; 钱晨
Original assignee: Beijing Datianmian White Sugar Technology Co ltd
Current assignee: Beijing Datianmian White Sugar Technology Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12
Also published as: WO2023124697A1

Abstract

The embodiment of the disclosure provides an image enhancement method and device, a storage medium and an electronic device, wherein the method can comprise the following steps: performing feature extraction on a target image to be enhanced to obtain appearance information of the target image, wherein the target image comprises a first object; the appearance information represents surface visual features in a target image; acquiring structure information of a second object, wherein the first object and the second object are target objects of the same kind; the structural information represents contour features of the second object; generating an enhanced image based on the appearance information and the structure information, the enhanced image including the appearance information and a target object having the structure information. The embodiment of the disclosure not only enables the generated image to have higher quality, but also reduces the cost of sample acquisition.

Description

Image enhancement method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to artificial intelligence technology, and in particular, to an image enhancement method and apparatus, a storage medium, and an electronic device.

Background

Image enhancement has wide application in a variety of scenarios. For example, in a scene of training a neural network, more and richer sample images can be obtained by performing image enhancement on the sample images. For another example, some applications such as face image enhancement, such as makeup migration, face driving, and the like, can also be realized through image enhancement.

In the related art, image enhancement is performed by adopting a traditional image processing method such as stretching and interpolation, but the quality of an enhanced image obtained by the method is not high, and only limited conditions of image enhancement can be performed generally, and the types of the enhanced image are few. In addition, if the image enhancement is performed by using a neural network, the training of the neural network needs to obtain enough sample images, for example, a certain time period of video of a user with a single ID is often needed to obtain multiple face images of the user in the video, which is costly and inconvenient for the user.

Disclosure of Invention

The embodiment of the disclosure at least provides an image enhancement method and device, a storage medium and an electronic device.

In a first aspect, a method for enhancing an image is provided, the method comprising:

performing feature extraction on a target image to be enhanced to obtain appearance information of the target image, wherein the target image comprises a first object; the appearance information represents surface visual features in a target image;

acquiring structure information of a second object, wherein the first object and the second object are target objects of the same kind; the structural information represents contour features of the second object;

generating an enhanced image based on the appearance information and the structure information, the enhanced image including the appearance information and a target object having the structure information.

In some examples, the method is performed by an image enhancement device having an image enhancement network deployed therein, the image enhancement network comprising: an appearance extractor and generator; the feature extraction of the target image to be enhanced to obtain the appearance information of the target image comprises the following steps: performing feature extraction on a target image to be enhanced through an appearance extractor in the image enhancement network to obtain appearance information of the target image; the generating an enhanced image based on the appearance information and the structure information includes: generating, by a generator in the image enhancement network, an enhanced image based on the appearance information and structure information.

In some examples, the obtaining structural information of the second object includes: acquiring an initial image, wherein the initial image comprises the second object; performing key point detection on the initial image to obtain key points of a second object in the initial image; and obtaining the structural information of the second object according to the key points of the second object.

In some examples, the second object is included in an auxiliary image; the method further comprises the following steps: acquiring an initial image, wherein the initial image comprises the target object; performing key point detection on the initial image to obtain key points of a target object in the initial image; and cutting the initial image according to the key point of the target object to obtain the target image or the auxiliary image comprising the target object.

In some examples, the method further comprises: replacing the enhanced image with a corresponding image portion in the initial image after the generating of the enhanced image based on the appearance information and structure information.

In some examples, the first object and the second object are the same target object, or different target objects of the same class, the target object being one of the five sense organs in the face of a person.

In a second aspect, a method for training an image enhancement network is provided, the method including:

acquiring a sample image to be enhanced and structural information of a second object, wherein the first object and the second object are the same target object with different structural information; the sample image comprises a first object; the structural information represents contour features of the second object;

carrying out feature extraction on a sample image to be enhanced through an image enhancement network to obtain appearance information of the sample image, wherein the appearance information represents surface visual features in a target image;

performing image generation processing on the appearance information and the structure information through the image enhancement network, and outputting a sample enhanced image, wherein the sample enhanced image comprises the appearance information and the target object with the structure information;

and adjusting the network parameters of the image enhancement network according to the sample enhanced image.

In some examples, the second object is included in an auxiliary image; the adjusting the network parameters of the image enhancement network according to the sample enhanced image comprises: adjusting network parameters of the appearance extractor and generator according to a difference between the sample enhanced image and the auxiliary image.

In some examples, the second object is included in an auxiliary image; the adjusting the network parameters of the image enhancement network according to the sample enhanced image comprises: inputting the sample enhanced image into the discriminator to obtain a discrimination value output by the discriminator; obtaining a first loss according to the difference between the discrimination value and the discrimination true value, and obtaining a second loss according to the difference between the sample enhanced image and the auxiliary image; adjusting a network parameter of at least one of the appearance extractor, generator, and discriminator based on the first loss and the second loss.

In a third aspect, an image enhancement apparatus is provided, the apparatus comprising:

the appearance extraction module is used for extracting the characteristics of a target image to be enhanced to obtain the appearance information of the target image, wherein the target image comprises a first object; the appearance information represents surface visual features in a target image;

the structure acquisition module is used for acquiring the structure information of a second object, wherein the first object and the second object are target objects of the same kind; the structural information represents contour features of the second object;

an image generation module to generate an enhanced image based on the appearance information and the structure information, the enhanced image including the appearance information and a target object having the structure information.

In a fourth aspect, there is provided an apparatus for training an image enhancement network, the apparatus comprising:

the information acquisition module is used for acquiring a sample image to be enhanced and the structure information of a second object, wherein the first object and the second object are the same target object with different structure information; the sample image comprises a first object; the structural information represents contour features of the second object;

the system comprises a characteristic extraction module, a target image acquisition module and a characteristic extraction module, wherein the characteristic extraction module is used for extracting characteristics of a sample image to be enhanced through an image enhancement network to obtain appearance information of the sample image, and the appearance information represents surface visual characteristics in the target image;

an image output module, configured to perform image generation processing on the appearance information and the structure information through the image enhancement network, and output a sample enhanced image, where the sample enhanced image includes the appearance information and the target object having the structure information;

and the parameter adjusting module is used for adjusting the network parameters of the image enhancement network according to the sample enhanced image.

In a fifth aspect, an electronic device is provided, comprising: the device comprises a memory and a processor, wherein the memory is used for storing computer readable instructions, and the processor is used for calling the computer instructions to realize the method of any embodiment of the disclosure.

In a sixth aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, performs the method of any of the embodiments of the present disclosure.

The image enhancement method and the device, the storage medium and the electronic equipment provided by the embodiment of the disclosure can enhance the sample image according to various types of structural information, and since the structural information can be various and unlimited, more abundant sample enhancement images can be obtained, so that the types of the sample are more abundant, when the generated sample enhancement images are applied to tasks such as model training, the robustness and the generalization of the model training can be improved by abundant and diversified samples, and more abundant sample types can be obtained through the method, compared with the previous sample acquisition method, the sample acquisition cost is reduced, and the sample acquisition is more convenient. In addition, the method generates the sample enhanced image through the image enhancement network, and compared with the conventional image processing modes such as interpolation, stretching and the like, the generated image quality can be higher.

Drawings

In order to more clearly illustrate one or more embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 illustrates a flowchart of a training method of an image enhancement network according to at least one embodiment of the present disclosure;

FIG. 2 illustrates a conceptual framework diagram of image enhancement provided by at least one embodiment of the present disclosure;

fig. 3A illustrates a schematic diagram of structural information of a first object provided by at least one embodiment of the present disclosure;

fig. 3B illustrates a schematic diagram of structural information of a second object provided by at least one embodiment of the present disclosure;

fig. 4 illustrates another eye structure information diagram provided by at least one embodiment of the present disclosure;

fig. 5 illustrates a network training schematic diagram provided by at least one embodiment of the present disclosure;

fig. 6 illustrates a flow chart of an image enhancement method according to at least one embodiment of the present disclosure;

fig. 7 is a schematic structural diagram illustrating an image enhancement apparatus according to at least one embodiment of the present disclosure;

fig. 8 illustrates a schematic structural diagram of a training apparatus of an image enhancement network according to at least one embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art based on one or more embodiments of the disclosure without inventive faculty are intended to be within the scope of the disclosure.

Embodiments of the present disclosure are directed to an image enhancement method that may generate an enhanced image through a trained neural network. The neural network may be referred to as an image enhancement network, and the enhanced image may be an image obtained by performing enhancement processing on the basis of an initial image. The enhancement processing may be, for example, deformation of the image, for example, taking enhancement of the face image as an example, the enhancement may include, but is not limited to, a change in face angle, a change in face expression, a change in face orientation, a change in size of five sense organs of the face, and the like. For example, assuming that the initial image is a face image, the mouth of the face is in a closed mouth state, and the mouth of the face image is transformed into an open smiling mouth, so as to obtain an enhanced image.

In the following embodiments, the training process of the image enhancement network will be described first, and then how to generate the enhanced image through the trained image enhancement network will be described.

Fig. 1 illustrates a flowchart of a training method of an image enhancement network according to at least one embodiment of the present disclosure, and as shown in fig. 1, the method may include the following processes:

in step 100, a sample image to be enhanced and structure information of a second object are acquired.

The training method of the embodiment may be performed by a training apparatus of the image enhancement network, for example, the training apparatus may be disposed on an electronic device (e.g., may be a server), and the training apparatus may include the image enhancement network to be trained.

In this step, the training device of the image enhancement network may obtain a sample image to be enhanced, and the image to be enhanced in the training phase may be referred to as the sample image. Wherein the sample image includes a first object therein. For example, the sample image may be an image including an eye, and the first object may be the eye in the sample image. For another example, the sample image may be an image including a tree, and the first object may be a tree in the sample image.

The training apparatus may further acquire structural information of a second object, which is the same target object having different structural information as the first object.

The structural information may be understood as representing contour features of the second object, e.g. size, structure of the object. Examples are as follows: taking facial features as an example, the acquired structural information of the facial features can be information such as the contour feature of the mouth, the contour feature of the nose and the like; and also can be characteristic information of the height of the nose. Wherein, the record form of the contour feature includes but is not limited to: either in the form of a contour line or as a plurality of keypoints distributed over a contour line, the position coordinates or keypoint identities of these keypoints may be recorded.

For example, the target object is an eye, and the structural information may be a contour feature of the eye. The structural information of the first object illustrated in fig. 3A and the structural information of the second object illustrated in fig. 3B can be seen as two kinds of eyes having different structural information, but the two objects may be the eyes of the same person, only one of which is in a state of squinting and the other of which is in a state of opening wide, so that the structural information of the two eyes is different.

Similarly, the first object and the second object having different structural information may also be the following examples: for another example, if the target object is a mouth, the first object may be a closed mouth and the second object may be an open mouth, and the two mouths are different structural information due to different states even though the mouths of the same person are different. For example, due to the difference in the above-mentioned state of the mouth, the positions of the contour key points recorded in the contour feature of the closed mouth are different from the positions of the contour key points recorded in the contour feature of the open mouth.

In step 102, feature extraction is performed on a sample image to be enhanced through an image enhancement network, so as to obtain appearance information of the sample image.

In one example, the appearance information may be obtained by feature extraction through an appearance extractor in the image enhancement network. As shown in fig. 2, after acquiring the sample image, the training apparatus may input the sample image to the image enhancement network. The image enhancement network may include an appearance extractor, and the appearance extractor 21 may perform feature extraction on the sample image to obtain appearance information of the sample image. The present embodiment is not limited to the network structure of the appearance extractor, and for example, the appearance extractor may include a plurality of modules, such as a convolutional layer, a residual module, an active layer, and a pooling layer.

The appearance information represents surface visual features in the target image. The surface visual features include, but are not limited to, texture, color, lighting information, etc. in the target image. Taking the sample image as a face image, the appearance information obtained after feature extraction by the appearance extractor may include: illumination brightness of the human face area, texture of the face, color of the face, etc.

For example, the appearance information of the sample image output by the appearance extractor 21 may be represented as a one-dimensional tensor, which may be a 64 × 1 tensor.

In addition, for the appearance information included in the sample image, the appearance extractor 21 may extract all of the appearance information, or may extract part of the appearance information, which may be determined according to actual business requirements. For example, an eye picture is taken as an example, and the eye picture includes a part of a face region around the eyes and an eyebrow region in addition to the eyes. Then, the appearance extractor 21 may extract the appearance information of the brightness, color, and texture of all these regions, or may extract only the appearance information of the eyebrow region, or only the appearance information of the face region around the eyes. The extraction of the appearance information of at least a part of the region in the sample image can be achieved by designing and training the appearance extractor 21.

Optionally, for the above appearance information such as brightness, color, texture, etc., the extraction of at least part of the appearance information may also be implemented by designing the function of the appearance extractor 21, for example, only extracting texture and color in the sample image, but not extracting brightness.

In addition, for the same reason, the structure information of the second object acquired in step 100 may also be determined to acquire at least part of the structure information of the second object according to the actual business requirement. For example, taking the second object as an eye as an example, if all structural information of the eye is to be acquired, the second object may include an outer contour key point of the eye, a contour key point of an eyeball, a center point of the eyeball, and the like; and if partial structural information of the eye is to be acquired, only the outer contour point of the eye may be included, and the contour key point of the eyeball and the center point of the eyeball are not included.

As described above, the structure information acquired in step 100 and the appearance information acquired in step 102 of the present embodiment are at least a part of the information that has been extracted, and these pieces of information will participate in the image generation processing.

In step 104, performing image generation processing on the appearance information and the structure information through an image enhancement network, and outputting a sample enhanced image; wherein the sample enhanced image includes the appearance information and a target object having the structure information.

In this embodiment, the image enhancement network may generate the sample enhanced image according to the obtained appearance information and structure information. For example, as shown in fig. 2, the image generation process may be performed by the generator 22, outputting a sample enhanced image. The sample enhanced image may have both said appearance information and said structure information. The structural information may be possessed by a target object in the sample enhanced image, and the target object may be the aforementioned first object or second object.

In one example, please refer to fig. 2, which shows that the sample image is an image containing an eye, and the structural information is a structural diagram of the eye in another state. After the processing by the image enhancement network, the output sample enhanced image may replace the structure information of the first object in the sample image with the structure information of the second object compared to the sample image, and other information in the sample image may not be changed, for example, the face texture around the eyes, the face color, the eyebrows, the positions of the eyeballs inside the eyes, the color of the eyeballs, and the like in the sample image may not be changed, and is the same as in the sample image.

In step 106, network parameters of the image enhancement network are adjusted according to the sample enhanced image.

In this embodiment, the image of the label as the sample enhanced image may be an auxiliary image in which the second object is located. The auxiliary image may be the same size as the image of the sample image, and the auxiliary image and the sample image may be of the same area.

For example, the sample image in fig. 2 includes one eye and one eyebrow, the auxiliary image corresponding to the sample image may also include one eye and one eyebrow, that is, include the same area as the sample image, and the sample image and the auxiliary image may have the same size. The difference is that the structural information of the eyes of the sample image and the auxiliary image are different, for example, the eyes in the sample image are open and the eyes in the auxiliary image are squinting.

After the sample enhanced image is obtained, network parameters of the image enhancement network may be adjusted according to the sample enhanced image. For example, the L1 norm loss function (L1 loss) between the sample enhanced image and the auxiliary image may be solved according to the difference between the sample enhanced image and the auxiliary image, and the network parameters of the appearance extractor and generator may be adjusted according to the L1 loss.

The training method of the image enhancement network of the embodiment can enhance the sample image according to various types of structural information through the method, because the structural information can be various and unlimited, so that more abundant sample enhancement images can be obtained, the types of the samples are more abundant, when the generated sample enhancement images are applied to tasks such as model training, the robustness and the generalization of the model training can be improved through abundant and diversified samples, and more abundant sample types are obtained through the method, compared with the previous sample acquisition mode, the cost for acquiring the samples is reduced, and the sample acquisition is simpler and more convenient. In addition, the method generates the sample enhanced image through the image enhancement network, and compared with the conventional image processing modes such as interpolation, stretching and the like, the generated image quality can be higher.

Further, the first object included in the sample image may be determined according to the requirements of the actual application. For example, according to the sample enhanced image obtained according to the embodiment of the present disclosure, if the actual application needs to include an image of an eye, the first object in the sample enhanced image is the eye. For another example, if the actual application requires an image including a mouth, the first object in the sample enhanced image is the mouth. In addition, other organs in the five sense organs of the face, such as the eyebrows, nose, etc., can be enhanced. Accordingly, generation of a sample enhanced image may be performed using a sample image containing an organ to be enhanced and corresponding structural information of the organ.

The sample enhanced image illustrated in fig. 2 is an image including eyes, and in practical implementation, sometimes the initially obtained image may be an image with a relatively large range including the whole face, and then the initial image may be preprocessed before the image enhancement process illustrated in fig. 2 is performed.

Referring to fig. 4, assuming that an image is initially obtained, which may be referred to as an "initial image," and the initial image includes not only the face of a person but also a plurality of areas of the hand, neck, clothing, etc., the face keypoints (for example, 106 keypoints) in the initial image may be detected through a pre-trained keypoint detection network. And cutting the initial image according to the detected key points to obtain an image including a human face, and removing areas except the human face, such as a hand, a neck and the like. The size of the cropped face image may be 1024 × 1024. As shown in fig. 4, the fig. 4 illustrates a face image obtained by cropping an initial image, and some face key points in the face image, for example, key point 41, key point 42, and so on.

Further, if it is desired to perform enhancement deformation on one of the organ regions in the face image through the image enhancement network shown in fig. 2, for example, to enhance the mouth, the face image shown in fig. 4 may be further cropped according to the above-mentioned key points of the face, so as to obtain an image including the mouth. As shown in fig. 4, the mouth in the mouth image is the mouth in the face image. And, the mouth image can be used as a sample image in the training stage of the image enhancement network, or can also be used as an auxiliary image.

Still further, according to the key points of the mouth, the structural information of the corresponding mouth can be obtained. As shown in fig. 4, the structure information may be a structure map (heatmap) corresponding to the mouth. The construction diagram may include key points of the mouth. The structure map may be input to an image enhancement network to assist the sample image in generating a corresponding sample enhanced image.

In addition, taking the enhanced face image as an example, when training data of the image enhancement network is prepared, the following data may be prepared:

1) small number of face images of the same ID: for example, it may be 15 face images of the same person. The same ID refers to the same person, for example, multiple face images of a king belong to the same ID, and the ID is the identification of the king.

2) The face images of a large number of other IDs are obtained, and each ID has a certain number of face images with different expressions and different angles. For example, the 1.5-million other IDs may be face images of other people such as a queen, a novice, and the like.

As described above, the prepared training data may include facial images of a plurality of IDs, each ID may include facial images of a plurality of expressions and different angles, and the different expressions and angles may correspond to different structural information.

When the image enhancement network is trained using the above-described training data, the sample image and the auxiliary image may be two face images belonging to the same ID randomly extracted from the above training data. For example, two small face images may be extracted, in which the two images are faces of small faces, the small face in one image is squinted eyes, the small face in the other image is open eyes, the structural information of the eyes in the two images is different, and the appearance information other than the structural information is the same. The auxiliary image is used as a label of the current enhancement, and the network parameters of the image enhancement network are adjusted according to the difference between the auxiliary image and the sample enhancement image output by the image enhancement network.

In one example, each face image in the training data may be subjected to the preprocessing illustrated in fig. 4. For example, the face key points in each face image are identified, and then the face image and the image including one organ in the five sense organs of the face are obtained by cutting according to the face key points. For example, assuming that an organ image including an eye is required, each image in the training data described above may be cropped to obtain an eye image including an eye. Then, the two eye images belonging to the same person are respectively used as an auxiliary image and a sample image, and an enhanced eye image is obtained through the image enhancement network shown in fig. 2, that is, in the enhanced eye image, the structure information of the eyes in the sample image is replaced by the structure information of the eyes in the auxiliary image.

Fig. 5 illustrates another network training schematic diagram provided by at least one embodiment of the present disclosure, and in addition to the aforementioned adjustment of network parameters according to the difference between the sample enhanced image and the auxiliary image, the training manner shown in fig. 5 may be adopted in training the image enhanced network.

As shown in fig. 5, the sample enhanced image and a corresponding label (for example, the label may be an auxiliary image) may be input to the discriminator 23, and a discrimination value output by the discriminator 23 is obtained. For example, the discrimination value may be a value between 0 and 1 to indicate a probability that the sample enhances the authenticity of the image. Obtaining a first loss according to the difference between the discrimination value and the discrimination true value; and deriving a second loss from a difference between the sample enhanced image and the auxiliary image. Adjusting a network parameter of at least one of the appearance extractor, the generator, and the discriminator further based on the first loss and the second loss.

In addition, the generator and the arbiter may adopt a conventional network structure for generating a countermeasure network (GAN), and the embodiment is not limited. For example, convolutional layers, residual modules, pooling layers, linear layers, active layers, etc. may be included in the network structure.

In the mode of generating the confrontation network training image enhancement network to generate the sample enhancement image, the discrimination value output by the discriminator is enabled to be the true value as much as possible through training, so that the reality of the generation of the enhancement image can be improved, and the generation of the enhancement image with higher quality is facilitated.

The image enhancement network described above may then be trained to generate enhanced images. Fig. 6 illustrates a flowchart of an image enhancement method provided by at least one embodiment of the present disclosure, which may be performed by an image enhancement apparatus as illustrated in fig. 6, and the method may include the following processes:

in step 600, feature extraction is performed on a target image to be enhanced to obtain appearance information of the target image, where the target image includes a first object.

In one example, the target image may be an image including an eye, such as the sample image shown in fig. 2 including an image of a person's eye. The present embodiment may refer to the eye in the target image as the first object, and the purpose of the present embodiment may be to enhance the target image and perform enhanced deformation on the eye in the target image.

The target image can be subjected to feature extraction through an appearance extractor in the trained image enhancement network, and appearance information of the target image is obtained.

Furthermore, if the initial image is an image including a complete face, the initial image may be preprocessed to obtain a target image including eyes. For example, the face key points in the initial image can be obtained by detecting the face key points in the initial image through a key point detection network. And the initial image can be cut according to the face key points to obtain the target image comprising eyes.

In step 602, structure information of a second object is obtained according to key points of the second object in an auxiliary image, where the first object and the second object are both target objects of the same kind.

In this step, the second object in the auxiliary image is the same type of object as the first object, for example, both the objects are eyes or both the objects are mouths. The same type of object may be referred to as a target object, the eyes in the auxiliary image and the target image are different, the eyes in the target image are referred to as a first object, and the eyes in the auxiliary image are referred to as a second object.

The first object and the second object of the present embodiment may be the same target object, for example, both are eyes of a queen, and the eye states of the two objects are different (e.g., one open wide, one squinted). Alternatively, the first object and the second object may also belong to different target objects, for example, the first object is an eye of a queen and the second object is an eye of a young person.

The present embodiment may obtain the structural information of the second object according to the key point of the second object in the auxiliary image. The image enhancement network may include a network module for extracting key points, and after the auxiliary image is input to the image enhancement network, the key points in the auxiliary image may be extracted through the network module, and then the structure information of the second object may be obtained according to the key points. Alternatively, the image enhancement network may not include a network module for extracting the key points, but may obtain the structure information of the second object through a processing module other than the image enhancement network, and input the structure information into the image enhancement network.

In step 604, an enhanced image is generated based on the appearance information and the structure information, the enhanced image replacing the structure information of the first object in the target image with the structure information of the second object.

For example, a generator in the image enhancement network may perform image generation processing according to the acquired appearance information and structure information, and finally generate an enhanced image. The enhanced image includes appearance information of the target image and structure information of the second object in the auxiliary image, and the enhanced image replaces the structure information of the first object in the target image with the structure information of the second object compared to the target image.

In one example, according to the actual application requirement, if the enhanced image including the eyes output by the image enhancement apparatus of the present embodiment through the image enhancement network can be used for subsequent network training, the enhanced image may not be subjected to subsequent processing. In another example, although the enhancement processing may be performed on individual organ portions through the image enhancement network shown in fig. 2, it is desirable that the image of the entire face is finally output. For example, the initial image may be a face image of a king, with the intention of obtaining an enhanced image that alters the structural information of the eyes of the king. Then, the structure information of the eyes of the small page can be acquired, and the image generation processing is performed through the image enhancement network by combining the structure information of the eyes of the small page and the image of the eyes of the small king obtained by cutting the face image of the small king, and the structure information of the eyes of the small king is replaced by the structure information of the eyes of the small page in the obtained enhanced image. However, the enhanced image output by the image enhancement network is an image including the eyes of the queen, and the enhanced image can be pasted back to the original queen face image, that is, the enhanced image is replaced by the corresponding part in the queen face image, so that the updated queen face image can be obtained, which can also be called as the enhanced face image of the queen.

In yet another example, if an enhanced face image with multiple organs such as eyes and mouth changed is desired, the following process can be used: the method comprises the steps of respectively cutting an eye image (including an image of eyes) and a mouth image (including an image of a mouth) from an initial face image according to face key points, and then respectively carrying out enhancement processing on the eye image and the mouth image through an image enhancement network to obtain respective corresponding enhanced images, such as an eye enhanced image and a mouth enhanced image. Finally, pasting the eye enhanced image and the mouth enhanced image back to the initial image respectively, and replacing the corresponding parts in the initial face image.

The process of generating the enhanced image in fig. 6 may be applied to a training scenario of a network, for example, if a neural network is to be trained, and a training sample is not enough, the enhanced image is generated by the method in fig. 6 according to the embodiment of the present disclosure, so as to obtain a richer sample image. As described above, the image enhancement network provided by the embodiment of the present disclosure may generate an enhanced image by combining with any structure information, and when generating an enhanced image by using the method, rich face enhanced images may be generated, which may include face enhanced images of various angles and expressions. The rich and diversified enhanced images are beneficial to improving the generalization and the robustness of the trained neural network model when being applied to the training of the neural network model, and the method generates the enhanced images through the trained image enhanced network and also uses a countermeasure generation mode for training in the training process, so that the generated enhanced images have higher quality and are more vivid and clear.

In a scene with certain data acquisition difficulty, for example, only a small amount of data with the same ID can be acquired, and then the small amount of data can be enriched through the image enhancement network of the embodiment of the present disclosure, so that the difficulty of data acquisition is reduced when the data is acquired.

In addition, the flow of generating the enhanced image in fig. 6 may also be applied to other scenes, for example, face image enhancement applications such as makeup migration and face driving.

For example, if the eyes of a human face in the initial image are to be transformed, the image including the eyes is enhanced by the method, and the enhanced eye image replaces the eyes in the initial image.

For another example, in a scene of makeup migration, the makeup of eyes of one person may be migrated to the eyes of another person, and then appearance information related to the makeup of the eyes of the small may be extracted by an appearance extractor in the image enhancement network, and then an enhanced image in which the eye structure of the small is not changed but it is already provided with the makeup of eyes of the small is generated in combination with the structure information of the eyes of the small.

For another example, in a face-driven scenario, assume that a small facial expression is to be used to drive the same expressive action of a king's face, and assume the action of the mouth in particular. And then, an enhanced image can be generated by combining the appearance information of the face picture of the king and the structure information of the mouth of the small piece, so that the enhanced image is the face of the king, and only the action expression of the mouth is changed into the expression of the small piece.

In order to implement the image enhancement method according to any of the embodiments of the present disclosure, an embodiment of the present disclosure further provides an image enhancement device. As shown in fig. 7, the image enhancement apparatus may include: appearance extraction module 71, structure acquisition module 72, and image generation module 73.

The appearance extraction module 71 is configured to perform feature extraction on a target image to be enhanced to obtain appearance information of the target image, where the target image includes a first object; the appearance information represents surface visual features in the target image.

A structure obtaining module 72, configured to obtain structure information of a second object, where the first object and the second object are target objects of the same kind; the structural information represents a contour feature of the second object.

An image generating module 73 configured to generate an enhanced image based on the appearance information and the structure information, the enhanced image including the appearance information and a target object having the structure information.

In an example, the appearance extraction module 71, when configured to perform feature extraction on a target image to be enhanced to obtain appearance information of the target image, includes: and performing feature extraction on a target image to be enhanced through an appearance extractor in the image enhancement network to obtain appearance information of the target image.

The image generation module 73, when configured to generate an enhanced image based on the appearance information and the structure information, includes: generating, by a generator in the image enhancement network, an enhanced image based on the appearance information and structure information.

In one example, the structure obtaining module 72, when configured to obtain the structure information of the second object, includes: acquiring an initial image, wherein the initial image comprises the second object; performing key point detection on the initial image to obtain key points of a second object in the initial image; and obtaining the structural information of the second object according to the key points of the second object.

In one example, the apparatus further comprises: and a preprocessing module. The preprocessing module is used for acquiring an initial image, and the initial image comprises the target object; performing key point detection on the initial image to obtain key points of a target object in the initial image; cutting the initial image according to the key point of the target object to obtain the target image or the auxiliary image comprising the target object; wherein the second object is included in the auxiliary image.

In order to implement the method for training the image enhancement network according to any of the embodiments of the present disclosure, an embodiment of the present disclosure further provides a device for training the image enhancement network. As shown in fig. 8, the training device of the image enhancement network may include: an information acquisition module 81, a feature extraction module 82, an image output module 83, and a parameter adjustment module 84.

An information obtaining module 81, configured to obtain a sample image to be enhanced and structure information of a second object, where the first object and the second object are the same target object with different structure information; the sample image comprises a first object; the structural information represents a contour feature of the second object.

The feature extraction module 82 is configured to perform feature extraction on the sample image to be enhanced through an image enhancement network to obtain appearance information of the sample image, where the appearance information represents a surface visual feature in the target image.

An image output module 83, configured to perform image generation processing on the appearance information and the structure information through the image enhancement network, and output a sample enhanced image, where the sample enhanced image includes the appearance information and the target object having the structure information.

And a parameter adjusting module 84, configured to adjust a network parameter of the image enhancement network according to the sample enhanced image.

In one example, the parameter adjusting module 84, when configured to adjust the network parameters of the image enhancement network according to the sample enhanced image, includes: adjusting network parameters of the appearance extractor and generator according to a difference between the sample enhanced image and an auxiliary image; wherein the second object is included in the auxiliary image.

In one example, the parameter adjusting module 84, when configured to adjust the network parameters of the image enhancement network according to the sample enhanced image, includes: inputting the sample enhanced image into the discriminator to obtain a discrimination value output by the discriminator; obtaining a first loss according to the difference between the discrimination value and the discrimination true value, and obtaining a second loss according to the difference between the sample enhanced image and the auxiliary image; adjusting network parameters of at least one of the appearance extractor, the generator and the discriminator according to the first loss and the second loss; wherein the second object is included in the auxiliary image.

One skilled in the art will appreciate that one or more embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program may be stored, and when the program is executed by a processor, the computer program implements the image enhancement method or the training method of the image enhancement network described in any embodiment of the present disclosure.

An embodiment of the present disclosure further provides an electronic device, which includes: the image enhancement network comprises a memory and a processor, wherein the memory is used for storing computer readable instructions, and the processor is used for calling the computer instructions to realize the image enhancement method or the training method of the image enhancement network in any embodiment of the disclosure.

Wherein, the "and/or" described in the embodiments of the present disclosure means having at least one of the two, for example, "multiple and/or B" includes three schemes: poly, B, and "poly and B".

The embodiments in the disclosure are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The foregoing description of specific embodiments of the present disclosure has been described. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this disclosure may be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this disclosure and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPG multi (field programmable gate array) or a SIC multi (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Further, the computer may be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PD multi), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this disclosure contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular embodiments of the disclosure. Certain features that are described in this disclosure in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present disclosure, and is not intended to limit the scope of the present disclosure, which is to be construed as being limited by the appended claims.

Claims

1. A method of image enhancement, the method comprising:

2. The method of claim 1, wherein the method is performed by an image enhancement device having an image enhancement network deployed therein, the image enhancement network comprising: an appearance extractor and generator;

the feature extraction of the target image to be enhanced to obtain the appearance information of the target image comprises the following steps: performing feature extraction on a target image to be enhanced through an appearance extractor in the image enhancement network to obtain appearance information of the target image;

the generating an enhanced image based on the appearance information and the structure information includes: generating, by a generator in the image enhancement network, an enhanced image based on the appearance information and structure information.

3. The method of claim 1, wherein the obtaining structural information of the second object comprises:

acquiring an initial image, wherein the initial image comprises the second object;

performing key point detection on the initial image to obtain key points of a second object in the initial image;

and obtaining the structural information of the second object according to the key points of the second object.

4. The method of claim 1, wherein the second object is included in an auxiliary image; the method further comprises the following steps:

acquiring an initial image, wherein the initial image comprises the target object;

performing key point detection on the initial image to obtain key points of a target object in the initial image;

and cutting the initial image according to the key point of the target object to obtain the target image or the auxiliary image comprising the target object.

5. The method of claim 4, further comprising: replacing the enhanced image with a corresponding image portion in the initial image after the generating of the enhanced image based on the appearance information and structure information.

6. The method of claim 1, wherein the first object and the second object are the same target object or different target objects of the same class, and wherein the target object is one of the five sense organs in the human face.

7. A method for training an image enhancement network, the method comprising:

8. The training method according to claim 7, wherein the second object is included in an auxiliary image; the adjusting the network parameters of the image enhancement network according to the sample enhanced image comprises:

adjusting network parameters of an appearance extractor and a generator according to a difference between the sample enhanced image and the auxiliary image; the image enhancement network includes: the appearance extractor and generator.

9. The training method according to claim 7, wherein the second object is included in an auxiliary image; the image enhancement network includes: an appearance extractor and generator;

the adjusting the network parameters of the image enhancement network according to the sample enhanced image comprises:

inputting the sample enhanced image into a discriminator to obtain a discrimination value output by the discriminator;

obtaining a first loss according to the difference between the discrimination value and the discrimination true value, and obtaining a second loss according to the difference between the sample enhanced image and the auxiliary image;

adjusting a network parameter of at least one of the appearance extractor, generator, and discriminator based on the first loss and the second loss.

10. An image enhancement apparatus, characterized in that the apparatus comprises:

11. An apparatus for training an image enhancement network, the apparatus comprising:

12. An electronic device, comprising: a memory for storing computer readable instructions, a processor for invoking the computer instructions to implement the method of any of claims 1 to 6, or the method of any of claims 7 to 9.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 6, or the method of any one of claims 7 to 9.