CN111311483A - Image editing and training method and device, electronic equipment and storage medium - Google Patents

Image editing and training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111311483A
CN111311483A CN202010074293.3A CN202010074293A CN111311483A CN 111311483 A CN111311483 A CN 111311483A CN 202010074293 A CN202010074293 A CN 202010074293A CN 111311483 A CN111311483 A CN 111311483A
Authority
CN
China
Prior art keywords
training
result
image
model
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010074293.3A
Other languages
Chinese (zh)
Inventor
祝加鹏
沈宇军
周博磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202010074293.3A priority Critical patent/CN111311483A/en
Publication of CN111311483A publication Critical patent/CN111311483A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure relates to an image editing and training method, apparatus, electronic device, and storage medium, the image editing method comprising: acquiring a first image; obtaining first coded data of the first image according to the first image; transforming the first coded data to obtain second coded data of the first image; and generating a second image according to the second coded data. The training method comprises the following steps: acquiring training data; training an initial coding model according to the training data to obtain the coding model; and training the initial transformation relation between the first coded data and the second coded data according to the training data and the coding model to obtain the transformation relation.

Description

Image editing and training method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to an image editing and training method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of image processing technology, image generation models become more and more important in the research of computer vision. The image generation model has important applications in many fields, such as data expansion of images, image editing and the like.
In the case of applying the image generation model to image editing, the image generation model may edit the input image, thereby changing some image feature or features of the image. In order to make the edited image more natural, how to change the image characteristics and simultaneously keep the characteristics of the rest real images which do not need to be changed as much as possible in the image editing process becomes a problem to be solved urgently.
Disclosure of Invention
The present disclosure proposes an image editing technical solution.
According to an aspect of the present disclosure, there is provided an image editing method including:
acquiring a first image; obtaining first coded data of the first image according to the first image; transforming the first coded data to obtain second coded data of the first image; and generating a second image according to the second coded data.
In one possible implementation, the generating a second image from the second encoded data includes: according to the editing type of the first image, combining the second coded data to obtain data to be edited of the first image; and taking the data to be edited as an input of an image generation model so as to generate the second image through the image generation model.
In a possible implementation manner, the transforming the first encoded data to obtain second encoded data of the first image includes: acquiring a transformation relation between the first coded data and the second coded data; and transforming the first coded data according to the transformation relation to obtain second coded data of the first image.
According to an aspect of the present disclosure, there is provided a training method, obtaining first encoding data of a first image according to the first image, including: taking the first image as an input of a coding model to obtain the first coded data through the coding model; before obtaining first encoded data for the first image from the first image, the method further comprises: acquiring training data; training an initial coding model according to the training data to obtain the coding model; and training the initial transformation relation between the first coded data and the second coded data according to the training data and the coding model to obtain the transformation relation.
In a possible implementation manner, the training an initial coding model according to the training data to obtain the coding model includes: sequentially passing the training data through an initial coding model and the image generation model to obtain a first reconstruction result corresponding to each training data; respectively taking each first reconstruction result as the input of an identification model, so as to obtain a first identification result corresponding to each training data through the identification model; and training the initial coding model according to the training data, the first reconstruction result and the first identification result to obtain the coding model.
In a possible implementation manner, the training the initial coding model according to the training data, the first reconstruction result, and the first identification result to obtain the coding model includes: acquiring a first error loss of the initial coding model according to the training data and the first reconstruction result; acquiring a second error loss of the initial coding model according to the characteristics of the training data and the characteristics of the first reconstruction result; processing the first discrimination result to obtain a first processing result, so as to obtain a third error loss of the initial coding model according to the first processing result; and training the initial coding model according to one or more of the first error loss, the second error loss and the third error loss to obtain the coding model.
In a possible implementation manner, the training an initial coding model according to the training data to obtain the coding model further includes: respectively taking each training data as the input of the identification model, so as to obtain a second identification result corresponding to each training data through the identification model; and training the identification model according to the first identification result and the second identification result.
In one possible implementation, the training the identification model according to the first identification result and the second identification result includes: processing the first identification result to obtain a second processing result so as to obtain a fourth error loss of the identification model according to the second processing result; processing the second identification result to obtain a third processing result, so as to obtain a fifth error loss of the identification model according to the third processing result; processing the gradient value of the second identification result to obtain a fourth processing result, so as to obtain a sixth error loss of the identification model according to the fourth processing result; training the discriminative model based on one or more of the fourth error loss, fifth error loss, and the sixth error loss.
In a possible implementation manner, the training, according to the training data and the coding model, an initial transformation relationship between the first encoded data and the second encoded data to obtain the transformation relationship includes: taking the training data as the input of the coding model, so as to obtain a first coding result corresponding to each training data through the coding model; respectively taking each first coding result as the input of the image generation model, so as to obtain a second reconstruction result corresponding to each training data through the image generation model; respectively taking each second reconstruction result as the input of the identification model, so as to obtain a third identification result corresponding to each training data through the identification model; respectively taking each second reconstruction result as the input of the coding model, so as to obtain a second coding result corresponding to each training data through the coding model; and training the initial transformation relation according to the training data, the first coding result, the second reconstruction result, the third identification result and the second coding result to obtain the transformation relation.
In a possible implementation manner, the training the initial transformation relationship according to the training data, the first encoding result, the second reconstruction result, the third identification result, and the second encoding result to obtain the transformation relationship includes: acquiring a seventh error loss of the initial transformation relation according to the training data and the second reconstruction result; acquiring an eighth error loss of the initial transformation relation according to the characteristics of the training data and the characteristics of the second reconstruction result; obtaining a ninth error loss of the initial transformation relation according to the third discrimination result; acquiring a tenth error loss of the initial transformation relation according to the first coding result and the second coding result; and training the initial transformation relation according to one or more of the seventh error loss, the eighth error loss, the ninth error loss and the tenth error loss to obtain the transformation relation.
According to an aspect of the present disclosure, there is provided an image editing apparatus including:
the image acquisition module is used for acquiring a first image; the encoding module is used for obtaining first encoded data of the first image according to the first image; the transformation module is used for transforming the first coded data to obtain second coded data of the first image; and the image generation module is used for generating a second image according to the second coded data.
In one possible implementation, the generating module is configured to: according to the editing type of the first image, combining the second coded data to obtain data to be edited of the first image; and taking the data to be edited as an input of an image generation model so as to generate the second image through the image generation model.
In one possible implementation, the transformation module is configured to: acquiring a transformation relation between the first coded data and the second coded data; and transforming the first coded data according to the transformation relation to obtain second coded data of the first image.
According to an aspect of the disclosure, there is provided a training apparatus, the encoding module is configured to: taking the first image as an input of a coding model to obtain the first coded data through the coding model; the training apparatus includes:
the training data acquisition module is used for acquiring training data; the coding model training module is used for training an initial coding model according to the training data to obtain the coding model; and the transformation relation training module is used for training the initial transformation relation between the first coded data and the second coded data according to the training data and the coding model to obtain the transformation relation.
In one possible implementation, the coding model training module is configured to: sequentially passing the training data through an initial coding model and the image generation model to obtain a first reconstruction result corresponding to each training data; respectively taking each first reconstruction result as the input of an identification model, so as to obtain a first identification result corresponding to each training data through the identification model; and training the initial coding model according to the training data, the first reconstruction result and the first identification result to obtain the coding model.
In one possible implementation, the coding model training module is further configured to: acquiring a first error loss of the initial coding model according to the training data and the first reconstruction result; acquiring a second error loss of the initial coding model according to the characteristics of the training data and the characteristics of the first reconstruction result; processing the first discrimination result to obtain a first processing result, so as to obtain a third error loss of the initial coding model according to the first processing result; and training the initial coding model according to one or more of the first error loss, the second error loss and the third error loss to obtain the coding model.
In one possible implementation, the coding model training module is further configured to: respectively taking each training data as the input of the identification model, so as to obtain a second identification result corresponding to each training data through the identification model; and training the identification model according to the first identification result and the second identification result.
In one possible implementation, the coding model training module is further configured to: processing the first identification result to obtain a second processing result so as to obtain a fourth error loss of the identification model according to the second processing result; processing the second identification result to obtain a third processing result, so as to obtain a fifth error loss of the identification model according to the third processing result; processing the gradient value of the second identification result to obtain a fourth processing result, so as to obtain a sixth error loss of the identification model according to the fourth processing result; training the discriminative model based on one or more of the fourth error loss, fifth error loss, and the sixth error loss.
In one possible implementation, the transformation relation training module is configured to: taking the training data as the input of the coding model, so as to obtain a first coding result corresponding to each training data through the coding model; respectively taking each first coding result as the input of the image generation model, so as to obtain a second reconstruction result corresponding to each training data through the image generation model; respectively taking each second reconstruction result as the input of the identification model, so as to obtain a third identification result corresponding to each training data through the identification model; respectively taking each second reconstruction result as the input of the coding model, so as to obtain a second coding result corresponding to each training data through the coding model; and training the initial transformation relation according to the training data, the first coding result, the second reconstruction result, the third identification result and the second coding result to obtain the transformation relation.
In one possible implementation, the transformation relation training module is further configured to: acquiring a seventh error loss of the initial transformation relation according to the training data and the second reconstruction result; acquiring an eighth error loss of the initial transformation relation according to the characteristics of the training data and the characteristics of the second reconstruction result; obtaining a ninth error loss of the initial transformation relation according to the third discrimination result; acquiring a tenth error loss of the initial transformation relation according to the first coding result and the second coding result; and training the initial transformation relation according to one or more of the seventh error loss, the eighth error loss, the ninth error loss and the tenth error loss to obtain the transformation relation.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: the above-described image editing method is executed.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: the above training method is performed.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described image editing method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above training method.
In the disclosed embodiment, the second image is generated from the second coded data by acquiring the first image, obtaining the first coded data from the first image, and then transforming the first coded data to obtain the second coded data. Through the process, the first coded data of the first image can be acquired before the image is edited, and the first coded data is transformed to obtain the second coded data, so that the second coded data used for generating the second image is located in the region space where the sampled data is located after the first image is sampled, namely the target region space, as much as possible under the constraint of transformation, and therefore the generated second image retains more characteristics of the first image, and the authenticity of the second image is effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flowchart of an image editing method according to an embodiment of the present disclosure.
Fig. 2 shows a flow diagram of a training method according to an embodiment of the present disclosure.
FIG. 3 shows a schematic diagram of a training process of a coding model according to an embodiment of the present disclosure.
Fig. 4 is a schematic diagram illustrating a process of training a coding model in the related art according to an embodiment of the disclosure.
Fig. 5 is a schematic diagram illustrating a principle of training a transformation relation between first encoded data and second encoded data in the related art.
FIG. 6 illustrates a schematic diagram of training a transformation relationship between first encoded data and second encoded data according to an embodiment of the disclosure.
Fig. 7 is a schematic diagram illustrating an effect of an image editing method according to an embodiment of the present disclosure after editing the image.
Fig. 8 shows a block diagram of an image editing apparatus according to an embodiment of the present disclosure.
FIG. 9 shows a block diagram of a training apparatus according to an embodiment of the present disclosure.
FIG. 10 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.
FIG. 11 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 shows a flowchart of an image editing method according to an embodiment of the present disclosure, which may be applied to a terminal device, a server or other processing device, and the like. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.
In some possible implementations, the image editing method may be implemented by a processor calling computer readable instructions stored in a memory. In one example, the processor may be a general-purpose processor such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or an Artificial Intelligence processor such as an Artificial Intelligence (AI) chip.
As shown in fig. 1, the image editing method may include:
in step S11, a first image is acquired.
In step S12, first encoded data of the first image is obtained based on the first image.
In step S13, the first encoded data is transformed to obtain second encoded data of the first image.
In step S14, a second image is generated based on the second encoded data.
In the above-described embodiment, the first image may be an image to be edited, and in a possible implementation manner, the image may include a face image or other images including a target object, and the like, which is not limited in the embodiment of the present disclosure. The type of editing performed on the first image is also not limited in the embodiments of the present disclosure, and may be to change some feature or features of the target object in the image. In one possible implementation, one or more features may be added to the first image, such as adding glasses to the face image, changing the expression of the face image to make the face image smile, cry, or making changes to the face image such as aging, younger, beard, etc. In a possible implementation, the scene of the first image may also be changed, such as increasing the illumination of the first image or adjusting the layout of the first image. Specifically, the number and type of the features edited in the first image may be flexibly selected according to the actual situation, and are not limited in the embodiment of the present disclosure.
Since the implementation manner of the first image is not limited, the manner of acquiring the first image in step S11 may also be flexibly selected according to the actual situation, and is not limited herein.
After the first image is acquired, the first encoded data of the first image may be obtained from the first image through step S12. In one possible implementation, step S11 may include: and taking the first image as an input of a coding model so as to obtain first coded data through the coding model. The implementation manner of the coding model can be flexibly selected according to actual conditions, and any neural network model capable of acquiring the hidden code (late code) of the first image can be used as the implementation form of the coding model, and is not limited to the embodiments disclosed later. By means of the coding model, first coded data of the first image can be obtained, and the first coded data can be used for reconstructing the first image through an image generation model and also can be used for generating a second edited image through the image generation model after adding certain edited attribute data.
In a possible implementation manner, a space where first encoded data obtained by encoding the first image through the encoding model is located may be different from or belong to a space where sampled data (which may be referred to as a target area space domain) after sampling the first image, that is, the first encoded data may exceed the target area space (out-of-domain). In such a case, the difference between the second image reconstructed or generated by using the first encoded data and the image generation model and the first image may be relatively large, which may cause distortion of the generated second image.
In the embodiment of the present disclosure, in order to reduce the possibility of the above situation, the first encoded data may be transformed by step S13, so as to obtain the second encoded data of the first image, so that the transformed second encoded data may be located in the target region space as much as possible. The specific transformation manner is not limited in the embodiment of the present disclosure, and may be flexibly obtained according to an actual situation, and in a possible implementation manner, the step S13 may include:
step S131, a transformation relation between the first encoded data and the second encoded data is obtained.
Step S132, according to the transformation relation, the first coding data is transformed to obtain second coding data of the first image.
As can be seen from the above disclosed embodiments, in a possible implementation manner, in the case of transforming the first encoded data, the transformation may be implemented according to a transformation relation between the first encoded data and the second encoded data after obtaining the transformation relation. The implementation manner of the transformation relationship between the first encoded data and the second encoded data is not limited in the embodiments of the present disclosure, and may be flexibly selected according to the actual situation.
Transform first coded data through obtaining the transform relation between first coded data and the second coded data, obtain the second coded data, can be effectual look like transform process for a concrete corresponding relation to be convenient for with image editing process modelling, conveniently edit the image of batch then, promote the practicality of image editing process.
After the second encoded data located in the target region space is obtained, a second image may be generated from the second encoded data by step S14. In one possible implementation, step S14 may include: a second image is generated from the second encoded data by the image generation model. That is, in one possible implementation, the second encoded data may be utilized in conjunction with an image generation model to generate a second image, thereby completing the editing of the first image. In the embodiment of the present disclosure, an image generation model for generating the second image is not limited to the implementation, and any neural network model that can generate an image from encoded data of an image may be used as an implementation form of the image generation model. In one possible implementation, the image generation model may be a Generator (Generator) in a Generative Augmentation Network (GAN). The generative confrontation network is a generative confrontation network model, which may include a generator and a Discriminator (Discriminator), wherein the generator may take encoded data obtained by randomly sampling an image as input to generate an image, and the Discriminator may take the image generated by the generator as input to discriminate whether the image is a real image or a generated image. Since the generator may generate the image using the encoded data of the image as input, in one example, the generator in the GAN may be used as an implementation form of the image generation model.
Because the second coded data is positioned in the target space region, the obtained second image can retain more characteristics of the first image as much as possible, so that the generated second image has higher reality. How to generate the second image in step S14 can be flexibly selected according to actual situations, and is not limited to the following disclosed embodiments.
In one possible implementation, step S14 may include:
step S141, combining the second encoded data according to the editing type of the first image to obtain the data to be edited of the first image.
In step S142, the data to be edited is used as an input of the image generation model to generate a second image through the image generation model.
In the above-described disclosed embodiment, reference may be made to the foregoing embodiment for the editing type of the first image, that is, the editing type of the first image is not limited in the embodiment of the present disclosure, and may be flexibly selected according to the actual situation. Since the editing type is not limited, the process of obtaining the data to be edited in step S141 can be flexibly changed according to the editing type.
In one possible implementation, the process of step S141 may be represented by a formula, and in one example, the formula may be:
zedit=z+αn (1)
wherein z iseditFor the data to be edited, z is the encoded data of the first image, in the embodiment of the present disclosure, this encoded data may be the second encoded data, n is the editing type, such as adding smile or adding glasses, etc., α is a parameter for controlling the size of the editing type, and can be flexibly adjusted according to the actual situation.
As can be seen from the above disclosed embodiments, in one possible implementation, the data to be edited can be obtained by adding the edited content on the basis of the second encoded data. In a possible implementation manner, when the first image is reconstructed by using the image generation model, it may be considered that the item of the edit type n is represented as 0, that is, the edit type of the image is not edited, at this time, the data to be edited generated in step S141 is encoded data of the image, a second image obtained by passing the encoded data through the image generation model is a reconstructed image of the first image, that is, the content of the second image is the same as that of the first image, and the second image is compared with the first image, so that the true degree of the obtained second image can be known.
After the data to be edited is obtained, the second image may be generated through step S142, and in a possible implementation manner, the implementation process of step S142 may also be represented in the form of a formula, which may be represented as:
xedit=G(z+αn) (2)
wherein, for the second image generated by step S142, it represents that the data to be edited passes through the image generation model.
As can be seen from the above disclosed embodiments, in one possible implementation, the data to be edited may be entered into the image generation model as input data, and then the output of the image generation model may be used as the generated second image.
In the embodiment of the disclosure, the data to be edited of the first image is obtained by combining the second encoded data according to the editing type of the first image, the data to be edited is used as the input of the image generation model, so as to generate the second image through the image generation model.
In the embodiment of the present disclosure, the second image is generated from the second encoded data by acquiring the first image, obtaining the first encoded data from the first image, and then transforming the first encoded data to obtain the second encoded data. Through the process, the first coded data of the first image can be acquired before the image is edited, and the first coded data is transformed to obtain the second coded data, so that the second coded data used for generating the second image is located in the region space where the sampled data is located after the first image is sampled, namely the target region space, as much as possible under the constraint of transformation, and therefore the generated second image retains more characteristics of the first image, and the authenticity of the second image is effectively improved.
As can be seen from the above-described embodiments, when the second image is generated through steps S11 to S14, the second encoded data input to the image generation model can be obtained through the encoding model and the transform of the encoded data in sequence. Meanwhile, in order to improve the reality of the second image generated by the image generation model, it is necessary to make the generated second encoded data be located in the target region space as much as possible. Therefore, in a possible implementation manner, the related content in the image editing method may be trained through training data, so that the generated second encoded data is located in the target region space as much as possible, and thus the image editing method obtains the degree of reality of the result.
Therefore, the embodiment of the present disclosure also provides a training method based on the above image editing method, and fig. 2 shows a flowchart of the training method according to an embodiment of the present disclosure, which may be implemented on the same device as the image editing method, or may be applied to a device different from the image editing method, such as a terminal device, a server, or other processing devices, or implemented by a processor calling a computer readable instruction stored in a memory.
The training method may be performed before step S11 provided in the above disclosed embodiment, and in a possible implementation, as shown in fig. 2, the training method may include:
in step S21, training data is acquired.
And step S22, training the initial coding model according to the training data to obtain the coding model.
And step S23, training the initial transformation relation between the first coded data and the second coded data according to the training data and the coding model to obtain the transformation relation.
The training data in the above-described embodiments may be any training image that can be input to the image generation model, and is not limited in the embodiments of the present disclosure, and the number and the acquisition mode of the training data may also be flexibly selected according to the actual situation, and is not limited in the embodiments of the present disclosure. The initial coding model and the initial transformation relation may also be flexibly selected according to the actual situation, which is not limited in the embodiments of the present disclosure.
According to the training method, under the training condition, the coding model and the transformation relation between the first coding data and the second coding data can be trained in sequence according to the training data. Although the second image may be generated by the image generation model during the image editing process, in one possible implementation manner in the embodiment of the present disclosure, the image generation model may not be trained, and an existing image generation model may be directly selected for implementation. In a possible implementation manner, the image generation model may also be tried to be trained according to requirements, and specifically, how to train the image generation model may be flexibly selected according to an actual situation, which is not limited in the embodiment of the present disclosure.
By obtaining the training data, training the initial coding model according to the training data to obtain the coding model, and training the transformation relation between the first coding data and the second coding data according to the coding model and the training data obtained by training, the obtained coding model and the transformation relation between the first coding data and the second coding data can be further perfected through the process, so that the second coding data obtained through the trained coding model and the transformation relation are positioned in the target region space as much as possible, and the truth of the second image generated based on the second coding data is improved.
Because the first coded data of the first image can be obtained through the coding model, if the first coded data can be located in the target region space as much as possible, the second coded data obtained through transformation according to the first coded data can be located in the target region space, and therefore the finally generated second image has higher reality. Therefore, in one possible implementation, a coding model that can improve the likelihood of the first coded data in the target region space can be trained through step S22.
The specific implementation manner of step S22 can be flexibly selected according to practical situations, and is not limited to the following disclosed embodiments. In one possible implementation, step S22 may include:
step S221, the training data are sequentially processed through the initial coding model and the image generation model, and a first reconstruction result corresponding to each training data is obtained.
Step S222, each first reconstruction result is used as an input of an identification model, so as to obtain a first identification result corresponding to each training data through the identification model.
Step S223, training the initial coding model according to the training data, the first reconstruction result, and the first identification result, to obtain a coding model.
In the above-described disclosed embodiment, the identification model may be a neural network model for determining whether the input image is a real image, and the implementation manner thereof is not limited. In one possible implementation, the authentication model may be an authenticator included in the GAN network.
Based on the above process, FIG. 3 showsAccording to an embodiment of the present disclosure, a training process diagram of a coding model is shown, and in the embodiment of the present disclosure, training data x may be usedrealAs input data, the data is input into an initial coding model, which can process the training data to obtain output coded data zencThen the output coded data zencInputting the data into an image generation model to obtain a first reconstruction result x corresponding to the training datarecAnd then inputting the first reconstruction result into an identification model to obtain a first image identification result corresponding to the training data. The training data, the first reconstruction result x, may then be utilizedrecAnd a first discrimination result to train the coding model.
Due to the fast inference speed of the coding model, the related art also includes a process of training the coding model in the GAN network. FIG. 4 is a schematic diagram illustrating a process of training a coding model in the related art according to an embodiment of the disclosure, and it can be seen from the diagram that, in the related art, the process of training the coding model may be to sample an image of training data to obtain a set of sampled coding data z of the training datasamAnd inputting the sampled encoded data to an image generation model to obtain a composite image x corresponding to the sampled encoded datasym=G(zsam) The initial coding model may then synthesize image xsymAs input to the initial coding model, an output result E (G (z) is obtainedsym) And then encodes data z based on the output result and the samplessamTo implement the training of the initial coding model. In one example, the form of implementation of the loss function for the training process shown in fig. 4 may be:
Figure BDA0002378092590000111
in the above disclosed embodiment, LEA loss function representing a coding model, | · |. non-woven phosphor2Error distance, theta, representing lossEParameters representing the coding model, as can be seen from equation (3), although the image generation model may beThe method is used for bridging the space where the image is located and the hidden space where the encoded data corresponding to the image is located, but through the training process shown in fig. 4 and in combination with the loss function of the formula (3), it is difficult to ensure that the space where the encoded data output by the encoding model is located is consistent with the space where the input data of the image generation model is located, that is, the first encoded data output by the encoding model obtained through training may not be consistent with the target region space.
By comparing the training processes of the coding models in fig. 3 and fig. 4, it can be seen that, in the process of training the coding model by the training method provided in the embodiment of the present disclosure, the encoded data output by the coding model may be input into the image generation model to reconstruct the image of the training data without being supervised by the image generation model, so that the encoded data z output by the coding model can be effectively enabledencThe space is located in the same space as the encoded data input to the image generation model. In addition, in the process of training the coding model by the training method provided in the embodiment of the present disclosure, the coding model may be trained by real training data, compared with sampling coding data z obtained by sampling the training data in the related artsamFor training, in the technical scheme provided by the embodiment of the application, the coding model obtained after training is more suitable for the editing process of the real image. In addition, the training method provided in the embodiment of the present disclosure also trains the coding model by using the output result of the identification model, and by using this training mode, more information of the training data can be obtained as much as possible from two components, namely, the generator and the identifier of the GAN network, so that the trained coding model can have higher degree of reality when applied to image reconstruction or editing.
Specifically, how to train the coding model by using the training data, the first reconstruction result and the first identification result through step S223, the specific training process can still be selected according to the actual situation, and is not limited to the following disclosed embodiments. In one possible implementation, step S223 may include:
step S2231, obtaining a first error loss of the initial coding model according to the training data and the first reconstruction result.
Step S2232, obtaining a second error loss of the initial coding model according to the characteristics of the training data and the characteristics of the first reconstruction result.
Step S2233, processing the first identification result to obtain a first processing result, so as to obtain a third error loss of the initial coding model according to the first processing result.
Step S2234, training the initial coding model according to one or more of the first error loss, the second error loss, and the third error loss to obtain a coding model.
In the above-described disclosed embodiment, the "first", "second", and "third" of the first error loss, the second error loss, and the third error loss are only used to distinguish the error losses calculated by using different contents, and other numbers of the error losses appearing later are also the same, and are not described again.
It can be seen from the above disclosure that, when the coding model is trained according to the training data, the first reconstruction result, and the first identification result, the obtained error loss (i.e., loss function) for performing parameter reverse adjustment on the coding model may include error losses of three aspects, namely, an error loss between the training data and the first reconstruction result output by the image generation model, an error loss between a feature of the training data and a feature of the first reconstruction result output by the image generation model, and an error loss obtained by processing according to the identification result. In practical application, when the error loss of the coding model is obtained, the error losses of the three aspects can be all counted, or the error loss of one or some aspects can be flexibly counted according to practical situations, and can be flexibly selected according to the practical situations.
In one example, all three losses mentioned above may be used in the training of the coding model at the same time, and the loss function for training the coding model may be expressed as:
Figure BDA0002378092590000131
wherein x isrealFor training data, xrecAs a first reconstruction result, VGG (x)real) Feature extracted for training data, VGG (x)rec) Features extracted for the first reconstruction result, D (x)rec) For the first discrimination, PrecA probability distribution of the first reconstruction result is represented,
Figure BDA0002378092590000132
for statistical results obtained from the probability distribution of the first discrimination result, lambda1And λ2The error weight is a hyperparameter used for balancing the loss weight between different error losses, and the value of the loss weight can be flexibly determined according to the actual situation, in one example, lambda in the training process1And λ2Can be 0.00005 and 0.1 respectively.
As can be seen from comparison between the formula (4) and the formula (3), the coding model training method implemented in the steps S2231 to S2234 does not train the coding model directly through the training data with labels, but feeds back the coding data obtained by the training data through the coding model to the GAN model to obtain guidance on the image layer, so that the trained coding model has more information and accuracy.
Further, it can be seen from the above disclosure that, in the process of training the coding model, the identification result output by the identification model may need to be utilized, and at this time, the accuracy of the identification model may also affect the accuracy of the coding model, and the accuracy of the coding model may be improved by training the identification model. Therefore, in one possible implementation, step S22 may further include:
step S224, each training data is used as an input of the identification model, so as to obtain a second identification result corresponding to each training data through the identification model.
And step S225, training the identification model according to the first identification result and the second identification result.
In the above-described disclosed embodiment, the "first" and "second" in the second authentication result and the first authentication result are only used to distinguish different authentication results obtained by using different data as input through the authentication model, and the same is true for the numbers of other pairs of authentication results appearing later, and are not described again.
The second identification result corresponding to each training data is obtained through the identification model by taking each training data as input, and then the identification model is trained according to the first identification result and the second identification result.
Specifically, how to train the identification model through step S225 according to the first identification result and the second identification result may be selected according to practical situations, and is not limited to the following disclosed embodiments. In one possible implementation, step S225 may include:
in step S2251, the first identification result is processed to obtain a second processing result, so as to obtain a fourth error loss of the identification model according to the second processing result.
In step S2252, the second identification result is processed to obtain a third processing result, so as to obtain a fifth error loss of the identification model according to the third processing result.
Step S2253, the gradient values of the second identification result are processed to obtain a fourth processing result, so as to obtain a sixth error loss of the identification model according to the fourth processing result.
In step S2254, the discriminant model is trained based on one or more of the fourth error loss, the fifth error loss, and the sixth error loss.
In the above-described embodiments, the processing performed on the identification result and the gradient value of the identification result may also be to count the probability distribution of the corresponding identification result or gradient value of the identification result in space, and in a possible implementation manner, the obtained processing result may be a statistical result. It can be seen from the above disclosure that, when the identification model is trained according to the first identification result and the second identification result, the obtained error loss (i.e., loss function) for performing parameter reverse adjustment on the identification model may include three error losses, namely, a statistical error loss of the first identification result, a statistical error loss of the second identification result, and an error loss obtained by performing statistics according to the gradient value of the second identification result.
In one example, all three losses mentioned above can be used in the training of the identification model at the same time, and the loss function for training the identification model can be expressed as:
Figure BDA0002378092590000141
wherein L isDLoss function, theta, representing an authentication modelDParameters representing the authentication model, D (x)real) As a second authentication result, PrealRepresenting the probability distribution of the training data,
Figure BDA0002378092590000142
to obtain statistical results for the probability distribution of the second authentication result,
Figure BDA0002378092590000143
for the gradient regularization of the second discrimination result, γ is a regularized hyper-parameter, and the value may be flexibly determined according to the actual situation, and in one possible implementation, the value of γ may be 10.
The fourth error loss, the fifth error loss and the sixth error loss of the identification model are obtained by respectively counting the gradient values of the first identification result, the second identification result and the second identification result, and the identification model is trained according to one or more of the error losses.
It can be seen from the above disclosure that, when the coding model is trained, the identification result of the identification model is needed, and when the identification model is trained, the training result of the coding model is needed, so that the training of the coding model and the training of the identification model can be regarded as an integral training process. In a possible implementation manner, in the overall training process, the training sequence between the coding model and the identification model may be that the coding model is trained first, then the identification model is further trained by using the trained coding model, and then the coding model is continuously trained by using the trained identification model, and the above steps are repeated for many times, so that the training of the coding model and the identification model can be terminated when the preset condition is reached. The preset condition may be set according to an actual situation, in a possible implementation manner, an output error of the coding model or the identification model may be not greater than a certain threshold as the preset condition, in a possible implementation manner, a training frequency of the coding model and the identification model may reach a preset threshold as the preset condition, when a requirement on accuracy of the coding model and the identification model is high, the threshold of the training frequency may be set to be larger, and when a requirement on accuracy of the coding model and the identification model may be appropriately lowered in consideration of increasing training efficiency and editing efficiency, the threshold of the training frequency may be set to be smaller. Through the co-training of the coding model and the identification model, the trained coding model can enable the first coded data to have the characteristic of being located in the target area space under the condition of outputting the first coded data, and therefore the authenticity of the whole image editing process is improved.
Likewise, the implementation manner of step S23 can also be flexibly selected according to practical situations, and is not limited to the following disclosed embodiments. In one possible implementation, step S23 may include:
step S231, using the training data as an input of the coding model, so as to obtain a first coding result corresponding to each training data through the coding model.
Step S232, respectively taking each first encoding result as an input of an image generation model, so as to obtain a second reconstruction result corresponding to each training data through the image generation model.
Step S233, each second reconstruction result is used as an input of the identification model, so as to obtain a third identification result corresponding to each training data through the identification model.
And step S234, taking each second reconstruction result as the input of the coding model respectively, so as to obtain a second coding result corresponding to each training data through the coding model.
And step S235, training the initial transformation relation according to the training data, the first coding result, the second reconstruction result, the third identification result and the second coding result to obtain the transformation relation.
The process of generating the image by utilizing the coded data of the image through the image generation model and the process of acquiring the coded data of the image through the coding model are just two reverse mapping processes. In practical applications, the process of generating an image by an image generation model may be regarded as mapping encoded data of the image at a distribution layer of the image, and the process of acquiring encoded data of the image by an encoding model may be regarded as reconstructing data of the image at an instance layer, and more specifically, obtaining a complete reverse mapping of the image by only one encoding model with limited capability is difficult to achieve. Therefore, if the encoded data outputted by the encoding model is directly inputted to the image generation model, it is difficult to reconstruct the original image. Therefore, even if the coding model obtained by training through the above-mentioned disclosed embodiment can constrain the first encoded data obtained through the coding model so that it follows the distribution of the target region space, in the disclosed embodiment, the encoded data can be further constrained through transformation, so that the reconstructed image generated from the transformed second encoded data has higher reality.
For the above reasons, in the embodiment of the present disclosure, when the transformation relationship between the first encoded data and the second encoded data is trained, in addition to training based on the training data, the training data may be passed through a trained coding model to obtain a first coding result, and the transformation relationship is trained by using the first coding result, so that the transformation relationship obtained by training is more accurate.
Therefore, the second reconstruction result in the embodiment of the present disclosure is a reconstruction result obtained by passing the training image through the trained coding model and then the image generation model, and the first reconstruction result is a reconstruction result obtained by passing the training image through the untrained coding model, such as the initial coding model, and then the image generation model, where "second" is different from "first" only for distinguishing the type of the model through which it passes.
Fig. 5 shows a schematic diagram of a principle of training a transformation relation between first encoding data and second encoding data in a possible implementation manner, and as can be seen from the diagram, in the related art shown in fig. 5, the first encoding data may be optimized through a gradient descent algorithm, so that the optimized encoding data may be located in a target region space as much as possible.
In consideration of the above implementation, since the encoded data may be freely optimized by the noise gradient in the process of being optimized by the gradient descent algorithm, this will make the situation in fig. 5 easy to occur, i.e. the optimized encoded data may not be in the target region space. In order to reduce the probability of the above situation as much as possible, fig. 6 shows a schematic diagram of a transformation relationship between first encoded data and second encoded data, in another possible implementation manner, and as can be seen from the diagram, in the embodiment of the present disclosure, a coding model is added to the transformation relationship as a regularization constraint when the encoded data is optimized, that is, a second encoding result is obtained from a second reconstruction result through the coding model, and the transformation relationship is trained by using the second encoding result. As shown in fig. 6, with the above-described disclosed embodiments, the coding model may constrain the alignment of the transformed second encoded data with the distribution in the target region space. Furthermore, reconstruction loss and perception loss can be introduced into the training process of the transformation relation by adding the constraint of the identification model, namely, the second reconstruction result is used for obtaining a third identification result through the identification model, and the transformation relation is trained by using the third identification result, so that the quality of image reconstruction of the second coded data obtained based on the transformation relation is improved.
Specifically, how to train the transformation relationship through step S235 according to the training data, the first encoding result, the second reconstruction result, the third identification result, and the second encoding result, the specific training process may still be selected according to the actual situation, and is not limited to the following disclosed embodiments. In one possible implementation, step S235 may include:
step S2351, acquiring a seventh error loss of the initial transformation relation according to the training data and the second reconstruction result.
Step S2352, acquiring an eighth error loss of the initial transformation relation according to the characteristics of the training data and the characteristics of the second reconstruction result.
Step S2353, acquiring a ninth error loss of the initial transformation relation according to the third discrimination result.
Step S2354, a tenth error loss of the initial transformation relation is obtained according to the first encoding result and the second encoding result.
Step S2355, training the initial transformation relationship according to one or more of the seventh error loss, the eighth error loss, the ninth error loss and the tenth error loss to obtain the transformation relationship.
It can be seen from the above disclosure that, when the transformation relationship is trained according to the training data, the first encoding result, the second reconstruction result, the third identification result, and the second encoding result, the obtained error loss (i.e., loss function) for performing parameter inverse adjustment on the transformation relationship may include error losses in four aspects, namely, an error loss between the training data and the second reconstruction result output by the image generation model, an error loss between a feature of the training data and a feature of the second reconstruction result output by the image generation model, an error loss obtained according to the third identification result, and an error loss between the first encoding result and the second encoding result. In practical application, under the condition of obtaining the error loss of the transformation relation, the error losses of the four aspects can be all counted, or the error loss of one or some aspects can be flexibly counted according to the practical situation, and the error loss can be flexibly selected according to the practical situation. In one possible implementation, an error loss between the training data and the second reconstruction result output by the image generation model and an error loss between the feature of the training data and the feature of the second impact result output by the image generation model may be taken as the error loss of the transformation relationship. In a possible implementation manner, in order to introduce the constraint of the coding model into the training process of the transformation relation, the error loss of the transformation relation may be obtained by introducing the error loss between the first coding result and the second coding result under the condition of introducing the two error losses. In a possible implementation manner, in order to further improve the accuracy of the transformation relation training result, an error loss obtained according to the third discrimination result may also be introduced, that is, the error losses of the four aspects are all recorded into the error loss of the transformation relation at the same time.
In one example, all of the above-mentioned four losses can be used in the training of the transformation relationship at the same time, and the loss function for training the transformation relationship at this time can be expressed as:
Figure BDA0002378092590000171
wherein z isinvFor the second encoded data, x is the training data, z is the first encoded result, G (z) is the second reconstructed result, VGG (x) is the feature extracted for the training data, VGG (G (z)) is the feature extracted for the second reconstructed result, D (G (z)) is the third discrimination result, E (G (z)) is the second encoded result, λ (z)) is the first encoded result, and G (z) is the second encoded result3、λ4And λ5For over-parameters, for balancing losses between different error lossesThe weight and the value of the weight can be flexibly determined according to the actual situation, and in a possible implementation mode, because the first coded data obtained by the coding model can be used as the input of the transformation relation, the transformation relation can be trained according to the hyper-parameter lambda in the coding model1And λ2To correspondingly adjust the hyper-parameter lambda in the transformation relation3、λ4And λ5The value of (c). In one possible implementation, the hyper-parameter λ3Can be the same as the hyper-parameter lambda in the above-disclosed embodiment1Same, but over-parameter λ4Can be the hyperparameter lambda in the above-disclosed embodiment21/2, indicating that the loss of the image generation model is less than the loss of the coding model during the training process. It should be noted that when the hyper-parameter λ4May cause the image generation model to generate a larger reconstruction deviation when the value of (a) is too large, and in a possible implementation manner, the hyper-parameter λ may be adjusted5To balance the reconstruction accuracy and operability of the encoded data. In one example, λ in the training process3、λ4And λ5Can be 0.00005, 0.01 and 2, respectively.
Through the process, the coding model can be used as the regularization constraint, and the second coded data and the target region space are effectively aligned, so that the possibility that the second coded data exceeds the target region space is further reduced, and the authenticity of the image editing result based on the transformation relation is improved.
By the image editing method and the training method provided by the above-mentioned disclosed embodiments, the effect of image editing can be effectively improved, and when the method is applied to the GAN model, the method can effectively help the GAN model to be applied in multiple fields, for example, data augmentation and the like can be performed by using the second encoded data through the method provided by the above-mentioned disclosed embodiments. Fig. 7 is a schematic diagram illustrating an effect of an image edited by an image editing method according to an embodiment of the present disclosure, and as can be seen from the diagram, by the image editing method provided in the embodiment of the present disclosure, not only an image can be reconstructed, but also various editing operations on the image, such as changing an age, adding glasses, adding a label, and the like, can be implemented, and in the editing process, an editing result can be made more real and natural.
Fig. 8 shows a block diagram of an image editing apparatus according to an embodiment of the present disclosure. The image editing device can be a terminal device, a server or other processing devices. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.
In some possible implementations, the image editing apparatus may be implemented by a processor calling computer readable instructions stored in a memory.
As shown in fig. 8, the image editing apparatus 30 may include:
an image acquisition module 31 for acquiring a first image;
the encoding module 32 is configured to obtain first encoded data of the first image according to the first image;
a transformation module 33, configured to transform the first encoded data to obtain second encoded data of the first image;
and an image generating module 34, configured to generate a second image according to the second encoded data.
In one possible implementation, the generation module is configured to: according to the editing type of the first image, combining the second coded data to obtain data to be edited of the first image; and taking the data to be edited as the input of the image generation model to generate a second image through the image generation model.
In one possible implementation, the transformation module is configured to: acquiring a transformation relation between first coded data and second coded data; and transforming the first coded data according to the transformation relation to obtain second coded data of the first image.
FIG. 9 shows a block diagram of a training apparatus according to an embodiment of the present disclosure. The training device may be a terminal device, a server or other processing device, etc. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.
In some possible implementations, the training apparatus may be implemented by a processor calling computer readable instructions stored in a memory.
As shown in fig. 9, the training device 40 may include:
a training data acquisition module 41, configured to acquire training data;
the coding model training module 42 is configured to train the initial coding model according to the training data to obtain a coding model;
and a transformation relation training module 43, configured to train an initial transformation relation between the first encoded data and the second encoded data according to the training data and the encoding model, so as to obtain a transformation relation.
In one possible implementation, the coding model training module is configured to: training data are sequentially subjected to an initial coding model and an image generation model, and a first reconstruction result corresponding to each training data is obtained; respectively taking each first reconstruction result as the input of an identification model, so as to obtain a first identification result corresponding to each training data through the identification model; and training the initial coding model according to the training data, the first reconstruction result and the first identification result to obtain the coding model.
In one possible implementation, the coding model training module is further configured to: acquiring a first error loss of the initial coding model according to the training data and the first reconstruction result; acquiring a second error loss of the initial coding model according to the characteristics of the training data and the characteristics of the first reconstruction result; processing the first identification result to obtain a first processing result so as to obtain a third error loss of the initial coding model according to the first processing result; and training the initial coding model according to one or more of the first error loss, the second error loss and the third error loss to obtain the coding model.
In one possible implementation, the coding model training module is further configured to: respectively taking each training data as the input of an identification model to obtain a second identification result corresponding to each training data through the identification model; and training the identification model according to the first identification result and the second identification result.
In one possible implementation, the coding model training module is further configured to: processing the first identification result to obtain a second processing result so as to obtain a fourth error loss of the identification model according to the second processing result; processing the second identification result to obtain a third processing result so as to obtain a fifth error loss of the identification model according to the third processing result; processing the gradient value of the second identification result to obtain a fourth processing result so as to obtain a sixth error loss of the identification model according to the fourth processing result; training the discriminative model based on one or more of the fourth error loss, the fifth error loss, and the sixth error loss.
In one possible implementation, the transformation relation training module is configured to: taking the training data as the input of a coding model to obtain a first coding result corresponding to each training data through the coding model; respectively taking each first coding result as the input of an image generation model, and obtaining a second reconstruction result corresponding to each training data through the image generation model; respectively taking each second reconstruction result as the input of an identification model, so as to obtain a third identification result corresponding to each training data through the identification model; respectively taking each second reconstruction result as the input of a coding model, so as to obtain a second coding result corresponding to each training data through the coding model; and training the initial transformation relation according to the training data, the first coding result, the second reconstruction result, the third identification result and the second coding result to obtain the transformation relation.
In one possible implementation, the transformation relation training module is further configured to: acquiring a seventh error loss of the initial transformation relation according to the training data and the second reconstruction result; acquiring an eighth error loss of the initial transformation relation according to the characteristics of the training data and the characteristics of the second reconstruction result; acquiring a ninth error loss of the initial transformation relation according to the third identification result; acquiring a tenth error loss of the initial transformation relation according to the first coding result and the second coding result; and training the initial transformation relation according to one or more of the seventh error loss, the eighth error loss, the ninth error loss and the tenth error loss to obtain the transformation relation.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.
An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 10 is a block diagram of an electronic device 800 according to an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.
Referring to fig. 10, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.
Fig. 11 is a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 11, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (20)

1. An image editing method, comprising:
acquiring a first image;
obtaining first coded data of the first image according to the first image;
transforming the first coded data to obtain second coded data of the first image;
and generating a second image according to the second coded data.
2. The method of claim 1, wherein generating a second image from the second encoded data comprises:
according to the editing type of the first image, combining the second coded data to obtain data to be edited of the first image;
and taking the data to be edited as an input of an image generation model so as to generate the second image through the image generation model.
3. The method of claim 1 or 2, wherein transforming the first encoded data to obtain second encoded data for the first image comprises:
acquiring a transformation relation between the first coded data and the second coded data;
and transforming the first coded data according to the transformation relation to obtain second coded data of the first image.
4. A training method based on the image editing method according to claim 1, wherein the obtaining of the first encoded data of the first image from the first image comprises:
taking the first image as an input of a coding model to obtain the first coded data through the coding model;
before obtaining first encoded data for the first image from the first image, the method further comprises:
acquiring training data;
training an initial coding model according to the training data to obtain the coding model;
and training the initial transformation relation between the first coded data and the second coded data according to the training data and the coding model to obtain the transformation relation.
5. The method of claim 4, wherein the training an initial coding model according to the training data to obtain the coding model comprises:
sequentially passing the training data through an initial coding model and the image generation model to obtain a first reconstruction result corresponding to each training data;
respectively taking each first reconstruction result as the input of an identification model, so as to obtain a first identification result corresponding to each training data through the identification model;
and training the initial coding model according to the training data, the first reconstruction result and the first identification result to obtain the coding model.
6. The method of claim 5, wherein the training the initial coding model according to the training data, the first reconstruction result and the first identification result to obtain the coding model comprises:
acquiring a first error loss of the initial coding model according to the training data and the first reconstruction result;
acquiring a second error loss of the initial coding model according to the characteristics of the training data and the characteristics of the first reconstruction result;
processing the first discrimination result to obtain a first processing result, so as to obtain a third error loss of the initial coding model according to the first processing result;
and training the initial coding model according to one or more of the first error loss, the second error loss and the third error loss to obtain the coding model.
7. The method according to claim 5 or 6, wherein the training an initial coding model according to the training data to obtain the coding model further comprises:
respectively taking each training data as the input of the identification model, so as to obtain a second identification result corresponding to each training data through the identification model;
and training the identification model according to the first identification result and the second identification result.
8. The method of claim 7, wherein training the authentication model based on the first authentication result and the second authentication result comprises:
processing the first identification result to obtain a second processing result so as to obtain a fourth error loss of the identification model according to the second processing result;
processing the second identification result to obtain a third processing result, so as to obtain a fifth error loss of the identification model according to the third processing result;
processing the gradient value of the second identification result to obtain a fourth processing result, so as to obtain a sixth error loss of the identification model according to the fourth processing result;
training the discriminative model based on one or more of the fourth error loss, fifth error loss, and the sixth error loss.
9. The method according to any one of claims 4 to 8, wherein training an initial transformation relationship between the first encoded data and the second encoded data according to the training data and the coding model to obtain the transformation relationship comprises:
taking the training data as the input of the coding model, so as to obtain a first coding result corresponding to each training data through the coding model;
respectively taking each first coding result as the input of the image generation model, so as to obtain a second reconstruction result corresponding to each training data through the image generation model;
respectively taking each second reconstruction result as the input of the identification model, so as to obtain a third identification result corresponding to each training data through the identification model;
respectively taking each second reconstruction result as the input of the coding model, so as to obtain a second coding result corresponding to each training data through the coding model;
and training the initial transformation relation according to the training data, the first coding result, the second reconstruction result, the third identification result and the second coding result to obtain the transformation relation.
10. The method of claim 9, wherein the training the initial transformation relationship according to the training data, the first encoding result, the second reconstruction result, the third discrimination result, and the second encoding result to obtain the transformation relationship comprises:
acquiring a seventh error loss of the initial transformation relation according to the training data and the second reconstruction result;
acquiring an eighth error loss of the initial transformation relation according to the characteristics of the training data and the characteristics of the second reconstruction result;
obtaining a ninth error loss of the initial transformation relation according to the third discrimination result;
acquiring a tenth error loss of the initial transformation relation according to the first coding result and the second coding result;
and training the initial transformation relation according to one or more of the seventh error loss, the eighth error loss, the ninth error loss and the tenth error loss to obtain the transformation relation.
11. An image editing apparatus characterized by comprising:
the image acquisition module is used for acquiring a first image;
the encoding module is used for obtaining first encoded data of the first image according to the first image;
the transformation module is used for transforming the first coded data to obtain second coded data of the first image;
and the image generation module is used for generating a second image according to the second coded data.
12. The apparatus of claim 11, wherein the generating module is configured to:
according to the editing type of the first image, combining the second coded data to obtain data to be edited of the first image;
and taking the data to be edited as an input of an image generation model so as to generate the second image through the image generation model.
13. The apparatus of claim 11 or 12, wherein the transformation module is configured to:
acquiring a transformation relation between the first coded data and the second coded data;
and transforming the first coded data according to the transformation relation to obtain second coded data of the first image.
14. An exercise apparatus based on the image editing apparatus as claimed in claim 11, wherein the encoding module is configured to:
taking the first image as an input of a coding model to obtain the first coded data through the coding model;
the training apparatus includes:
the training data acquisition module is used for acquiring training data;
the coding model training module is used for training an initial coding model according to the training data to obtain the coding model;
and the transformation relation training module is used for training the initial transformation relation between the first coded data and the second coded data according to the training data and the coding model to obtain the transformation relation.
15. The apparatus of claim 14, wherein the coding model training module is configured to:
sequentially passing the training data through an initial coding model and the image generation model to obtain a first reconstruction result corresponding to each training data;
respectively taking each first reconstruction result as the input of an identification model, so as to obtain a first identification result corresponding to each training data through the identification model;
and training the initial coding model according to the training data, the first reconstruction result and the first identification result to obtain the coding model.
16. The apparatus of claim 15, wherein the coding model training module is further configured to:
acquiring a first error loss of the initial coding model according to the training data and the first reconstruction result;
acquiring a second error loss of the initial coding model according to the characteristics of the training data and the characteristics of the first reconstruction result;
processing the first discrimination result to obtain a first processing result, so as to obtain a third error loss of the initial coding model according to the first processing result;
and training the initial coding model according to one or more of the first error loss, the second error loss and the third error loss to obtain the coding model.
17. The apparatus of claim 15 or 16, wherein the coding model training module is further configured to:
respectively taking each training data as the input of the identification model, so as to obtain a second identification result corresponding to each training data through the identification model;
and training the identification model according to the first identification result and the second identification result.
18. The apparatus of any one of claims 14 to 17, wherein the transformation relation training module is configured to:
taking the training data as the input of the coding model, so as to obtain a first coding result corresponding to each training data through the coding model;
respectively taking each first coding result as the input of the image generation model, so as to obtain a second reconstruction result corresponding to each training data through the image generation model;
respectively taking each second reconstruction result as the input of the identification model, so as to obtain a third identification result corresponding to each training data through the identification model;
respectively taking each second reconstruction result as the input of the coding model, so as to obtain a second coding result corresponding to each training data through the coding model;
and training the initial transformation relation according to the training data, the first coding result, the second reconstruction result, the third identification result and the second coding result to obtain the transformation relation.
19. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 10.
20. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 10.
CN202010074293.3A 2020-01-22 2020-01-22 Image editing and training method and device, electronic equipment and storage medium Pending CN111311483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010074293.3A CN111311483A (en) 2020-01-22 2020-01-22 Image editing and training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010074293.3A CN111311483A (en) 2020-01-22 2020-01-22 Image editing and training method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111311483A true CN111311483A (en) 2020-06-19

Family

ID=71158199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010074293.3A Pending CN111311483A (en) 2020-01-22 2020-01-22 Image editing and training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111311483A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207515A1 (en) * 2022-04-29 2023-11-02 北京字跳网络技术有限公司 Image generation method and device, and storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377535A (en) * 2018-10-24 2019-02-22 电子科技大学 Facial attribute automatic edition system, method, storage medium and terminal
CN109801344A (en) * 2019-01-03 2019-05-24 深圳壹账通智能科技有限公司 A kind of image processing method and device, storage medium, electronic equipment
US20190188882A1 (en) * 2017-12-20 2019-06-20 Samsung Electronics Co., Ltd. Method and apparatus for processing image interaction
CN109920016A (en) * 2019-03-18 2019-06-21 北京市商汤科技开发有限公司 Image generating method and device, electronic equipment and storage medium
CN110139109A (en) * 2018-02-08 2019-08-16 北京三星通信技术研究有限公司 The coding method of image and corresponding terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188882A1 (en) * 2017-12-20 2019-06-20 Samsung Electronics Co., Ltd. Method and apparatus for processing image interaction
CN110139109A (en) * 2018-02-08 2019-08-16 北京三星通信技术研究有限公司 The coding method of image and corresponding terminal
CN109377535A (en) * 2018-10-24 2019-02-22 电子科技大学 Facial attribute automatic edition system, method, storage medium and terminal
CN109801344A (en) * 2019-01-03 2019-05-24 深圳壹账通智能科技有限公司 A kind of image processing method and device, storage medium, electronic equipment
CN109920016A (en) * 2019-03-18 2019-06-21 北京市商汤科技开发有限公司 Image generating method and device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207515A1 (en) * 2022-04-29 2023-11-02 北京字跳网络技术有限公司 Image generation method and device, and storage medium and program product

Similar Documents

Publication Publication Date Title
US11430427B2 (en) Method and electronic device for separating mixed sound signal
TWI777162B (en) Image processing method and apparatus, electronic device and computer-readable storage medium
CN110659640B (en) Text sequence recognition method and device, electronic equipment and storage medium
CN111310616B (en) Image processing method and device, electronic equipment and storage medium
CN109658401B (en) Image processing method and device, electronic equipment and storage medium
CN109871883B (en) Neural network training method and device, electronic equipment and storage medium
CN110517185B (en) Image processing method, device, electronic equipment and storage medium
CN112258381B (en) Model training method, image processing method, device, equipment and storage medium
CN110909815B (en) Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment
CN110675409A (en) Image processing method and device, electronic equipment and storage medium
TW202027033A (en) Image processing method and apparatus, electronic device and storage medium
TW202209254A (en) Image segmentation method, electronic equipment and computer-readable storage medium thereof
CN109920016B (en) Image generation method and device, electronic equipment and storage medium
CN114429611B (en) Video synthesis method and device, electronic equipment and storage medium
WO2023165082A1 (en) Image preview method and apparatus, electronic device, storage medium, computer program, and product thereof
CN113691833A (en) Virtual anchor face changing method and device, electronic equipment and storage medium
CN112597944A (en) Key point detection method and device, electronic equipment and storage medium
CN111833242A (en) Face transformation method and device, electronic equipment and computer readable medium
CN111259967A (en) Image classification and neural network training method, device, equipment and storage medium
CN115273831A (en) Voice conversion model training method, voice conversion method and device
CN110647508B (en) Data compression method, data decompression method, device and electronic equipment
CN109447258B (en) Neural network model optimization method and device, electronic device and storage medium
CN114333804A (en) Audio classification identification method and device, electronic equipment and storage medium
CN114446318A (en) Audio data separation method and device, electronic equipment and storage medium
CN112613447A (en) Key point detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200619