CN115311152A - Image processing method, image processing apparatus, electronic device, and storage medium - Google Patents

Image processing method, image processing apparatus, electronic device, and storage medium Download PDF

Info

Publication number
CN115311152A
CN115311152A CN202210161789.3A CN202210161789A CN115311152A CN 115311152 A CN115311152 A CN 115311152A CN 202210161789 A CN202210161789 A CN 202210161789A CN 115311152 A CN115311152 A CN 115311152A
Authority
CN
China
Prior art keywords
image
processed
feature map
feature extraction
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210161789.3A
Other languages
Chinese (zh)
Inventor
朱飞达
朱俊伟
邰颖
汪铖杰
李季檩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210161789.3A priority Critical patent/CN115311152A/en
Publication of CN115311152A publication Critical patent/CN115311152A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an image processing method and device, electronic equipment and a storage medium, and relates to the technical fields of artificial intelligence, cloud technology, audio and video and the like. The method comprises the following steps: acquiring an image to be processed of a target object; the method comprises the following steps of carrying out the following processing on an image to be processed by calling a trained image processing model to obtain a target image corresponding to the image to be processed, wherein the image quality of the target image is higher than that of the image to be processed: performing down-sampling feature extraction on an image to be processed to obtain an image feature map corresponding to the image to be processed; determining a style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram; and performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image. According to the method and the device, the style characteristic diagram corresponding to the image to be processed can be obtained based on the image characteristic diagram of the image to be processed, and the style characteristic diagram is fused into the up-sampling process, so that the quality of the obtained target image can be improved.

Description

Image processing method, image processing apparatus, electronic device, and storage medium
Technical Field
The application relates to the technical fields of artificial intelligence, cloud technology, audio and video and the like, in particular to an image processing method and device, electronic equipment and a storage medium.
Background
With the rapid development of scientific technology and the increase of living demands of people, images or videos (which are also images in nature) have appeared in various living scenes. For example, more and more young people like to record or share their lives by means of images (photos) or videos. However, during the process of shooting or network transmission, the image may be unclear and the quality of the image is low due to various reasons, such as inaccurate focusing, too much noise, too high picture compression ratio, etc., and especially, the user's feeling may be affected for the image of a person (such as the face image of the user). How to improve the image quality by the image processing technology is also called one of the important issues studied by researchers.
At present, although there are many different image quality improvement methods in the related art, the processing efficiency is not ideal enough and needs to be improved.
Disclosure of Invention
The embodiment of the application provides an image processing method and device, electronic equipment and a storage medium, and can solve the problem that in the prior art, processing efficiency is not ideal enough when image quality is improved. The technical scheme is as follows:
according to an aspect of an embodiment of the present application, there is provided an image processing method including:
acquiring an image to be processed of a target object;
the method comprises the following steps of calling a trained image processing model to perform the following processing on an image to be processed to obtain a target image corresponding to the image to be processed, wherein the image quality of the target image is higher than that of the image to be processed:
performing down-sampling feature extraction on an image to be processed to obtain an image feature map corresponding to the image to be processed;
determining a style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram;
and performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image.
Optionally, the down-sampling feature extraction is performed on the image to be processed to obtain an image feature map corresponding to the image to be processed, and the method includes:
performing down-sampling feature extraction on an image to be processed to obtain an image feature map of at least one scale corresponding to the image to be processed;
determining a style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram, wherein the style characteristic diagram comprises the following steps:
for each scale, determining a style characteristic diagram corresponding to the scale according to the image characteristic diagram corresponding to the scale;
performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image, wherein the method comprises the following steps:
and performing at least one-time up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module to obtain a target image, wherein in the process of each up-sampling feature extraction, the style feature map of the corresponding scale is fused into the feature map of the corresponding scale obtained by each up-sampling feature extraction, the target feature map is the image feature map of the minimum scale in the image feature map of at least one scale, and each up-sampling feature extraction module corresponds to the style feature map of one scale.
Optionally, the image feature map of at least one scale includes image feature maps of at least two scales, and the at least one upsampling feature extraction module includes at least two upsampling feature extraction modules connected in series;
performing at least one up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module to obtain a target image, comprising:
for each up-sampling feature extraction module, taking the style feature map corresponding to the up-sampling feature extraction module as an adjustment weight, adjusting the network parameters of the up-sampling feature extraction module, performing up-sampling processing on the input feature map corresponding to the up-sampling feature extraction module through the adjusted network parameters to obtain an output feature map of the up-sampling feature extraction module, and taking the last output feature map as a target image;
the input of the first up-sampling feature extraction module is a target feature map, and the input of the modules except the first up-sampling feature extraction module is an output feature map of the last up-sampling feature extraction module of the up-sampling feature extraction module.
Optionally, the image processing model includes an image coding network and an image generating network, the image coding network includes a feature extraction module and a feature mapping module, and the following processing is performed on the image to be processed by calling the trained image processing model, so as to obtain a target image corresponding to the image to be processed, including:
inputting an image to be processed into a feature extraction module, and performing down-sampling feature extraction on the image to be processed through the feature extraction module to obtain an image feature map of at least one scale corresponding to the image to be processed;
respectively inputting the image feature maps of all scales into a feature mapping module to obtain style feature maps corresponding to all scales;
and performing at least one-time up-sampling feature extraction on the target feature map through an image generation network based on the style feature map and the target feature map corresponding to each scale to obtain a target image corresponding to the image to be processed.
Optionally, the image processing model is obtained by training in the following manner:
acquiring first training data and an initial neural network model, wherein the training data comprises a plurality of image pairs, each image pair comprises a first sample image and a second sample image corresponding to the same image content, the image quality of the second sample image in each image pair is higher than that of the first sample image, and the initial neural network model comprises an initial image processing model and an initial discriminator;
training the initial neural network model based on first training data until the value of a total loss function corresponding to the model meets a training end condition to obtain a trained neural network model, and taking the trained image processing model as an image processing model;
the input of the initial image processing model is each first sample image, the output is a prediction target image corresponding to each first sample image, the input of the initial discriminator is a prediction target image and a second sample image corresponding to each first sample image, and the output is a discrimination result of the prediction target image and the second sample image corresponding to each first sample image;
the total loss function comprises a first loss function and a second loss function, the value of the first loss function represents the difference between the prediction target image and the second sample image corresponding to each first sample image, and the value of the second loss function represents the accuracy degree of the discrimination result of the prediction target image and the second sample image corresponding to each first sample image.
Optionally, the image generation network and the initial discriminator of the initial image processing model are obtained by pre-training with second training data, and when the initial neural network model is trained based on the first training data, the learning rate of the image coding network of the initial image processing model is greater than the learning rate of the image generation network and the initial discriminator of the initial image processing model.
Optionally, the obtaining of the first training data includes:
acquiring each second sample image, and performing image degradation processing on each second sample image to obtain each processed image;
and taking the processed image corresponding to each second sample image as the first sample image corresponding to the second sample image, and obtaining first training data based on each second sample image and the corresponding processed image.
Optionally, the image degradation processing includes at least one of adding a blur processing, a processing of reducing image resolution, a processing of adding noise, or an image format compression processing; the first training data comprises images obtained by processing in at least two image degradation processing modes.
According to another aspect of embodiments of the present application, there is provided an image processing apparatus including:
the image acquisition module is used for acquiring an image to be processed of the target object;
the image processing module is used for performing down-sampling feature extraction on the image to be processed by calling the trained image processing model to obtain an image feature map corresponding to the image to be processed, and determining a style feature map corresponding to the image to be processed according to the image feature map; performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image; wherein the image quality of the target image is higher than that of the image to be processed.
Optionally, when the image processing module performs downsampling feature extraction on the image to be processed to obtain an image feature map corresponding to the image to be processed, the image processing module is specifically configured to:
performing down-sampling feature extraction on an image to be processed to obtain an image feature map of at least one scale corresponding to the image to be processed;
when determining the style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram, the image processing module is specifically configured to:
for each scale, determining a style characteristic diagram corresponding to the scale according to the image characteristic diagram corresponding to the scale;
the image processing module performs up-sampling feature extraction on the image feature map based on the style feature map, and is specifically configured to:
and performing at least one-time up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module to obtain a target image, wherein in the process of each up-sampling feature extraction, the style feature map of the corresponding scale is fused into the feature map of the corresponding scale obtained by each up-sampling feature extraction, the target feature map is the image feature map of the minimum scale in the image feature map of at least one scale, and each up-sampling feature extraction module corresponds to the style feature map of one scale.
Optionally, the image feature map of at least one scale includes image feature maps of at least two scales, and the at least one upsampling feature extraction module includes at least two upsampling feature extraction modules connected in series;
the image processing module is used for performing at least one up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module, and when the target image is obtained, the image processing module is specifically used for:
for each up-sampling feature extraction module, taking the style feature map corresponding to the up-sampling feature extraction module as an adjustment weight, adjusting the network parameters of the up-sampling feature extraction module, performing up-sampling processing on the input feature map corresponding to the up-sampling feature extraction module through the adjusted network parameters to obtain an output feature map of the up-sampling feature extraction module, and taking the last output feature map as a target image;
the input of the first up-sampling feature extraction module is a target feature map, and the input of the modules except the first up-sampling feature extraction module is an output feature map of the last up-sampling feature extraction module of the up-sampling feature extraction module.
Optionally, the image processing model includes an image coding network and an image generating network, the image coding network includes a feature extraction module and a feature mapping module, and the image processing module is specifically configured to, when the trained image processing model is called to perform the following processing on the image to be processed to obtain a target image corresponding to the image to be processed:
inputting an image to be processed into a feature extraction module, and performing down-sampling feature extraction on the image to be processed through the feature extraction module to obtain an image feature map of at least one scale corresponding to the image to be processed;
respectively inputting the image feature maps of all scales into a feature mapping module to obtain style feature maps corresponding to all scales;
and performing at least one-time up-sampling feature extraction on the target feature map through an image generation network based on the style feature map and the target feature map corresponding to each scale to obtain a target image corresponding to the image to be processed.
Optionally, the image processing model is obtained by training the model training module in the following manner:
acquiring first training data and an initial neural network model, wherein the training data comprises a plurality of image pairs, each image pair comprises a first sample image and a second sample image corresponding to the same image content, the image quality of the second sample image in each image pair is higher than that of the first sample image, and the initial neural network model comprises an initial image processing model and an initial discriminator;
training the initial neural network model based on first training data until the value of a total loss function corresponding to the model meets a training end condition to obtain a trained neural network model, and taking the trained image processing model as an image processing model;
the input of the initial image processing model is each first sample image, the output is a prediction target image corresponding to each first sample image, the input of the initial discriminator is a prediction target image and a second sample image corresponding to each first sample image, and the output is a discrimination result of the prediction target image and the second sample image corresponding to each first sample image;
the total loss function comprises a first loss function and a second loss function, the value of the first loss function represents the difference between the prediction target image and the second sample image corresponding to each first sample image, and the value of the second loss function represents the accuracy degree of the discrimination result of the prediction target image and the second sample image corresponding to each first sample image.
Optionally, the image generation network and the initial discriminator of the initial image processing model are obtained by pre-training with second training data, and when the initial neural network model is trained based on the first training data, the learning rate of the image coding network of the initial image processing model is greater than the learning rates of the image generation network and the initial discriminator of the initial image processing model.
Optionally, when the model training module acquires the first training data, the model training module is specifically configured to:
acquiring each second sample image, and performing image degradation processing on each second sample image to obtain each processed image;
and taking the processed image corresponding to each second sample image as the first sample image corresponding to the second sample image, and obtaining first training data based on each second sample image and the corresponding processed image.
Optionally, the image degradation processing includes at least one of adding a blur processing, a processing of reducing image resolution, a processing of adding noise, or an image format compression processing; the first training data comprises images obtained by processing in at least two image degradation processing modes.
According to another aspect of embodiments of the present application, there is provided an electronic device including: memory, processor and computer program stored on the memory, characterized in that the processor executes the computer program to implement any of the steps of the image processing method described above.
According to still another aspect of embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement any one of the steps of the image processing method.
According to an aspect of embodiments of the present application, there is provided a computer program product, which when executed by a processor implements any one of the steps of the image processing method described above.
The beneficial effects of the scheme provided by the embodiment of the application are as follows:
in the embodiment of the application, when an image to be processed is processed based on an image processing model, a style characteristic diagram corresponding to the image to be processed is obtained based on an image characteristic diagram when the image characteristic diagram of the image to be processed is obtained, and the style characteristic diagram is fused into an up-sampling process when a high-quality target image is obtained by up-sampling the image characteristic diagram, so that the obtained target image can better accord with the original style of the image to be processed, the obtained target image still has more real style characteristics while the quality is improved, and the actual application requirements can be better met.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic system architecture diagram of an image processing system to which an image processing method according to an embodiment of the present disclosure is applied;
fig. 2a is a schematic diagram of an image to be processed according to an embodiment of the present application;
FIG. 2b is a schematic diagram of a target image according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a GAN network according to an embodiment of the present application;
fig. 5 is a schematic diagram of a styleGAN v2 network according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an image processing model according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, such as "a and/or B" or "at least one of a or B," can be implemented as "a," or as "B," or as "a and B.
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
Optionally, the image processing method provided in the embodiment of the present application may be implemented based on an Artificial Intelligence (AI) technology, for example, after the image to be processed is obtained, the image to be processed may be processed based on a trained image processing model, so as to obtain a target image with higher quality. The artificial intelligence technology is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and obtain the best result by using the knowledge.
The method provided by the embodiment of the application can be specifically realized based on a Computer Vision technology (Computer Vision, CV) in an artificial intelligence technology, wherein the Computer Vision is a science for researching how to enable a machine to see, and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire information from images or multidimensional data. The computer vision technology generally includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, intelligent transportation and other technologies, and also includes common biometric identification technologies such as face recognition and fingerprint recognition.
The image processing model referred to in the embodiments of the present application may be obtained based on machine learning training. Machine Learning (ML) is a study on how a computer simulates or implements human Learning behavior to acquire new knowledge or skills, and reorganizes an existing knowledge structure to continuously improve its performance. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
Optionally, the data processing related to the embodiment of the present application may be implemented based on a cloud technology, for example, for data calculation related to a training process of a model, cloud calculation may be adopted. Cloud computing (cloud computing) is a computing model that distributes computing tasks over a resource pool of numerous computers, enabling various application systems to acquire computing power, storage space, and information services as needed, and the network that provides the resources is called the "cloud". Resources in the "cloud" appear to the user as if they are infinitely expandable and can be acquired at any time, used on demand, expanded at any time, and paid for use.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps and the like in different embodiments is not repeated.
It should be noted that the image to be processed in the embodiment of the present application may be any image that needs to be processed, that is, the embodiment of the present application is not limited to a specific form of the target object in the image to be processed, and alternatively, the image to be processed may be an object image, a person image, an image of a certain part of a person, such as a face image, or a landscape image.
In an alternative embodiment of the present application, the related data such as the user information (e.g. the face image of the user) needs to obtain user permission or consent when the above embodiment of the present application is applied to a specific product or technology, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region. That is, if data related to the user is involved in the embodiment of the present application, the data needs to be obtained by the approval of the user and in compliance with relevant laws, regulations and standards of countries and regions.
The image processing method provided by the present application may be executed by any electronic device, and may include, but is not limited to, a user terminal or a server. Optionally, the image processing method provided in this embodiment of the present application may be implemented as an independent application (e.g., image processing software) or as a functional module/plug-in of an existing application, and by applying the image processing method provided in this embodiment of the present application to an application, when a user needs to edit an image, the user terminal may implement repairing of the image to be processed by executing a computer program corresponding to the functional module/plug-in to obtain a target image, or after the user terminal obtains the image to be processed selected by the user through a client of the application, the user terminal may send the image to a server of the application, and the server implements repairing of the image to be processed by executing the computer program corresponding to the functional module/plug-in and returns the repaired target image to the user terminal.
In this embodiment, what terminal device the user terminal specifically is not limited in this embodiment, and may include, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, a wearable electronic device, an AR/VR device, and the like. The server may be a cloud server or a physical server, may be a server, or may be a server cluster.
The following describes an image processing method provided by the present application with reference to the drawings and optional embodiments in the embodiments of the present application.
Fig. 1 is a schematic diagram of a system architecture of an image processing system to which the image processing method provided in the embodiment of the present invention is applicable, as shown in fig. 1, the system may include a terminal device 10 (i.e., a user terminal) and a server 20, where the terminal device 10 and the server 20 perform data transmission through a network, and a trained image processing model may be deployed in the server 20. After acquiring the image to be processed, the terminal device 10 sends the image to be processed to the server 20, where the image to be processed is an image with low image quality, such as a blurred image shown in fig. 2 a. Further, after receiving the image to be processed, the server 20 may invoke the trained image processing model to process the image to be processed, so as to obtain a target image corresponding to the image to be processed, where the image quality of the obtained target image (as shown in fig. 2 b) is improved compared with that of the image to be processed, and the obtained target image better conforms to the original style of the image to be processed, and has more real style characteristics. Further, the server 20 returns the target image to the terminal device 10, and the terminal device 10 presents the target image to the user.
The following describes an alternative embodiment of the image processing method provided by the present application, taking a server as an example of an execution subject of the method, based on the image processing system shown in fig. 1. As shown in fig. 3, an embodiment of the present application provides an image processing method, which may include:
step S201, an image to be processed of the target object is acquired.
The target object may refer to any object such as a person, an animal, a plant, and a building, and the type of the target object is not limited in the embodiments of the present application. The image to be processed may be any image whose quality needs to be improved, may be an image acquired in real time by an image acquisition device, and may also be a frame image extracted from a video, which is not limited in the embodiment of the present application. Optionally, the image to be processed may be a face image, and for convenience of description, in some example descriptions below, the face image is taken as an example for description.
And S202, carrying out the processing from the step S301 to the step S303 on the image to be processed by calling the trained image processing model to obtain a target image corresponding to the image to be processed, wherein the image quality of the target image is higher than that of the image to be processed.
The image quality may be characterized by an image index, for example, by one or more indexes such as resolution of the image or sharpness of the image. Optionally, the image quality of the target image is higher than the image quality of the image to be processed, which may mean that at least one image index of the target image is better than the image to be processed, for example, the definition of the target image may be higher than the image quality of the image to be processed.
Next, steps S301 to S303 for processing the image to be processed by the trained image processing model will be described.
And S301, performing down-sampling feature extraction on the image to be processed to obtain an image feature map corresponding to the image to be processed.
Step S302, determining a style characteristic diagram corresponding to an image to be processed according to the image characteristic diagram;
optionally, downsampling feature extraction is performed on the obtained image to be processed, at this time, an image feature map corresponding to the image to be processed may be obtained, and based on the obtained image feature map, a style feature map corresponding to the image to be processed is determined. For example, when a person takes an image, if the shooting device adopts different shooting parameters or shoots in different shooting environments, the obtained image may have higher definition of some images and lower definition of some images, and the image feature maps of the images are different at this time, and further, the obtained image feature maps may be extracted into different style feature maps to reflect the blurring degree of the image. The image to be processed may be a face image, and when the image to be processed is the face image, the style of the image to be processed may be understood as information that may affect visual effects such as face skin color in the image. Correspondingly, the style characteristic diagram can be understood as a characteristic vector of the style influence factor obtained based on the image characteristic diagram of the image to be processed, namely, a mathematical representation of the influence factor.
And step S303, performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image.
Optionally, after the image feature map corresponding to the image to be processed is obtained, the image feature map may be subjected to upsampling feature extraction for upsampling feature extraction, and in order to make the finally obtained target image more conform to the original style of the image to be processed, the target image has more real style characteristics, at this time, the image feature map may be subjected to upsampling feature extraction according to the determined style feature map, so as to obtain the target image. The size of the target image may be the same as or different from that of the image to be processed, and the embodiment of the present application is not limited.
In the embodiment of the application, when an image to be processed is processed based on an image processing model, a style characteristic diagram corresponding to the image to be processed is obtained based on an image characteristic diagram when the image characteristic diagram of the image to be processed is obtained, and the style characteristic diagram is fused into an up-sampling process when a high-quality target image is obtained by up-sampling the image characteristic diagram, so that the obtained target image can better accord with the original style of the image to be processed, the obtained target image still has more real style characteristics while the quality is improved, and the actual application requirements can be better met.
The embodiment of the present application provides a possible implementation manner, which is to perform downsampling feature extraction on an image to be processed to obtain an image feature map corresponding to the image to be processed, and the implementation manner includes:
performing down-sampling feature extraction on an image to be processed to obtain an image feature map of at least one scale corresponding to the image to be processed;
determining a style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram, wherein the style characteristic diagram comprises the following steps:
for each scale, determining a style characteristic diagram corresponding to the scale according to the image characteristic diagram corresponding to the scale;
performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image, wherein the method comprises the following steps:
and performing at least one-time up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module to obtain a target image, wherein in the process of each up-sampling feature extraction, the style feature map of the corresponding scale is fused into the feature map of the corresponding scale obtained by each up-sampling feature extraction, the target feature map is the image feature map of the minimum scale in the image feature map of at least one scale, and each up-sampling feature extraction module corresponds to the style feature map of one scale.
Optionally, when the downsampling feature extraction is performed on the image to be processed, at least one downsampling feature extraction may be performed on the image to be processed, at this time, an image feature map of at least one scale corresponding to the image to be processed is obtained, and when an image feature map corresponding to one scale is obtained, the style feature map corresponding to the scale may be determined according to the image feature map corresponding to the scale. For example, when the image feature map corresponding to the a scale is obtained, the style feature map corresponding to the a scale may be determined from the image feature map corresponding to the a scale.
Further, an image feature map with a minimum scale in image feature maps with at least one scale obtained during the down-sampling feature extraction may be used as a target feature map, and at least one up-sampling feature extraction module performs at least one up-sampling feature extraction on the target feature map, where each up-sampling feature extraction module corresponds to a style feature map with one scale, and the corresponding scale may refer to a scale of an input feature map of the up-sampling feature extraction module, or may refer to a scale of an output feature map of the up-sampling feature extraction module, and this embodiment of the present application is not limited, for example, it is assumed that an up-sampling feature extraction module may extract an image feature map with 4 (a size of the feature map) × 4 to an image feature map with 8 × 8, and at this time, the scale corresponding to the up-sampling feature extraction module may be 4 × 4, or 8. It should be noted that, the meaning of the scale corresponding to each upsampling feature extraction module should be the same, for example, when the scale corresponding to an upsampling feature extraction module refers to the scale of the input feature map of the upsampling feature extraction module, the scales corresponding to all the upsampling feature extraction modules should also be the scales of the input feature map.
Further, when the target feature map is subjected to at least one upsampling feature extraction process by at least one upsampling feature extraction module, in each upsampling feature extraction process, the style feature map of the corresponding scale can be fused into the feature map of the corresponding scale obtained by each upsampling feature extraction. For example, assuming that when an up-sampling feature module performs up-sampling feature extraction once, the scale corresponding to the up-sampling feature module is a, at this time, the style feature map corresponding to the a scale may be fused to the image feature map obtained by the up-sampling feature module.
The embodiment of the application provides a possible implementation manner, the image feature map of at least one scale comprises image feature maps of at least two scales, and the at least one up-sampling feature extraction module comprises at least two cascaded up-sampling feature extraction modules;
at least one up-sampling feature extraction is carried out on the target feature graph through at least one up-sampling feature extraction module to obtain a target image, and the method comprises the following steps:
for each up-sampling feature extraction module, taking the style feature map corresponding to the up-sampling feature extraction module as an adjustment weight, adjusting the network parameters of the up-sampling feature extraction module, performing up-sampling processing on the input feature map corresponding to the up-sampling feature extraction module through the adjusted network parameters to obtain an output feature map of the up-sampling feature extraction module, and taking the last output feature map as a target image;
the input of the first up-sampling feature extraction module is a target feature map, and the input of the modules except the first up-sampling feature extraction module is an output feature map of the last up-sampling feature extraction module of the up-sampling feature extraction module.
Optionally, in this embodiment of the present application, downsampling feature extraction may be performed on an image to be processed at least twice, at this time, image feature maps of at least two scales may be obtained, and at least two upsampling feature extraction modules that are cascaded may be included for upsampling feature extraction. Further, when the target feature map is subjected to at least one up-sampling feature extraction by at least one up-sampling feature extraction module to obtain a target image, the target feature map may be input to a first up-sampling feature extraction module, for the first up-sampling feature extraction module, the style feature map corresponding to the module may be used as an adjustment weight, the network parameter of the up-sampling feature extraction module is adjusted, then the input target feature map is subjected to up-sampling processing by the adjusted network parameter to obtain an output feature map of the module, the output feature map of the module is input to a second up-sampling feature extraction module, and simultaneously, based on the same principle, the style feature map corresponding to the second up-sampling feature extraction module is used as an adjustment weight, the network parameter of the second up-sampling feature extraction module is adjusted, and then the input output feature map is subjected to up-sampling processing by the adjusted network parameter to obtain the output feature map of the module until the output feature map of the last up-sampling feature extraction module is obtained, and the output feature map is used as the target feature map.
In an example, assuming that 2 times of downsampling feature extraction is performed on an image to be processed, 16 × 16 image feature maps are downsampled to 8 × 8 image feature maps and 4 × 4 image feature maps respectively, and a total of 2 cascaded upsampling feature extraction modules are included, wherein a scale corresponding to each upsampling feature extraction module is the size of an input feature map; further, the style feature map corresponding to 8 × 8 scales can be determined based on 8 × 8 image feature maps, and the style feature map corresponding to 4 × 4 scales can be determined based on 4 × 4 image feature maps. Correspondingly, 4 × 4 image feature maps can be input into a first up-sampling feature extraction module as target feature maps, the network parameters of the up-sampling feature extraction module are adjusted based on the style feature maps corresponding to 4 × 4 scales as adjustment weights, the input 4 × 4 image feature maps are subjected to up-sampling processing through the adjusted network parameters to obtain 8 × 8 image feature maps, then the 8 × 8 image feature maps are input into a second up-sampling feature extraction module, the 8 × 8 image feature maps are subjected to up-sampling processing through the adjusted network parameters to obtain 16 × 16 image feature maps, and the 16 image feature maps are used as target images.
As an optional embodiment, in the embodiment of the present application, after performing downsampling feature extraction at least twice on an image to be processed to obtain image feature maps of at least two scales, a target feature map may be used as an initial feature map to be adjusted, and then a style feature map corresponding to the scale of the initial feature map to be adjusted is used as an adjustment weight to adjust the initial feature map to be adjusted, so as to obtain an adjusted feature map; further, the adjusted feature map may be subjected to up-sampling feature extraction to obtain a new feature map, at this time, the new feature map to be processed may be used as a new feature map to be processed, the new feature map to be processed may be adjusted according to a style feature map corresponding to a scale of the new feature map to be processed as an adjustment weight to obtain an adjusted feature map, the adjusted feature map may be subjected to up-sampling feature extraction once again to obtain an extracted feature map, the extracted feature map may be used as a new feature map to be processed, a process of adjusting the feature map of the corresponding scale obtained in the up-sampling feature extraction process using the style feature map corresponding to each scale as the adjustment weight to obtain a target image, and the up-sampling feature extraction may be performed based on the adjusted feature map, the extracted feature map may be used as a new feature map to be processed until the number of up-sampling feature extraction times is equal to the number of style feature maps, and the feature map obtained by the last up-sampling feature extraction may be used as a target image.
In an example, it is assumed that 3 times of downsampling feature extraction is performed on an image to be processed, and feature maps of three scales are obtained, namely, an image feature map of 16 × 16 (the size of the feature map), an image feature map of 8 × 8 and an image feature map of 4 × 4; further, the style feature map corresponding to 16 × 16 may be determined based on the feature map of 16 × 16, the style feature map corresponding to 8 × 8 may be determined based on the feature map of 8 × 8, and the style feature map corresponding to 4 × 4 may be determined based on the feature map of 4 × 4. Correspondingly, 4 × 4 image feature maps can be used as target feature maps, then the style feature maps corresponding to 4 × 4 are used as adjustment weights to adjust the 4 × 4 image feature maps to obtain adjusted feature maps, and the adjusted feature maps are subjected to up-sampling feature extraction to obtain 8 × 8 image feature maps; further, the style feature map corresponding to 8 × 8 may be used as an adjustment weight to adjust the image feature map of 8 × 8, so as to obtain an adjusted feature map, the adjusted feature map is subjected to up-sampling feature extraction, so as to obtain an image feature map of 16 × 16, the style feature map corresponding to 16 × 16 is used as an adjustment weight to adjust the image feature map of 16 × 16, so as to obtain an adjusted feature map, the adjusted feature map is subjected to up-sampling feature extraction, so as to obtain an image feature map of 32 × 32, and at this time, the image feature map of 32 × 32 is the target image.
The embodiment of the present application provides a possible implementation manner, where the image processing model includes an image coding network and an image generation network, the image coding network includes a feature extraction module and a feature mapping module, and the following processing is performed on the image to be processed by calling the trained image processing model to obtain a target image corresponding to the image to be processed, including:
inputting an image to be processed into a feature extraction module, and performing down-sampling feature extraction on the image to be processed through the feature extraction module to obtain an image feature map of at least one scale corresponding to the image to be processed;
respectively inputting the image feature maps of all scales into a feature mapping module to obtain style feature maps corresponding to all scales;
and performing at least one-time up-sampling feature extraction on the target feature map through an image generation network based on the style feature map and the target feature map corresponding to each scale to obtain a target image corresponding to the image to be processed.
Optionally, the method provided in this embodiment of the present application may be implemented by using an image processing model, where the image processing model includes an image coding network and an image generating network, the image coding network may include a feature extraction module and a feature mapping module, and the image generating network includes at least one upsampling feature extraction module, which is used to perform at least one upsampling feature extraction on an input feature map. Correspondingly, when a target image corresponding to the image to be processed is obtained based on the image processing model, the image to be processed may be input to the feature extraction module to perform at least one time of downsampling feature extraction, so as to obtain an image feature map of at least one scale corresponding to the image to be processed, then the image feature maps of the scales may be respectively input to the feature mapping module to obtain a style feature map corresponding to each scale, at this time, the style feature map corresponding to each scale and the target feature map may be input to the image generation network, at least one upsampling feature extraction module included in the image generation network may perform at least one time of upsampling feature extraction on the target feature map, and when an upsampling feature of each scale is extracted, the style feature map of the corresponding scale is used to adjust a network weight (such as a convolutional network weight) when the upsampling feature is extracted, at this time, the target feature map is subjected to at least one time of upsampling feature extraction to obtain the target image. Correspondingly, when the style characteristic diagrams are different, the image generation network can perform image quality enhancement on the input image to different degrees according to the different style characteristic diagrams.
The embodiment of the application provides a possible implementation mode, and the image processing model is obtained by training in the following mode:
acquiring first training data and an initial neural network model, wherein the training data comprises a plurality of image pairs, each image pair comprises a first sample image and a second sample image corresponding to the same image content, the image quality of the second sample image in each image pair is higher than that of the first sample image, and the initial neural network model comprises an initial image processing model and an initial discriminator;
training the initial neural network model based on first training data until the value of a total loss function corresponding to the model meets a training end condition to obtain a trained neural network model, and taking the trained image processing model as an image processing model;
the input of the initial image processing model is each first sample image, the output is a prediction target image corresponding to each first sample image, the input of the initial discriminator is a prediction target image and a second sample image corresponding to each first sample image, and the output is a discrimination result of the prediction target image and the second sample image corresponding to each first sample image;
the total loss function comprises a first loss function and a second loss function, the value of the first loss function represents the difference between the prediction target image and the second sample image corresponding to each first sample image, and the value of the second loss function represents the accuracy degree of the discrimination result of the prediction target image and the second sample image corresponding to each first sample image.
Optionally, the image processing model may be obtained by training an initial neural network model based on the obtained first training data. The first training data comprises a plurality of image pairs, each image pair comprises a first sample image and a second sample image corresponding to the first sample image, the image contents of the first sample image and the second sample image are the same, the image quality of the second sample image is higher than that of the first sample image, and the initial neural network model comprises an initial image processing model and an initial discriminator.
Further, when the initial neural network model is trained, each first sample image may be input to the initial neural network model to obtain a prediction target image corresponding to each first sample image, at this time, a first loss function may be determined based on the prediction target image corresponding to each first sample image and the first sample image, a value of the first loss function represents a difference between the prediction target image corresponding to each first sample image and the first sample image, and when a value of the first loss function is smaller, it indicates that a difference between the sample image and the corresponding prediction target image is smaller. Then, the prediction target image and the second sample image corresponding to each first sample image may be respectively input to the initial discriminator to obtain a discrimination result of each prediction target image and each second sample image, and a second loss function may be determined based on the discrimination result of the prediction target image and the second sample image corresponding to each first sample image, where a value of the second loss function represents an accuracy degree of the discrimination result of the prediction target image and the second sample image corresponding to each first sample image, and a smaller value of the second loss function indicates a more accurate discrimination result of the prediction target image and the second sample image. And the total loss function corresponding to the initial neural network model comprises a first loss function and a second loss function, whether the value of the total loss function meets the training end condition can be judged based on the values of the first loss function and the second loss function, if not, the network parameter of the initial image processing model can be adjusted to obtain an adjusted image processing model, then the adjusted image processing model is continuously trained based on first training data until the value of the corresponding total loss function meets the training condition, the obtained discriminator can discriminate the judged target object as true, and the generated target image can cheat the discriminator. The training condition may be set according to actual needs, which is not limited in the embodiment of the present application, for example, the value of the total loss function may be smaller than a set threshold. In the embodiment of the present application, the value of the first loss function is the generation loss, the value of the second loss function is understood as the recognition loss, and the training goal of the initial neural network model is to make the generation loss and the recognition loss as small as possible, that is, in the embodiment of the present application, the training of the model is constrained by the loss functions of two different dimensions, so that the performance of the image processing model can be better improved.
Optionally, the total loss function in the embodiment of the present application may be expressed as:
L=L GAN +L LPIPS
wherein L represents the total loss function, L LPIPS Representing a first loss function, L GAN Representing a second loss function. And specific use for the first loss functionWhat kind of loss function is not limited in the present application, and optionally, the first loss function may be represented by L1 loss.
Optionally, L corresponds to one image pair LPIPS Can be expressed as:
L LPIPS =|G(input)-GT| 1
wherein G (input) represents a prediction target image corresponding to the first sample image, GT represents a second sample image corresponding to the first sample image, | G (input) -GT 1 Represents the L1 loss between pair G (input) and GT.
Optionally, L GAN Can be expressed as:
Figure BDA0003515076370000201
wherein D (GT) represents the result of discrimination of the second sample image, G (input) represents the result of discrimination of the first sample image, E [ logD (GT)) ] represents the expectation of the result of discrimination of the second sample image, E [ log (1-D (G (input))) ] represents the expectation of the result of discrimination of the first sample image, and the loss corresponding to an image pair is the sum of differences in pixel values for all pixels at corresponding positions in the two images.
The embodiment of the application provides a possible implementation manner, the image generation network and the initial discriminator of the initial image processing model are obtained by adopting second training data for pre-training, and when the initial neural network model is trained based on the first training data, the learning rate of the image coding network of the initial image processing model is greater than that of the image generation network and the initial discriminator of the initial image processing model.
Optionally, in practical application, the structure of each part of the image processing model is not limited in the embodiment of the present application, and may be selected according to a requirement of practical application. Alternatively, as long as a high-definition portrait can be generated and put into the portrait restoration framework in a manner similar to encoding, the high-definition portrait can be used as an image generation network of the initial image processing model. For example, as shown in fig. 4, a GAN (generic adaptive Networks) network can generate a highly realistic and high-definition high-quality image, which can map a simple distribution (e.g., gaussian noise) to a complex distribution (i.e., an image distribution) similar to a real image distribution, where the generated network in the GAN network can be inserted into an initial image processing model as a generated network and a discriminator of the GAN network as an initial discriminator. In recent years, the quality of images generated by GAN Networks has been improved, for example, PGAN (Progressive GAN (Progressive adaptive network) and BigGAN (a GAN network), styleGAN (a GAN network) and StyleGAN v2 (StyleGAN version 2, a StyleGAN v1 network) can generate images with strong reality, and the generating Networks of PGAN, progressive GAN, bigGAN, styleGAN v1 and StyleGAN v2 can be used as the image generating Networks of the initial image processing models.
In an example, assuming that the generation network of the styleGAN v2 network is used as the image generation network of the initial image processing model, the training data for training the styleGAN v2 network is the second training data. When training the styleGAN v2 network, a schematic diagram of a network structure to be trained may be as shown in fig. 5, where the network structure may include a mapping network and a generating network of the styleGAN v2 network, and a discriminator to be pre-trained. In the styleGAN v2 network, random noise z can be mapped to hidden variable w (a vector) through the mapping network, a feature map with 4 × 4 as initial input of the network is generated, the resolution is changed to 2 times every time the network passes through one up-sampling module, the resolution is finally raised to 1024 × 1024 through 8 up-sampling modules in total, and in each up-sampling module, the hidden variable w modulates the input feature map in an AdaIN (adaptive instance normalization) mode, so that the appearance of the final generated image is influenced. The process of generating the final image from the noise z can be expressed as:
w=Mapping(z)
Image=Generator(w)
here, mapping (z) indicates that random noise z is mapped to a hidden variable w, and Image indicates that an Image is generated.
After training based on the second training data, the image generation network and the discriminator which satisfy the training end condition may be used as the image generation network and the initial discriminator of the initial neural network model, and the initial neural network model may be trained based on the first training data, wherein, since the image generation network and the initial discriminator are pre-trained, in order to improve the training efficiency, the image coding network of the initial image processing model, the image generation network of the initial image processing model, and the initial discriminator may be trained with different learning rates. For example, the learning rate of the image coding network of the initial image processing model can be set to be greater than the learning rates of the image generation network of the initial image processing model and the initial discriminator (for example, the ratio of the learning rate of the image coding network of the initial image processing model to the learning rates of the image generation network of the initial image processing model and the initial discriminator is set to be 100.
The embodiment of the present application provides a possible implementation manner, and acquiring first training data includes:
acquiring each second sample image, and performing image degradation processing on each second sample image to obtain each processed image;
and taking the processed image corresponding to each second sample image as the first sample image corresponding to the second sample image, and obtaining first training data based on each second sample image and the corresponding processed image.
Further, each second sample image may be acquired, and image degradation processing may be performed on each second sample image to obtain each processed image, and at this time, the processed image corresponding to each second sample image may be used as the first sample image corresponding to the second sample image, so as to obtain the first training data. It can be understood that the first sample image in the first training data includes an image processed by at least two image degradation processing manners, and the image processed by at least two image degradation processing manners may be obtained by performing different image degradation processing manners on one second sample image, or may be obtained by performing different image degradation processing manners on different second sample images.
The embodiment of the application provides a possible implementation manner, and the image degradation processing comprises at least one of adding fuzzy processing, reducing image resolution processing, adding noise processing or image format compression processing; the first training data comprises images obtained by processing in at least two image degradation processing modes.
Alternatively, the image degradation processing refers to processing for reducing image quality, and may specifically include at least one of adding blurring processing, reducing image resolution processing, adding noise processing, or image format compression processing. The adding of the fuzzy processing comprises modes of randomly adding Gaussian fuzzy, motion fuzzy and the like, wherein the standard deviation of the Gaussian fuzzy is randomly selected in a certain range, and the motion fuzzy comprises 38 kinds of self-defined fuzzy; the image resolution reduction processing refers to reducing the image resolution, the image quality can be reduced through a down-sampling processing mode, and the method is not limited in the application as to which one or more down-sampling modes are specifically sampled, for example, the image resolution can be reduced through randomly selecting modes such as biliner (bilinear) down-sampling, bicubic (bicubic) down-sampling, area (regional down-sampling) and the like; the noise adding process comprises Gaussian noise, poisson noise and the like, and the noise intensity can be randomly selected within a certain range; the image format compression processing can refer to JPEG (Joint photographic experts group) (an image file format) compression, the JPEG compression is that the image quality is reduced in the process of simulating and storing the picture, and the compression ratio can be randomly selected between 5% and 50%.
Optionally, the method provided by the embodiment of the present application may be applied to application scenarios such as old photo restoration and image sharpening, and for better understanding, the following detailed description is made in conjunction with specific application scenarios. In this example, the image to be processed is a face image to be restored (i.e., a low-definition image with a size of 32 × 32), and the restored face image (i.e., a high-definition image with a size of 32 × 32) can be obtained based on the image processing model. As shown in fig. 6, the image processing model includes an image coding network and an image generation network, the image coding network includes a feature extraction module and a feature mapping module, the feature extraction module is composed of a plurality of cascaded down-sampling modules (three are illustrated in this example), the feature mapping module refers to a fully connected layer, and the image generation network is composed of a plurality of cascaded up-sampling modules (three are illustrated in this example).
Further, the low-definition images can be input into a feature extraction module, three down-sampling feature extraction modules in the feature extraction module perform down-sampling on the low-definition images three times to obtain image feature maps of three scales, namely, 16 × 16 image features, 8 × 8 image feature maps and 4 × 4 image feature maps, and then the 16 × 16 image features, 8 × 8 image feature maps and 4 × 4 image feature maps are input into a full connection layer respectively to obtain style feature maps corresponding to the scales; further, 4 × 4 image feature maps (i.e., target feature maps) are input into a first upsampling feature extraction module in the image generation network, the network parameters of the upsampling feature extraction module are adjusted based on the style feature maps corresponding to 4 × 4 scales as adjustment weights, the input 4 × 4 image feature maps are upsampled through the adjusted network parameters to obtain 8 × 8 image feature maps, then the 8 × 8 image feature maps are input into a second upsampling feature extraction module, the 8 × 8 image feature maps are input into a third upsampling feature extraction module based on the 8 × 8 scale as adjustment weights, the network parameters of the upsampling feature extraction module are adjusted, the 8 × 8 image feature maps are upsampled through the adjusted network parameters to obtain 16 image feature maps, the 16 image feature maps are input into a third upsampling feature extraction module, the adjusted network parameters of the upsampling feature extraction module are adjusted based on the style feature maps corresponding to 16 scales as adjustment weights, the adjusted network parameters of the upsampling feature extraction module are adjusted, the adjusted network parameters of the upsampling feature extraction module are used for repairing the image of the image 16, and the adjusted network parameters of the upsampling feature maps are processed to obtain 32 of the image (i.e., the facial feature maps 16, and the facial image is the facial feature map of the repaired image).
An embodiment of the present application provides an apparatus, as shown in fig. 7, the image processing apparatus 70 may include: an image acquisition module 701 and an image processing module 702, wherein,
an image obtaining module 701, configured to obtain an image to be processed of a target object;
the image processing module 702 is configured to perform downsampling feature extraction on the image to be processed by calling the trained image processing model to obtain an image feature map corresponding to the image to be processed, and determine a style feature map corresponding to the image to be processed according to the image feature map; performing up-sampling feature extraction on the image feature map based on the style feature map to obtain a target image; wherein the image quality of the target image is higher than that of the image to be processed.
Optionally, when the image processing module performs downsampling feature extraction on the image to be processed to obtain an image feature map corresponding to the image to be processed, the image processing module is specifically configured to:
performing down-sampling feature extraction on an image to be processed to obtain an image feature map of at least one scale corresponding to the image to be processed;
when the image processing module determines the style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram, the image processing module is specifically configured to:
for each scale, determining a style characteristic diagram corresponding to the scale according to the image characteristic diagram corresponding to the scale;
the image processing module performs up-sampling feature extraction on the image feature map based on the style feature map, and is specifically configured to:
and performing at least one-time up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module to obtain a target image, wherein in the process of each up-sampling feature extraction, the stylish feature map of a corresponding scale is fused into the feature map of the corresponding scale obtained by each up-sampling feature extraction, the target feature map is the image feature map of the smallest scale in the image feature maps of at least one scale, and each up-sampling feature extraction module corresponds to the stylish feature map of one scale.
Optionally, the image feature map of at least one scale includes image feature maps of at least two scales, and the at least one upsampling feature extraction module includes at least two upsampling feature extraction modules connected in series;
the image processing module is used for performing at least one up-sampling feature extraction on the target feature map through at least one up-sampling feature extraction module, and when the target image is obtained, the image processing module is specifically used for:
for each up-sampling feature extraction module, taking the style feature map corresponding to the up-sampling feature extraction module as an adjustment weight, adjusting the network parameters of the up-sampling feature extraction module, performing up-sampling processing on the input feature map corresponding to the up-sampling feature extraction module through the adjusted network parameters to obtain an output feature map of the up-sampling feature extraction module, and taking the last output feature map as a target image;
the input of the first up-sampling feature extraction module is a target feature map, and the input of the modules except the first up-sampling feature extraction module is an output feature map of the last up-sampling feature extraction module of the up-sampling feature extraction module.
Optionally, the image processing model includes an image coding network and an image generating network, the image coding network includes a feature extraction module and a feature mapping module, and the image processing module is specifically configured to, when calling the trained image processing model to perform the following processing on the image to be processed to obtain a target image corresponding to the image to be processed:
inputting an image to be processed into a feature extraction module, and performing down-sampling feature extraction on the image to be processed through the feature extraction module to obtain an image feature map of at least one scale corresponding to the image to be processed;
respectively inputting the image feature maps of all scales into a feature mapping module to obtain style feature maps corresponding to all scales;
and performing at least one-time up-sampling feature extraction on the target feature map through an image generation network based on the style feature map and the target feature map corresponding to each scale to obtain a target image corresponding to the image to be processed.
Optionally, the image processing model is obtained by training the model training module in the following manner:
acquiring first training data and an initial neural network model, wherein the training data comprises a plurality of image pairs, each image pair comprises a first sample image and a second sample image corresponding to the same image content, the image quality of the second sample image in each image pair is higher than that of the first sample image, and the initial neural network model comprises an initial image processing model and an initial discriminator;
training the initial neural network model based on first training data until the value of a total loss function corresponding to the model meets a training end condition to obtain a trained neural network model, and taking the trained image processing model as an image processing model;
the input of the initial image processing model is each first sample image, the output is a prediction target image corresponding to each first sample image, the input of the initial discriminator is a prediction target image and a second sample image corresponding to each first sample image, and the output is a discrimination result of the prediction target image and the second sample image corresponding to each first sample image;
the total loss function comprises a first loss function and a second loss function, the value of the first loss function represents the difference between the prediction target image and the second sample image corresponding to each first sample image, and the value of the second loss function represents the accuracy degree of the discrimination result of the prediction target image and the second sample image corresponding to each first sample image.
Optionally, the image generation network and the initial discriminator of the initial image processing model are obtained by pre-training with second training data, and when the initial neural network model is trained based on the first training data, the learning rate of the image coding network of the initial image processing model is greater than the learning rate of the image generation network and the initial discriminator of the initial image processing model.
Optionally, when the model training module acquires the first training data, the model training module is specifically configured to:
acquiring each second sample image, and performing image degradation processing on each second sample image to obtain each processed image;
and taking the processed image corresponding to each second sample image as the first sample image corresponding to the second sample image, and obtaining first training data based on each second sample image and the corresponding processed image.
Optionally, the image degradation processing includes at least one of adding a blurring processing, a reducing image resolution processing, an adding a noise processing, or an image format compression processing; the first training data comprises images obtained by processing in at least two image degradation processing modes.
The apparatus of the embodiment of the present application may execute the method provided by the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus of the embodiments of the present application correspond to the steps in the method of the embodiments of the present application, and for the detailed functional description of the modules of the apparatus, reference may be specifically made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
An embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the steps of the image processing method.
In an alternative embodiment, an electronic device is provided, as shown in fig. 8, the electronic device 4000 shown in fig. 8 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. It should be noted that the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but that does not indicate only one bus or one type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 4001 to execute. The processor 4001 is used to execute computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program may implement the steps and corresponding contents of the foregoing method embodiments.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than described or illustrated herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as needed, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. In a scenario where execution times are different, an execution sequence of the sub-steps or the phases may be flexibly configured according to requirements, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the protection scope of the embodiments of the present application without departing from the technical idea of the present application.

Claims (12)

1. An image processing method, comprising:
acquiring an image to be processed of a target object;
the method comprises the following steps of calling a trained image processing model to perform the following processing on an image to be processed to obtain a target image corresponding to the image to be processed, wherein the image quality of the target image is higher than that of the image to be processed:
performing down-sampling feature extraction on the image to be processed to obtain an image feature map corresponding to the image to be processed;
determining a style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram;
and performing up-sampling feature extraction on the image feature map based on the style feature map to obtain the target image.
2. The method according to claim 1, wherein the performing downsampling feature extraction on the image to be processed to obtain an image feature map corresponding to the image to be processed comprises:
performing down-sampling feature extraction on the image to be processed to obtain an image feature map of at least one scale corresponding to the image to be processed;
the determining the style characteristic diagram corresponding to the image to be processed according to the image characteristic diagram comprises the following steps:
for each scale, determining a style characteristic diagram corresponding to the scale according to the image characteristic diagram corresponding to the scale;
the performing up-sampling feature extraction on the image feature map based on the style feature map to obtain the target image includes:
the method comprises the steps of performing at least one up-sampling feature extraction on a target feature map through at least one up-sampling feature extraction module to obtain the target image, wherein in the process of each up-sampling feature extraction, a style feature map of a corresponding scale is fused into a feature map of a corresponding scale obtained by each up-sampling feature extraction, the target feature map is an image feature map of the smallest scale in the image feature maps of at least one scale, and each up-sampling feature extraction module corresponds to a style feature map of one scale.
3. The method of claim 2, wherein the image feature map of at least one scale comprises image feature maps of at least two scales, and the at least one upsampling feature extraction module comprises a cascade of at least two upsampling feature extraction modules;
the at least one upsampling feature extraction module performs at least one upsampling feature extraction on the target feature map to obtain the target image, and the method comprises the following steps:
for each up-sampling feature extraction module, taking the style feature map corresponding to the up-sampling feature extraction module as an adjustment weight, adjusting the network parameters of the up-sampling feature extraction module, performing up-sampling processing on the input feature map corresponding to the up-sampling feature extraction module through the adjusted network parameters to obtain an output feature map of the up-sampling feature extraction module, and taking the last output feature map as a target image;
the input of the first up-sampling feature extraction module is the target feature map, and the input of the modules except the first up-sampling feature extraction module is the output feature map of the last up-sampling feature extraction module of the up-sampling feature extraction module.
4. The method according to any one of claims 1 to 3, wherein the image processing model comprises an image coding network and an image generation network, the image coding network comprises a feature extraction module and a feature mapping module, and the obtaining of the target image corresponding to the image to be processed by calling the trained image processing model to perform the following processing on the image to be processed comprises:
inputting the image to be processed into the feature extraction module, and performing down-sampling feature extraction on the image to be processed through the feature extraction module to obtain an image feature map of at least one scale corresponding to the image to be processed;
respectively inputting the image feature maps of all scales into the feature mapping module to obtain style feature maps corresponding to all scales;
and performing at least one-time up-sampling feature extraction on the target feature map through the image generation network based on the style feature map corresponding to each scale and the target feature map to obtain a target image corresponding to the image to be processed.
5. The method of claim 4, wherein the image processing model is trained by:
acquiring first training data and an initial neural network model, wherein the training data comprises a plurality of image pairs, each image pair comprises a first sample image and a second sample image corresponding to the same image content, the image quality of the second sample image in each image pair is higher than that of the first sample image, and the initial neural network model comprises an initial image processing model and an initial discriminator;
training the initial neural network model based on the first training data until the value of a total loss function corresponding to the model meets a training end condition to obtain a trained neural network model, and taking the trained image processing model as the image processing model;
the input of the initial image processing model is each first sample image, the output is a prediction target image corresponding to each first sample image, the input of the initial discriminator is a prediction target image and a second sample image corresponding to each first sample image, and the output is a discrimination result of the prediction target image and the second sample image corresponding to each first sample image;
the total loss function comprises a first loss function and a second loss function, the value of the first loss function represents the difference between the prediction target image and the second sample image corresponding to each first sample image, and the value of the second loss function represents the accuracy of the discrimination result of the prediction target image and the second sample image corresponding to each first sample image.
6. The method of claim 5, wherein the initial discriminators and the image generation networks of the initial image processing models are pre-trained using second training data, and wherein the learning rate of the image coding networks of the initial image processing models is greater than the learning rate of the initial discriminators and image generation networks of the initial image processing models when the initial neural network models are trained based on the first training data.
7. The method of claim 4, wherein the obtaining first training data comprises:
acquiring each second sample image, and performing image degradation processing on each second sample image to obtain each processed image;
and taking the processed image corresponding to each second sample image as the first sample image corresponding to the second sample image, and obtaining the first training data based on each second sample image and the corresponding processed image.
8. The method of claim 7, wherein the image degradation process comprises at least one of adding a blur process, a reduce image resolution process, an add noise process, or an image format compression process; the first training data comprises images obtained by processing in at least two image degradation processing modes.
9. An image processing apparatus characterized by comprising:
the image acquisition module is used for acquiring an image to be processed of the target object;
the image processing module is used for performing downsampling feature extraction on the image to be processed by calling a trained image processing model to obtain an image feature map corresponding to the image to be processed, and determining a style feature map corresponding to the image to be processed according to the image feature map; performing up-sampling feature extraction on the image feature map based on the style feature map to obtain the target image; wherein the image quality of the target image is higher than that of the image to be processed.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the method of any of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
12. A computer program product comprising a computer program, wherein the computer program when executed by a processor performs the steps of the method of any one of claims 1 to 8.
CN202210161789.3A 2022-02-22 2022-02-22 Image processing method, image processing apparatus, electronic device, and storage medium Pending CN115311152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210161789.3A CN115311152A (en) 2022-02-22 2022-02-22 Image processing method, image processing apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210161789.3A CN115311152A (en) 2022-02-22 2022-02-22 Image processing method, image processing apparatus, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
CN115311152A true CN115311152A (en) 2022-11-08

Family

ID=83855402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210161789.3A Pending CN115311152A (en) 2022-02-22 2022-02-22 Image processing method, image processing apparatus, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN115311152A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024125158A1 (en) * 2022-12-13 2024-06-20 腾讯科技(深圳)有限公司 Image processing method and apparatus, electronic device, storage medium, and program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024125158A1 (en) * 2022-12-13 2024-06-20 腾讯科技(深圳)有限公司 Image processing method and apparatus, electronic device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
CN111488865B (en) Image optimization method and device, computer storage medium and electronic equipment
KR20200140713A (en) Method and apparatus for training neural network model for enhancing image detail
CN112581370A (en) Training and reconstruction method of super-resolution reconstruction model of face image
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN111079764A (en) Low-illumination license plate image recognition method and device based on deep learning
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
Akimoto et al. Diverse plausible 360-degree image outpainting for efficient 3dcg background creation
CN117576264B (en) Image generation method, device, equipment and medium
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
CN115131218A (en) Image processing method, image processing device, computer readable medium and electronic equipment
Yuan et al. Single image dehazing via NIN-DehazeNet
CN112906721A (en) Image processing method, device, equipment and computer readable storage medium
CN105979283A (en) Video transcoding method and device
CN112509144A (en) Face image processing method and device, electronic equipment and storage medium
CN110570375B (en) Image processing method, device, electronic device and storage medium
CN115311152A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN113538254A (en) Image restoration method and device, electronic equipment and computer readable storage medium
CN116403064A (en) Picture processing method, model, basic block structure, device and medium
JP7479507B2 (en) Image processing method and device, computer device, and computer program
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium
CN110766153A (en) Neural network model training method and device and terminal equipment
CN114299105A (en) Image processing method, image processing device, computer equipment and storage medium
CN114596203A (en) Method and apparatus for generating images and for training image generation models
CN114663937A (en) Model training and image processing method, medium, device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination