CN117994173A - Repair network training method, image processing method, device and electronic equipment - Google Patents

Repair network training method, image processing method, device and electronic equipment Download PDF

Info

Publication number
CN117994173A
CN117994173A CN202410405497.9A CN202410405497A CN117994173A CN 117994173 A CN117994173 A CN 117994173A CN 202410405497 A CN202410405497 A CN 202410405497A CN 117994173 A CN117994173 A CN 117994173A
Authority
CN
China
Prior art keywords
image
repair
network
sample
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410405497.9A
Other languages
Chinese (zh)
Other versions
CN117994173B (en
Inventor
贺珂珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410405497.9A priority Critical patent/CN117994173B/en
Publication of CN117994173A publication Critical patent/CN117994173A/en
Application granted granted Critical
Publication of CN117994173B publication Critical patent/CN117994173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a repair network training method, an image processing device, electronic equipment, a storage medium and a program product; the method comprises the following steps: carrying out matting processing on a first image sample comprising a target component to obtain a first matting sample for matting the target component, training a reconstruction task aiming at the target component on the basis of the first matting sample on a first reconstruction network to obtain a trained first reconstruction network, carrying out repairing processing on a second image sample comprising the target component through the first reconstruction network to obtain a first repairing image, carrying out reconstructing processing on a second matting sample through the trained first reconstruction network to obtain a first reconstruction image, determining a first repairing loss corresponding to the first repairing network based on the first repairing image and the first reconstruction image, updating the first repairing network based on the first repairing loss, and obtaining the trained first repairing network. By the method and the device, the efficiency and the accuracy of the first repairing network repairing image can be improved.

Description

Repair network training method, image processing method, device and electronic equipment
Technical Field
The present application relates to artificial intelligence technology, and in particular, to a repair network training method, an image processing device, an electronic device, a computer readable storage medium, and a computer program product.
Background
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a comprehensive technology of computer science, and by researching the design principle and implementation method of various intelligent machines, the machines have the functions of sensing, reasoning and decision. Artificial intelligence technology is a comprehensive subject, and relates to a wide range of fields, such as natural language processing technology, machine learning/deep learning and other directions, and with the development of technology, the artificial intelligence technology will be applied in more fields and has an increasingly important value.
The method has more application scenes for the component restoration task of the face of the object, and is described by taking the example that the component is a tooth, so that the tooth restoration has more application scenes, and can be used for the post-processing restoration of the tooth beautifying task and the portrait generating task. In the tooth beauty scene, when the teeth are deformed, the teeth are black and have cracks, and the appearance is not attractive enough, the teeth of a user can be corrected to be uniform through the teeth restoration. In the portrait generation task, the problems of multilayer cracks, partial blurring and the like of teeth are easy to occur, and the portrait generation effect can be improved by accessing a dental restoration algorithm. In the related art, the tooth area is positioned by image segmentation, and the phenomenon of instability easily occurs in the image segmentation mode under a video scene, so that accurate restoration cannot be realized.
Disclosure of Invention
The embodiment of the application provides a repair network training method, an image processing device, electronic equipment, a computer readable storage medium and a computer program product, which can improve the efficiency and accuracy of repairing images of a first repair network.
The technical scheme of the embodiment of the application is realized as follows:
The embodiment of the application provides a repair network training method, which comprises the following steps:
carrying out matting processing on a first image sample comprising a target component to obtain a first matting sample for matting the target component;
Training a reconstruction task aiming at the target component on the first reconstruction network based on the first matting book to obtain a trained first reconstruction network;
Reconstructing a second matting book through the trained first reconstruction network to obtain a first reconstruction image, and repairing a second image sample comprising the target component through a first repairing network to obtain a first repairing image, wherein the second matting sample is obtained by matting the second image sample;
Determining a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image, and updating the first repair network based on the first repair loss to obtain a trained first repair network;
The trained first repair network is used for performing repair processing on an original image comprising the target component to obtain a target image after repairing the target component.
The embodiment of the application provides a repair network training device, which comprises:
The image matting processing module is used for performing matting processing on a first image sample comprising a target component to obtain a first matting sample with the target component scratched;
The first training module is used for training the reconstruction task of the target component on the first reconstruction network based on the first matting book to obtain a trained first reconstruction network;
The image restoration module is used for carrying out reconstruction processing on the second matting book through the trained first reconstruction network to obtain a first reconstruction image, and carrying out restoration processing on a second image sample comprising the target component through the first restoration network to obtain a first restoration image, wherein the second matting sample is obtained by carrying out matting processing on the second image sample;
The second training module is configured to determine a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image, and update the first repair network based on the first repair loss to obtain a trained first repair network, where the trained first repair network is configured to repair an original image including the target component to obtain a target image after repairing the target component.
In the above scheme, the first training module is further configured to perform blurring processing on the first image sample to obtain a first blurred sample; performing splicing treatment on the first matt sample and the first fuzzy sample to obtain a first spliced sample; and training the reconstruction task of the target component on the first reconstruction network based on the first spliced sample to obtain a trained first reconstruction network.
In the above scheme, the first training module is further configured to perform reconstruction processing on the first stitched sample through the initialized first reconstruction network, so as to obtain a second reconstructed image corresponding to the first stitched sample; determining a first reconstruction loss based on the first image sample and the second reconstruction image; updating the initialized first reconstruction network based on the first reconstruction loss to obtain the trained first reconstruction network.
In the above scheme, the image restoration module is further configured to perform blurring processing on the second image sample to obtain a second blurred sample; performing splicing treatment on the second matt sample and the second fuzzy sample to obtain a second spliced sample; and carrying out coding processing on the second spliced sample through the first coding network to obtain a first coding result, and carrying out decoding processing on the first coding result through the first decoding network to obtain the first reconstructed image.
In the above scheme, the image restoration module is further configured to perform restoration processing on the second image sample through the first restoration network, so as to obtain a predicted restoration area mask corresponding to the target component in the second image sample and a second restoration image corresponding to the second image sample; and synthesizing the second repair image and the second image sample based on the predicted repair area mask in the second image sample to obtain the first repair image.
In the above scheme, the second training module is further configured to obtain a label repair area mask, and determine repair area mask loss based on the predicted repair area mask and the label repair area mask corresponding to the target component in the second image sample; at least one of the following loss determination processes is performed: determining an overall reconstruction loss based on the first reconstructed image and the first repair image; determining an image feature level loss based on the first reconstructed image and the first repair image; determining a local reconstruction loss based on the first reconstructed image, the first repair image, and the label repair area mask; determining a loss of identity of the object based on the first reconstructed image and the first repair image; determining a generation loss based on the first repair image; determining a first auxiliary loss based on at least one of the global reconstruction loss, the image feature level loss, the local reconstruction loss, the object identity loss, and the generation loss; and carrying out fusion processing on the first auxiliary loss and the repair area mask loss to obtain the first repair loss.
In the above solution, the second training module is further configured to obtain a first pixel value of each pixel position in the first reconstructed image and a second pixel value of each pixel position in the second image sample; performing the following processing for each of the pixel positions, determining a first absolute value between a first pixel value of the pixel position and a second pixel value of the pixel position; normalizing the first absolute value of each pixel position to obtain a third pixel value of each pixel position; the label repair area mask is generated based on the third pixel value at each of the pixel locations.
In the above solution, the second training module is further configured to obtain a first flag value of each pixel position in the prediction repair area mask and a second flag value of each pixel position in the label repair area mask; performing the following processing for each of the pixel positions, determining a second absolute value between a first marker value of the pixel position and a second marker value of the pixel position; and carrying out fusion processing on the second absolute values of the pixel positions to obtain the mask loss of the repair area.
In the above solution, the second training module is further configured to obtain a fifth pixel value of each pixel position in the first reconstructed image and a sixth pixel value of each pixel position in the first repair image; performing the following processing for each of the pixel positions, determining a third absolute value between a fifth pixel value of the pixel position and a sixth pixel value of the pixel position; and carrying out fusion processing on the third absolute values of the pixel positions to obtain the integral reconstruction loss.
In the above scheme, the second training module is further configured to perform feature extraction processing of multiple levels on the first reconstructed image to obtain a first feature corresponding to each level; performing feature extraction processing of multiple levels on the first repair image to obtain second features corresponding to each level; the following is performed for each of the levels: determining feature distances between a first feature of the hierarchy and a second feature of the hierarchy; and carrying out fusion processing on the characteristic distances of the multiple layers to obtain the image characteristic level loss.
In the above scheme, the second training module is further configured to acquire a first local image corresponding to the target component in the first reconstructed image based on the tag repair area mask; acquiring a second partial image corresponding to the target component in the first repair image based on the label repair area mask; acquiring a seventh pixel value of each pixel position in the first partial image and an eighth pixel value of each pixel position in the second partial image; performing the following processing for each of the pixel positions, determining a fourth absolute value between a seventh pixel value of the pixel position and an eighth pixel value of the pixel position; and carrying out fusion processing on the fourth absolute values of the pixel positions to obtain the local reconstruction loss.
In the above scheme, the second training module is further configured to invoke an object identification network to perform an identification feature extraction process on the first reconstructed image, so as to obtain a first identification feature of the first reconstructed image; invoking an object identity recognition network to extract the identity characteristics of the first repair image to obtain a second identity characteristic of the first repair image; and determining the identity feature similarity between the first identity feature and the second identity feature, and acquiring the identity loss of the object inversely related to the identity feature similarity.
In the above scheme, the second training module is further configured to acquire an original image including the target component; repairing the original image through the trained first repairing network to obtain a predicted repairing area mask corresponding to the target component in the original image and a third repairing image corresponding to the original image; and synthesizing the third repair image and the original image based on the predicted repair area mask corresponding to the target component in the original image to obtain a target image after repairing the target component.
The embodiment of the application provides an image processing method, which comprises the following steps:
Acquiring an original image including a target component;
repairing the original image through a first repairing network to obtain a target image after repairing the target component;
the first repair network is trained by the repair network training method.
An embodiment of the present application provides an image processing apparatus including:
An image acquisition module for acquiring an original image including a target component;
The image restoration module is used for carrying out restoration processing on the original image through a first restoration network to obtain a target image after restoration of the target component, wherein the first restoration network is trained through the restoration network training method.
In the above scheme, the image restoration module is further configured to perform restoration processing on the original image through the first restoration network, so as to obtain a predicted restoration area mask corresponding to the target component in the original image and a third restoration image corresponding to the original image; and synthesizing the third repair image and the original image based on the predicted repair area mask corresponding to the target component in the original image to obtain a target image after repairing the target component.
An embodiment of the present application provides an electronic device, including:
a memory for storing computer executable instructions;
And the processor is used for realizing the repair network training method provided by the embodiment of the application or realizing the image processing method provided by the embodiment of the application when executing the computer executable instructions stored in the memory.
The embodiment of the application provides a computer readable storage medium which stores computer executable instructions for realizing the repair network training method provided by the embodiment of the application or realizing the image processing method provided by the embodiment of the application when the computer executable instructions are executed by a processor.
The embodiment of the application provides a computer program product, which comprises computer executable instructions, wherein the computer executable instructions realize the repair network training method provided by the embodiment of the application or realize the image processing method provided by the embodiment of the application when being executed by a processor.
The embodiment of the application has the following beneficial effects:
Carrying out matting processing on a first image sample comprising a target component to obtain a first matting sample for matting the target component, carrying out matting processing on a first reconstruction network based on the first matting sample to obtain a trained first reconstruction network, which is equivalent to a network which is obtained by training the first reconstruction network based on the samples for matting the target component and can realize target component reconstruction, carrying out repairing processing on a second image sample comprising the target component through a first repairing network to obtain a first repairing image, which belongs to a forward propagation process of the image sample in a first repairing network to be trained, carrying out reconstructing processing on a second matting sample through the trained first reconstruction network to obtain a first reconstruction image, wherein the second matting sample is obtained by matting processing on the second image sample, and is equivalent to generating the first reconstruction image by using the trained first reconstruction network, and because the first reconstruction network is obtained by training based on the matting sample, the first reconstruction network can be regarded as the first repairing image obtained by matting processing, and the first repairing image can be obtained by repairing the first repairing network based on the first repairing network, and the first repairing network can be obtained by directly inputting the first repairing network based on the first repairing network, and the first repairing network is equivalent to the first repairing network, and the first reconstructing image can be obtained by the first reconstructing network based on the first repairing network has a true value, and the first repairing effect is obtained by the first repairing network, and the first reconstructing image can be obtained by directly inputting the first repairing network based on the first repairing network, and the first repairing network has a true value, the method has the advantages that the target image after reconstructing the target component is obtained, the step of matting is omitted, and the portrait image with the high-definition and neat teeth is repaired by utilizing the depth network, so that the time consumption of overall repair is low, and the stability can be kept in a complex video scene because the step of matting is not needed.
Drawings
FIG. 1 is a schematic diagram of a repair network training system 100 according to an embodiment of the present application;
fig. 2A is a schematic structural diagram of an electronic device 500-1 according to an embodiment of the present application;
Fig. 2B is a schematic structural diagram of an electronic device 500-2 according to an embodiment of the present application;
FIG. 3A is a schematic diagram of a first flow chart of a repair network training method according to an embodiment of the present application;
FIG. 3B is a second flow chart of a repair network training method according to an embodiment of the present application;
FIG. 3C is a third flow chart of a repair network training method according to an embodiment of the present application;
FIG. 3D is a fourth flowchart of a repair network training method according to an embodiment of the present application;
fig. 3E is a flowchart of an image processing method according to an embodiment of the present application;
FIG. 4 is a schematic illustration of an original image and a matting image provided by an embodiment of the present application;
FIG. 5 is a schematic illustration of dental restoration results provided by an embodiment of the present application;
FIG. 6 is a diagram illustrating an implementation of a repair network training method according to an embodiment of the present application;
Fig. 7 is a schematic diagram of feature visualization of a repair network training method according to an embodiment of the present application.
Detailed Description
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the embodiments of the application is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
In the embodiment of the application, the relevant data collection processing should be strictly according to the requirements of relevant national laws and regulations when the example is applied, so as to acquire the informed consent or independent consent of the personal information body, and develop the subsequent data use and processing within the authorized range of the laws and regulations and the personal information body.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
1) Generating an countermeasure Network (GENERATIVE ADVERSARIAL Network, abbreviated GAN): a method for non-supervision learning carries out learning by making two neural networks game with each other. The generating countermeasure network consists of a generating network and a discriminating network. The generation network randomly samples from the potential space (LATENT SPACE) as input, and its output results need to mimic as much as possible the real samples in the training set. The input of the discrimination network is then the real sample or the output of the generation network, the purpose of which is to distinguish the output of the generation network as far as possible from the real sample.
2) Tooth restoration: the teeth which are cracked, blacked or not neat are corrected to be neat.
3) Splitting the network: for acquiring the positions of the respective components in the face of the subject. For example, in embodiments of the present application, it may be used to extract a mouth region of a subject's face.
The method has more application scenes for the component restoration task of the face of the object, and is described by taking the example that the component is a tooth, so that the tooth restoration has more application scenes, and can be used for the post-processing restoration of the tooth beautifying task and the portrait generating task. In the tooth beauty scene, when the teeth are deformed, the teeth are black and have cracks, and the appearance is not attractive enough, the teeth of a user can be corrected to be uniform through the teeth restoration. In the portrait generation task, the problems of multilayer cracks, partial blurring and the like of teeth are easy to occur, and the portrait generation effect can be improved by accessing a dental restoration algorithm.
The dental restoration algorithm in the related art has a form based on matting, takes the portrait of the better teeth as input, locates a mouth area through key points of the face, and sticks the better teeth to the mouth of the image to be restored. Algorithms for direct dental restoration based on depth networks are also available. The method takes the facial image with the teeth removed as input and outputs the restored result. However, this approach relies on a pre-segmentation network to locate the tooth region during reasoning, which on the one hand can introduce additional time-consuming; on the other hand, the stability of tooth region segmentation is difficult to ensure under a complex video by a segmentation network, so that instability of tooth restoration can be brought.
The embodiment of the application provides a repair network training method, an image processing device, electronic equipment, a computer readable storage medium and a computer program product, which can improve the efficiency and accuracy of repairing images of a first repair network. The image processing method provided by the embodiment of the application is realized based on a computer vision technology in an artificial intelligence technology.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of researching how to make a machine look, and more specifically, a camera and a Computer are used to replace human eyes to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. The large model technology brings important transformation for the development of computer vision technology, and pre-trained models in the vision fields of swin-transducer, viT, V-MOE, MAE and the like can be quickly and widely applied to downstream specific tasks through fine tuning. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
The following describes exemplary applications of the electronic device provided by the embodiments of the present application, where the electronic device provided by the embodiments of the present application may be implemented as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (for example, a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), a smart device (for example, a smart phone, a smart speaker, a smart watch, a smart television, a smart home appliance, a smart voice interaction device), a vehicle-mounted terminal, an aircraft, and other various types of object terminals, and may also be implemented as a server. In the following, an exemplary application when the device is implemented as a server will be described.
Referring to fig. 1, fig. 1 is a schematic architecture diagram of a repair network training system 100 according to an embodiment of the present application, in order to support an application of repair network training, a terminal is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two.
The terminal 400 is used to acquire an original image including a target component for image restoration and transmit the original image to the server 200 through the network 300.
The server 200 is configured to perform matting processing on a first image sample including a target component with an original image transmitted by the terminal 400 as an image sample, obtain a first matting sample for matting the target component, perform training on a reconstruction task of the target component on a first reconstruction network based on the first matting sample, obtain a trained first reconstruction network, perform reconstruction processing on a second matting sample through the trained first reconstruction network, obtain a first reconstruction image, and perform repair processing on a second image sample including the target component through a first repair network, obtain a first repair image, wherein the second matting sample is obtained by performing matting processing on the second image sample, determine a first repair loss corresponding to the first repair network based on the first repair image and the first reconstruction image, and update the first repair network based on the first repair loss, so as to obtain the trained first repair network. And then, acquiring an original image which is transmitted by the terminal 400 and comprises the target component, carrying out restoration processing on the original image through a first restoration network after training to obtain a target image after reconstructing the target component, and transmitting the reconstructed target image to the terminal 400.
The repair network training method provided by the embodiment of the application can be applied to video production scenes and live broadcast scenes. In a video production scene, a corresponding image of a video to be produced is obtained, the image is taken as an image sample, a first image sample comprising a target component is subjected to matting processing to obtain a first matting sample for matting the target component, a first reconstruction network is trained for a reconstruction task of the target component based on the first matting sample to obtain a trained first reconstruction network, a second matting sample is subjected to reconstruction processing through the trained first reconstruction network to obtain a first reconstruction image, and a second image sample comprising the target component is subjected to restoration processing through a first restoration network to obtain a first restoration image, wherein the second matting sample is obtained by matting processing a second image sample, a first restoration loss corresponding to the first restoration network is determined based on the first restoration image and the first reconstruction image, and the first restoration network is updated based on the first restoration loss to obtain the trained first restoration network. And then repairing all images included in the video to be produced based on the trained first repairing network, and obtaining the repaired video to be produced.
In a live broadcast scene, firstly, acquiring images included in a historical live broadcast video, as image samples, carrying out matting processing on a first image sample including a target component to obtain a first matting sample for matting the target component, training a reconstruction task for the target component on the basis of the first matting sample on a first reconstruction network to obtain a trained first reconstruction network, carrying out reconstruction processing on a second matting sample through the trained first reconstruction network to obtain a first reconstruction image, and carrying out restoration processing on a second image sample including the target component through a first restoration network to obtain a first restoration image, wherein the second matting sample is obtained by carrying out matting processing on the second image sample, determining a first restoration loss corresponding to the first restoration network based on the first restoration image and the first reconstruction image, and updating the first restoration network based on the first restoration loss to obtain the trained first restoration network. And then repairing each image included in the live broadcast process based on the trained first repairing network, and displaying the repaired image to the audience as a picture in live broadcast.
In some embodiments, the server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart television, a car terminal, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.
In some embodiments, the terminal 400 may implement the repair network training method provided by the embodiments of the present application by running a computer program, for example, the computer program may be a native program or a software module in an operating system; may be a Native Application (APP), i.e. a program that needs to be installed in an operating system to run, such as a video APP; the method can also be an applet, namely a program which can be run only by being downloaded into a browser environment; but also an applet that can be embedded in any APP. In general, the computer programs described above may be any form of application, module or plug-in.
Referring to fig. 2A, fig. 2A is a schematic structural diagram of an electronic device 500-1 according to an embodiment of the present application, and the electronic device 500-1 shown in fig. 2A includes: at least one processor 510-1, a memory 550-1, at least one network interface 520-1, and a user interface 530-1. The various components in electronic device 500-1 are coupled together by bus system 540-1. It is appreciated that bus system 540-1 is used to enable connected communications between these components. The bus system 540-1 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 540-2 in fig. 2A.
The Processor 510-1 may be an integrated circuit chip having signal processing capabilities such as a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., where the general purpose Processor may be a microprocessor or any conventional Processor, etc.
The user interface 530-1 includes one or more output devices 531-1, including one or more speakers and/or one or more visual displays, that enable presentation of media content. The user interface 530-1 also includes one or more input devices 532-1 that include user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 550-1 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 550-1 optionally includes one or more storage devices physically remote from processor 510-1.
Memory 550-1 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM) and the volatile Memory may be a random access Memory (Random Access Memory, RAM). The memory 550-1 described in embodiments of the present application is intended to comprise any suitable type of memory.
In some embodiments, memory 550-1 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 551-1 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
Network communication module 552-1 is used to reach other electronic devices via one or more (wired or wireless) network interfaces 520-1, exemplary network interfaces 520-1 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (Universal Serial Bus, USB), etc.;
A presentation module 553-1 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530-1;
The input processing module 554-1 is configured to detect one or more user inputs or interactions from one of the one or more input devices 532-1 and translate the detected inputs or interactions.
In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 2A shows a repair network training apparatus 555-1 stored in a memory 550-1, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the matting processing module 5551, the first training module 5552, the image restoration module 5553, and the second training module 5554 are logical, and thus may be arbitrarily combined or further split according to the implemented functions. The functions of the respective modules will be described hereinafter.
Referring to fig. 2B, fig. 2B is a schematic structural diagram of an electronic device 500-2 according to an embodiment of the present application, and the electronic device 500-2 shown in fig. 2B includes: at least one processor 510-2, a memory 550-2, at least one network interface 520-2, and a user interface 530-2. The various components in electronic device 500-2 are coupled together by bus system 540-2. It is appreciated that bus system 540-2 is used to facilitate connected communications between these components. The bus system 540-2 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 540-2 in fig. 2B.
It should be noted that the processor 510-2, the user interface 530-2, the memory 550-2, the operating system 551-2, the network communication module 552-2, the presentation module 553-2, and the input processing module 554-2 included in fig. 2B all have the same functions and structures as the corresponding components in fig. 2A.
In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 2B shows the image processing apparatus 555-2 stored in the memory 550-2, which may be software in the form of a program, a plug-in, or the like, including the following software modules: image acquisition module 5555 and image restoration module 5556, which are logical, and thus may be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be described hereinafter.
In other embodiments, the apparatus provided by the embodiments of the present application may be implemented in hardware, and by way of example, the apparatus provided by the embodiments of the present application may be a Processor in the form of a hardware decoding Processor that is programmed to perform the repair network training method provided by the embodiments of the present application, for example, the Processor in the form of a hardware decoding Processor may employ one or more Application-specific integrated circuits (ASICs), digital signal processors (DIGITAL SIGNAL processors, DSPs), programmable logic devices (Programmable Logic Device, PLDs), complex Programmable logic devices (Complex Programmable Logic Device, CPLD), field-Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA), or other electronic components.
In the following, the method for training a repair network provided by the embodiment of the present application is described, and as described above, the electronic device implementing the method for training a repair network of the embodiment of the present application may be a terminal, a server, or a combination of both. The execution subject of the respective steps will not be repeated hereinafter.
Referring to fig. 3A, fig. 3A is a first flow chart of a repair network training method according to an embodiment of the present application, and the following description will be made with reference to steps 101 to 104 shown in fig. 3A.
In step 101, a first image sample including a target component is subjected to matting processing, and a first matting sample from which the target component is scratched is obtained.
As an example, the target component may be a component to be repaired included in the image, for example, in a face image, the target component may be a component of teeth, eyelashes, eyes, or the like in the face image, and in a building image, the target component may be a component of a roof, a window, or the like. The specific target components may be set according to actual situations, and are not particularly limited herein.
As an example, the first image sample may be an image sample stored in the database in advance, or may be an image sample uploaded by the terminal, and the type of the image sample may be a face image, an animal image, a building image, or the like. The specific image sample type can be set according to the actual situation.
As an example, the process of matting the first image sample may be to input the first image sample into a pre-trained segmentation network, obtain, by the segmentation network, a region of the first image sample corresponding to the target component, and subtract, based on the region, the target component included in the first image sample. For example, a mouth mask mouth_mask is acquired by the segmentation network, a position with a value of 1 in the mouth mask mouth_mask belongs to a mouth region, a position with a value of 0 does not belong to the mouth region, and a matting image img_no_mouth with teeth scratched off is acquired, where img_no_mouth=the product of the pixels of the first image sample and (1-mouth_mask).
As an example, the segmentation network is a deep learning model, which may be used to segment an input image into different regions or objects. The segmentation network can accurately identify different objects, backgrounds and boundaries in the image by learning the characteristics and semantic information of the image, so that accurate segmentation and identification of the image are realized. The segmentation network can adopt structures such as a Convolutional Neural Network (CNN) or a full convolutional neural network (FCN), and the like, and is trained through a back propagation algorithm so as to improve the accuracy and efficiency of segmentation.
As an example, a matting image in the embodiment of the present application is described below with reference to fig. 4, and fig. 4 is a schematic diagram of an original image and a matting image provided by the embodiment of the present application.
In fig. 4, taking a tooth 403 as an example of a target member, a left side is an original image 401 which is not subjected to the matting process, and a right side is a matting image 402 obtained after the matting process. As shown in fig. 4, the original image includes an image of the region corresponding to the tooth 403, and the matting image 402 does not include an image of the region corresponding to the tooth 403.
In step 102, training a reconstruction task for a target component is performed on a first reconstruction network based on the first matting cost, and a trained first reconstruction network is obtained.
As an example, after the first matting book is obtained, the first matting book may be used to train the reconstruction task for the target component on the first reconstruction network first, and a process of training the first reconstruction network based on the first matting book is described below.
In some embodiments, training the first reconstruction network for the reconstruction task of the target component based on the first matting cost in step 102, resulting in a trained first reconstruction network may be implemented by steps 1021 through 1023 as shown in fig. 3B.
In step 1021, blurring processing is performed on the first image sample, so as to obtain a first blurred sample.
As an example, the blurring process may be performed on the first image sample first, so as to obtain a first blurred sample corresponding to the first image sample, where the principle of blurring the image is to change details of the image by manipulating pixel values of the image. Common image blurring methods include gaussian blurring, mean blurring, motion blurring, and the like. In the embodiment of the application, the first image sample is subjected to average pooling treatment to obtain a blurred image img_blur. The specific mode of processing the module may be selected according to practical situations, and is not specifically limited herein.
In step 1022, the first matted sample and the first blurred sample are spliced, so as to obtain a first spliced sample.
As an example, the process of performing the stitching processing on the first matted sample and the first blurred sample may be to stitch each pixel point of the first matted sample and the first blurred sample, so as to obtain a stitched sample with a higher dimension.
As an example, the dimension of the first matting sample is(Channel number is 3), the dimension of the first blurred sample is/>(The number of channels is 3), the dimension of the first spliced sample obtained by splicing the first matt sample and the first blurred sample may be/>(The number of channels is 6). /(I)
In step 1023, training a reconstruction task for the target component is performed on the first reconstruction network based on the first stitched sample, resulting in a trained first reconstruction network.
The process of training the first reconstruction network for the reconstruction task of the target component based on the first stitched samples is described below.
In some embodiments, training the first reconstruction network for the reconstruction task of the target component based on the first spliced sample in step 1023, and obtaining the trained first reconstruction network may be implemented by the following technical scheme: carrying out reconstruction processing on the first spliced sample through the initialized first reconstruction network to obtain a second reconstructed image corresponding to the first spliced sample; determining a first reconstruction loss based on the first image sample and the second reconstruction image; updating the initialized first reconstruction network based on the first reconstruction loss to obtain a trained first reconstruction network. The embodiment of the application can improve the accuracy of image restoration of the first reconstruction network obtained by training.
As an example, the stitched samples may be subjected to a repair process based on the untrained first reconstruction network to obtain a second reconstruction image, after which a first reconstruction loss is determined based on the label image and the second reconstruction image, and parameters of the first reconstruction network are updated based on the first reconstruction loss inverse gradient.
As an example, the first reconstruction loss may comprise one or several of the following, e.g. the first reconstruction loss may be fused by: the method comprises the steps of overall reconstruction loss of a second reconstruction image and a first image sample, feature level loss of the second reconstruction image and the first image sample, local reconstruction loss of the second reconstruction image and a target component of the first image sample, object identity loss of the second reconstruction image and the first image sample and generation loss of the second reconstruction image. The specific first reconstruction loss may be set according to practical situations, and is not specifically limited herein.
As an example, by performing repair processing on the spliced samples and training the first reconstruction network through the first reconstruction loss, the accuracy of the first reconstruction network repair image can be improved, and thus the accuracy of the subsequent first repair network repair image can be improved.
The overall reconstruction loss of the second reconstructed image and the first image sample is introduced, and the pixel value of each pixel position in the first image sample and the pixel value of each pixel position in the second reconstructed image are obtained; performing the following processing for each pixel position, determining an absolute value between pixel values of the pixel positions in the two images; and carrying out fusion processing on the absolute values of the pixel positions to obtain the integral reconstruction loss of the second reconstruction image and the first image sample. The embodiment of the application can restrict the two images to be similar as much as possible from the whole pixel, thereby improving the reconstruction capability of the model.
The feature level loss of the second reconstructed image and the first image sample is introduced, and the first image sample is subjected to feature extraction processing of multiple levels to obtain features corresponding to each level; performing feature extraction processing of multiple levels on the second reconstructed image to obtain features corresponding to each level; the following processing is performed for each hierarchy: determining feature distances between features of the two images at the hierarchy; and carrying out fusion processing on the feature distances of the multiple layers to obtain feature level loss of the second reconstructed image and the first image sample. The embodiment of the application can restrict the reconstruction capability of the model with depth from the angles of different hierarchy characteristics.
The following describes the local reconstruction loss of the second reconstruction image corresponding to the target component of the first image sample, which can be achieved by the following technical scheme: acquiring a local image of a corresponding target component in the first image sample based on the region of the target component; acquiring a local image of the corresponding target component in the second reconstructed image based on the region of the target component; acquiring a fourth pixel value of the two partial images at each pixel position; performing the following processing for each pixel position, determining an absolute value between pixel values of the two partial images at the pixel position; and carrying out fusion processing on the absolute values of the pixel positions to obtain local reconstruction loss corresponding to the target component of the second reconstruction image and the first image sample. The embodiment of the application can restrict the reconstruction effect from the local area where the target component is located.
Introducing object identity loss of the second reconstructed image and the first image sample, and calling an object identity recognition network to extract the identity of the first image sample to obtain the identity of the first image sample; invoking an object identity recognition network to extract the identity characteristics of the second reconstructed image to obtain the identity characteristics of the second reconstructed image; and determining the identity feature similarity between the identity features of the two images, and acquiring the identity loss of the object negatively related to the identity feature similarity. The embodiment of the application can ensure that the identity characteristics are not lost in the reconstruction process.
The generation loss of the second reconstructed image is introduced, and a discrimination network is called to conduct true and false prediction processing on the second reconstructed image, so that a first prediction probability that the second reconstructed image belongs to a first image sample is obtained; a generation penalty that is inversely related to the first predictive probability is obtained. The embodiment of the application can enable the generated second reconstructed image to be close to the real image, namely the image which is generated by the model can not be seen.
In step 103, the second matting book is reconstructed through the trained first reconstruction network to obtain a first reconstructed image, and a second image sample including the target component is repaired through the first repair network to obtain a first repair image, wherein the second matting sample is obtained by matting the second image sample.
As an example, the first repair network to be trained performs repair processing on the second image sample to obtain a first repair image, and the trained first reconstruction network performs repair processing on the second matting book corresponding to the second image sample to obtain a first reconstruction image.
The process of obtaining a first reconstructed image based on a first reconstruction network is described below.
In some embodiments, the trained first reconstruction network includes a first encoding network and a first decoding network, and the reconstructing the second matting book by the trained first reconstruction network in step 103 to obtain a first reconstructed image may be implemented by the following technical scheme: performing blurring processing on the second image sample to obtain a second blurred sample; performing splicing treatment on the second matt sample and the second fuzzy sample to obtain a second spliced sample; and carrying out coding processing on the second spliced sample through a first coding network to obtain a first coding result, and carrying out decoding processing on the first coding result through a first decoding network to obtain a first reconstructed image.
As an example, the manner of blurring the second image sample may refer to the blurring process described in step 1021 above.
As an example, the manner of performing the stitching on the second matted sample and the second blurred sample may refer to the procedure of the stitching described in step 1022.
By way of example, the spliced sample is repaired through the first reconstruction network, so that the accuracy of repairing the obtained first reconstruction image can be improved, and the accuracy of repairing the image by the subsequent first reconstruction network can be improved.
As an example, a process of encoding and decoding a stitched image is described below. Each layer of coding network is used for performing convolution calculation, halving input and gradually increasing channels. The dimension of the stitched image x is(Two images are spliced together as input, the channel number of each image is 3), and the encoding is gradually carried out as/>The dimension isThe dimension is/>The dimension is/>Finally, the coding result in the hidden space is obtained. The decoding process can be the inverse process of encoding, specifically, the resolution of the input encoding result can be doubled, and the final decoding network gradually decodes the encoding result into the dimension/>, in sequenceIs the dimension/>Is the dimension/>Is the dimension/>Is the dimension/>Finally, a first reconstructed image is obtained.
The procedure for performing the repair processing on the second image sample by the first repair network is described below.
In some embodiments, the repairing of the second image sample including the target component by the first repairing network in step 103 to obtain the first repairing image may be implemented by the following technical scheme: and repairing the second image sample through the first repairing network to obtain a predicted repairing area mask of the corresponding target component in the second image sample and a second repairing image of the corresponding second image sample. And synthesizing the second repair image and the second image sample based on the predicted repair area mask in the second image sample to obtain a first repair image.
As an example, the first repair network may identify a predicted repair area mask for the corresponding target component in the second image sample and obtain a second repair image corresponding to the second image sample. And then, replacing the predicted repair area mask in the second image sample by using the second repair image to obtain a first repair image.
As an example, the predicted repair area mask may be composed of a set of binary masks, where each mask corresponds to a pixel or set of pixels in the image, the predicted repair area mask being used to determine whether the pixel belongs to a predicted repair area of the target part. The first repair network can obtain the predicted repair area mask of the corresponding target component in the second image sample, because repair area mask loss is used in the training process of the first repair network, and the determination process of repair area mask loss is described in detail later.
As an example, since the second repair image is a generated image, there is an unavoidable difference between the image except for the region corresponding to the target component and the original second image sample, and the purpose of repairing the second image sample is to repair only the region corresponding to the target component, so in order to ensure the accuracy of the generated first repair image, replacement processing may be performed on the predicted repair region mask in the second image sample with the second repair image, to obtain the first repair image.
As an example, the image of the region corresponding to the predicted repair region mask in the second image sample may be replaced with the region corresponding to the predicted repair region mask in the second repair image, to obtain the first repair image.
As an example, the second image sample includes a region a and a region B, where the region B is a region corresponding to the predicted repair region mask, and the second repair image includes a region C and a region D, where the region D is a region corresponding to the predicted repair mask, and then the region B may be replaced with the region D, so as to obtain a first repair image, that is, the first repair image is an image including the region a and the region D.
As an example, after obtaining the predicted repair area mask and the second repair image, the first repair image may be determined by the following formula.
(1)
In the case of the formula (1),For the first one of the repair images,In order to make the second restoration image available,In order to predict the repair area mask,Is the second sample image.
According to the embodiment of the application, the region corresponding to the predicted repair region mask in the original image is replaced by the region corresponding to the predicted repair region mask in the repair image, so that the region which does not need to be repaired in the original image can be reserved, and the accuracy of the obtained repair image is further improved.
In step 104, determining a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image, and updating the first repair network based on the first repair loss to obtain a trained first repair network; the first trained repairing network is used for repairing an original image comprising the target component to obtain a target image after repairing the target component.
As an example, a process of determining the first repair loss is described below.
In some embodiments, determining the first repair loss for the corresponding first repair network based on the first repair image and the first reconstructed image in step 104 may be implemented by steps 1041 through 1048 as shown in fig. 3C.
In step 1041, a tag repair area mask is acquired and repair area mask loss is determined based on the predicted repair area mask and the tag repair area mask for the corresponding target component in the second image sample.
The manner in which the tag repair area mask is obtained is described below.
In some embodiments, the acquiring the tag repair area mask in step 1041 may be implemented by the following technical scheme: acquiring a first pixel value of each pixel position in a first reconstructed image and a second pixel value of each pixel position in a second image sample; performing the following processing for each pixel position, determining a first absolute value between a first pixel value of the pixel position and a second pixel value of the pixel position; normalizing the first absolute value of each pixel position to obtain a third pixel value of each pixel position; a label repair area mask is generated based on the third pixel value at each pixel location.
As an example, since the first reconstructed image is an image obtained after the first reconstructed network repairs the second image sample, and the repaired area is an area corresponding to the target component (tag repair area mask), the same as the second image sample is applied to an area other than the area corresponding to the target component (tag repair area mask). Therefore, the pixel value of each position of the first reconstructed image and the second image sample can be differenced to obtain a third pixel value, the position of the pixel with the third pixel value not being 0 is marked as 1, and the position with the third pixel value being 0 is marked as 0, so that the label repair area mask is obtained.
As an example, the first pixel values corresponding to the first reconstructed image are represented by a matrix as:
[1,1,1,1,1,1,1,1
2,2,2,2,2,2,2,2
3,3,3,3,3,3,3,3]
The second pixel values corresponding to the second image samples are represented by a matrix as:
[1,1,1,1,1,1,1,1
2,2,1,2,2,2,2,2
3,2,2,2,3,3,3,3]
Then, determining a first absolute value between the first pixel value and the second pixel value as follows:
[0,0,0,0,0,0,0,0
0,0,1,0,0,0,0,0
0,1,1,1,0,0,0,0]
and then carrying out normalization processing on the first absolute value to obtain a third pixel value as follows:
[0,0,0,0,0,0,0,0
0,0,1/255,0,0,0,0,0
0,1/255,1/255,1/255,0,0,0,0]
Marking the pixel position with the third pixel value not being 0 as 1, and marking the position with the third pixel value being 0 as 0, thereby obtaining the label repair area mask:
[0,0,0,0,0,0,0,0
0,0,1,0,0,0,0,0
0,1,1,1,0,0,0,0]。
By determining the mode of the label repair area mask according to the embodiment of the application, the accuracy of the obtained label repair area mask can be improved, and the accuracy of the subsequent first repair network repair image can be further improved.
The process of determining the repair area mask loss is described below.
In some embodiments, determining the repair area mask loss based on the predicted repair area mask and the tag repair area mask in step 1041 may be achieved by: acquiring a first mark value of each pixel position in the prediction repair area mask and a second mark value of each pixel position in the label repair area mask; performing the following processing for each pixel position, determining a second absolute value between the first marker value of the pixel position and the second marker value of the pixel position; and carrying out fusion processing on the second absolute values of the pixel positions to obtain the mask loss of the repair area.
As an example, the process of determining repair area mask loss may refer to a repair area mask loss function corresponding to the following formula:
(2)
in the formula (2) of the present invention, In order to repair the area mask loss function,For the absolute value calculation formula,As a result of the value of the first flag,Is the second flag value.
(3)
In the formula (3) of the present invention,Is an absolute value calculation formula which indicates the absolute value of a-b.
As an example, as can be seen by combining equation (2) and equation (3), the repair area mask loss is a loss between the tag repair area mask corresponding to the characterization first reconstructed image and the predicted repair area mask corresponding to the first repair image.
The repair area mask loss in the embodiment of the application can enable the first repair network to accurately identify the area needing repair, thereby improving the accuracy of the repair image of the first repair network.
In order to further improve the accuracy of the image restoration by the first restoration network, the following first auxiliary loss can be determined.
At least one of the loss determination processing in steps 1042 to 1046 is performed:
in step 1042, an overall reconstruction loss is determined based on the first reconstructed image and the first repair image.
As an example, a fifth pixel value for each pixel location in the first reconstructed image and a sixth pixel value for each pixel location in the first reconstructed image are obtained; performing the following processing for each pixel position, determining a third absolute value between the fifth pixel value of the pixel position and the sixth pixel value of the pixel position; and carrying out fusion processing on the third absolute values of the pixel positions to obtain the overall reconstruction loss.
As an example, a fifth pixel value for each location in the first reconstructed image and a sixth pixel value for each location in the first reconstructed image are determined, and an overall reconstruction loss is determined based on the fifth pixel value and the sixth pixel value. See in particular the following formula:
(4)
In the formula (4) of the present invention, In order to reconstruct the loss in its entirety,The absolute value calculation formula introduced for formula (3),For the value of the sixth pixel it is,Is the fifth pixel value.
In step 1043, an image feature level loss is determined based on the first reconstructed image and the first repair image.
As an example, performing feature extraction processing of multiple levels on the first reconstructed image to obtain a first feature corresponding to each level; performing feature extraction processing of multiple levels on the first repair image to obtain second features corresponding to each level; the following processing is performed for each hierarchy: determining feature distances between first features of the hierarchy and second features of the hierarchy; and carrying out fusion processing on the characteristic distances of the multiple layers to obtain the image characteristic level loss.
As an example, first, the first restoration image is subjected to multi-level feature extraction through the deep convolution network to obtain features of four levels, namely, result_fea1, result_fea2, result_fea3 and result_fea4, the first restoration image is subjected to multi-level feature extraction through the deep convolution network to obtain gt_fea1, gt_fea2, gt_fea3 and gt_fea4, and then, based on the obtained result_fea1, result_fea2, result_fea3, result_fea4 and gt_fea1, gt_fea2, gt_fea3 and gt_fea4, the image feature level loss is determined, and the following formula can be specifically referred to:
(5)
In the formula (5) of the present invention, For image feature level loss, result_fea1 is a feature of a first level of a first repair image, gt_fea1 is a feature of a first level of a first reconstruction image, result_fea2 is a feature of a second level of the first repair image, gt_fea2 is a feature of the second level of the first repair image, result_fea3 is a feature of a third level of the first repair image, gt_fea3 is a feature of a third level of the first reconstruction image, result_fea4 is a feature of a fourth level of the first repair image, and gt_fea4 is a feature of a fourth level of the first reconstruction image.
In step 1044, a local reconstruction loss is determined based on the first reconstructed image, the first repair image, and the tag repair region mask.
As an example, a first partial image of a corresponding target component in a first reconstructed image is acquired based on a tag repair area mask; acquiring a second partial image of the corresponding target component in the first repair image based on the label repair area mask; acquiring a seventh pixel value of each pixel position in the first partial image and an eighth pixel value of each pixel position in the second partial image; performing the following processing for each pixel position, determining a fourth absolute value between the seventh pixel value of the pixel position and the eighth pixel value of the pixel position; and carrying out fusion processing on fourth absolute values of the pixel positions to obtain local reconstruction loss.
As an example, taking the target component as a tooth for illustration, first, the eighth pixel value of each position of the second partial image corresponding to the first repair image and the seventh pixel value of each position of the second partial image corresponding to the first reconstructed image are determined, and specifically, the following formula can be referred to:
(6)
(7)
Where result_mask is a pixel of the mouth region in the first repair image (the eighth pixel value of each pixel position in the second partial image of the corresponding target component in the first repair image), result is a pixel of the first repair image (the sixth pixel value of each pixel position in the first repair image), mouth_mask is a mouth region mask, gt_mouth is a pixel of the mouth region in the first reconstruction image (the seventh pixel value of each pixel position in the first partial image of the corresponding target component in the first reconstruction image), and gt is a pixel of the first reconstruction image (the fifth pixel value of each pixel position in the first reconstruction image).
The loss of partial reconstruction of a dental region, teeth_l1_loss, can be seen in the following formula:
(8)
in equation (8), teeth_l1_loss is the loss of partial reconstruction of the tooth region, For the absolute value calculation formula introduced in formula (3), result_mouth is a pixel of the mouth region in the first repair image (an eighth pixel value of each pixel position in the second partial image corresponding to the target component in the first repair image), and gt_mouth is a pixel of the mouth region in the first reconstructed image (a seventh pixel value of each pixel position in the first partial image corresponding to the target component in the first reconstructed image).
In step 1045, a loss of identity of the object is determined based on the first reconstructed image and the first repair image.
As an example, calling an object identity recognition network to extract the identity of the first reconstructed image to obtain a first identity of the first reconstructed image; invoking an object identity recognition network to extract the identity characteristics of the first repair image to obtain a second identity characteristic of the first repair image; and determining the identity feature similarity between the first identity feature and the second identity feature, and acquiring the identity loss of the object negatively related to the identity feature similarity.
As an example, the first identity of the first reconstructed image is extracted by using the existing object identification network, and at the same time, the second identity of the first repair image is extracted by using the existing object identification network, so as to calculate the object identity loss, wherein the object identity loss aims to restrict the better the first reconstructed image and the first repair image are, and the following formula can be referred to specifically:
(9)
in the formula (9) of the present invention, In order to lose the identity of the object,For the cosine similarity calculation formula,As a first one of the characteristics of the identity,Is a second identity feature.
In step 1046, a generation loss is determined based on the first repair image.
As an example, embodiments of the present application may also employ a first repair image of an antagonism network (ADVERSARIAL NETWORK), which is a model made up of two mutually opposing neural networks. One of the networks is called a generator (generator) responsible for generating new samples or data; the other network is called a arbiter (discriminator) responsible for determining whether the data generated by the generator is authentic or counterfeit. Therefore, the first auxiliary loss may further include a generation loss and a discrimination loss, and the specific generation loss may refer to the following formula:
(10)
in the formula (10) of the present invention, It is the generation of the loss that,Is the output of the arbiter network D for the first repair image.
In step 1047, a first auxiliary penalty is determined based on at least one of the global reconstruction penalty, the image feature level penalty, the local reconstruction penalty, the subject identity penalty, and the generation penalty.
As an example, at least one of the overall reconstruction loss, the image feature level loss, the local reconstruction loss, the object identity loss, and the generation loss may be selected as the first auxiliary loss. The specific first auxiliary loss may be selected according to practical situations, and is not specifically limited herein.
In step 1048, the first auxiliary loss and the repair area mask loss are fused, to obtain a first repair loss.
As an example, after determining the first auxiliary loss, the first auxiliary loss and the repair area mask loss may be fused, so as to obtain the first repair loss.
As an example, if the first auxiliary loss includes a global reconstruction loss, an image feature level loss, a local reconstruction loss, an object identity loss, and a generation loss, the first repair loss may be determined with reference to the following formula:
(11)
In the formula (11) of the present invention, For the first repair loss to be made,In order to reconstruct the loss in its entirety,For the loss of the level of image features,In order to lose the identity of the object,In order to generate a loss of power,For the loss of the local reconstruction,To repair regional mask loss, the number before each loss is the weight corresponding to each loss.
In some embodiments, after step 104 is performed, steps 105 through 107 as shown in fig. 3D may also be performed.
In step 105, an original image including a target part is acquired.
As an example, after training to obtain the first repair network, an original image including the target component may be acquired to repair the target component in the original image later through the first repair network.
In step 106, the original image is repaired by the trained first repair network, so as to obtain a predicted repair area mask corresponding to the target component in the original image and a third repair image corresponding to the original image.
As an example, a predicted repair area mask corresponding to a target component in an original image and a third repair image corresponding to the original image are determined through a first repair network after training, where the predicted repair area mask is a mask corresponding to the target component in the original image output by the first repair network, that is, in the predicted repair area mask, a value of an area corresponding to the target component is 1, and values of other areas are 0.
In step 107, based on the predicted repair area mask of the corresponding target component in the original image, the third repair image and the original image are synthesized, so as to obtain a target image after repairing the target component.
As an example, after obtaining the predicted repair area mask and the third repair image, the area image corresponding to the target component in the third repair image may be obtained by multiplying the predicted repair mask and the third repair image, and then the area image corresponding to the target component in the original image may be replaced with the area image corresponding to the target component in the third repair image, so as to obtain the target image. See in particular the following formula.
(12)
In the formula (12) of the present invention,In order to be an image of the object,For the third restoration image, a second image is displayed,In order to predict the repair area mask,Is the original image.
According to the embodiment of the application, the first restoration network can finish restoration of the image without carrying out matting processing, so that the image restoration efficiency is improved, and the resource consumption of image restoration is reduced.
Referring to fig. 3E, fig. 3E is a flowchart illustrating an image processing method according to an embodiment of the present application. The following will explain in connection with steps 201 to 202 shown in fig. 3E.
In step 201, an original image including a target component is acquired.
As an example, the target component is the same as the target component referred to in steps 101 to 104, and the description thereof will not be repeated, and the original image may be an image uploaded to the server by the terminal, or an image read by the server from the database, and the source of the original image is not specifically limited herein.
In step 202, repairing the original image through a first repairing network to obtain a target image after repairing the target component; the first repair network is trained by the repair network training method.
As an example, the first repair network trained in the steps 101 to 104 may perform repair processing on the obtained original image, so as to obtain a reconstructed target image.
In some embodiments, the repairing of the original image by the first repairing network in step 202 may be implemented by the following technical scheme: and carrying out restoration processing on the original image through a first restoration network to obtain a predicted restoration area mask corresponding to the target component in the original image and a third restoration image corresponding to the original image, and carrying out synthesis processing on the third restoration image and the original image based on the predicted restoration area mask corresponding to the target component in the original image to obtain the target image after restoration of the target component.
As an example, the first repair network may determine a predicted repair area mask corresponding to the target component and a third repair image during repair processing of the original image. The procedure of the first repair network for performing the repair process on the original image may be referred to as steps 105 to 106, and will not be repeated here.
As an example, the manner in which the third repair image and the original image are subjected to the synthesis processing based on the predicted repair area mask may be referred to in step 107 described above, and the description thereof will not be repeated.
In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The repair network training method provided by the embodiment of the application can be applied to video production scenes and live broadcast scenes. The method for training the restoration network provided by the embodiment of the application is described below by taking a restoration target part as a tooth as an example.
The repair network training method provided by the embodiment of the application can be a tooth repair algorithm based on two-stage training to keep video stable. Specifically, in the first training stage, the face image with the teeth scratched is used as input by utilizing the segmentation network, and the face image of the high-definition teeth is reconstructed by the target. In the second training stage, the face image is directly used as input without dividing a network, and the face image with high definition teeth is restored by utilizing the output of the first stage network (first reconstruction network). In the application stage, the tooth restoration is directly realized by utilizing the network (the first restoration network) in the second stage without depending on a segmentation network. Therefore, the first restoration network can directly utilize the depth network to restore the portrait image of the tooth with the high definition, the time consumption is low, and the stability can be kept in a complex video scene.
A schematic of the first restoration network for restoring teeth is described below.
Referring to fig. 5, fig. 5 is a schematic view of the dental restoration results provided by the embodiment of the present application.
In fig. 5, 501 on the left side is an image of a tooth having a defect in the original image, and 502 on the right side is an image of a tooth after the restoration is completed through the first restoration network.
The following describes an implementation process of the repair network training method provided by the embodiment of the present application with reference to fig. 6, and fig. 6 is an implementation process of the repair network training method provided by the embodiment of the present application.
As can be seen from a region 601 in fig. 6, firstly, a matting sample 603 corresponding to a first image sample pair and a blurred image 604 corresponding to an original image are input into a first reconstruction network to be trained, and encoding processing is performed on the matting sample 603 and the blurred image 604. And obtaining a coding result, then carrying out decoding treatment on the coding result to obtain a decoding result, finally obtaining an output repaired image, and updating parameters of the first reconstruction network by reverse gradient through calculating loss between the repaired image and the label image so as to complete training of the first reconstruction network.
Referring to a region 602 in fig. 6, it can be seen that the second image sample is input into the first repair network, the first repair image 605 output by the second image sample is obtained through encoding and decoding processing of the first repair network, meanwhile, the first reconstructed image 606 obtained after the first reconstructed network repairs the second image sample is obtained, and then the parameters of the first repair network are updated by determining the loss between the first repair image 605 and the first reconstructed image 606, so as to complete the training of the first repair network.
The preparation process before training the first repair net is described in detail below.
In step 1, because the human face often occupies only a small position in the input image, advanced human face detection is required to obtain a human face area.
And 2, carrying out face registration in the face area to obtain key points of the face, wherein key points of eyes and corners of a mouth of a person are emphasized.
And step 3, obtaining a cut face image according to the face key points.
At the same time, the present embodiment requires 3 additional models that have been trained to assist in the learning of the dental restoration network. The object identity recognition network is used for extracting the identity characteristics of the human face, and the dimension of the identity characteristics is 1024 dimensions generally, and the closer the identity of the human face to be generated and the better the identity of the human face of the original image are, the more the identity characteristics of the human face are extracted to be constrained. The pre-trained Alexach network (alexnet network) is used to extract features of the image at different layers to calculate image feature level loss (LPIPS loss).
Referring to fig. 7, fig. 7 is a schematic diagram of feature visualization of a repair network training method according to an embodiment of the present application.
In fig. 7, low-level features can represent low-level features such as lines, colors, etc., and high-level features can represent high-level features such as components. The overall proximity can be measured by comparing the features extracted by alexnet of the 2 images. The segmentation network is used to extract the positions of the various components of the face, and in the embodiment of the application, is used to extract the mouth region of the face.
The process of training the first repair net is described in detail below.
Step 1, preparing dental restoration data, obtaining high-definition face data (a first image sample is taken as a label sample) on the front face, and marking the face data as a figure.
Step 2, extracting a mouth region of the first image sample gt, specifically, sending the first image sample gt into a segmentation network, extracting a mouth region, and obtaining a mouth mask mouth_mask, wherein a position with a value of 1 in the mouth mask mouth_mask belongs to the mouth region, and a position with a value of 0 does not belong to the mouth region.
Step 3, obtaining a picture img_no_mouth of the tooth to be scratched out)。
And 4, obtaining a blurred first image sample gt, and carrying out average pooling treatment on the first image sample to obtain a blurred image img_blu, wherein the average pooling treatment core size kernelsize is 33.
Step 5, acquiring a spliced image x, specifically a spliced image x=cat (img_no_mouth, img_blast), and splicing the matting sample and the blurred image together to be used as the input of the tooth restoration network.
And 6, the first reconstruction network can be divided into a coding network and a decoding network, wherein the coding network is composed of coding modules, each coding module is used for executing convolution calculation, the input is halved, and the channels are gradually increased. The dimension of the spliced image x input by the dental restoration network provided by the embodiment of the application is as follows(Two images are spliced together as input, the channel number of each image is 3), and the encoding is gradually carried out as/>The dimension is/>The dimension is/>The dimension is/>Finally, the internal characteristics (inner_features) of the coding result in the hidden space are obtained.
Step 7, inputting the obtained encoding result inner_features into a decoder, obtaining a decoding result by the decoder, wherein the decoding network consists of decoding modules, each decoding module at least comprises an up-sampling layer, the input resolution can be doubled, and finally the decoding network gradually decodes the encoding result into a dimensionIs the dimension/>Is the dimension/>Is the dimension/>Is of dimensionFinally, the target image result of the dental restoration is obtained.
Step 8, calculating the loss included in the first reconstruction network, namely, firstly, the overall reconstruction loss of the first reconstruction network, wherein the following formula can be seen specifically:
(13)
in the case of the formula (13), In order to reconstruct the loss in its entirety,The absolute value of the equation is calculated,Pixel values of the target image for dental restoration,Is the pixel value of the label sample.
Step 9, calculating the image feature level loss included in the first reconstruction network, wherein the following formula can be seen for the concrete:
(14)
In the case of the formula (14), For image feature level loss, result_fea1 is a feature of a first level of a target image for dental restoration, gt_fea1 is a feature of a first level of a label sample, result_fea2 is a feature of a second level of a target image for dental restoration, gt_fea2 is a feature of a second level of a label sample, result_fea3 is a feature of a third level of a target image for dental restoration, gt_fea3 is a feature of a third level of a label sample, result_fea4 is a feature of a fourth level of a target image for dental restoration, and gt_fea4 is a feature of a fourth level of a label sample.
Step 10, calculating the local reconstruction loss included in the first reconstruction network, wherein the following formula can be seen:
(15)
In equation (15), teeth_l1_loss is the partial reconstruction loss, As an absolute value calculation formula, result_mouth is a pixel of a mouth region in the target image of dental restoration (a pixel value of each pixel position in the partial image of the corresponding tooth in the target image of dental restoration), and gt_mouth is a pixel of a mouth region in the label sample (a pixel value of each pixel position in the partial image of the corresponding tooth in the label sample).
Step 11, extracting the identity characteristics gt_id_features of the label sample by using the existing object identification network, extracting the identity characteristics result_id_features of the dental restoration target image by using the existing object identification network, and calculating the identity loss ID_loss, wherein the specific formula can be seen as follows:
ID_loss=1–cosine_similarity(gt_id_features,result_id_features)(16)
In equation (16), id_loss is identity loss, cosine_similarity is cosine similarity, gt_id_features is identity of the label sample, and result_id_features is identity of the target image of dental restoration.
Step 12, calculating the contrast loss D_loss of the target image, wherein the embodiment of the application provides a discriminator network D for judging whether the target image of the dental restoration is a real image or not, calculating the contrast loss D_loss, and optimizing the parameters of the discriminator network D based on the contrast loss D_loss, wherein the following formula can be seen:
D_loss=-logD(gt)-log(1–D(result))(17)
in equation (17), d_loss is the counterloss, D (gt) is the output of the discriminator network D for the label sample gt, and D (result) is the output of the discriminator network D for the target image result of tooth restoration.
Step 13, calculating the generation loss G_loss of the target image, wherein the generation loss G_loss can be specifically shown in the following formula:
G_loss=log(1–D(result))(18);
in equation (18), g_loss is the generation loss, and D (result) is the output of the arbiter network D for the target image result.
And 14, finally adding all the losses to obtain the overall loss of the dental restoration network, wherein the specific formula can be seen as follows:
(19)
In equation (19), l1_loss is the overall reconstruction loss, LPIPS _loss is the image feature level loss, id_loss is the identity loss, g_loss is the generation loss, and test_l1_loss is the reconstruction loss of the dental region.
After the first reconstructed network obtained by training in the above manner, a procedure for training the first repair network is described below.
Step 15, first, a second matting sample corresponding to the second image sample and a blurred sample corresponding to the second image sample are input into a first reconstruction network after training, and a first reconstruction image gt=test_net_stage 1 (lq) output by the first reconstruction network is obtained.
Step 16. Input the second image sample input img into the first repair network.
Step 17, calculating a mask gt_mask of the tooth area, and normalizing the mask gt_mask, wherein the following formula is specifically referred.
(20)
In the formula (20) of the present invention,Is a mask for the area of the teeth,For the first reconstructed image to be a first reconstructed image,Is the second image sample. Division by 255 is to normalize the value of gt_mask to between 0 and 1, after which non-zero values in the normalized mask can be marked as 1, resulting in a mask consisting of 0 and 1.
And 18, performing coding processing on the second image sample through an encoder and a decoder included in the first restoration network to obtain a coding result, inputting the coding result into the decoder to obtain a tooth restoration result output by the decoder, marking the tooth restoration result (second restoration result) as a fake graph, and outputting a pred_mask of a tooth area (predicted restoration area mask), namely finally obtaining the tooth restoration result fake and the predicted restoration area mask.
And step 19, obtaining a region image corresponding to the teeth in the tooth restoration result by multiplying the predicted restoration mask and the first tooth restoration result, and replacing the region image corresponding to the teeth in the second image sample with the region image corresponding to the teeth in the tooth restoration result to obtain a final tooth restoration result, wherein the specific formula can be seen as follows.
(21)
In the formula (21) of the present invention,For the final dental restoration result,As a result of the restoration of the first tooth,In order to predict the repair mask,Is the second image sample.
Step 20. Determining a loss function included in the first repair network, which may include a total reconstruction loss, an image feature level loss, a partial reconstruction loss, an object identity loss, and a generation loss, where the determining process of the total reconstruction loss, the image feature level loss, the partial reconstruction loss, the object identity loss, and the generation loss may refer to the formulas provided in the formulas (3) to (9) and corresponding formulas, and will not be repeated herein.
Step 21, determining the mask loss of the repair area included in the first repair network, specifically referring to the formula provided by the above formula (1) and the corresponding formula description, and the description will not be repeated here.
Step 22, finally adding all the losses to obtain the overall loss of the dental restoration network, wherein the specific formula can be seen as follows:
(22)
in equation (22), L1_loss is the global reconstruction loss, LPIPS _loss is the image feature level loss, ID_loss is the identity loss, G_loss is the generation loss, test_l1_loss is the reconstruction loss of the dental region, and mask_loss is the repair region mask loss.
And step 23, updating parameters of the first repair network according to the calculated loss value in a reverse gradient manner, and completing training of the first repair network.
In the practical application process of the first repair network provided by the embodiment of the application, any face image can be sent to the first repair network (test_net_stage 2) without splitting the network to obtain a repair result fake and a mask pred_mask corresponding to the target component.
And then, obtaining a region image corresponding to the teeth in the tooth restoration result by multiplying the predicted restoration mask and the first tooth restoration result, and replacing the region image corresponding to the teeth in the second image sample with the region image corresponding to the teeth in the tooth restoration result to obtain a final tooth restoration result, wherein the specific formula can be seen as follows.
(23)
In the formula (23) of the present invention,For the final dental restoration result,As a result of the restoration of the first tooth,In order to predict the repair mask,Is the second image sample.
It will be appreciated that in the embodiments of the present application, related data such as user images are involved, when the embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
In the implementation of the related data capturing technical scheme, when the embodiment of the application is applied to specific products or technologies, the related data collection, use and processing processes should comply with the national legal and legal requirements, accord with legal, legal and necessary principles, do not relate to acquiring data types forbidden or limited by the legal and legal regulations, and do not prevent the normal operation of a target website.
Continuing with the description below of an exemplary architecture of the repair network training device 555 implemented as a software module provided in accordance with an embodiment of the present application, in some embodiments, as shown in fig. 2A, the software modules stored in the repair network training device 555 of the memory 550 may include:
A matting processing module 5551, configured to perform matting processing on a first image sample including a target component, so as to obtain a first matting sample for matting out the target component;
The first training module 5552 is configured to perform training on a reconstruction task of the target component on the first reconstruction network based on the first matting cost, so as to obtain a trained first reconstruction network;
The image restoration module 5553 is configured to perform reconstruction processing on a second matting book through the trained first reconstruction network to obtain a first reconstructed image, and perform restoration processing on a second image sample including the target component through the first restoration network to obtain a first restored image, where the second matting sample is obtained by performing matting processing on the second image sample;
The second training module 5554 is configured to determine a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image, and update the first repair network based on the first repair loss to obtain a trained first repair network, where the trained first repair network is configured to perform repair processing on an original image including the target component to obtain a target image after repairing the target component.
In some embodiments, the first training module 5552 is further configured to blur the first image sample to obtain a first blurred sample; performing splicing treatment on the first matt sample and the first fuzzy sample to obtain a first spliced sample; and training the reconstruction task of the target component on the first reconstruction network based on the first spliced sample to obtain a trained first reconstruction network.
In some embodiments, the first training module 5552 is further configured to reconstruct the first stitched sample through the initialized first reconstruction network to obtain a second reconstructed image corresponding to the first stitched sample; determining a first reconstruction loss based on the first image sample and the second reconstruction image; updating the initialized first reconstruction network based on the first reconstruction loss to obtain the trained first reconstruction network.
In some embodiments, the image restoration module 5553 is further configured to blur the second image sample to obtain a second blurred sample; performing splicing treatment on the second matt sample and the second fuzzy sample to obtain a second spliced sample; and carrying out coding processing on the second spliced sample through the first coding network to obtain a first coding result, and carrying out decoding processing on the first coding result through the first decoding network to obtain the first reconstructed image.
In the above aspect, the image restoration module 5553 is further configured to perform restoration processing on the second image sample through the first restoration network, so as to obtain a predicted restoration area mask corresponding to the target component in the second image sample and a second restoration image corresponding to the second image sample; and synthesizing the second repair image and the second image sample based on the predicted repair area mask in the second image sample to obtain the first repair image.
In the above aspect, the second training module 5554 is further configured to obtain a label repair area mask, and determine a repair area mask loss based on the predicted repair area mask and the label repair area mask corresponding to the target component in the second image sample; at least one of the following loss determination processes is performed: determining an overall reconstruction loss based on the first reconstructed image and the first repair image; determining an image feature level loss based on the first reconstructed image and the first repair image; determining a local reconstruction loss based on the first reconstructed image, the first repair image, and the label repair area mask; determining a loss of identity of the object based on the first reconstructed image and the first repair image; determining a generation loss based on the first repair image; determining a first auxiliary loss based on at least one of the global reconstruction loss, the image feature level loss, the local reconstruction loss, the object identity loss, and the generation loss; and carrying out fusion processing on the first auxiliary loss and the repair area mask loss to obtain the first repair loss.
In the above aspect, the second training module 5554 is further configured to obtain a first pixel value of each pixel location in the first reconstructed image and a second pixel value of each pixel location in the second image sample; performing the following processing for each of the pixel positions, determining a first absolute value between a first pixel value of the pixel position and a second pixel value of the pixel position; normalizing the first absolute value of each pixel position to obtain a third pixel value of each pixel position; the label repair area mask is generated based on the third pixel value at each of the pixel locations.
In the above aspect, the second training module 5554 is further configured to obtain a first flag value of each pixel location in the predicted repair area mask and a second flag value of each pixel location in the label repair area mask; performing the following processing for each of the pixel positions, determining a second absolute value between a first marker value of the pixel position and a second marker value of the pixel position; and carrying out fusion processing on the second absolute values of the pixel positions to obtain the mask loss of the repair area.
In the above aspect, the second training module 5554 is further configured to obtain a fifth pixel value of each pixel location in the first reconstructed image and a sixth pixel value of each pixel location in the first repair image; performing the following processing for each of the pixel positions, determining a third absolute value between a fifth pixel value of the pixel position and a sixth pixel value of the pixel position; and carrying out fusion processing on the third absolute values of the pixel positions to obtain the integral reconstruction loss.
In the above solution, the second training module 5554 is further configured to perform feature extraction processing on the first reconstructed image at multiple levels, so as to obtain a first feature corresponding to each level; performing feature extraction processing of multiple levels on the first repair image to obtain second features corresponding to each level; the following is performed for each of the levels: determining feature distances between a first feature of the hierarchy and a second feature of the hierarchy; and carrying out fusion processing on the characteristic distances of the multiple layers to obtain the image characteristic level loss.
In the foregoing aspect, the second training module 5554 is further configured to acquire a first local image corresponding to the target component in the first reconstructed image based on the label repair area mask; acquiring a second partial image corresponding to the target component in the first repair image based on the label repair area mask; acquiring a seventh pixel value of each pixel position in the first partial image and an eighth pixel value of each pixel position in the second partial image; performing the following processing for each of the pixel positions, determining a fourth absolute value between a seventh pixel value of the pixel position and an eighth pixel value of the pixel position; and carrying out fusion processing on the fourth absolute values of the pixel positions to obtain the local reconstruction loss.
In the above solution, the second training module 5554 is further configured to invoke an object identification network to perform an identification feature extraction process on the first reconstructed image, so as to obtain a first identification feature of the first reconstructed image; invoking an object identity recognition network to extract the identity characteristics of the first repair image to obtain a second identity characteristic of the first repair image; and determining the identity feature similarity between the first identity feature and the second identity feature, and acquiring the identity loss of the object inversely related to the identity feature similarity.
In the above aspect, the second training module 5554 is further configured to acquire an original image including the target component; repairing the original image through the trained first repairing network to obtain a predicted repairing area mask corresponding to the target component in the original image and a third repairing image corresponding to the original image; and synthesizing the third repair image and the original image based on the predicted repair area mask corresponding to the target component in the original image to obtain a target image after repairing the target component.
Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer executable instructions from the computer readable storage medium, and the processor executes the computer executable instructions, so that the electronic device executes the repair network training method according to the embodiment of the present application or executes the image processing method according to the embodiment of the present application.
An embodiment of the present application provides a computer readable storage medium storing computer executable instructions, where the computer executable instructions are stored, which when executed by a processor, cause the processor to perform the repair network training method provided by the embodiment of the present application or perform the image processing method described above in the embodiment of the present application, for example, the repair network training method as shown in fig. 3A.
In some embodiments, the computer readable storage medium may be RAM, ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, or code, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (Hyper Text Markup Language, HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, computer-executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or distributed across multiple sites and interconnected by a communication network.
In summary, the following beneficial effects can be achieved by the embodiments of the present application:
Carrying out matting processing on a first image sample comprising a target component to obtain a first matting sample for matting the target component, carrying out training aiming at a reconstruction task of the target component on the first reconstruction network based on the first matting sample to obtain a trained first reconstruction network, wherein the trained first reconstruction network is equivalent to a network which can realize reconstruction of the target component and is obtained by training the sample of the target component based on matting, carrying out restoration processing on a second image sample comprising the target component through a first restoration network to obtain a first restoration image, carrying out reconstruction processing on the second matting sample through the trained first reconstruction network to obtain a first reconstruction image, wherein the second matting sample is obtained by carrying out matting processing on the second image sample, and is equivalent to generating the first reconstruction image by using the trained first reconstruction network, and because the first reconstruction network is obtained by training the matting sample, the first reconstruction image can be regarded as the restoration image based on the matting sample, the first restoration image can be obtained by directly inputting the first restoration image based on the first restoration network and is equivalent to the first reconstruction network, namely, the first restoration image can be obtained by directly inputting the first reconstruction network based on the first restoration image based on the first restoration training sample and the first restoration sample, the first restoration image can be obtained by taking the first restoration network as the restoration sample to realize restoration image, and the first reconstruction image is equivalent to realize restoration processing based on the first restoration image loss, the method has the advantages that the target image after reconstructing the target component is obtained, the step of matting is omitted, and the portrait image with the high-definition and neat teeth is repaired by utilizing the depth network, so that the time consumption of overall repair is low, and the stability can be kept in a complex video scene because the step of matting is not needed.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (16)

1. A method of training a repair network, the method comprising:
carrying out matting processing on a first image sample comprising a target component to obtain a first matting sample for matting the target component;
Training a reconstruction task aiming at the target component on the first reconstruction network based on the first matting book to obtain a trained first reconstruction network;
Reconstructing a second matting book through the trained first reconstruction network to obtain a first reconstruction image, and repairing a second image sample comprising the target component through a first repairing network to obtain a first repairing image, wherein the second matting sample is obtained by matting the second image sample;
Determining a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image, and updating the first repair network based on the first repair loss to obtain a trained first repair network;
The trained first repair network is used for performing repair processing on an original image comprising the target component to obtain a target image after repairing the target component.
2. A method as in claim 1 wherein training the first reconstruction network for the reconstruction task for the target component based on the first matting pattern results in a trained first reconstruction network comprising:
Performing fuzzy processing on the first image sample to obtain a first fuzzy sample;
performing splicing treatment on the first matt sample and the first fuzzy sample to obtain a first spliced sample;
And training the reconstruction task of the target component on the first reconstruction network based on the first spliced sample to obtain a trained first reconstruction network.
3. The method of claim 2, wherein training the first reconstruction network for the reconstruction task of the target component based on the first stitched sample results in a trained first reconstruction network, comprising:
Performing reconstruction processing on the first spliced sample through an initialized first reconstruction network to obtain a second reconstructed image corresponding to the first spliced sample;
determining a first reconstruction loss based on the first image sample and the second reconstruction image;
updating the initialized first reconstruction network based on the first reconstruction loss to obtain the trained first reconstruction network.
4. A method as in claim 1 wherein the trained first reconstruction network comprises a first encoding network and a first decoding network, the reconstructing the second matting book by the trained first reconstruction network to obtain a first reconstructed image comprising:
Performing fuzzy processing on the second image sample to obtain a second fuzzy sample;
Performing splicing treatment on the second matt sample and the second fuzzy sample to obtain a second spliced sample;
and carrying out coding processing on the second spliced sample through the first coding network to obtain a first coding result, and carrying out decoding processing on the first coding result through the first decoding network to obtain the first reconstructed image.
5. The method of claim 1, wherein the repairing the second image sample including the target component through the first repair network to obtain a first repair image comprises:
Repairing the second image sample through the first repairing network to obtain a predicted repairing area mask corresponding to the target component in the second image sample and a second repairing image corresponding to the second image sample;
And synthesizing the second repair image and the second image sample based on the predicted repair area mask in the second image sample to obtain the first repair image.
6. The method of claim 5, wherein the determining a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image comprises:
Acquiring a label repair area mask, and determining repair area mask loss based on the predicted repair area mask and the label repair area mask corresponding to the target component in the second image sample;
at least one of the following loss determination processes is performed:
determining an overall reconstruction loss based on the first reconstructed image and the first repair image;
determining an image feature level loss based on the first reconstructed image and the first repair image;
determining a local reconstruction loss based on the first reconstructed image, the first repair image, and the label repair area mask;
Determining a loss of identity of the object based on the first reconstructed image and the first repair image;
Determining a generation loss based on the first repair image;
determining a first auxiliary loss based on at least one of the global reconstruction loss, the image feature level loss, the local reconstruction loss, the object identity loss, and the generation loss;
and carrying out fusion processing on the first auxiliary loss and the repair area mask loss to obtain the first repair loss.
7. The method of claim 6, wherein the acquiring a tag repair area mask comprises:
acquiring a first pixel value of each pixel position in the first reconstructed image and a second pixel value of each pixel position in the second image sample;
Performing the following processing for each of the pixel positions, determining a first absolute value between a first pixel value of the pixel position and a second pixel value of the pixel position;
normalizing the first absolute value of each pixel position to obtain a third pixel value of each pixel position;
the label repair area mask is generated based on the third pixel value at each of the pixel locations.
8. The method of claim 6, wherein the determining a repair area mask loss based on the predicted repair area mask and the tag repair area mask for the target part in the second image sample comprises:
Acquiring a first mark value of each pixel position in the prediction repair area mask and a second mark value of each pixel position in the label repair area mask;
performing the following processing for each of the pixel positions, determining a second absolute value between a first marker value of the pixel position and a second marker value of the pixel position;
And carrying out fusion processing on the second absolute values of the pixel positions to obtain the mask loss of the repair area.
9. The method according to claim 1, wherein the method further comprises:
Acquiring an original image including a target component;
repairing the original image through the trained first repairing network to obtain a predicted repairing area mask corresponding to the target component in the original image and a third repairing image corresponding to the original image;
and synthesizing the third repair image and the original image based on the predicted repair area mask corresponding to the target component in the original image to obtain a target image after repairing the target component.
10. An image processing method, the method comprising:
Acquiring an original image including a target component;
repairing the original image through a first repairing network to obtain a target image after repairing the target component;
wherein the first repair network is trained by the repair network training method of any one of claims 1-9.
11. The method according to claim 10, wherein the repairing the original image through the first repairing network to obtain the target image after repairing the target component includes:
Repairing the original image through the first repairing network to obtain a predicted repairing area mask corresponding to the target component in the original image and a third repairing image corresponding to the original image;
and synthesizing the third repair image and the original image based on the predicted repair area mask corresponding to the target component in the original image to obtain a target image after repairing the target component.
12. A prosthetic network training device, the device comprising:
The image matting processing module is used for performing matting processing on a first image sample comprising a target component to obtain a first matting sample with the target component scratched;
The first training module is used for training the reconstruction task of the target component on the first reconstruction network based on the first matting book to obtain a trained first reconstruction network;
The image restoration module is used for carrying out reconstruction processing on the second matting book through the trained first reconstruction network to obtain a first reconstruction image, and carrying out restoration processing on a second image sample comprising the target component through the first restoration network to obtain a first restoration image, wherein the second matting sample is obtained by carrying out matting processing on the second image sample;
The second training module is configured to determine a first repair loss corresponding to the first repair network based on the first repair image and the first reconstructed image, and update the first repair network based on the first repair loss to obtain a trained first repair network, where the trained first repair network is configured to repair an original image including the target component to obtain a target image after repairing the target component.
13. An image processing apparatus, characterized in that the apparatus comprises:
An image acquisition module for acquiring an original image including a target component;
The image restoration module is used for carrying out restoration processing on the original image through a first restoration network to obtain a target image after restoration of the target component, wherein the first restoration network is trained by the restoration network training method according to any one of claims 1-9.
14. An electronic device, the electronic device comprising:
a memory for storing computer executable instructions;
A processor for implementing the repair network training method of any one of claims 1 to 9 or the image processing method of any one of claims 10 to 11 when executing computer executable instructions stored in the memory.
15. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the repair network training method of any one of claims 1 to 9 or the image processing method of any one of claims 10 to 11.
16. A computer program product comprising computer executable instructions which, when executed by a processor, implement the repair network training method of any one of claims 1 to 9 or the image processing method of any one of claims 10 to 11.
CN202410405497.9A 2024-04-07 2024-04-07 Repair network training method, image processing method, device and electronic equipment Active CN117994173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410405497.9A CN117994173B (en) 2024-04-07 2024-04-07 Repair network training method, image processing method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410405497.9A CN117994173B (en) 2024-04-07 2024-04-07 Repair network training method, image processing method, device and electronic equipment

Publications (2)

Publication Number Publication Date
CN117994173A true CN117994173A (en) 2024-05-07
CN117994173B CN117994173B (en) 2024-06-11

Family

ID=90890929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410405497.9A Active CN117994173B (en) 2024-04-07 2024-04-07 Repair network training method, image processing method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117994173B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269245A (en) * 2018-01-26 2018-07-10 深圳市唯特视科技有限公司 A kind of eyes image restorative procedure based on novel generation confrontation network
CN113962893A (en) * 2021-10-27 2022-01-21 山西大学 Face image restoration method based on multi-scale local self-attention generation countermeasure network
WO2023077742A1 (en) * 2021-11-04 2023-05-11 新东方教育科技集团有限公司 Video processing method and apparatus, and neural network training method and apparatus
CN117557689A (en) * 2024-01-11 2024-02-13 腾讯科技(深圳)有限公司 Image processing method, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269245A (en) * 2018-01-26 2018-07-10 深圳市唯特视科技有限公司 A kind of eyes image restorative procedure based on novel generation confrontation network
CN113962893A (en) * 2021-10-27 2022-01-21 山西大学 Face image restoration method based on multi-scale local self-attention generation countermeasure network
WO2023077742A1 (en) * 2021-11-04 2023-05-11 新东方教育科技集团有限公司 Video processing method and apparatus, and neural network training method and apparatus
CN117557689A (en) * 2024-01-11 2024-02-13 腾讯科技(深圳)有限公司 Image processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117994173B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN111080628B (en) Image tampering detection method, apparatus, computer device and storage medium
Castillo Camacho et al. A comprehensive review of deep-learning-based methods for image forensics
CN113901894A (en) Video generation method, device, server and storage medium
CN115249306B (en) Image segmentation model training method, image processing device and storage medium
CN115565238B (en) Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product
CN114529574B (en) Image matting method and device based on image segmentation, computer equipment and medium
CN112132106A (en) Image augmentation processing method, device and equipment based on artificial intelligence and storage medium
CN117557689B (en) Image processing method, device, electronic equipment and storage medium
CN116958323A (en) Image generation method, device, electronic equipment, storage medium and program product
CN116310045A (en) Three-dimensional face texture creation method, device and equipment
CN114972016A (en) Image processing method, image processing apparatus, computer device, storage medium, and program product
Surabhi et al. Advancing Faux Image Detection: A Hybrid Approach Combining Deep Learning and Data Mining Techniques
CN117392293A (en) Image processing method, device, electronic equipment and storage medium
CN113538254A (en) Image restoration method and device, electronic equipment and computer readable storage medium
CN117994173B (en) Repair network training method, image processing method, device and electronic equipment
Tous Pictonaut: movie cartoonization using 3D human pose estimation and GANs
CN116721008A (en) User-defined expression synthesis method and system
CN116958306A (en) Image synthesis method and device, storage medium and electronic equipment
CN113554655B (en) Optical remote sensing image segmentation method and device based on multi-feature enhancement
CN113496225A (en) Image processing method, image processing device, computer equipment and storage medium
US20240169701A1 (en) Affordance-based reposing of an object in a scene
Ruan Anime Characters Generation with Generative Adversarial Networks
CN118071867B (en) Method and device for converting text data into image data
Du [Retracted] An Improved Method Research on Graphics and Image Processing System
CN115424119B (en) Image generation training method and device capable of explaining GAN based on semantic fractal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant