CN114049280A

CN114049280A - Image erasing and repairing method and device, equipment, medium and product thereof

Info

Publication number: CN114049280A
Application number: CN202111415434.4A
Authority: CN
Inventors: 黄家冕
Original assignee: Guangzhou Huaduo Network Technology Co Ltd
Current assignee: Guangzhou Huaduo Network Technology Co Ltd
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-02-15

Abstract

The application discloses an image elimination and restoration method, and a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: acquiring a picture to be restored and a target image label indicating a target area image in the picture to be restored; performing target detection on the picture to be restored by adopting a target detection model trained to be convergent so as to obtain a boundary frame which is matched with the target image label and indicates the position of the target area image, and intercepting the target area image from the picture to be restored according to the boundary frame; then, extracting mask data of a target contour region in the target region image by adopting an image segmentation model trained to be convergent, and expanding the mask data to obtain the mask data of the picture to be repaired; and performing image restoration according to the picture to be restored and the mask data thereof by adopting an image restoration model trained to be convergent to obtain a restored image, and realizing end-to-end and efficient target contour region elimination and restoration work.

Description

Image erasing and repairing method and device, equipment, medium and product thereof

Technical Field

The present application relates to the field of image restoration technologies, and in particular, to an image erasure restoration method, and a corresponding apparatus, computer device, computer-readable storage medium, and computer program product.

Background

With the development of science and technology, the demand of electric business affairs is increasing day by day, and more users sell commodities on electric business platforms. It is known that the distribution of advertising messages on e-commerce platforms must comply with relevant regulations. Due to the influence of a wind control system, the E-commerce platform requires that pictures with trademark information cannot be displayed in the advertisement information issued by the user under the condition of no trademark authorization; also with other pending information that violates the wind control regime. The information to be processed may be sensitive information, the sensitive information may be, for example, a trademark drawing, a preset specific drawing, and the like, and the trademark may be a registered trademark or a non-registered trademark. Therefore, the picture with the information to be processed needs to be subjected to the elimination and repair work of the target contour region.

For application scenes such as an e-commerce platform, the current technology for eliminating and repairing the target contour area is mainly realized in a way that a user paints and mosaics images of the target area by himself so as to avoid a related wind control system. In the concrete implementation, firstly, the target contour with the information to be processed is selected, and the target contour is manually smeared, mosaiced or replaced by other image areas without the information to be processed and the like through an image technology. The whole implementation process of the method needs manual processing, processing personnel are required to have certain image processing technology, each processing process only aims at one picture, the method is extremely inefficient, and the cost is huge for users of the e-commerce platform. In addition, although the manual processing method can remove the information to be processed, the repairing effect is not satisfactory, and the characteristics of inconsistent image contents easily cause the customers to generate bad consumption impression, so that the sales volume is influenced.

Nowadays, commodities on an e-commerce platform are iterated rapidly, so are corresponding advertisement information, and therefore, the e-commerce users are required to update the advertisement information rapidly in a short time so as to compete for sales shares in the e-commerce market.

Therefore, in view of the above two main problems of low efficiency and poor effect, how to efficiently remove the contour region from the picture to be repaired containing the information to be processed and implement the repair, so that the removal and repair effects are better, becomes a technical problem to be solved in the art.

Disclosure of Invention

A primary object of the present application is to solve at least one of the above problems and provide an image erasure repairing method and a corresponding apparatus, computer device, computer readable storage medium, and computer program product.

In order to meet various purposes of the application, the following technical scheme is adopted in the application:

an image erasure repairing method adapted to one of the purposes of the present application includes the following operations:

acquiring a picture to be restored and a target image label indicating a target area image in the picture to be restored;

performing target detection on the picture to be restored by adopting a target detection model trained to be convergent so as to obtain a boundary frame which is matched with the target image label and indicates the position of the target area image, and intercepting the target area image from the picture to be restored according to the boundary frame;

extracting mask data of a target contour region in the target region image by adopting an image segmentation model trained to be convergent, and expanding the mask data to obtain the mask data of the picture to be repaired;

and adopting an image restoration model trained to be convergent to carry out image restoration according to the image to be restored and the mask data thereof, and obtaining a restored image.

In a deepened embodiment, the method for obtaining the picture to be restored and the target image label indicating the target area image in the picture to be restored includes the following steps:

responding to an advertisement publishing request triggered by a user, and acquiring a picture to be repaired in advertisement publishing information correspondingly submitted by the user;

responding to an image restoration request triggered by a user, and acquiring a target image label indicating a target area image in the picture to be restored;

in a deepened embodiment, a target detection model trained to be convergent is adopted to perform target detection on the picture to be restored to obtain a bounding box which matches the target image label and indicates the position of the target area image, and the target area image is intercepted from the picture to be restored according to the bounding box, which includes the following steps:

extracting a plurality of feature maps with different scales aiming at the picture to be restored by adopting a backbone network block in the target detection model;

performing strong semantic information and strong positioning information fusion on the feature maps with different scales by adopting a neck block in the target detection model to obtain a plurality of prediction feature maps with different scales;

predicting corresponding bounding boxes and label data thereof aiming at the prediction feature maps with different scales by adopting prediction blocks in the target detection model;

and determining the boundary box matched with the target image label according to the predicted label data, and intercepting the target area image indicated by the boundary box from the picture to be repaired.

In an embodied embodiment, the training of the target detection model comprises the steps of:

determining a related picture with information to be processed and boundary frame label data and label data thereof as a training sample in a training set of the target detection model;

performing image data enhancement processing on pictures in training samples in the training set;

determining the optimal anchor frame parameter of each training according to part of training samples, and determining the boundary frame and the label of the picture of the training sample in each training on the basis of the optimal anchor frame parameter by adopting a target detection model;

calculating a loss value according to the boundary box and the label of each training sample and the boundary box labeled data and the label data of the training sample, and performing gradient updating on the target detection model according to the loss value;

and repeating the two updating steps until the target detection model is trained to be converged.

In a deepened embodiment, an image segmentation model trained to be convergent is adopted to extract mask data of a target contour region in the target region image, and the mask data is expanded to become the mask data of the picture to be repaired, and the method comprises the following steps:

performing multi-level coding on the target area image, and correspondingly generating intermediate characteristic information of a corresponding scale, wherein the intermediate characteristic information is characteristic representation of information to be processed in the target area image;

correspondingly performing multi-stage decoding on a decoding path, generating first image characteristic information by using the intermediate characteristic information of the minimum scale, and correspondingly decoding image characteristic information of a higher scale by taking the intermediate characteristic information generated by the previous-stage image characteristic information and the same-stage code thereof as reference, wherein the image characteristic information is used for representing the contour characteristic of information to be processed in the target area image in a mask mode;

and fusing all the image characteristic information to generate mask data of a target contour region in the target region image, and expanding the mask data to obtain the mask data of the picture to be repaired.

In an embodiment, the training of the image segmentation model comprises the steps of:

determining a related picture with information to be processed and acquiring mask data of a target contour region of the related picture as training set sample data of the image segmentation model;

calling data samples in the training set to carry out iterative training on the image segmentation model, calculating model loss and carrying out inverse gradient propagation;

and repeating the updating operation until the model is trained to be converged.

In a deepened embodiment, image restoration is performed according to the picture to be restored and the mask data thereof by using an image restoration model trained to be convergent, so as to obtain a restored image, and the method comprises the following steps:

feeding the picture to be repaired and the mask data thereof into the image repairing model constructed based on the partial convolution U-Net architecture for local feature extraction to obtain an intermediate feature map;

and performing multi-stage decoding on the intermediate characteristic graph to obtain an intermediate repair image, and fusing the picture to be repaired and the intermediate repair image according to the mask data of the picture to be repaired to generate a repair image.

In an embodiment, the training of the image inpainting model comprises the steps of:

determining relevant pictures which do not need to be repaired and forming data samples of a training set of the image repairing model by mask data obtained after random erasing of the pictures;

constructing a generation countermeasure network for training, wherein the image restoration model is used as a generator, a discriminant model trained to be convergent is used as a discriminant, and generator parameters are initialized;

freezing the discriminator, calling a data sample of a training set to calculate a preset loss function in the generated countermeasure network, and reversely updating the generator parameters;

and repeating the updating steps until the generation of the confrontation network is converged, wherein the generator is the image restoration model trained to be converged.

An image erasing and restoring apparatus adapted to one of the objects of the present application includes: the data acquisition module is used for acquiring the picture to be restored and a target image label indicating a target area image in the picture to be restored; the target detection module is used for performing target detection on the picture to be restored by adopting a target detection model trained to be convergent so as to obtain a boundary frame which is matched with the target image label and indicates the position of the target area image, and intercepting the target area image from the picture to be restored according to the boundary frame; the image segmentation module is used for extracting mask data of a target contour region in the target region image by adopting an image segmentation model trained to be convergent and expanding the mask data to form the mask data of the picture to be repaired; and the image restoration module is used for adopting the trained to convergent image restoration model to restore the image according to the image to be restored and the mask data thereof so as to obtain a restored image.

In a further embodiment, the data obtaining module includes: the first response submodule is used for responding to an advertisement publishing request triggered by a user and acquiring a picture to be repaired in advertisement publishing information correspondingly submitted by the user; and the second response submodule is used for responding to an image restoration request triggered by a user and acquiring a target image label indicating a target area image in the picture to be restored.

In a further embodiment, the object detection module includes: the backbone network submodule is used for extracting feature maps with different scales aiming at the picture to be repaired; the neck submodule is used for carrying out strong semantic information and strong positioning information fusion on the feature maps with different scales to obtain a plurality of prediction feature maps with different scales; the prediction sub-module is used for predicting corresponding bounding boxes and label data thereof aiming at the prediction feature maps with different scales; and the intercepting submodule is used for determining the boundary box matched with the target image label according to the predicted label data and intercepting the target area image indicated by the boundary box from the picture to be repaired.

In an embodiment, the first training submodule includes: the first training set subunit is used for determining a related picture with information to be processed, and boundary frame label data and label data thereof as a training sample in a training set of the target detection model; the enhancement unit is used for carrying out image data enhancement processing on pictures in the training samples in the training set; the first updating subunit is used for determining the optimal anchor frame parameter of each training according to part of the training samples, and determining the boundary frame and the label of the picture of the training sample during each training by adopting a target detection model on the basis; the second updating subunit is used for calculating a loss value according to the boundary box and the label of each training sample, and the boundary box labeling data and the label data of the training sample, and performing gradient updating on the target detection model according to the loss value; and the first repeating subunit is used for repeating the first updating subunit and the second updating subunit until the target detection model is trained to be converged.

In a further embodiment, the image segmentation module includes: the coding submodule is used for carrying out multi-level coding on the target area image and correspondingly generating intermediate characteristic information of corresponding scales, and the intermediate characteristic information is characteristic representation of information to be processed in the target area image; the decoding submodule is used for correspondingly performing multi-stage decoding on a decoding path, generating first image characteristic information by using the intermediate characteristic information of the minimum scale, correspondingly decoding image characteristic information of a higher scale by taking the intermediate characteristic information generated by the previous-stage image characteristic information and the same-stage coding thereof as reference, wherein the image characteristic information is used for representing the contour characteristic of information to be processed in the target area image in a mask mode; and the first fusion submodule is used for fusing all the image characteristic information, generating mask data of a target contour region in the target region image, and expanding the mask data to enable the mask data to become mask data of the picture to be repaired.

In an embodiment, the second training submodule includes: the second training set subunit is used for determining a related picture with information to be processed and acquiring mask data of a target contour region of the related picture as training set sample data of the image segmentation model; the third updating subunit is used for calling the data samples in the training set to carry out iterative training on the image segmentation model, calculating the model loss and carrying out inverse gradient propagation; and the second repeating subunit is used for repeating the third updating subunit until the model is trained to be converged.

In a further embodiment, the image inpainting module includes: the feature extraction submodule is used for feeding the picture to be repaired and the mask data thereof into the image repair model constructed based on the partial convolution U-Net architecture for local feature extraction to obtain an intermediate feature map; and the restoration submodule is used for carrying out multi-stage decoding on the intermediate characteristic graph to obtain an intermediate restoration image, and fusing the picture to be restored and the intermediate restoration image according to the mask data of the picture to be restored to generate a restoration image.

In an embodiment, the third training submodule includes: the third training set subunit is used for determining relevant pictures which do not need to be repaired and mask data obtained after the pictures are randomly erased to form a data sample of the training set of the image repairing model; the initialization subunit is used for constructing and generating a countermeasure network for training, wherein the image restoration model is used as a generator, the discriminant model trained to be convergent is used as a discriminant, and generator parameters are initialized; the fourth updating subunit is used for freezing the discriminator, calling a data sample of a training set to calculate a preset loss function in the generated countermeasure network, and updating the generator parameters in the reverse direction; and the third repeating subunit is used for repeating the fourth updating subunit until the generation of the confrontation network is converged, and the generator is the image restoration model trained to be converged.

The computer device comprises a central processing unit and a memory, wherein the central processing unit is used for calling and running a computer program stored in the memory to execute the steps of the image elimination and restoration method.

A computer-readable storage medium, which stores in the form of computer-readable instructions a computer program implemented according to the image erasure repair method, which when invoked by a computer performs the steps included in the method.

A computer program product, provided to adapt to another object of the present application, comprises computer programs/instructions which, when executed by a processor, implement the steps of the method described in any of the embodiments of the present application.

Compared with the prior art, the application has the following advantages:

the method comprises the steps of obtaining a picture to be restored and a target image label indicating a target area image in the picture to be restored; performing target detection on the picture to be restored by adopting a target detection model trained to be convergent so as to obtain a boundary frame which is matched with the target image label and indicates the position of the target area image, and intercepting the target area image from the picture to be restored according to the boundary frame; then, extracting mask data of a target contour region in the target region image by adopting an image segmentation model trained to be convergent, and expanding the mask data to obtain the mask data of the picture to be repaired; and performing image restoration according to the picture to be restored and the mask data thereof by adopting an image restoration model trained to be convergent to obtain a restored image, and realizing end-to-end and efficient target contour region elimination and restoration work.

This application mainly solves two main problems that meet in this application field, inefficiency and repairing effect are not good promptly. Therefore, the method and the device have the advantages that the target area image is positioned from the picture to be restored by adopting various network models trained to be convergent, the target outline area is further segmented from the target area image name, and then the target outline area is subjected to image elimination and image restoration in a targeted manner, the process is completely automatic, fine image processing is realized from coarse to fine, the time and the labor cost of an e-commerce user can be greatly reduced, meanwhile, the e-commerce user can quickly iterate the advertisement information, and more sales shares are obtained; the method comprises the steps that a target area image is directly positioned through a target detection model according to an input target label, mask data of a target outline area are obtained according to an image segmentation model, and finally the target outline area is repaired through an image repairing model; and the repair work only repairs the target contour region, so that the repair difficulty can be greatly reduced, and meanwhile, the repair effect can be effectively improved, so that the repaired target contour region is better fused with other regions, and the customer appeal and competitiveness of the commodity are increased.

In summary, the method and the device have the advantages of being high in efficiency and better in repairing effect when the target outline region of the picture to be repaired is eliminated and repaired, capable of being highly trusted, and suitable for eliminating and repairing the target outline region of the picture to be repaired containing the information to be processed in application scenes such as e-commerce platforms.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a block diagram of the image erasure repairing method according to the present application;

FIG. 2 is a network architecture diagram of a neural network model employed in the image inpainting method of the present application;

FIG. 3 is a schematic flow chart diagram of an exemplary embodiment of an image erasure repairing method of the present application;

FIG. 4 shows a U-based representation of the present application²A structural schematic block diagram of an image segmentation model of net;

fig. 5 is a schematic flow chart of acquiring a picture to be restored and a target image tag in the embodiment of the present application;

fig. 6 is a schematic flowchart of a process of performing target detection on a picture to be restored and capturing a target area image indicated by a target image label in an embodiment of the present application;

fig. 7 is a schematic flowchart of the process of training the target detection model in this embodiment.

Fig. 8 is a schematic flow chart illustrating that the mask data of the target contour region is obtained and expanded to the mask data of the picture to be repaired in the embodiment of the present application;

fig. 9 is a schematic flowchart of the training process of the image segmentation model in the embodiment of the present application.

FIG. 10 is a schematic view of a process for performing image restoration and generating a restored image according to an embodiment of the present application;

fig. 11 is a schematic flowchart of the training process of the image inpainting model in this embodiment.

FIG. 12 is a diagram of a network architecture for generating a countermeasure model in an embodiment of the present application;

fig. 13 is a functional block diagram of an image erasure correcting apparatus according to the present application;

fig. 14 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As will be appreciated by those skilled in the art, "client," "terminal," and "terminal device" as used herein include both devices that are wireless signal receivers, which are devices having only wireless signal receivers without transmit capability, and devices that are receive and transmit hardware, which have receive and transmit hardware capable of two-way communication over a two-way communication link. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (Personal Communications Service), which may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.

The hardware referred to by the names "server", "client", "service node", etc. is essentially an electronic device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., a computer program is stored in the memory, and the central processing unit calls a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.

One or more technical features of the present application, unless expressly specified otherwise, may be deployed to a server for implementation by a client remotely invoking an online service interface provided by a capture server for access, or may be deployed directly and run on the client for access.

Unless specified in clear text, the neural network model referred to or possibly referred to in the application can be deployed in a remote server and used for remote call at a client, and can also be deployed in a client with qualified equipment capability for direct call.

Various data referred to in the present application may be stored in a server remotely or in a local terminal device unless specified in the clear text, as long as the data is suitable for being called by the technical solution of the present application.

The person skilled in the art will know this: although the various methods of the present application are described based on the same concept so as to be common to each other, they may be independently performed unless otherwise specified. In the same way, for each embodiment disclosed in the present application, it is proposed based on the same inventive concept, and therefore, concepts of the same expression and concepts of which expressions are different but are appropriately changed only for convenience should be equally understood.

The embodiments to be disclosed herein can be flexibly constructed by cross-linking related technical features of the embodiments unless the mutual exclusion relationship between the related technical features is stated in the clear text, as long as the combination does not depart from the inventive spirit of the present application and can meet the needs of the prior art or solve the deficiencies of the prior art. Those skilled in the art will appreciate variations therefrom.

The image elimination and restoration method can be programmed into a computer program product and is deployed in a client or a server to run, so that the method can be executed by accessing an open interface after the computer program product runs and performing man-machine interaction with a process of the computer program product through a graphical user interface.

Fig. 1 is a schematic diagram of an image erasure repairing method according to an embodiment of the present application. According to the graph, the principle of the method is that target detection is carried out on the picture to be restored according to the target image label so as to determine the target area image, then the target outline area in the target area image is eliminated, image restoration is carried out on the basis, and the picture to be restored is fused after restoration to obtain the output restoration graph with better effect.

Fig. 2 is a diagram of a network architecture of an image erasure repairing method according to an embodiment of the present application. In the network architecture, a target area image indicated by a target image label is obtained from a picture to be restored through a target detection model trained to be convergent; then obtaining mask data of a target contour region in the target region image through the image segmentation model trained to be convergent; then, image restoration is carried out on the picture to be restored and the mask data to obtain an intermediate restored image; and finally, fusing the intermediate repairing image, the picture to be repaired and the mask data to obtain a final repairing image as output.

Referring to fig. 3, and with reference to the principles and architectures disclosed in fig. 1 and fig. 2, a further understanding of an exemplary embodiment of the image erasure repairing method according to the present application is further understood, and specifically, in this embodiment, the image erasure repairing method includes the following steps:

step S1100, obtaining a picture to be restored and a target image label indicating a target area image in the picture to be restored;

in an exemplary application scenario for the auxiliary explanation, the to-be-repaired picture may be a commodity picture, an advertisement picture, and the like in an e-commerce platform; the pictures generally only contain information suitable for disclosure, so when the pictures contain information not suitable for disclosure, the pictures generally have certain sensitivity, for example, information such as trademark rights of others is infringed, in this case, elimination and repair are needed, and the information needing elimination and repair is information to be processed in the application. More specifically, the to-be-processed information refers to information against platform-related regulations, such as unauthorized trademark drawings or other predetermined types of specific drawings. The implementation of the method and the device for repairing the image realizes image repairing while removing the information to be processed in the image to be repaired, so that the repaired image does not have the information to be processed any more and can be coordinated with the main content of the original image into a whole.

In the application scene of the e-commerce platform, if a picture to be repaired needs to be acquired, one implementation mode is to receive input of a user of the e-commerce platform, particularly input when a user of a merchant instance configures advertisement information, and take a commodity picture, an advertisement picture and the like containing information such as trademark waiting processing information in the advertisement information as the picture to be repaired to perform image repair.

The target area image refers to an image corresponding to a target area containing information to be processed, such as a trademark, in the picture to be restored; the target image tag is a tag to which information to be processed included in the target area image belongs, and can be flexibly set by a person skilled in the art, such as a brand name, a brand ID, and the like. The expression form of the target image label can be flexibly set by a person skilled in the art, as long as the target image label is convenient to call. In the subsequent operation flow, the image elimination and restoration method of the present application may match the corresponding target area image according to the target image tag, so as to perform image elimination and image restoration.

Step S1200, performing target detection on the picture to be restored by adopting a target detection model trained to be convergent to obtain a boundary frame which is matched with the target image label and indicates the position of the target area image, and intercepting the target area image from the picture to be restored according to the boundary frame;

the target detection model is trained to a convergence state in advance to obtain corresponding target detection capability according to the task requirement of the application, accordingly, the target detection model takes the picture to be repaired as input, one or more boundary frames and corresponding labels thereof are determined for the picture to be repaired, then the target image labels are matched with the corresponding labels of the boundary frames to determine the boundary frames with the matched labels, the boundary frames indicate the positions of the target area images indicated by the target image labels in the picture to be repaired, and the boundary frames are used as output.

The target detection model is implemented as a preferred neural network model, for example, in the embodiment of the present application, the target detection model is YOLOv5 trained to converge. Alternatively, the neural network model may be selected from a variety of superior target detection models known in the art, including but not limited to: the YOLO series model, the R-CNN series model, the SSD model, the DETR model and the like are mature target detection models.

In this embodiment, the YOLOv5 model includes three components, namely a Backbone component, a Neck component and a Prediction component. The Backbone component is mainly used for extracting depth features of an input picture, and the output of the Backbone component is three feature layers with different granularities; the Neck component performs mixed operations such as up-sampling, fusion and feature extraction on the three feature layers to obtain three prediction feature maps with different granularities; and the Prediction component predicts according to the three Prediction characteristic graphs and generates a boundary box and a label thereof.

And according to the boundary frame and the label thereof output by the target detection model, determining the boundary frame matched with the target image label, and further, according to the determined boundary frame, intercepting the target area image indicated by the boundary frame from the picture to be repaired.

Step S1300, extracting mask data of a target contour region in the target region image by adopting an image segmentation model trained to be convergent, and expanding the mask data to enable the mask data to become mask data of the picture to be repaired;

the image segmentation model is trained to a convergent state in advance in a similar manner, and has the capability of performing image segmentation according to the input image to obtain mask data corresponding to contour features in the image, so that the image segmentation model takes the target area image as input and takes the mask data of the target area image as output.

In an embodiment of the present application, the image segmentation model is trained to convergeU²net. Of course, the image segmentation model of the present application is suitable for use with U-based image segmentation models²Variations of net, not limited to U²net, and can be popularized and compatible with other models with equivalent graph splitting capability.

The image segmentation model is constructed with an encoding path and a decoding path, wherein a multi-level decoder is arranged on the encoding path, a multi-level decoder is correspondingly arranged on the decoding path, and the level number of the encoder and the decoder can be flexibly determined by a person skilled in the art according to prior knowledge such as experiments, experience and the like. For example, in the example of fig. 4, six-level encoders and five-level decoders are provided (where En _6 may be actually regarded as one decoder, so that it may also be understood as six decoders), which directly perform 1 × 1 convolution kernel transformation, Sigmoid activation function and upsampling on intermediate feature information obtained by each level of decoders and the last level of encoder En _6 to form image feature information that is consistent with the image specification of the target region, that is, obtain six image feature information, then fuse all the image feature information in a cascade fusion manner, and generate mask data of the final target contour region after 1 × 1 convolution kernel transformation and Sigmoid activation function. The outline features of the content (for example, trademark) of the target area image are captured by the image feature information of each object in different scales, so that the mask data obtained here is essentially the mask form of the outline of the content (for example, trademark).

Because the mask data is only the data with the same size as the target area image, in order to facilitate the mask data to correspond to the picture to be repaired in size, the mask data in the target outline area in the target area image can be expanded in size and expanded to be the mask data with the same size as the picture to be repaired, the mask data in the target outline area is kept unchanged in the expansion process, and each pixel of the extension part obtained by the expansion is subjected to mask shielding treatment, and finally, the obtained expanded mask data is expanded to the size with the same size as the picture to be repaired and can be used for shielding the picture to be repaired from the target outline area with the information to be processed.

And step S1400, adopting the image restoration model trained to be convergent to carry out image restoration according to the image to be restored and the mask data thereof, and obtaining a restored image.

The image restoration model is trained to a convergence state in advance to acquire the capability of restoring the image of the picture to be restored according to the expanded mask data with the same scale as the picture to be restored, and accordingly the image restoration model takes the picture to be restored and the mask data thereof as input, removes the image of the target contour region according to the mask data, restores the image of the target contour region, and finally outputs the restored image.

In the embodiment of the application, the image restoration model is a U-net model based on partial convolution which is trained to be converged. In the embodiment, a U-net architecture proposed by David Dubray and Jochen Laubrock in 2019 is recommended to be used as a main feature extraction network, and the convolution operation in the main feature extraction network is replaced by partial convolution. The formula for the partial convolution is as follows:

wherein W represents the weight of the convolution kernel, b represents the bias of the convolution kernel, X represents the input feature map data, M represents the input Mask data Mask, l represents the element-wise dot product operation, X 'represents the feature map output after convolution, and M' represents the Mask data output after convolution.

The partial convolution is a special convolution operation applied to image restoration, and the partial convolution is input into an upper-layer feature map and mask data thereof and output into an image feature map with deeper representation and mask data thereof. In the embodiment of the application, deep feature extraction is performed on a feature map region displayed in mask data by adopting a partial convolution layer according to an input feature map and mask data thereof, so as to obtain a deep representation of a non-target region image, and obtain a corresponding intermediate image feature map and mask data thereof for subsequent step operation. The partial convolution layer may be a multilayer layer in a specific embodiment, and two layers are adopted in this embodiment, that is, the partial convolution is performed twice in succession on the picture to be repaired and the mask data thereof.

Referring to fig. 5, in a further embodiment, the step S1100 of obtaining the picture to be restored and the target image tag indicating the target area image in the picture to be restored includes the following steps:

step S1110, responding to an advertisement publishing request triggered by a user, and acquiring a picture to be repaired in advertisement publishing information correspondingly submitted by the user;

the method and the device respond to an advertisement publishing request triggered by a user and receive advertisement publishing information correspondingly submitted by the user, wherein the advertisement publishing information comprises commodity pictures, advertisement pictures and the like. When the picture contains the information to be processed of the trademark, the problem that the information cannot be issued on the relevant e-commerce platform is caused by some wind control measures, so that the information to be processed of the trademark in the picture needs to be removed. And taking the picture containing the trademark waiting processing information as a picture to be repaired, and performing subsequent target contour region elimination and repair operation. The target contour area is a pixel position area where the information to be processed is located.

Step 1120, responding to an image restoration request triggered by a user, and acquiring a target image label indicating a target area image in the picture to be restored;

the target area image refers to an image corresponding to a target area containing information to be processed in the picture to be repaired, and the target area image contains a target contour area directly referring to the information to be processed; the information to be processed may be sensitive information, the sensitive information may be, for example, a trademark drawing, a preset specific drawing, and the like, and the trademark may be a registered trademark or a non-registered trademark. The target image label refers to a label, such as a trademark name, a trademark ID, and the like, to which information to be processed included in the target area image belongs. In the subsequent operation flow, the image elimination and restoration method can frame the corresponding target area image according to the target image label, so as to perform the subsequent target contour area mask data extraction and elimination and restoration work.

In the embodiment, the technical scheme of the application is further deeply combined with a specific advertisement publishing scene, so that the information security maintenance capability of the advertisement publishing scene is improved, information to be processed contained in an advertisement picture (or a commodity picture) can be rapidly eliminated, and meanwhile, the integrity of the image of the related picture is ensured through image restoration, so that the normal automatic development of an advertisement publishing service is ensured, and the service efficiency of an e-commerce is improved.

Referring to fig. 6, in a further embodiment, in the step S1200, performing target detection on the to-be-repaired picture by using a target detection model trained to be convergent to obtain a bounding box which matches the target image label and indicates the position of the target area image, and capturing the target area image from the to-be-repaired picture according to the bounding box, the method includes the following steps:

step S1210, extracting a plurality of feature maps with different scales for the picture to be repaired by adopting a backbone network block in the target detection model;

the Backbone network block is a Backbone component of the YOLOv5 model, and comprises following sub-components, namely a Focus layer, a first CBL layer, a first CSP layer, a second CBL layer, a second CSP layer, a third CBL layer, a third CSP layer, a fourth convolution layer, an SPP layer, a fourth CSP layer and a fifth CBL layer in sequence, and feature extraction of different scales is carried out on a picture to be repaired.

And the Focus layer performs slicing operation on the input picture to obtain a feature map with the size being half of the size of the input picture and the number of channels being four times of the number of channels of the input picture.

And the output characteristic diagram of the second CSP layer, the output characteristic diagram of the third CSP layer and the output characteristic diagram of the fifth CBL layer are transmitted to a next component for further operation.

Step S1220, adopting a neck block in the target detection model to perform strong semantic information and strong positioning information fusion on the feature maps with different scales to obtain a plurality of prediction feature maps with different scales;

the Neck block is a tack component using a combination of FPN and PAN. As known to those skilled in the art, in the process of extracting features of a neural network, a shallower feature map contains stronger position information, and a deeper feature map contains stronger semantic information. The FPN is a characteristic pyramid network and transmits strong semantic information from top to bottom; the PAN is a path aggregation network and transmits strong position information from bottom to top. The FPN and PAN combination realizes information aggregation of the feature maps of three different scales output by the previous component, so that semantic expression and position expression on the feature maps of the different scales are enhanced, and predicted feature maps with the sizes of 76 × 76, 38 × 38 and 19 × 19 are output for the next component to predict.

Step S1230, predicting the corresponding bounding box and its label data for the prediction feature maps of different scales by using the prediction block in the target detection model;

the Prediction block is a Prediction component, and is used for identifying and classifying three different-scale feature maps output by the previous component respectively by connecting 1 × 1 convolution layers to obtain an accurate detection target, performing non-maximum value suppression and IOU value judgment on the detected boundary frame, and finally outputting the detected most likely boundary frame and the label data thereof.

And S1240, determining the boundary box matched with the target image label according to the predicted label data, and intercepting the target area image indicated by the boundary box from the picture to be repaired.

And finding a boundary box corresponding to the same label in the output result of the last assembly according to the label of the target image, wherein the boundary box is the boundary box indicating the target area image. And according to the boundary frame, intercepting a target area image of the indication position of the picture to be repaired from the picture to be repaired as the input of the next operation.

The method and the device have the advantages that strong semantic information and strong position information of different levels of feature maps in feature extraction are fused, detection accuracy of the target detection model on the small target is improved, and an application scene of the embodiment belongs to the problem of small target detection, so that the target detection model can well solve the problem.

Referring to fig. 7, in an embodiment, the training process of the target detection model includes the following steps:

step S1201, determining a related picture with information to be processed and boundary frame label data and label data thereof as a training sample in a training set of the target detection model;

in the application, representative advertisement pictures, commodity pictures and the like with to-be-processed information related to the realization of the technology are determined, and the to-be-processed information areas in the pictures are framed and labeled in a manual labeling mode and the like to obtain corresponding boundary frame labeling data and label labeling data; and obtaining enough training samples through multiple screening and labeling, namely forming a training set of the target detection model.

Step S1202, image data enhancement processing is carried out on pictures in the training samples in the training set;

in order to promote the rapid convergence of the model and generalize the data characteristics, firstly, image data enhancement processing, namely, Mosaic data enhancement is carried out on the pictures of the training samples in the training set.

The Mosaic data enhancement is proposed in YOLOv4, and 4 pictures are spliced in a random scaling, random clipping and random arrangement mode; and further, the training set, especially the small target training set, is enriched, and the detection precision of the small target is increased. The information area to be processed belongs to the range of small targets, so that the data enhancement mode can effectively improve the target detection precision of the application scene in the embodiment of the application.

Step S1203, determining the optimal anchor frame parameter of each training according to a part of training samples, and determining the picture boundary frame and the label of the training samples in each training on the basis of the optimal anchor frame parameter by adopting a target detection model;

as known to those skilled in the art, in the YOLO series algorithm, the length and width of the anchor frame are generally initialized according to different training sets, and then the subsequent bounding box prediction is performed on the basis of the initialization, so as to improve the detection accuracy and the recall rate. The setting is generally manually set by a person skilled in the art according to an actual service application scenario or an independent calculation program is adopted to obtain the parameters of the initial anchor frame. In YOLOv5, the initialization process is embedded in the training network. In each training, a part of training samples of the training set is called, and the optimal anchor frame parameter in the part of samples is adaptively calculated to be used as the initialization setting of the training.

After the initialization setting is completed, the training of the target detection model can be started for the corresponding training sample, and the corresponding pictures of the training sample are taken as input to obtain the corresponding output of the plurality of bounding boxes and the labels thereof.

Step S1204, calculating a loss value according to the boundary box and the label of each training sample and the boundary box labeled data and the label data of the training sample, and performing gradient update on a target detection model according to the loss value;

according to the setting of the previous step, the step S1200 is implemented for the part of samples called in each training, and the detected prediction boundary frame and the prediction labels thereof are obtained; and calculating model loss aiming at the prediction boundary box and the prediction label thereof as well as the real boundary box and the real label thereof.

As known to those skilled in the art, in the training of the target detection model, the loss function is composed of a classification loss function and a regression loss function, wherein the regression loss function generally adopts IOU _ loss to perform loss calculation; it can be known that IOU is the intersection ratio of the predicted bounding box and the corresponding real bounding box, and the IOU _ loss is 1-IOU; the iterative training according to the IOU _ loss results in the following two problems: firstly, when the predicted bounding box is not intersected with the real bounding box, the loss function is not derivable, namely parameter optimization is not possible; second, when the two predicted bounding boxes are different in position but the same in size and the IOU value is the same, IOU _ loss cannot determine its exact status. Therefore, the regression loss function adopted in the YOLOv5 model adopted in the embodiment of the present application is GIOU _ loss, the loss first calculates the minimum bounding rectangle of the predicted bounding box and the real bounding box, then calculates the union of the two bounding boxes and the difference of the union in the minimum bounding rectangle, so as to calculate the GIOU _ loss, and the calculation flow is shown as the following formula:

GIOU _ loss 1-GIOU 1- (IOU- | difference |/| C |)

The IOU is an intersection ratio of the prediction bounding box and the real bounding box, | difference set | is a difference set of a union set of the two bounding boxes in the minimum circumscribed rectangle, | C | is the area of the minimum circumscribed rectangle.

And carrying out inverse gradient propagation according to the loss value obtained by calculation, and updating the parameters of the target detection model.

And S1205, repeating the two updating steps until the target detection model is trained to be converged.

Repeating the above steps S1203 and S1204, iteratively updating the parameters of the target detection model, which is substantially updating the parameters of the model, until a preset termination condition is satisfied, and obtaining the target detection model trained to converge, where the preset termination condition may be set by a person skilled in the art according to an actual service background.

In the embodiment of the application, a data enhancement method adopted by the training of the target detection model can increase samples of small targets in a training set, so as to improve the detection precision of the small targets; the boundary loss function used has scale invariance, so that the training of the model is converged quickly.

Referring to fig. 8, in a further embodiment, in the step S1300, extracting mask data of a target contour region in the target region image by using an image segmentation model trained to converge, and expanding the mask data to make the mask data become mask data of the to-be-repaired picture, the method includes the following steps:

step 1310, performing multi-level coding on the target area image, and correspondingly generating intermediate characteristic information of a corresponding scale, wherein the intermediate characteristic information is characteristic representation of information to be processed in the target area image;

in the present embodiment, please refer to fig. in which six encoders (En _1 to En _6) along the side branch paths are used to apply the specification original image corresponding to the picture to be detected, i.e. to adapt to U²The net cuts the picture to be detected into the picture to be detected with the specified specification according to the requirement of the specification of the input picture, and the original picture of the specification is coded step by step. The first-stage encoder at the top layer extracts the intermediate characteristic information corresponding to the first scale from the specification original image, then transmits the intermediate characteristic information to the encoder at the next stage to extract the intermediate characteristic information corresponding to the second scale, and so on, the six-stage encoder extracts the intermediate characteristic information corresponding to the intermediate characteristic information, so that the six-stage encoder can obtain six intermediate characteristic information corresponding to the spatial resolution after the specification original image is encoded step by step through the encoding path.

It can be understood that the intermediate feature information of each scale is an obtained representation of the specification picture after deep semantic understanding at the corresponding scale, and is information extracted from the contour features of the human body in the picture to be detected. U shape²This ability of the net model is known to those skilled in the art, and as long as a sufficient amount of training samples are used to train the net model to converge, the encoding path thereof can have a deep semantic understanding ability to capture the information to be processed in the target region picture.

Step S1320, correspondingly performing multi-stage decoding on the decoding path, generating first image feature information by using the intermediate feature information of the minimum scale, and correspondingly decoding image feature information of a higher scale by taking the intermediate feature information generated by the previous-stage image feature information and the same-stage code thereof as reference, wherein the image feature information is used for representing the contour feature of the information to be processed in the target area image in a mask mode;

in the decoding path, referring to the figure, in the five decoders (De _1 to De _5) in the right branch path, from the bottom layer, each decoder stage takes the cascade of the upsampling feature map from the previous stage and the feature map from the symmetric encoder stage as input, and outputs the feature information after decoding. And (3) carrying out 1 x 1 convolution kernel conversion, Sigmoid activation function and up-sampling operation on output results of the last stage of encoder (En _6) and each stage of decoder, and then extracting six pieces of image characteristic information with the same specification as the original image, wherein the image characteristic information can represent human body contour information in a picture to be detected, and represents the contour characteristics in a Mask mode, namely Mask image data (Mask 1-Mask 6) essentially.

Thus, as can be understood from the disclosure of the structure and principle of the image segmentation model, the target region image of the present application is subjected to the progressive encoding and the progressive decoding by the image segmentation model, and then a plurality of image feature information (Mask image data: Mask1 to Mask6) are obtained.

Step S1330, fusing all the image feature information, generating mask data of the target contour region in the target region image, and expanding the mask data to make the mask data become mask data of the to-be-repaired picture.

According to U²And (3) fusing the image characteristic information in a cascade fusion mode according to a net model principle, and generating Mask data (Mask7) of a final target contour region after 1-by-1 convolution kernel transformation and a Sigmoid activation function. The mask image data is essentially a binary image, and in an iconic expression, when the target area image does not have any information to be processed, the binary image is image data with all values of 1; when the target area image has a target contour area which is information to be processed, the binary image is image data of which the corresponding position of the foreground of the target contour area is 0 and the values of other positions are 1. Thus, the mask data can be used to mask the image of the information region to be processed.

And expanding the mask data of the target contour region into the picture to be repaired, wherein the specific operation is as follows. Setting the pixels at all positions of the picture to be repaired as 1, setting the pixels at all positions of the image of the target area as mask data of the target contour area, and further obtaining the mask data of the picture to be repaired. In the mask data of the picture to be repaired, the value of the target contour region is 0, and the values of other regions are 1, that is, the mask data is used for shielding the target contour region.

In the embodiment, image segmentation is performed on the target area image instead of the to-be-restored image to extract the target contour area belonging to the to-be-processed information, so that mask data corresponding to the target area image is determined, the target contour area is masked, on the basis, the scale of the mask data is expanded according to the size of the to-be-restored image, the mask data is expanded to the same specification as the to-be-restored image, the image corresponding to the to-be-restored image and belonging to the target contour area part in the to-be-restored image can be masked, advanced fine positioning processing is performed on the subsequent to-be-restored image for image elimination and image restoration, and image elimination and image restoration can be achieved more accurately.

Referring to fig. 9, in an embodiment, the training of the image segmentation model includes the following steps:

step S1301, determining a relevant picture with information to be processed and obtaining mask data of a target contour region of the picture as training set sample data of the image segmentation model;

in the method, representative advertisement pictures, commodity pictures and the like with information to be processed related to the realization of the technology are determined, and a target contour region with the information to be processed in the pictures is marked in a manual marking mode and the like to obtain label mask data of the target contour region; and obtaining enough data samples through multiple screening and labeling, namely forming a training set of the image segmentation model.

Step S1302, calling the data samples in the training set to carry out iterative training on the image segmentation model, calculating the model loss and carrying out inverse gradient propagation;

calling part of samples in the training set as samples of the training, adopting the image segmentation model to carry out forward reasoning to obtain generated Mask data Mask1-Mask7, and carrying out loss calculation on the generated Mask data Mask1-Mask7 and the label Mask data, wherein the specific formula is as follows:

wherein

Indicates the loss values, l, corresponding to masks 1-6_fuseIs the loss value, P, corresponding to Mask7_G(r,c)、P_S(r,c)Respectively representing label mask data and generating mask data, said

And w_fuseThe weight value for each loss can be set by those skilled in the art according to the actual service application scenario.

And after the loss is calculated, carrying out inverse gradient propagation, and updating the parameters of the image segmentation model.

And step S1303, repeating the updating operation until the model is trained to be converged.

Repeating the previous step S1302, and iteratively updating the parameters of the image segmentation model until a preset termination condition is met, where the preset termination condition can be set by a person skilled in the art according to an actual service background, and then stopping training.

The embodiment is light and efficient because of the ability of performing iterative training on the image segmentation model to converge and to acquire mask data corresponding to the target contour region from the target region image name.

Referring to fig. 10, in the deepened embodiment, the step S1400 of performing image restoration according to the to-be-restored picture and the mask data thereof by using the image restoration model trained to be convergent to obtain the restored image includes the following steps:

step 1410, feeding the picture to be repaired and the mask data thereof into the image repairing model constructed based on the partial convolution U-Net architecture for local feature extraction, and obtaining an intermediate feature map;

in the embodiment of the present application, considering that a forwarding structure inside the U-Net network architecture can retain and capture more spatial detail information, the U-Net network architecture is used as a main network framework of the image restoration model, and the convolutional layers therein are replaced by partial convolutional layers, where the partial convolutional layers are as described above.

Firstly, feeding the picture to be repaired and the mask data thereof obtained in the last step into the U-Net network architecture, performing multi-level coding operation, and extracting a multi-scale intermediate characteristic diagram; the coding branch comprises a first part of convolution layer, a first pooling layer, a second part of convolution layer, a second pooling layer, a third part of convolution layer, a third pooling layer, a fourth part of convolution layer, a fourth pooling layer and a fifth part of convolution layer.

The deeper the layer number is, the larger the visual field of the feature map is, and the learned features are more biased to semantic information; the lighter the layer number is, the smaller the feature map visual field is, and the more the learned features are biased to the texture detail information. Therefore, by extracting the coding features with different scales, a plurality of intermediate feature maps focusing on different information can be obtained.

The deep network concerns the nature of the features, so deep shallow features are of respective significance; the other point is that the edge of the feature map with a larger size obtained by deconvolution is lack of edge detail information, after all, some edge features are inevitably lost while the feature is extracted by downsampling every time, and the lost features cannot be found from upsampling, so that the edge features are found by splicing the features.

After the step, a plurality of intermediate feature maps corresponding to the picture to be repaired can be obtained and can be further decoded.

Step S1420, performing multi-level decoding on the intermediate feature map to obtain an intermediate repair image, and fusing the to-be-repaired image and the intermediate repair image according to mask data of the to-be-repaired image to generate a repair image.

Performing multi-stage decoding operation on the intermediate characteristic diagram obtained in the last step to obtain a repair characteristic diagram; the decoding branch comprises a first up-sampling layer, a sixth part of convolutional layer, a second up-sampling layer, a seventh part of convolutional layer, a third up-sampling layer, an eighth part of convolutional layer, a fourth up-sampling layer and a ninth part of convolutional layer. The input of the partial convolution layer is fused with the intermediate characteristic diagram output by the same level coding and the up-sampling characteristic diagram output by the up-sampling layer.

The feature map with larger size obtained by up-sampling is lack of edge detail information, because some edge features are inevitably lost by the pooling operation in the coding feature extraction, the lost edge features can be spliced by the forward operation, so that the fusion of deep semantic information and shallow edge information is realized, and the intermediate restored image which can restore the eliminated region image and enable the eliminated region image to be seamlessly fused with the non-eliminated region is obtained.

And carrying out image scaling operation on the intermediate restored image and the mask data of the picture to be restored to obtain an original-size intermediate restored image and original-size mask data which have the same specification size as the originally input picture to be restored, carrying out fusion operation on the original-size intermediate restored image and the original-size mask data and obtaining an integrated restored image which is used as the final output of the image elimination restoration network. The fusion operation is specifically as follows:

Repair_img＝Mid_Repair_img*(1-Mask)+Original_img*Mask

wherein Mid _ replay _ img is the Original-size intermediate Repair image, Mask is the Original-size Mask data, Original _ img is the Original-size to-be-repaired image, and replay _ img is the finally generated Repair image.

In the embodiment of the application, the deep semantic information and the shallow edge information are fused by adopting encoding and decoding and forwarding operations in the model, and the pixel value of the target contour region in the picture to be repaired can be restored according to the semantic information and the detail information, so that the eliminated target contour region is seamlessly connected with other regions after being repaired.

Referring to fig. 11, in an embodiment, the training of the image inpainting model includes the following steps:

step S1401, determining relevant pictures which do not need to be repaired and mask data obtained after random erasing of the pictures to form a data sample of a training set of the image repairing model;

in the method, the pictures such as the commodity pictures and the advertisement pictures which do not contain the information to be processed are screened to be used for constructing the training set, and the pictures are obtained by artificially and efficiently screening, so that the method is beneficial to training the relevant models. The data sample of the training set is obtained by processing the screened pictures, specifically, erasing a random target contour region aiming at the pictures, generating mask data of the erasing target contour region according to the randomly erased position, and forming the data sample of the training set by the screened pictures and the mask data of the erasing target contour region.

Step S1402, constructing a generation countermeasure network for training, wherein the image restoration model is used as a generator, the discriminant model trained to be convergent is used as a discriminant, and generator parameters are initialized;

the image restoration model can be obtained by training a generated confrontation model (GAN) to convergence, wherein the generated confrontation model (GAN) consists of a generator and a discriminator, and the generator generates a new data sample according to the input of random noise so as to imitate a real sample; the discriminator is used as a two-classifier for distinguishing whether the input data is a real sample or a generated sample.

Please refer to fig. 12, which is a schematic diagram of the network architecture in the training phase of the generative confrontation model.

In an embodiment embodied by the present application, the image restoration model is used as a generator for generating a restored image according to an input screening picture and mask data of an erasing target contour region thereof; and taking a discriminant model as a discriminant for judging the similarity between the repaired image and the screened image.

The discriminant model can be selected from a plurality of excellent discriminant models in the prior art, including but not limited to: VGG series models, Inceptation series models, ResNet series models, EfficientNet series models and the like are mature discrimination models. The method can be used as the discrimination model of the application as long as sufficient corresponding training samples are adopted to train the system to be convergent. In the present application, a pre-trained VGG16 model is employed.

And initializing the parameters of the generator after selecting the image restoration model as a generator and the discrimination model as a discriminator. The model parameter initialization refers to the process of performing initialization assignment on the weights and the biases of each node in the generator before model training.

Step S1403, freezing the discriminator, calling a data sample of a training set to calculate a preset loss function in the generated countermeasure network, and updating the generator parameters in a reverse direction;

and calling samples in the training set, namely pictures and mask data of an erasing target contour region, and generating a repairing image through the generator model.

And the repairing image generated by the generator is subjected to feature extraction through the discriminator to obtain a generated feature map, the image is subjected to feature extraction through the discriminator to obtain a real feature map, similarity calculation is carried out on the generated feature map and the real feature map, and related loss is calculated to reversely update the generator parameters. The associated losses include the total variation loss TV _ loss, the Style loss Style _ loss, the perceptual loss, and L1-regularization L1_ loss. The total variation loss can promote the spatial smoothness of the generated restoration image and inhibit data noise amplified in the image generation; the style loss is to match the restored image with the style of a real sample; the perceptual loss is such that the restored image is consistent with the content texture of a real sample; the L1-regularization tends to make the generator sparse. The loss combination is an implementation case in the embodiment of the present application, and the actual usage thereof can be set by the related technical personnel according to the actual business requirements.

And step S1404, repeating the updating steps until the generated countermeasure network converges, and the generator is the image restoration model trained to converge.

Repeating the updating step, and iteratively updating the parameters of the generated countermeasure model, which is substantially to update the parameters of the generator, until a preset termination condition is satisfied, where the preset termination condition may be set by a relevant technician in the field according to an actual business context.

In the embodiment, the training set data is generated in a mode of picture without repairing and random erasing, and the image repairing model is trained in a mode of generating confrontation network training, so that on one hand, the input data of the image repairing model can be generated more conveniently; on the other hand, the image restoration model can be calculated and trained according to the loss of the real feature map and the generated feature map to generate a restoration image closer to the original real sample.

Referring to fig. 13, an image erasure repairing apparatus adapted to one of the objectives of the present application includes: the data acquisition module 1100 is used for acquiring a picture to be restored and a target image label indicating a target area image in the picture to be restored; a target detection module 1200, configured to perform target detection on the to-be-repaired picture by using a target detection model trained to be convergent to obtain a bounding box that matches the target image label and indicates the position of the target area image, and intercept the target area image from the to-be-repaired picture according to the bounding box; the image segmentation module 1300 is configured to extract mask data of a target contour region in the target region image by using an image segmentation model trained to be convergent, and expand the mask data to obtain mask data of the to-be-repaired image; and the image restoration module 1400 performs image restoration according to the image to be restored and the mask data thereof by using the image restoration model trained to be convergent to obtain a restored image.

In a further embodiment, the data acquisition module 1100 includes: the first response submodule is used for responding to an advertisement publishing request triggered by a user and acquiring a picture to be repaired in advertisement publishing information correspondingly submitted by the user; the second response submodule is used for responding to an image restoration request triggered by a user and acquiring a target image label indicating a target area image in the picture to be restored;

in a further embodiment, the object detection module 1200 includes: the backbone network submodule is used for extracting feature maps with different scales aiming at the picture to be repaired; the neck submodule is used for carrying out strong semantic information and strong positioning information fusion on the feature maps with different scales to obtain a plurality of prediction feature maps with different scales; the prediction sub-module is used for predicting corresponding bounding boxes and label data thereof aiming at the prediction feature maps with different scales; and the intercepting submodule is used for determining the boundary box matched with the target image label according to the predicted label data and intercepting the target area image indicated by the boundary box from the picture to be repaired.

In a further embodiment, the image segmentation module 1300 includes: the coding submodule is used for carrying out multi-level coding on the target area image and correspondingly generating intermediate characteristic information of corresponding scales, and the intermediate characteristic information is characteristic representation of information to be processed in the target area image; the decoding submodule is used for correspondingly performing multi-stage decoding on a decoding path, generating first image characteristic information by using the intermediate characteristic information of the minimum scale, correspondingly decoding image characteristic information of a higher scale by taking the intermediate characteristic information generated by the previous-stage image characteristic information and the same-stage coding thereof as reference, wherein the image characteristic information is used for representing the contour characteristic of information to be processed in the target area image in a mask mode; and the first fusion submodule is used for fusing all the image characteristic information, generating mask data of a target contour region in the target region image, and expanding the mask data to enable the mask data to become mask data of the picture to be repaired.

In a further embodiment, the image inpainting module 1400 includes: the feature extraction submodule is used for feeding the picture to be repaired and the mask data thereof into the image repair model constructed based on the partial convolution U-Net architecture for local feature extraction to obtain an intermediate feature map; and the restoration submodule is used for carrying out multi-stage decoding on the intermediate characteristic graph to obtain an intermediate restoration image, and fusing the picture to be restored and the intermediate restoration image according to the mask data of the picture to be restored to generate a restoration image.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. As shown in fig. 14, the internal structure of the computer device is schematically illustrated. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize an image elimination and restoration method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform the image erasure repair method of the present application. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In this embodiment, the processor is configured to execute specific functions of each module and its sub-module in fig. 13, and the memory stores program codes and various data required for executing the modules or sub-modules. The network interface is used for data transmission to and from a user terminal or a server. The memory in this embodiment stores program codes and data necessary for executing all modules/sub-modules in the image erasing and restoring apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the image erasure repairing method of any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which, when executed by one or more processors, implement the steps of the method as described in any of the embodiments of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments of the present application can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when the computer program is executed, the processes of the embodiments of the methods can be included. The storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

In summary, the method and the device have the advantages that the full automation of the complete work of target area image positioning, target contour area elimination and target contour area restoration is realized by adopting various network models trained to be convergent, the time and labor cost of the E-commerce user can be greatly reduced by the efficient image elimination restoration technology, and meanwhile, the E-commerce user can quickly iterate the advertisement information to obtain more sales shares; meanwhile, the repairing effect of the commodity can be effectively improved, the repaired target contour area is better integrated with other areas, and therefore the customer attractiveness and competitiveness of the commodity are increased.

Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims

1. An image erasure restoration method, comprising the steps of:

2. The image restoration and elimination method according to claim 1, wherein obtaining the picture to be restored and a target image tag indicating a target area image in the picture to be restored comprises the steps of:

and responding to an image restoration request triggered by a user, and acquiring a target image label indicating a target area image in the picture to be restored.

3. The image inpainting method according to claim 1, wherein a target detection model trained to be convergent is used to perform target detection on the to-be-inpainted picture to obtain a bounding box which matches the target image label and indicates the position of the target area image, and the target area image is captured from the to-be-inpainted picture according to the bounding box, including the following steps:

4. The image restoration method according to claim 3, wherein the training process of the target detection model comprises the following steps:

5. The image elimination restoration method according to claim 1, wherein the method comprises the following steps of extracting mask data of a target contour region in the target region image by using an image segmentation model trained to be convergent, and expanding the mask data to be mask data of the image to be restored, and comprises the following steps:

6. The image restoration method according to claim 1, wherein image restoration is performed according to the image to be restored and the mask data thereof by using an image restoration model trained to converge, and a restored image is obtained, comprising the steps of:

7. The image inpainting method according to claim 1, wherein the training process of the image inpainting model comprises the following steps:

8. A computer device comprising a central processor and a memory, characterized in that the central processor is adapted to invoke execution of a computer program stored in the memory to perform the steps of the method according to any one of claims 1 to 7.

9. A computer-readable storage medium, characterized in that it stores, in the form of computer-readable instructions, a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.

10. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method as claimed in any one of claims 1 to 7.