CN117253233B

CN117253233B - Character erasing method, device and equipment

Info

Publication number: CN117253233B
Application number: CN202311142364.9A
Authority: CN
Inventors: 高红超
Original assignee: Guangdong OPT Machine Vision Co Ltd
Current assignee: Guangdong OPT Machine Vision Co Ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2024-05-17
Anticipated expiration: 2043-09-05
Also published as: CN117253233A

Abstract

The embodiment of the application provides a character erasing method, a device and equipment. In the embodiment of the application, the target workpiece image of the character to be erased can be obtained; inputting the target workpiece image into a character positioning model to output and obtain description information of a character area of the target workpiece image, wherein the character positioning model is obtained based on labeling training of a plurality of images containing characters and the description information of the corresponding character area, and the description information of the character area comprises coordinates, width, height and angle of a central point of the characters in the target workpiece image; determining a character mask area of the target workpiece image based on the description information of the character area of the target workpiece image; and performing character erasing operation on the character mask area of the target workpiece image through a diffusion model.

Description

Character erasing method, device and equipment

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method, an apparatus, and a device for erasing characters.

Background

In order to train a usable character recognition model, data expansion is required for training sample data. And one step of data amplification is to perform character removal operation on the image containing the characters in the training sample, so that more characters with the same style are synthesized on the image after the characters are removed, and more data sets are generated to train the character recognition model. Moreover, in the training process of the character recognition model, private information of companies involved in the surface of the workpiece, such as numbers, materials, dates and the like, can also be revealed in the form of videos or images. To avoid this problem, character erasure of video or images of these workpieces is also a necessary operation.

However, the existing word erasing algorithm based on the GAN network generated by deep learning has poor word erasing effect and low universality, and cannot meet the industrial requirement of word erasing.

Disclosure of Invention

The application provides a training and character erasing method, device and equipment of a character positioning model, which are used for solving the problems of poor character erasing effect and low universality of the existing character erasing algorithm based on a deep generation GAN network.

The embodiment of the application provides a character erasing method, which comprises the following steps: acquiring a target workpiece image of a character to be erased; inputting the target workpiece image into a character positioning model to output and obtain description information of a character area of the target workpiece image, wherein the character positioning model is obtained based on labeling training of a plurality of images containing characters and the description information of the corresponding character area, and the description information of the character area comprises coordinates, width, height and angle of a central point of the characters in the target workpiece image; determining a character mask area of the target workpiece image based on the description information of the character area of the target workpiece image; and performing character erasing operation on the character mask area of the target workpiece image through a diffusion model.

The embodiment of the application also provides a character erasing device, which comprises: the image acquisition module is used for acquiring a target workpiece image of the character to be erased; the character positioning module is used for inputting the target workpiece image into a character positioning model to output and obtain description information of a character area of the target workpiece image, wherein the character positioning model is obtained based on labeling training of a plurality of images containing characters and the description information of the corresponding character area, and the description information of the character area comprises coordinates, width, height and angle of a central point of the characters in the target workpiece image; the area determining module is used for determining a character mask area of the target workpiece image based on the description information of the character area of the target workpiece image; and the character erasing module is used for carrying out character erasing operation on the character mask area of the target workpiece image through the diffusion model.

The embodiment of the application also provides electronic equipment, which comprises: a memory and a processor; the memory is used for storing a computer program; the processor is coupled to the memory for executing the computer program for implementing steps as in a character erasure method.

The embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement steps as in a character erasure method.

According to the character erasing method provided by the embodiment of the application, the target workpiece image of the character to be erased can be obtained, the character area in the target workpiece image is positioned by using the character positioning model obtained by marking training based on a plurality of images containing the character and the description information of the corresponding character area, so that the description information of the character area in the target workpiece image is determined, the description information of the character area comprises the central point coordinates, the width, the height and the angle of the character in the target workpiece image, the character mask area of the target workpiece image is determined based on the description information of the character area of the target workpiece image, and finally, the character erasing operation is performed on the character mask area of the target workpiece image through the diffusion model for erasing the character, so that the erasing operation of the character in the target workpiece image can be accurately realized. The character positioning model only needs to detect the position of the character area, does not need to identify the character type, and can accurately realize the character erasure of various workpiece images by matching with the universality of the diffusion model for erasing characters.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a flowchart of a character erasing method according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart of a training method of a character positioning model according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of character annotation information in an image comprising characters according to an exemplary embodiment of the present application;

fig. 4 is a schematic diagram of a character erasing method according to an exemplary embodiment of the present application applied to a practical scene;

Fig. 5 is an image schematic diagram of a character to be erased in the character erasing method according to the exemplary embodiment of the present application;

Fig. 6 is a schematic diagram of a character mask region corresponding to an image of a character to be erased in the character erasing method according to the exemplary embodiment of the present application;

Fig. 7 is a schematic diagram of an image after erasing characters in the character erasing method according to an exemplary embodiment of the present application;

Fig. 8 is a schematic diagram of a character erasing device according to an exemplary embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order to solve the problems of poor character erasing effect and low universality of the existing character erasing algorithm based on the deep generation GAN network, some embodiments of the application provide a character erasing method.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a flowchart of a character erasing method according to an exemplary embodiment of the present application. As shown in fig. 1, the method includes:

step 110, a target workpiece image of the character to be erased is acquired.

The target workpiece images can be multiple in number, privacy information of some companies and related technologies can be related to the target workpiece images, or the target workpiece images can be used for expanding sample data to remove characters in the target workpiece images as basic images, multiple different combinations of characters are added to the images, one or more images containing the characters are newly generated, and the images are used as training sample data of a deep learning model in an industrial scene where the target workpiece images are located. The number of target workpiece images may be plural.

Step 120, inputting the target workpiece image into the character positioning model to output the description information of the character area of the target workpiece image, wherein the description information of the character area comprises the coordinates, width, height and angle of the center point of the character in the target workpiece image.

The character positioning model is obtained by training for labeling based on a plurality of images containing characters and description information of corresponding character areas.

Fig. 2 is a flowchart of a training method of a character positioning model according to an exemplary embodiment of the present application. As shown in fig. 2, the training process of the character positioning model includes:

step 210, acquiring a character positioning model training data set, wherein the character positioning model training data set consists of an image containing characters and an image not containing characters, and the image containing the characters also carries character labeling information, and the character labeling information comprises the coordinates of the central point, the width, the height and the rotation angle of the characters.

Fig. 3 is a schematic diagram of character labeling information in an image including characters according to an exemplary embodiment of the present application. The character labeling information shown in fig. 3 is labeling information for the character "L", and includes the center point coordinates (x, y) of the rectangular frame of the character "L", the width w, the height h, and the rotation angle θ, which is the offset angle of the rectangular frame of the character "L" with respect to the horizontal axis. The rectangular frame of the character "L" is the smallest circumscribed rectangular frame that can cover the character "L".

Wherein the images containing the characters may be obtained from the public dataset and the signed dataset in the industrial scenario. Wherein the disclosed dataset may include SysthText for which matrix boxes of characters in each image in the dataset may be extracted, and an offset angle of each matrix box with respect to a horizontal axis, i.e., a rotation angle described below, is determined. Specifically, acquiring a character positioning model training dataset includes:

acquiring a public character data set, wherein the public character data set comprises a plurality of images containing characters;

Performing data preprocessing on the public character data set, wherein the data preprocessing comprises the following steps: extracting rectangular frames containing characters from the plurality of images respectively, and marking the categories of the plurality of images as specified categories;

generating an industrial character dataset based on a plurality of images containing characters and a plurality of images not containing characters in the industrial scene;

a character positioning model training dataset is generated based on the public character dataset and the industrial character dataset.

The categories of the plurality of images in the public data set are marked as specified categories, namely the categories of the plurality of images in the public data set are marked as the same category, so that the character positioning model trained based on the sample data does not need to pay attention to the category of the characters.

It should be appreciated that the number of images containing characters in an industrial scenario is generally small and often cannot meet the sample order of magnitude requirements for training a deep learning model, and based on this, embodiments of the present application may obtain images containing characters from a dataset disclosed by SysthText, etc., as part of sample data for training a character positioning model. The character positioning model in the embodiment of the application is also applied to character detection and positioning in an industrial scene, and in order to enable the character positioning model obtained through training to accurately detect characters in the industrial scene, the sample data for training the character positioning model also needs to comprise workpiece images containing the characters in the industrial scene.

In some exemplary embodiments, since the number of images containing characters in an industrial scene is typically small, to expand this portion of sample data, embodiments of the present application may regenerate a portion of the images containing characters based on the images containing characters and the images not containing characters in the industrial scene to enrich the industrial character dataset. Specifically, generating an industrial character dataset based on a plurality of images containing characters and a plurality of images not containing characters in an industrial scene, comprising:

Acquiring character labeling information of a plurality of images containing characters, and cutting the plurality of images containing the characters based on the character labeling information to obtain a plurality of character images;

generating a plurality of character images similar to the character styles of the plurality of character images through a style conversion network so as to expand the number of the plurality of character images;

Randomly selecting a target image from a plurality of images not containing characters, and randomly determining a specified number of character images from the plurality of character images after expansion, the specified number being randomly determined in a range from 1 to the number of character images of the plurality of character images after expansion;

Performing a geometric image transformation operation on a specified number of character images, the geometric image transformation operation including at least one of scaling, blurring, noise adding, color conversion;

And pasting the specified number of character images after the geometric image transformation operation into the target image to generate an industrial character data set, wherein the character images in the target image are not overlapped.

The character labeling information of the plurality of images containing characters is used for labeling the coordinates, width and height of the center point of the minimum rectangular frame containing the characters in each image of the plurality of images containing the characters and the rotation angle of the rectangular frame relative to the horizontal axis.

As one example, generating an industrial character dataset based on a plurality of images containing characters and a plurality of images not containing characters in an industrial scene may include the steps of:

S1, acquiring character labeling information of a plurality of images containing characters in an industrial scene, and cutting the plurality of images containing the characters based on the character labeling information to obtain a plurality of character images, wherein the number of the plurality of character images can be marked as N0.

S2, respectively generating a plurality of character images similar to the character styles of the N0 character images through a style conversion network so as to expand the number of the plurality of character images and finally obtain the N character images, wherein N is more than N0. Wherein the style conversion network is used for generating character images similar to the character style of the input character images. It will be appreciated that the character image generated by the style conversion network is similar to the input character image in overall style, but is not exactly the same, e.g., there may be some differences in character size, angle of rotation of the character, and font style.

S3, randomly selecting a target image from a plurality of images which do not contain characters as a background image, randomly determining a designated number x from the range from 1 to the number of character images of the expanded plurality of character images, and randomly selecting x character images from the expanded plurality of character images.

S4, performing geometrical image transformation operation on the x character images respectively, wherein the geometrical image transformation operation comprises at least one operation of zooming, blurring, noise adding and color conversion.

And S5, pasting the x character images subjected to the geometric image transformation operation into a target image to obtain a newly generated image containing the characters, wherein the character images in the target image are not overlapped.

Specifically, for the ith character image in the x character images, where i e [1, x ], the following operations are performed:

S51, determining a pre-pasting area of the ith character image in the target image.

S52, calculating the overlapping area of the pre-pasting area of the ith character image in the target image and the character image pasted in the target image, if the value of the overlapping area is less than 0.1, indicating that the overlapping area of the pre-pasting area of the ith character image in the target image and the character image pasted in the target image is smaller or basically not overlapped, and pasting the ith character image to the pre-pasting position in the target image.

If the value of the overlapping area is greater than 0.1, returning again to S51, it should be understood that 0.1 is an example of a set threshold value and should not constitute a limitation of the set threshold value of the overlapping area.

S6, repeating the steps S3-S5 to obtain a plurality of newly generated images containing the characters, and generating an industrial character data set based on the plurality of newly generated images containing the characters.

Step 220, training to obtain a character positioning model based on the character positioning model training data set and a loss function of the character positioning model; the character positioning model is constructed based on the object detection network YOLOV S.

The character positioning model comprises a feature extraction module, a feature fusion module and a target positioning module, wherein the number of convolution kernels of each layer in the feature extraction module of the character positioning model is smaller than that of the feature extraction module in the target detection network YOLOV S, the number of convolution layers of each layer in the feature extraction module of the character positioning model is smaller than that of the feature extraction module in the target detection network YOLOV S, and the description about a predicted target frame in the target positioning module of the character positioning model comprises a center point coordinate, a height, a depth and a rotation angle of the target frame.

The number of convolution kernels of each layer in the feature extraction module of the character positioning model is smaller than the number of convolution kernels of the feature extraction module in the target detection network YOLOV S, specifically, the number of convolution kernels of each layer in the feature extraction module of the character positioning model can be reduced according to a certain proportion, for example, the number of convolution kernels of each layer in the feature extraction module of the character positioning model can be reduced to one half of the number of convolution kernels of the feature extraction module of the target detection network YOLOV S, so that parameters of the model are reduced, the calculated amount in the model training process is reduced, and the training of the model is simplified. The number of convolution layers of each layer in the feature extraction module of the character positioning model is smaller than that of the feature extraction module in the target detection network YOLOV S, and specifically, the number of convolution layers of each layer in the feature extraction module of the character positioning model can be combined according to the attribute of the feature.

In some exemplary embodiments, training the character positioning model based on the character positioning model training data set and a loss function of the character positioning model, comprises:

Inputting a plurality of images in a training data set of the character positioning model into the character positioning model to determine center point loss, width and height loss of the characters, confidence coefficient loss of the characters and angle loss of the characters in each image predicted by the character positioning model;

Determining a value of a loss function of the character positioning model based on center point loss of characters, width and height loss of the characters, confidence loss of the characters and angle loss of the characters in each image predicted by the character positioning model;

and optimizing model parameters of the character positioning model based on the value of the loss function of the character positioning model, and training to obtain the character positioning model.

The character positioning model has three detection heads, each detection head of the character positioning module of the character positioning model can contain 3-4 anchor frame character positioning modules with different scales, and the output dimension of the character positioning module of the character positioning model is 4 x (x, y, w, h, θ, conf) =24. x, y, w, h, θ represent the coordinates (x, y) of the center point of the anchor frame, and the width w, height h, and angle θ, conf are the confidence levels of whether characters exist, that is, the probability of whether characters exist in the anchor frame. The calculation formula of the Loss function of the character positioning model is los=lambada _coor∑loss(xy)+λ_coor∑loss(wh)+λ_conf∑loss(conf)+λ_angle Σloss (θ). Where loss (xy) represents the center point loss of the character predicted by the character positioning model, loss (wh) represents the wide and high loss of the character predicted by the character positioning model, oss (conf) represents the confidence loss of whether the character predicted by the character positioning model is the loss of the rotation angle of the character predicted by the character positioning model, and loss (θ) represents the loss of the rotation angle of the character predicted by the character positioning model. Wherein, the optimizer of the character positioning model can select Adam optimization algorithm, momentum:0.937, initializing learning rate to be 0.01, training 50 ten thousand pictures in data scale, and stopping after training for 90 rounds.

The training method for the character positioning model provided by the embodiment of the application can acquire a character positioning model training data set, wherein the character positioning model training data set consists of an image containing characters and an image not containing characters, wherein the image containing the characters also carries character labeling information, the character labeling information comprises the coordinates of the central point, the width, the height and the rotation angle of the characters, and the character positioning model is obtained by training based on the character positioning model training data set and a loss function of the character positioning model. The character positioning model is constructed based on the target detection network YOLOV S and comprises a feature extraction module, a feature fusion module and a target positioning module, and as the number of convolution kernels of each layer in the feature extraction module of the character positioning model is smaller than that of the convolution kernels of the feature extraction module in the target detection network YOLOV5S and the number of convolution layers of each layer in the feature extraction module of the character positioning model is smaller than that of the feature extraction module in the target detection network YOLOV5S, the parameter number of the character positioning model obtained through training is reduced compared with that of the feature extraction module in the target detection network YOLOV S, the calculation amount of the model is correspondingly reduced, and the complexity of the character positioning model obtained through training is reduced. Moreover, the description about the predicted target frame in the target positioning module of the character positioning model further increases the rotation angle of the character on the basis of including the center point coordinate, height and depth of the target frame, and the detection capability of the character with the rotation angle can be realized.

It should be understood that, after the target workpiece image is input to the character positioning model, the character positioning model may detect all the characters included in the target workpiece image, that is, the description information of the character area of the target workpiece image output by the character positioning model includes the description information of the character area of at least one character, that is, all the characters in the target workpiece image. The character area of each character is a circumscribed rectangular frame containing each character.

Step 130, determining a character mask area of the target workpiece image based on the description information of the character area of the target workpiece image.

The character masking area of the target workpiece image is determined based on the description information of the character area of the target workpiece image, specifically, the area to be masked in the target workpiece image can be determined based on the description information of the character area of the target workpiece image, and masking operation is performed on the area to be masked through a masking network to determine the character masking area of the target workpiece image.

In some exemplary embodiments, when the number of characters in the target workpiece image is small, the direct erasing of the rectangular box area containing the characters has less influence on the character background because the occupation of the characters in the target workpiece image is small. Based on the method, in order to improve the efficiency of masking the character area, the method can directly mask the character frame of the character in the target workpiece image to serve as the character masking area of the target workpiece image. Specifically, determining the character mask area of the target workpiece image based on the description information of the character area of the target workpiece image includes:

determining the number of characters in the target workpiece image based on the description information of the character area of the target workpiece image;

And if the number of the characters in the target workpiece image is smaller than the preset number, taking the character frame of the characters in the target workpiece image as a character mask area of the target workpiece image.

In some exemplary embodiments, when the number of characters in the target workpiece image is greater, to preserve more background information in the target workpiece image, masking operations may be performed on only the character portions to determine a character mask area of the target workpiece image. Specifically, if the number of characters in the target workpiece image is greater than or equal to the preset number, clustering the characters in the target workpiece image from the appointed dimension of the characters to obtain a plurality of class clusters, wherein the appointed dimension comprises the coordinates, width, height and angle of the central point of the characters;

Determining a connected domain of target characters contained in various clusters in the plurality of class clusters, wherein the target characters are any character in a target workpiece image;

Merging the connected domains of the target characters to obtain mask areas of the target characters;

a character mask area of the target workpiece image is determined based on the mask area of the target character.

As an example, in the case where the number of characters in the target workpiece image is greater than or equal to the preset number, the following operations may be performed on the respective characters in the target workpiece image:

1) And clustering the characters by using a density peak clustering algorithm, wherein the characteristics comprise five groups of x, y, w, h and theta, M class clusters Cluster are obtained, wherein x, y are central point coordinates of the characters, w and h are the width and the height of the characters respectively, and theta is the rotation angle of the characters.

2) Solving the connected domain of each class cluster containing characters.

3) And combining all the connected domains to obtain a character mask area corresponding to the target workpiece image.

And 140, performing character erasing operation on the character mask area of the target workpiece image through the diffusion model.

In some exemplary embodiments, performing a character erasure operation on a character mask region of a target workpiece image by a diffusion model includes:

and performing character erasing operation on the character frame of the target workpiece image through the diffusion model.

The diffusion model (Stablediffusion) adopted by the scheme is a method for performing a diffusion coding process on the potential representation space (LATENT SPACE), so that the computational complexity can be greatly reduced, and compared with an countermeasure generation network, a higher-quality image generation effect can be achieved. The diffusion model selected has been trained with a large amount of data without requiring retraining. The network of diffusion models comprises a pre-trained self-coding model for extracting potential feature space representation of the characters to be erased, then the character area information to be erased is added into the potential feature space through a cross attention mechanism, and based on the condition, a decoder in the network of diffusion models is utilized for decoding and recovering the original pixel space, so that the erasure of the character area is realized, and other area information irrelevant to characters is reserved.

The input of the diffusion model is a target workpiece image and a character mask area corresponding to the target workpiece image, and the output of the diffusion model is a target workpiece image with characters erased. Fig. 4 is a schematic diagram of a character erasing method according to an exemplary embodiment of the present application applied to a practical scenario. As shown in fig. 4, the process of the character erasing method may include: s11, inputting the target workpiece Image into a character positioning model char_det to determine a character area in the target workpiece Image; s12, performing masking operation on the character areas in the target workpiece Image through a masking network mask_Gen to determine character masking areas in the target workpiece Image; s13, performing character erasing operation on the character mask area in the target workpiece Image through the diffusion model.

Fig. 5 is an image schematic diagram of a character to be erased in the character erasing method according to the exemplary embodiment of the present application. Fig. 6 is a schematic diagram of a character mask region corresponding to an image of a character to be erased in the character erasing method according to the exemplary embodiment of the present application. The character mask area shown in fig. 6 is obtained by determining character areas for the images of the characters to be erased shown in fig. 5 through the character positioning model, and masking operations by the masking network, respectively. Fig. 7 is a schematic diagram of an image after erasing characters in the character erasing method according to an exemplary embodiment of the present application. The character erased image of the character to be erased shown in fig. 7 is obtained by inputting the image of the erased character shown in fig. 5 and the image of the character mask area shown in fig. 6 into a diffusion model, and performing the character erasing operation on the character area in the image of the erased character shown in fig. 5 based on the image of the character mask area shown in fig. 6 by the diffusion model.

According to the character erasing method provided by the embodiment of the application, the target workpiece image of the character to be erased can be obtained, the character positioning model obtained by training based on the training method of the character positioning model is utilized to position the character area in the target workpiece image so as to determine the description information of the character area in the target workpiece image, the description information of the character area comprises the central point coordinates, the width, the height and the angle of the character in the target workpiece image, the character mask area of the target workpiece image is determined based on the description information of the character area of the target workpiece image, and finally, the character erasing operation is carried out on the character mask area of the target workpiece image through the diffusion model for erasing the character, so that the erasing operation of the character in the target workpiece image can be accurately realized. The character positioning model only needs to detect the position of the character area, does not need to identify the character category, has universality by matching with the diffusion model for erasing characters, and can accurately realize the character erasure of various workpiece images.

In addition, the method provided by the embodiment can be applied to any application scene with character erasure, only the character locating model for locating the character area is required to be trained, once the training of the model is completed, when the character erasure operation is carried out subsequently, only the mask operation is carried out on the character area located by the character locating model by combining the mask network so as to determine the character mask area of the image of the character to be erased, and finally the image of the character to be erased and the corresponding character mask area are input into the diffusion model for carrying out the character erasure operation. On one hand, the training of the character positioning model only needs to pay attention to the character area, and the description of the rotation angle of the character area is added in the training process, so that the accuracy of the positioning of the character area can be effectively improved, and the convolution layer and the convolution kernel number of the feature extraction module of the character positioning model are reduced, so that the character positioning model obtained through training is lighter. On the other hand, the diffusion model for erasing characters does not need to be retrained according to different industrial scenes, has strong universality and can be applied to various scenes needing to erase characters on the surface of a workpiece.

It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 110 to 130 may be device a; for another example, the execution subject of steps 110 to 120 may be device a, and the execution subject of step 130 may be device B; etc.

It should be further noted that, the concept of the character erasing method provided by the embodiment of the application is not limited to the scene of character erasing, and the character detection is replaced by other target detection, such as more general object (vehicle, person, cat, etc.) detection scenes, and the model training and character erasing method in the inventive concept is also applicable to such scenes.

In addition, in some of the above embodiments and the flows described in the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or in parallel, the sequence numbers of the operations such as 110, 120, 210, 220, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequential order, and the descriptions of "first" and "second" are not limited to different categories.

Fig. 8 is a schematic diagram of a character erasing device 800 according to an exemplary embodiment of the present application. As shown in fig. 8, the apparatus 800 includes: an image acquisition module 810, a character positioning module 820, a region determination module 830, and a character erasure module 840, wherein:

An image acquisition module 810 for acquiring a target workpiece image of a character to be erased;

the character positioning module 820 is configured to input the target workpiece image into a character positioning model to output description information of a character area for obtaining the target workpiece image, where the character positioning model is obtained based on labeling training of a plurality of images including characters and description information of corresponding character areas, and the description information of the character areas includes coordinates of a center point, width, height and angle of the characters in the target workpiece image;

a region determining module 830, configured to determine a character mask region of the target workpiece image based on description information of a character region of the target workpiece image;

and a character erasing module 840, configured to perform a character erasing operation on the character mask area of the target workpiece image through the diffusion model.

Optionally, the area determining module 830 is specifically configured to:

Optionally, the character erasing module 840 is specifically configured to:

and performing character erasing operation on the character frame of the character of the target workpiece image through the diffusion model.

Optionally, the area determining module 830 is specifically configured to:

If the number of the characters in the target workpiece image is greater than or equal to the preset number, clustering the characters in the target workpiece image from the appointed dimension of the characters to obtain a plurality of class clusters, wherein the appointed dimension comprises the central point coordinates, the width, the height and the angle of the characters;

Determining a connected domain of a target character contained in each cluster in the plurality of clusters, wherein the target character is any character in the target workpiece image;

Optionally, the apparatus further comprises:

The data acquisition module is used for acquiring a character positioning model training data set, wherein the character positioning model training data set consists of an image containing characters and an image not containing the characters, the image containing the characters also carries character marking information, and the character marking information comprises the coordinates of the central point, the width, the height and the rotation angle of the characters;

The model training module is used for training to obtain a character positioning model based on the character positioning model training data set and a loss function of the character positioning model; the character positioning model is constructed based on a target detection network YOLOV S and comprises a feature extraction module, a feature fusion module and a target positioning module, the number of convolution kernels of each layer in the feature extraction module of the character positioning model is smaller than that of the convolution kernels of the feature extraction module in the target detection network YOLOV S, the number of convolution layers of each layer in the feature extraction module of the character positioning model is smaller than that of the feature extraction module in the target detection network YOLOV S, and the description about a predicted target frame in the target positioning module of the character positioning model comprises a center point coordinate, a height, a depth and a rotation angle of the target frame.

Optionally, the data acquisition module is specifically configured to:

Performing data preprocessing on the public character data set, wherein the data preprocessing comprises the following steps: extracting rectangular frames containing characters from a plurality of images respectively, and marking the categories of the images as specified categories;

The character positioning model training dataset is generated based on the public character dataset and the industrial character dataset.

Optionally, the data acquisition module is specifically configured to:

Acquiring character labeling information of the plurality of images containing the characters, and cutting the plurality of images containing the characters based on the character labeling information to obtain a plurality of character images;

randomly selecting a target image from the plurality of images not containing characters, and randomly determining a specified number of character images from the expanded plurality of character images, the specified number being randomly determined in a range from 1 to the number of character images of the expanded plurality of character images;

performing geometric image transformation operation on the specified number of character images, wherein the geometric image transformation operation comprises at least one operation of zooming, blurring, noise adding and color conversion;

Optionally, the model training module is specifically configured to:

Inputting a plurality of images in the character positioning model training data set into the character positioning model to determine center point loss, width and height loss of characters, confidence coefficient loss of the characters and angle loss of the characters in each image predicted by the character positioning model;

The character erasing device can implement the method of the method embodiments of fig. 1 to 7, and the character erasing method of the embodiment shown in fig. 1 to 7 can be specifically referred to, and will not be described again.

The embodiment of the application also provides electronic equipment, which comprises: the character positioning device comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus, and the machine-readable instructions are executed by the processor to execute the character positioning method. Specifically, fig. 9 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 9, the apparatus includes: a memory 91 and a processor 92.

Memory 91 is used to store computer programs and may be configured to store various other data to support operations on the computing device. Examples of such data include instructions for any application or method operating on a computing device, contact data, phonebook data, messages, images, video, and the like.

A processor 92 coupled to the memory 91 for executing the computer program in the memory 91 for: acquiring a target workpiece image of a character to be erased; inputting the target workpiece image into a character positioning model to output and obtain description information of a character area of the target workpiece image, wherein the character positioning model is obtained based on labeling training of a plurality of images containing characters and the description information of the corresponding character area, and the description information of the character area comprises coordinates, width, height and angle of a central point of the characters in the target workpiece image; determining a character mask area of the target workpiece image based on the description information of the character area of the target workpiece image; and performing character erasing operation on the character mask area of the target workpiece image through a diffusion model.

Further, as shown in fig. 9, the electronic device further includes: communication component 93, display 94, power component 95, audio component 96, and other components. Only some of the components are schematically shown in fig. 9, which does not mean that the electronic device only comprises the components shown in fig. 8. In addition, the components within the dashed box in fig. 9 are optional components, not necessarily optional components, depending on the implementation of the flow playback device. For example, when the electronic device is implemented as a terminal device such as a smart phone, tablet computer, or desktop computer, the components within the dashed box in fig. 9 may be included; when the electronic device is implemented as a server-side device such as a conventional server, cloud server, data center, or server array, the components within the dashed box in fig. 9 may not be included.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the steps in the above-described character erasure method embodiments.

The communication assembly of fig. 9 is configured to facilitate wired or wireless communication between the device in which the communication assembly is located and other devices. The device in which the communication component is located may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component may further include a Near Field Communication (NFC) module, radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and the like.

The memory of fig. 9 described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The display in fig. 9 described above includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.

The power supply assembly shown in fig. 9 provides power to various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

The audio component of fig. 9 described above may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A character erasure method, comprising:

Acquiring a target workpiece image of a character to be erased;

Inputting the target workpiece image into a character positioning model to output and obtain description information of a character area of the target workpiece image, wherein the character positioning model is obtained based on labeling training of a plurality of images containing characters and the description information of the corresponding character area, and the description information of the character area comprises coordinates, width, height and angle of a central point of the characters in the target workpiece image;

If the number of characters in the target workpiece image is smaller than the preset number, taking the character frame in the target workpiece image as a character mask area of the target workpiece image;

determining a character mask area of the target workpiece image based on the mask area of the target character;

And performing character erasing operation on the character mask area of the target workpiece image through a diffusion model.

2. The method of claim 1, wherein the training process of the character positioning model comprises:

acquiring a character positioning model training data set, wherein the character positioning model training data set consists of an image containing characters and an image not containing the characters, the image containing the characters also carries character marking information, and the character marking information comprises the coordinates of the central point, the width, the height and the rotation angle of the characters;

Training to obtain a character positioning model based on the character positioning model training data set and a loss function of the character positioning model; the character positioning model is constructed based on a target detection network YOLOV S and comprises a feature extraction module, a feature fusion module and a target positioning module, the number of convolution kernels of each layer in the feature extraction module of the character positioning model is smaller than that of the convolution kernels of the feature extraction module in the target detection network YOLOV S, the number of convolution layers of each layer in the feature extraction module of the character positioning model is smaller than that of the feature extraction module in the target detection network YOLOV S, and the description about a predicted target frame in the target positioning module of the character positioning model comprises a center point coordinate, a height, a depth and a rotation angle of the target frame.

3. The method of claim 2, wherein the acquiring a character positioning model training dataset comprises:

4. The method of claim 3, wherein generating the industrial character dataset based on the plurality of character-containing images and the plurality of non-character-containing images in the industrial scene comprises:

Acquiring character labeling information of the plurality of images containing the characters, and cutting the plurality of images containing the characters based on the labeling information to obtain a plurality of character images;

5. The method of claim 2, wherein training a character positioning model based on the character positioning model training data set and a loss function of the character positioning model comprises:

6. A character erasing apparatus, comprising:

the image acquisition module is used for acquiring a target workpiece image of the character to be erased;

The character positioning module is used for inputting the target workpiece image into a character positioning model to output and obtain description information of a character area of the target workpiece image, wherein the character positioning model is obtained based on labeling training of a plurality of images containing characters and the description information of the corresponding character area, and the description information of the character area comprises coordinates, width, height and angle of a central point of the characters in the target workpiece image;

The area determining module is used for determining the number of characters in the target workpiece image based on the description information of the character area of the target workpiece image; if the number of characters in the target workpiece image is smaller than the preset number, taking the character frame in the target workpiece image as a character mask area of the target workpiece image; if the number of the characters in the target workpiece image is greater than or equal to the preset number, clustering the characters in the target workpiece image from the appointed dimension of the characters to obtain a plurality of class clusters, wherein the appointed dimension comprises the central point coordinates, the width, the height and the angle of the characters; determining a connected domain of a target character contained in each cluster in the plurality of clusters, wherein the target character is any character in the target workpiece image; merging the connected domains of the target characters to obtain mask areas of the target characters; determining a character mask area of the target workpiece image based on the mask area of the target character;

And the character erasing module is used for carrying out character erasing operation on the character mask area of the target workpiece image through the diffusion model.

7. An electronic device, comprising: a memory and a processor;

The memory is used for storing a computer program;

the processor, coupled to the memory, is configured to execute the computer program for implementing the steps in the character erasure method according to any of claims 1 to 5.

8. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, causes the processor to implement the steps in the character erasure method according to any one of claims 1 to 5.