CN117079263B - Method, device, equipment and medium for extracting stele characters - Google Patents

Method, device, equipment and medium for extracting stele characters Download PDF

Info

Publication number
CN117079263B
CN117079263B CN202311336471.5A CN202311336471A CN117079263B CN 117079263 B CN117079263 B CN 117079263B CN 202311336471 A CN202311336471 A CN 202311336471A CN 117079263 B CN117079263 B CN 117079263B
Authority
CN
China
Prior art keywords
generator
training
decoder
network
inscription
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311336471.5A
Other languages
Chinese (zh)
Other versions
CN117079263A (en
Inventor
张攀
李超
黄蓝蓝
魏星如
杨彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neijiang Normal University
Original Assignee
Neijiang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neijiang Normal University filed Critical Neijiang Normal University
Priority to CN202311336471.5A priority Critical patent/CN117079263B/en
Publication of CN117079263A publication Critical patent/CN117079263A/en
Application granted granted Critical
Publication of CN117079263B publication Critical patent/CN117079263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/80Recognising image objects characterised by unique random patterns
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method, a device, equipment and a medium for extracting inscription characters, which relate to the field of image processing and have the technical scheme that: stacking the networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, performing cross training on the first generator and the second generator according to the two groups of generating countermeasure networks, and obtaining the second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training times reach the preset times, wherein the last U-Net network of the second generator is the first generator. The method solves the problem that the prior art cannot effectively extract the inscription with the highly self-similar foreground and background characteristics.

Description

Method, device, equipment and medium for extracting stele characters
Technical Field
The invention relates to the field of image processing, in particular to a method, a device, equipment and a medium for extracting inscription characters.
Background
The text extraction is to separate the text from various backgrounds as foreground information, is widely applied to text acquisition in scenes such as ancient cultural relics, bill signatures, road street view signs and the like, and is one of high-level basic works such as digital restoration and storage of subsequent cultural relics, information security and judicial identification, guideboard auxiliary automatic driving and the like. The text, especially the unique sparse space structure of Chinese characters, can be combined with random and individual stroke style characteristics to form a handwriting system with artistic value, so that the existing text extraction and segmentation algorithm faces significant challenges.
Common texts can be generally divided into printing fonts and non-printing fonts, and the corresponding text segmentation and extraction difficulties are huge under different environmental backgrounds. Print fonts are typically generated by standardized industrial equipment, whose text is completely similar to the text internally filled, which provides the potential for improved effectiveness of various conventional extraction algorithms based on similar features. Meanwhile, the self-similar characteristic of background information is superimposed, and the self-similar characteristic is strong in comparison with the color of characters, so that a plurality of traditional machine learning methods also achieve good effects. With the development of artificial intelligence technology in recent years, a large number of algorithm models based on deep convolutional neural networks have realized positioning, extracting and identifying characters in various indoor and outdoor scenes, and especially the positioning accuracy is approaching to the limit, so that the current research is more focused on the extracting and identifying of characters. However, for non-printed fonts, such as Dan Paifang steganographic text, which is produced by craftsman engraving over several years, unlike the stable physical environment that indoor scenes have, text regions can produce self-similarity of blocks under the influence of rain chemicals, climbing over plant secretions, etc., which necessarily results in failure of many algorithms to directly apply text extraction for printed fonts. Meanwhile, for natural stone, the whole background area before and after the character carving has more complex self-similar repeated texture characteristics, so that the carving trace under the effect of weathering effect is more difficult to extract, more physical rubbing is adopted in the prior art, the time consumption is huge, and the stroke detail deviation phenomenon still exists in rubbing text.
Therefore, how to solve the problem that the prior art cannot effectively extract the inscription with the highly self-similar foreground and background features is an urgent need to be solved at present.
Disclosure of Invention
The object of the present application is to provide a method, an apparatus, a device and a medium for extracting inscription characters, which are used for dividing and extracting inscription picture fonts with highly self-similar foreground and background characteristics.
The technical aim of the application is achieved through the following technical scheme:
in a first aspect of the present invention, there is provided a method for extracting steganographic characters, the method comprising:
stacking the networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; constructing a U-Net network by an encoder and a decoder, wherein the slope of an activation function of the encoder is greater than 1, and the slope of an activation function of the decoder is less than 1;
forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, performing cross training on the first generator and the second generator according to the two groups of generating countermeasure networks, and obtaining a second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training times reach a preset number of times, wherein the last U-Net network of the second generator is the first generator;
dividing and extracting the inscription words of the inscription picture to be processed according to the second generator.
In one implementation, the convolution operation of the U-Net network is expressed asWherein->Mean pooling, representing a size of 3 x 3,/->An operation representing a convolution kernel of size 1 x 1,/->And (5) representing a shallow feature map of the inscription picture.
In one implementation, the activation function of the encoder and the activation function of the decoder are both slopeleakyrlu functions.
In one implementation scheme, feature images of channel dimensions of the minimum scale scaling of the U-Net network are divided into two groups equally, after multiplication operation is carried out on one group of feature images and single-channel text mask templates with the same scale, the feature images are spliced with the feature images of the other group of channel dimensions, so that the constraint on the compression process of an encoder and the guidance on the generation process of a decoder are realized; wherein the U-Net network comprises a four-level scaling process.
In one implementation, a weighted two-class cross entropy loss function is employed as the loss function for the second generator training.
In one implementation scheme, a first group of generating countermeasure network is formed according to a first generator and a first decision device, a second group of generating countermeasure network is formed according to a second generator and a second decision device, the first generator and the second generator are cross-trained according to the two groups of generating countermeasure network, and when training times reach preset times, a second generator with segmentation and extraction capability on the foreground and the background of the inscription picture is obtained, and the method comprises the following steps:
the decoder of the first generator and a pre-trained first decision device form a first group of generation countermeasure network, the decoder of the first generator is trained according to the first generation countermeasure network, so that the decoder of the first generator has character generation and segmentation capability, and the trained decoder of the first generator is obtained; wherein the pre-training of the first decider is accomplished by a dataset generated from a plurality of different fonts of literal pictures of the same literal;
the second generator and the pre-trained second decision device form a second group to generate a countercheck network, training parameters of the decoder of the first generator after training is loaded in the decoders of all repeated network units of the second generator, the second generator after training parameters are loaded is trained according to the second group of countercheck network, and when the training parameters reach preset times, the second generator with segmentation and extraction capacity on the foreground and the background of the inscription picture is obtained, wherein the training of the second decision device is completed by data marked by the text picture of the real inscription.
In one implementation, when training a decoder of a last U-Net network according to a first generation reactance network, feature layers for splicing operation transmitted by an equivalent-scale encoder required in the decoder are randomly generated;
when training the second generator loaded with training parameters according to the second generation countermeasure network, other font character pictures of the same Chinese character similar to the target character font are added between the encoder and the decoder of the repeated structural unit of the generator so as to correct the compression process of the encoder and the generation process of the constraint decoder and finely adjust the network parameters of the second generator.
In a second aspect of the present invention, there is provided a steganographic character extraction apparatus, the apparatus comprising:
the second generator construction module is used for stacking networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; constructing a U-Net network by an encoder and a decoder, wherein the slope of an activation function of the encoder is greater than 1, and the slope of an activation function of the decoder is less than 1;
the cross training module is used for forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, carrying out cross training on the first generator and the second generator according to the two groups of generating countermeasure network, and obtaining the second generator with segmentation and extraction capability on the foreground and the background of the stele picture when the training times reach the preset times, wherein the last U-Net network of the second generator is the first generator;
and the character extraction module is used for dividing and extracting the inscription characters of the inscription picture to be processed according to the second generator.
In a third aspect of the present invention, an electronic device is provided, the electronic device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program, when executed by the processor, implements the steps of a method for extracting steganography as provided in the first aspect of the present invention.
In a fourth aspect of the present invention, there is provided a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of a method for extracting inscription words as provided in the first aspect of the present invention.
Compared with the prior art, the application has the following beneficial effects:
according to the method for extracting the inscription text, the improved convolution operation method suitable for extracting the self-similarity features is added in the U-Net network, so that the segmentation effect of the self-similarity regions and boundaries in the inscription picture is improved, different slopes are added in the activation function to form different slopes of the activation function to assist foreground and background separation of the inscription picture, the activation operation is respectively carried out in the encoding and decoding structures of the encoder, rough extraction of different information in respective branches is promoted, the subsequent decoder is promoted to generate a characteristic plane with better differentiation, the foreground and background is further effectively separated, further, the improved U-Net network is used for carrying out network stacking to construct a second generator, training of the second generator is carried out by two groups of generation countermeasure networks, the pre-training of the text generation capacity of the decoder and the iterative training of the generator are respectively realized under different data sets, and therefore the second generator has the capacity of segmenting and extracting the text of the background foreground from the similar inscription picture.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention. In the drawings:
fig. 1 shows a schematic flow chart of a method for extracting steganography according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a second generator stacked from U-Net networks according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a comparison of a conventional convolution and an improved convolution operation provided by an embodiment of the present disclosure;
FIG. 4 shows a prior art approachLeakyReLUSchematic diagram of activation function;
FIG. 5 illustrates an improvement provided by an embodiment of the present inventionLeakyReLUSchematic of an activation function
FIG. 6 shows a schematic diagram of an improved U-Net network provided by an embodiment of the present invention;
fig. 7 is a schematic block diagram of a device for extracting inscription words according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
It is noted that the terms "comprises" or "comprising" when utilized in various embodiments of the present application are indicative of the existence of, and do not limit the addition of, one or more functions, operations or elements of the subject application. Furthermore, as used in various embodiments of the present application, the terms "comprises," "comprising," and their cognate terms are intended to refer to a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be interpreted as first excluding the existence of or increasing likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
It should be appreciated that terms such as "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
For non-printed fonts, such as Dan Paifang stele characters, which are produced by craftsmen engraving in multiple years, unlike the stable physical environment of indoor scenes, the character areas can produce self-similarity of blocks under the influence of rain chemicals, climbing cover plant secretions and the like, which necessarily results in failure of numerous algorithms for directly applying text extraction for printed fonts. Meanwhile, for natural stone, the whole background area before and after the character carving has more complex self-similar repeated texture characteristics, so that the carving trace under the effect of weathering effect is more difficult to extract, more physical rubbing is adopted in the prior art, the time consumption is huge, and the stroke detail deviation phenomenon still exists in rubbing text.
Therefore, in order to solve the above-mentioned shortcomings of segmentation and extraction of the stone memorial archway stele text picture, the present embodiment provides a method for extracting the stele text fonts, which improves the segmentation effect of self-similar regions and boundaries in the stele text picture by adding an improved convolution operation method suitable for self-similar feature extraction in a U-Net network, and adds different slopes in an activation function to form different slopes of the activation function to assist the foreground and background separation of the stele text picture, which respectively performs activation operation in the codec structure of an encoder, thereby promoting rough extraction of different information in respective branches, and improves the subsequent decoder to generate a more differentiated feature plane, so as to realize further effective separation of the foreground and background.
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the examples and the accompanying drawings, and the exemplary embodiments of the present application and the descriptions thereof are only for explaining the present application and are not limiting the present application.
Referring to fig. 1, fig. 1 shows a flow chart of a method for extracting steganographic characters according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
s101, stacking networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; the U-Net network is constructed by an encoder and a decoder, the slope of the activation function of the encoder is greater than 1, and the slope of the activation function of the decoder is less than 1.
In this embodiment, as a technology known to those skilled in the art, a U-Net network can be constructed by an encoder and a decoder, which is a neural network for image segmentation, and accordingly, a network stack is performed on a U-Net network as a repeating unit, a second generator is constructed, as shown in fig. 2, the decoder of one U-Net network is connected to the input end of the encoder of the next U-Net network, in fig. 2, en1, en2 and En3 respectively represent the encoders of the stacked U-Net network, de1, de2 and De3 respectively represent the decoders of the stacked U-Net network, and it can be understood that the number of encoders and decoders in fig. 2 can be adaptively adjusted based on the number of processing, for example, in the practical application process, only for a small number of stele pictures can be generated, that three U-Net networks are sequentially stacked as shown in fig. 2, thereby constructing a generator in an initial state, as a field can be used, and the field can be understood as a field can be an iterative training picture can not be performed on the whole of the prior art, and the segmentation can be achieved, and the image can be extracted as a text generator. Di1 and Di2 represent a first decision maker and a second decision maker in the generation countermeasure network, and Chinese characters appearing in FIG. 2 are data tag objects used by the training generator, such as English characters True, that is, represent real tag data, and correspondingly, gloss, sloss and dloss correspond to loss functions of the second generation countermeasure network, the generator and the first generation countermeasure network during training, respectively.
Aiming at the convolution operation method suitable for the self-similar feature extraction, which is provided in the U-Net network, the basic constraints such as the same label of the similar region and the same feature value after the deep convolution operation are mainly utilized to assimilate the partial convolution kernel parameters of the shallow layer, so that the excessive detail feature extraction capability of the deep convolution neural network model in the shallow layer is weakened, the macro feature extraction capability is enhanced, and finally more accurate self-similar region segmentation is formed.
The 3×3 size convolution kernel calculation procedure used by the conventional convolution neural network is shown in part (a) of fig. 3, that is, the feature result extracted from the center point is expressed by association calculation with the surrounding 8 connected regions. Because the target area during the stele segmentation only has two types of objects, namely the foreground and the background, namely the data label only contains the foreground and the background information, the highly self-similar area of the target area is in the characteristic plane of each layer of the deep convolutional neural network model and should respectively contain characteristic values close to each other in blocks. However, the shallower the feature extraction layer of the deep convolutional neural network model, the more shallow features such as image details can be obtained, which is not beneficial to the effective extraction of self-similar features. Therefore, the embodiment improves a partial convolution operation method in the U-Net network, and the principle of the partial convolution operation method is shown in the formula (1) and the part (b) of fig. 3. The conventional 3×3 convolution process is replaced by 1×1 convolution, but the activation operation is not performed immediately after that, and the feature plane obtained by the convolution operation is subjected to the average pooling operation according to the size of 3×3, and then the conventional activation operation is performed. The convolution kernel with the size of 1 multiplied by 1 is used for guaranteeing the equalization of parameters of the convolution kernel, so that a low-level front-end network is indirectly driven to extract characteristic values which are close to each other in blocks aiming at similar areas under the constraint of the same labels of the similar areas. However, the receptive field is reduced compared to a convolution kernel of 3×3 size, and the recovery receptive field can be amplified by an average pooling of 3×3 size followed by reactivation.
The convolution operation of the U-Net network is expressed as follows(1) Wherein->Mean pooling, representing a size of 3 x 3,/->An operation representing a convolution kernel of size 1 x 1,/->And (5) representing a shallow feature map of the inscription picture.
Further, the activation function of the encoder and the activation function of the decoder are both slopeleakyrlu functions, the expression of which is shown in formula (2),(2) Wherein->Representing the input characteristic value>Indicating that a larger value is taken between 0 and the input characteristic value,/->Representing the negative half-axis slope, +.>Representing positive half-axis slope, +.>Representing the output characteristic value, when->1->Takes very small values to formLeakyReLUThe function corresponds to that shown in fig. 4. The slope of the activation function at the positive half-axis is 1 by default, indicating that the activation function is a direct mapping of the input eigenvalue. Since there are only two objects to be segmented, namely the foreground and the background, in the scene extracted by the inscription, the embodiment improves the traditionLeakyReLUThe slope of the activation function at the active activation portion. Detailed design As shown in FIG. 5, the slopes of the activation functions in the coding and decoding branches of the U-Net networkslopeRespectively isαAndβthe super-parameters are obtained according to experimental tests as super-parameters of a network model. To sum up, the embodiment is as followsLeakyReLUAdding different slopes to the activation functionSlopeLeakyReLUAnd the activation function is used for performing activation operation in the coding and decoding structures of the encoder respectively, so that rough extraction of different information in respective branches is promoted, a subsequent decoder is promoted to generate a characteristic plane with more distinguishing property, and effective separation of foreground and background of the inscription picture is realized.
S102, forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, performing cross training on the first generator and the second generator according to the two groups of generating countermeasure networks, and obtaining the second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training times reach the preset times, wherein the last U-Net network of the second generator is the first generator.
In this embodiment, the cross training includes the following steps: the decoder of the first generator and a pre-trained first decision device form a first group of generation countermeasure network, the decoder of the first generator is trained according to the first generation countermeasure network, so that the decoder of the first generator has character generation and segmentation capability, and the trained decoder of the first generator is obtained; wherein the pre-training of the first decider is accomplished by a dataset generated from a plurality of different fonts of literal pictures of the same literal;
the second generator and the pre-trained second decision device form a second group to generate a countercheck network, training parameters of the decoder of the first generator after training is loaded in the decoders of all repeated network units of the second generator, the second generator after training parameters are loaded is trained according to the second group of countercheck network, and when the training parameters reach preset times, the second generator with segmentation and extraction capacity on the foreground and the background of the inscription picture is obtained, wherein the training of the second decision device is completed by data marked by the text picture of the real inscription.
Specifically, in the framework shown in fig. 2, two sets of generated countermeasure networks are included, and training of the second set of networks is performed only after training of the first set of networks is completed.
The first group of generating countermeasure network is composed of a decoder De3 of the first generator and a first decision device Di1, and is used for training the decoder De3, so that the decoder De3 has basic character generating and dividing capability, and the problem that the whole generator is difficult to train due to severe blurring of the inscription pictures is prevented. The data set is a plurality of other fonts, namely, characters and pictures similar to the fonts of the inscription to be segmented, and particularly, when the decoder of the first generator is trained according to the first generation reactance network, the feature layers for splicing operation, which are transmitted by the equal-scale encoders required by the decoder, are randomly generated, so that the overfitting of the first generator during training is avoided.
The second group of generating countermeasure network is composed of a second generator and a second decision device Di2, and is used for training the complete inscription segmentation and extracting the second generator so that the second generator can segment the inscription words with high self-similarity of the background and the foreground. The second generator loads the first set of training parameters of the decoder De3 obtained by the countermeasure network training into the decoders De1, de2 and De3, respectively, before training. The data set is real and effective stele picture data and expansion samples thereof, and when the second generator loaded with training parameters is trained according to the second generation countermeasure network, other font character pictures of the same Chinese character similar to the target character font are added between the encoder and the decoder of the second generator, so that the compression process of each encoder is corrected, the generation process of each decoder is restrained, and the network parameters of the second generator are finely adjusted. The complete training process, as shown in algorithm 1, algorithm 1 is as follows:
loading the trained parameters to a decoder De3 of the first generator;
randomly generating a set of m two-dimensional Gaussian samples
Randomly sampling m samples from a dataset training a first decision maker
The second decision device Di2 is trained by using the gradient descent method:
randomly generating a set of m two-dimensional Gaussian samples
Using the gradient descent method, the decoder De3 is trained:
loading decoder De3 parameters into decoder De3, decoder De2 and decoder De1;
randomly generating a set of m two-dimensional Gaussian samples
Randomly sampling m samples from a dataset training a first decision maker
The first decision device Di1 is trained by using the gradient descent method:
randomly generating a set of m two-dimensional gaussian samplesThe book is provided with
Training the stacked U-Net network by gradient descent, namely the generator:
the explanation of the parameters in algorithm 1 above is as follows:、/>、/>、/>respectively representing the gradient of the scalar of the corresponding lower right corner; />Representing Di3 decider,/->Representing Di2 decider,/->Representing Di1 decision maker,>representing a second generator->Representing a first generator.
The above procedure based on algorithm 1 is a routine technical means for a person skilled in the art, so this embodiment is given as a simple example and is not described in detail.
S103, dividing and extracting the inscription of the inscription picture to be processed according to the second generator.
In this embodiment, the second generator based on training may be defined as an image segmentation extraction network, which is equivalent to the second generator in practice, and is called differently in different situations by the same network, so as to solve the problem that the prior art cannot effectively extract the inscription with the highly self-similar foreground and background features, thereby realizing the word separation of the highly self-similar background and foreground inscription pictures, and generating the word content in the inscription pictures.
In one embodiment, the method further comprises: dividing the characteristic diagram of the channel dimension of the minimum scaling of the U-Net network into two groups, and splicing one group of characteristic diagram with the characteristic diagram of the other group of channel dimension after multiplying the characteristic diagram with a single-channel character mask template with the same scale so as to realize the constraint on the compression process of the encoder and the guidance on the generation process of the decoder; wherein the U-Net network comprises a four-level scaling process.
In this embodiment, the overall improved structure of the stacked U-Net network is shown in fig. 6, and the input/output feature graphs are 256×256, which includes four-level scaling process. The feature map of the minimum-scale branch is 32×32, specifically as shown in fig. 6, that is, a horizontal row corresponding to the minimum box in the map is the minimum-scale branch, the feature map of the middle 1024 channels is equally divided into two groups, one group is multiplied by a single-channel text mask template with the same scale, and then the two groups are spliced again in the channel dimension, so that the basic constraint on the compression process of the encoder is realized, the basic guidance on the generation process of the decoder is also realized, and the problem that the model is difficult to fit due to the self-similarity of the foreground and the background is avoided. As one skilled in the art can appreciate, before the final output of the prediction feature map by the U-Net network, a sigmoid activation function can be used to implement classification of the foreground and the background of the inscription picture.
In one embodiment, a weighted two-class cross entropy loss function is employed as the loss function for the second generator training.
In this embodiment, as shown in fig. 2, sloss is a cross entropy loss function with weight two classifications. With reference to the intermediate supervision method used in stacked hourglass networks, corresponding images are also added between every two U-Net networksA segmentation supervision process for more efficient training of the model. Combining the intermediate supervision process and the finally output binary segmentation requirement, adopting weighted two-class cross entropy shown in the formula (3) as a loss function, namely(3) Wherein, the method comprises the steps of, wherein,Nrepresenting the number of pixels of the image,/->Representing the probability that the sample is predicted to be positive, +.>A 0 or 1 may be taken to indicate that the predicted output does not match or matches the tag, respectively; />、/>All represent hyper-parameters.
The invention also provides a device for extracting the inscription, which can be used for executing the method for extracting the inscription according to any one of the embodiments of the invention.
Referring to fig. 7, fig. 7 shows a schematic block diagram of a steganographic extraction device according to an embodiment of the present invention, where the device includes:
a second generator construction module 710, configured to perform network stacking with a U-Net network as a repeating network unit to obtain a second generator, where convolution operation of the U-Net network is sequentially completed by convolution kernel of 1×1 size and average pooling of 3×3 size; constructing a U-Net network by an encoder and a decoder, wherein the slope of an activation function of the encoder is greater than 1, and the slope of an activation function of the decoder is less than 1;
the cross training module 720 is configured to form a first group of generating countermeasure network according to the first generator and the first decision device, form a second group of generating countermeasure network according to the second generator and the second decision device, perform cross training on the first generator and the second generator according to the two groups of generating countermeasure networks, and obtain the second generator with segmentation and extraction capabilities on the foreground and the background of the stele picture when the training times reach the preset times, wherein the last U-Net network of the second generator is the first generator;
the text extraction module 730 is configured to segment and extract the inscription text of the inscription picture to be processed according to the second generator.
The method for extracting the inscription words in the embodiment of the present application is based on the same concept as the method for extracting the inscription words shown in fig. 1, and by the above detailed description of the method for extracting the inscription words, a person skilled in the art can clearly understand the implementation process of the apparatus for extracting the inscription words in the embodiment, so that the description is omitted herein for brevity.
Correspondingly, the device for extracting the inscription text provided by the embodiment improves the segmentation effect of the self-similarity region and the boundary in the inscription picture by adding the improved convolution operation method suitable for extracting the self-similarity feature in the U-Net network, and adds different slopes in the activation function to form different slopes of the activation function to assist the foreground and background separation of the inscription picture, which respectively perform the activation operation in the coding and decoding structure of the encoder, so as to promote the rough extraction of different information in the respective branches, promote the subsequent decoder to generate a more distinguishing characteristic plane, realize the further effective separation of the foreground and background, further, perform network stacking by the improved U-Net network to construct a second generator, perform training for the second generator by two groups of generation countermeasure networks, respectively realize the pre-training of the text generating capability of the decoder and the iterative training of the generator under different data sets, thereby enabling the second generator to have the capability of segmenting and extracting the text of the inscription picture with the background foreground height self-similarity.
In still another embodiment of the present invention, an electronic device is further provided, and referring to fig. 8, fig. 8 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application. Wherein the electronic device 800 comprises a processor 810, a memory 820, a communication interface 830, and at least one communication bus for connecting the processor 810, the memory 820, the communication interface 830. Memory 820 includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (PROM), or portable read only memory (CD-ROM), and memory 420 is used for associated instructions and data.
The communication interface 830 is used to receive and transmit data. The processor 810 may be one or more CPUs, and in the case where the processor 810 is one CPU, the CPU may be a single core CPU or a multi-core CPU. The processor 810 in the electronic device 800 is configured to read one or more programs 821 stored in the memory 820, and perform the following operations: stacking the networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; constructing a U-Net network by an encoder and a decoder, wherein the slope of an activation function of the encoder is greater than 1, and the slope of an activation function of the decoder is less than 1; forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, performing cross training on the first generator and the second generator according to the two groups of generating countermeasure networks, and obtaining a second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training times reach a preset number of times, wherein the last U-Net network of the second generator is the first generator; dividing and extracting the inscription words of the inscription picture to be processed according to the second generator.
It should be noted that, the specific implementation of each operation may be described in the foregoing corresponding description of the method embodiment shown in fig. 1, and the electronic device 800 may be used to execute the method for extracting the inscription text in the foregoing method embodiment of the present application, which is not described herein in detail.
In yet another embodiment of the present invention, there is also provided a computer-readable storage medium that is a memory device in a computer device for storing a program and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the method for extracting inscription words in the above embodiments. It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. A method for extracting stele characters is characterized by comprising the following steps:
stacking the networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; constructing a U-Net network by an encoder and a decoder, wherein the slope of an activation function of the encoder is greater than 1, and the slope of an activation function of the decoder is less than 1;
forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, performing cross training on the first generator and the second generator according to the two groups of generating countermeasure networks, and obtaining a second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training times reach a preset number of times, wherein the last U-Net network of the second generator is the first generator; the decoder of the first generator is trained according to the first generation antagonism network, so that the decoder of the first generator has character generation and segmentation capability, and the trained decoder of the first generator is obtained; wherein the pre-training of the first decider is accomplished by a dataset generated from a plurality of different fonts of literal pictures of the same literal; forming a second group of generating countermeasure networks by a second generator and a pre-trained second decision maker, loading training parameters of a decoder of a first generator after training is completed in decoders of all repeated network units of the second generator, training the second generator after loading the training parameters according to the second group of countermeasure networks, and obtaining the second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training parameters reach preset times, wherein the training of the second decision maker is completed by data marked by the text picture of a real inscription;
dividing and extracting the inscription words of the inscription picture to be processed according to the second generator.
2. The method for extracting steganographic characters according to claim 1, wherein the convolution operation of the U-Net network has the expression ofWherein->Mean pooling, representing a size of 3 x 3,/->An operation representing a convolution kernel of size 1 x 1,/->And (5) representing a shallow feature map of the inscription picture.
3. The method of claim 1, wherein the encoder activation function and the decoder activation function are both slopeleakrrelu functions.
4. The method for extracting the steganographic characters according to claim 1, wherein feature graphs of channel dimensions of the minimum scale scaling of the U-Net network are divided into two groups, and after multiplication operation is carried out on one group of feature graphs and a single-channel character mask template with the same scale, the feature graphs of the other group of channel dimensions are spliced to realize constraint on the compression process of an encoder and guidance on the generation process of a decoder; wherein the U-Net network comprises a four-level scaling process.
5. A method of steganographic extraction as recited in claim 1, characterized by using a weighted two-class cross entropy loss function as the loss function trained by the second generator.
6. The method for extracting steganography of claim 1, wherein, when the decoder of the last U-Net network is trained according to the first generation resist network, feature layers for splicing operation transmitted by the equivalent-scale encoder required in the decoder are randomly generated;
when training the second generator loaded with training parameters according to the second generation countermeasure network, other font character pictures of the same Chinese character similar to the target character font are added between the encoder and the decoder of the repeated structural unit of the generator so as to correct the compression process of the encoder and the generation process of the constraint decoder and finely adjust the network parameters of the second generator.
7. A steganographic word extraction device, the device comprising:
the second generator construction module is used for stacking networks by taking a U-Net network as a repeated network unit to obtain a second generator, wherein the convolution operation of the U-Net network is sequentially completed by convolution kernels with the size of 1 multiplied by 1 and average pooling with the size of 3 multiplied by 3; constructing a U-Net network by an encoder and a decoder, wherein the slope of an activation function of the encoder is greater than 1, and the slope of an activation function of the decoder is less than 1;
the cross training module is used for forming a first group of generating countermeasure network according to the first generator and the first decision device, forming a second group of generating countermeasure network according to the second generator and the second decision device, carrying out cross training on the first generator and the second generator according to the two groups of generating countermeasure network, and obtaining the second generator with segmentation and extraction capability on the foreground and the background of the stele picture when the training times reach the preset times, wherein the last U-Net network of the second generator is the first generator; the decoder of the first generator is trained according to the first generation antagonism network, so that the decoder of the first generator has character generation and segmentation capability, and the trained decoder of the first generator is obtained; wherein the pre-training of the first decider is accomplished by a dataset generated from a plurality of different fonts of literal pictures of the same literal; forming a second group of generating countermeasure networks by a second generator and a pre-trained second decision maker, loading training parameters of a decoder of a first generator after training is completed in decoders of all repeated network units of the second generator, training the second generator after loading the training parameters according to the second group of countermeasure networks, and obtaining the second generator with segmentation and extraction capability on the foreground and the background of the inscription picture when the training parameters reach preset times, wherein the training of the second decision maker is completed by data marked by the text picture of a real inscription;
and the character extraction module is used for dividing and extracting the inscription characters of the inscription picture to be processed according to the second generator.
8. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor performs the steps of a method of steganography extraction as recited in any of claims 1 to 6.
9. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, wherein the computer program, when executed by a processor, implements the steps of a method for extracting inscription words as claimed in any one of claims 1 to 6.
CN202311336471.5A 2023-10-16 2023-10-16 Method, device, equipment and medium for extracting stele characters Active CN117079263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311336471.5A CN117079263B (en) 2023-10-16 2023-10-16 Method, device, equipment and medium for extracting stele characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311336471.5A CN117079263B (en) 2023-10-16 2023-10-16 Method, device, equipment and medium for extracting stele characters

Publications (2)

Publication Number Publication Date
CN117079263A CN117079263A (en) 2023-11-17
CN117079263B true CN117079263B (en) 2024-01-02

Family

ID=88706447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311336471.5A Active CN117079263B (en) 2023-10-16 2023-10-16 Method, device, equipment and medium for extracting stele characters

Country Status (1)

Country Link
CN (1) CN117079263B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524205A (en) * 2020-04-23 2020-08-11 北京信息科技大学 Image coloring processing method and device based on loop generation countermeasure network
CN113837366A (en) * 2021-09-23 2021-12-24 中国计量大学 Multi-style font generation method
CN114092926A (en) * 2021-10-20 2022-02-25 杭州电子科技大学 License plate positioning and identifying method in complex environment
CN114943204A (en) * 2022-05-30 2022-08-26 华南理工大学 Chinese character font synthesis method based on generation countermeasure network
CN115205420A (en) * 2022-07-13 2022-10-18 陕西师范大学 Method for generating ancient character fonts based on GAN network
CN115221842A (en) * 2022-08-31 2022-10-21 内江师范学院 Font style migration method, system and equipment based on small sample dataset
CN115457568A (en) * 2022-09-20 2022-12-09 吉林大学 Historical document image noise reduction method and system based on generation countermeasure network
CN115797216A (en) * 2022-12-14 2023-03-14 齐鲁工业大学 Inscription character restoration model and restoration method based on self-coding network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210287096A1 (en) * 2020-03-13 2021-09-16 Nvidia Corporation Microtraining for iterative few-shot refinement of a neural network
US20230005107A1 (en) * 2021-06-30 2023-01-05 Palo Alto Research Center Incorporated Multi-task text inpainting of digital images

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524205A (en) * 2020-04-23 2020-08-11 北京信息科技大学 Image coloring processing method and device based on loop generation countermeasure network
CN113837366A (en) * 2021-09-23 2021-12-24 中国计量大学 Multi-style font generation method
CN114092926A (en) * 2021-10-20 2022-02-25 杭州电子科技大学 License plate positioning and identifying method in complex environment
CN114943204A (en) * 2022-05-30 2022-08-26 华南理工大学 Chinese character font synthesis method based on generation countermeasure network
CN115205420A (en) * 2022-07-13 2022-10-18 陕西师范大学 Method for generating ancient character fonts based on GAN network
CN115221842A (en) * 2022-08-31 2022-10-21 内江师范学院 Font style migration method, system and equipment based on small sample dataset
CN115457568A (en) * 2022-09-20 2022-12-09 吉林大学 Historical document image noise reduction method and system based on generation countermeasure network
CN115797216A (en) * 2022-12-14 2023-03-14 齐鲁工业大学 Inscription character restoration model and restoration method based on self-coding network

Also Published As

Publication number Publication date
CN117079263A (en) 2023-11-17

Similar Documents

Publication Publication Date Title
US10817741B2 (en) Word segmentation system, method and device
CN111723585B (en) Style-controllable image text real-time translation and conversion method
US20190180154A1 (en) Text recognition using artificial intelligence
CN111160343B (en) Off-line mathematical formula symbol identification method based on Self-Attention
WO2022142611A1 (en) Character recognition method and apparatus, storage medium and computer device
CN111444919A (en) Method for detecting text with any shape in natural scene
CN106570521B (en) Multilingual scene character recognition method and recognition system
CN110188762B (en) Chinese-English mixed merchant store name identification method, system, equipment and medium
US10373022B1 (en) Text image processing using stroke-aware max-min pooling for OCR system employing artificial neural network
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN112966685B (en) Attack network training method and device for scene text recognition and related equipment
CN116110036B (en) Electric power nameplate information defect level judging method and device based on machine vision
CN115082693A (en) Multi-granularity multi-mode fused artwork image description generation method
CN110610006B (en) Morphological double-channel Chinese word embedding method based on strokes and fonts
CN110858307B (en) Character recognition model training method and device and character recognition method and device
Wu et al. STR transformer: a cross-domain transformer for scene text recognition
CN111242114B (en) Character recognition method and device
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN110852102B (en) Chinese part-of-speech tagging method and device, storage medium and electronic equipment
CN112183494A (en) Character recognition method and device based on neural network and storage medium
CN117079263B (en) Method, device, equipment and medium for extracting stele characters
Wijerathna et al. Recognition and translation of Ancient Brahmi Letters using deep learning and NLP
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
CN112036290A (en) Complex scene character recognition method and system based on class mark coding representation
CN114092931B (en) Scene character recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant