CN115546801A - Method for extracting paper image data features of test document - Google Patents

Method for extracting paper image data features of test document Download PDF

Info

Publication number
CN115546801A
CN115546801A CN202210725519.0A CN202210725519A CN115546801A CN 115546801 A CN115546801 A CN 115546801A CN 202210725519 A CN202210725519 A CN 202210725519A CN 115546801 A CN115546801 A CN 115546801A
Authority
CN
China
Prior art keywords
image
input
area
sequence
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210725519.0A
Other languages
Chinese (zh)
Inventor
严浩
王芳潇
范强
江春
周晓磊
张骁雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202210725519.0A priority Critical patent/CN115546801A/en
Publication of CN115546801A publication Critical patent/CN115546801A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1914Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries, e.g. user dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for extracting paper image data features of a test document. The method comprises the steps of image preprocessing, namely performing paging, inclination correction and binarization operation on paper image data of a test document; performing layout analysis, and detecting a field area, a table area, an iconic notation area, a page number area and an image area contained in the image based on RefineNet; establishing an index, and indexing the detected character area, table area, icon area, page number area and image area by establishing a data dictionary; and character recognition, namely recognizing paragraph region characters, table region characters and page numbers through a character recognition technology based on CRNN. Compared with the prior art, the method has the characteristics of light model weight and short detection and identification time, can effectively shorten the time for manually inputting the paper image data, and saves the labor cost.

Description

Method for extracting paper image data features of test document
Technical Field
The invention belongs to the field of image feature extraction, and particularly relates to a test document paper image data feature extraction method.
Background
The electronization of paper documents is a current trend of informatization construction, an intelligent digital extraction method is lacked in the acquisition of paper image data of a test document by current enterprises and public institutions, the uniform standardized extraction of the paper image data is difficult, a standard and standardized data set foundation cannot be provided for data mining analysis and artificial intelligence model training, and the acquired traditional data information cannot provide required intelligent service application for the enterprises and public institutions.
Disclosure of Invention
The invention provides a test document paper image data feature extraction method, which aims at the technical problems of poor paper image data acquisition effect and long consumed time in the prior art. The method provides an index function for a paragraph area, a table area, a chart (table) note area, a page number area and an image area of paper image data, and supports quick identification of paragraph area characters, table area characters and page numbers.
The invention specifically adopts the following technical scheme: a method for extracting paper image data features of a test document comprises the following steps:
step SS1: uploading an image to be identified, comprising: uploading a PDF image to be identified to a processing program;
step SS2: image pre-processing, comprising: strengthening effective image information of the PDF image to be identified, and weakening redundant or invalid information, wherein the effective image information comprises paging processing, inclined image correction processing and image binarization processing of input multi-page PDF data;
and step SS3: layout analysis, comprising: detecting a field area, a table area, a legend area, a page number area and an image area by intelligently identifying the preprocessed image data based on RefineNet;
and step SS4: establishing an index, comprising: establishing indexes for different areas identified in paper image data through a data dictionary, and mapping the position relation of different area types;
step SS5: character recognition, comprising: establishing a CRNN character recognition model, wherein the CRNN character recognition model comprises a CNN layer, an RNN convolution layer and a CTC layer; firstly, extracting the features of a picture through a convolutional neural network to obtain an input feature sequence, and then predicting the input feature sequence by adopting an LSTM cyclic neural network to obtain more context information; finally, solving the alignment problem of the indefinite length input by taking the CTC as a loss function;
step SS6: verifying after identification, comprising: after intelligent feature extraction of characters is carried out, aiming at recognized errors, checking and error correction are carried out on a front-end web interface.
As a preferred embodiment, the step SS2 includes the following steps:
step SS21: paging the multiple pages of PDF images;
step SS22: image inclination correction, namely performing Hough transformation on the shot inclined paper-based image to obtain a corrected image;
step SS23: and (3) image binarization processing, namely performing image binarization by adopting one-dimensional maximum entropy threshold segmentation, so that the quality of the input PDF image is improved to the greatest extent, and the requirement of a subsequent automatic input system on the input image is met.
As a preferred embodiment, the layout analysis in step SS3 includes: the method comprises the following steps that an intelligent identification method based on RefineNet is adopted, a frame of the RefineNet consists of two modules, namely an ARM module and an ODM module, and the ARM module and the ODM module are connected through a TCB; the loss function is shown as the following formula and comprises an ARM module and an ODM module, wherein the ARM module comprises two categories of loss lb and regression loss Lr; similarly, the ODM module comprises loss lm and regression loss Lr of Multi-class Classification;
Figure BDA0003713130520000031
wherein p is i And x i Representing confidence of Anchor classification and regression coordinates in ARM module, p i And x i Representing confidence and coordinate regression of the referred Anchor classification in the ODM module; n is a radical of arm And N odm Represents the number of positive samples in batch;
Figure BDA0003713130520000032
represents the ground truth position and size of the ith anchor.
As a preferred embodiment, the step SS5 of predicting the sequence based on CRNN's character recognition comprises the following steps:
step SS51: CNN model design, adopting convolution layer and maximum pooling layer in VGG structure to extract features of image sequence;
step SS52: the design of an RNN layer, wherein a deep bidirectional recurrent neural network Bi-LSTM is adopted as the RNN layer, and the RNN layer corresponds to one output for each input of the characteristic sequence input by the CNN layer;
step SS53: designing a CTC layer, defining a keyword-winning mother table/syllable set in a sequence annotation task as A, wherein A' is an expansion table set added with blank characters;
Figure BDA0003713130520000033
inputting a sequence x, A 'with the length of T for the probability of outputting an element k at the moment T by the CTC network' T Is a sequence set with the length of T in the A' set; assuming that the outputs at different times are conditionally independent, after inputting x, any path in the set π ∈ A' T The probability distribution of (a) is:
Figure BDA0003713130520000034
l is denoted as A' T A 'is defined as the sequence of output labels in the set, where multiple paths in the set are mapped to the same result' T →A ≤T Mapping from the path set to the final prediction sequence is realized; the probability of predicting a true tag sequence is expressed as:
Figure BDA0003713130520000041
where p (l | x) is the probability that the true tag sequence is predicted.
As a preferred embodiment, the step SS51 further includes fine-tuning the VGG network: changing the core scale of the third and fourth largest pooling layers from 2 x 2 to 1 x 2; the fifth and sixth convolution layers are followed by a Batch Normalization layer to speed up the training process.
As a preferred embodiment, the step SS52 specifically includes: to prevent the gradient from vanishing during training and to use both the forward and backward information of the sequence for the prediction of the sequence, the deep Bi-directional recurrent neural network Bi-LSTM controls the long-term state c by 3 "gates", denoted as:
g(x)=σ(Wx+b)
wherein g (x) is a control gate function, sigma is a sigmoid function, W is a weight vector of a gate, b is a bias term, and the input is x; since σ is a sigmoid function, and the value range is (0, 1), the states of the gates are all half-open and half-closed;
the first gate controls the storage of the long-term state c and is called a forgetting gate f _ t;
f_t=σ(W_f·[h_(t-1),x_t]+b_f)
wherein, W _ f is a weight matrix of a forgetting gate, [ h _ (t-1), x _ t ] is a merged hidden layer and a merged matrix of current input, and b _ f is a weight matrix;
the second gate controls the input of the instant state to the long-term state c, which is called an input gate i _ t;
i_t=σ(W_i·[h_(t-1),x_t]+b_i)
wherein, W _ i is a weight matrix, [ h _ (t-1), x _ t ] is a merged matrix of the merged hidden layer and the current input, and b _ i is an offset term;
the third "gate" for describing the currently input cell state
Figure BDA0003713130520000051
Controlling the output quantity of the Bi-LSTM in the current deep bidirectional circulation neural network in the long-term state c;
Figure BDA0003713130520000052
wherein, W _ c is a weight matrix, [ h _ (t-1), x _ t ] is a merged matrix of the merged hidden layer and the current input, b _ c is an offset term;
the current state c _ ((t)) is represented as the state of the previous cell c _ ((t-1)) times the forgetting gate f _ t by element, plus the current input cell state
Figure BDA0003713130520000053
Multiplying the input gate i _ t by element yields:
Figure BDA0003713130520000054
the forgetting door controls the state of the unit, so that the early state information can be stored, the input door controls the current input, and the input quantity of the current state can be controlled to enter the memory;
the output gate o _ ((t)) controls the effect of the long-term state on the current output:
o_((t))=σ(W_o·[h_(t-1),x_t]+b_o)
wherein, W _ o is a weight matrix, [ h _ (t-1), x _ t ] is a merged matrix of the merged hidden layer and the current input, and b _ o is an offset term;
the output gate and cell state together determine the final output h _ t of LSTM:
h_t=o_((t))·tanh(c_t)。
as a preferred embodiment, the step SS6 specifically includes: and (4) selecting a target detection text area, and adding an editable window to realize the change of the identification error.
Compared with the prior art, the invention has the beneficial effects that: aiming at the problems of poor acquisition effect and long time consumption of the existing paper image data, the invention provides a test document paper image data feature extraction method, which is characterized in that a PDF image to be identified is uploaded to a processing program; strengthening effective image information of the PDF image to be identified, and weakening redundant or invalid information, wherein the effective image information comprises paging processing, inclined image correction processing and image binarization processing of input multi-page PDF data; detecting a field area, a table area, a legend area, a page number area and an image area by intelligently identifying the preprocessed image data based on RefineNet; establishing indexes for different areas identified in paper image data through a data dictionary, and mapping the position relations of different area types; establishing a CRNN character recognition model, wherein the CRNN character recognition model comprises a CNN layer, an RNN convolutional layer and a CTC layer; firstly, extracting the features of a picture through a convolutional neural network to obtain an input feature sequence, and then predicting the input feature sequence by adopting an LSTM cyclic neural network to obtain more context information; finally, solving the alignment problem of the indefinite length input by taking the CTC as a loss function; after the intelligent character extraction is carried out on characters, the verification and the error investigation are carried out on a front-end web interface aiming at the identified error, the rapid character extraction can be carried out on paper image data under the limited computing resource on the premise of not sacrificing great precision, and the problems of poor acquisition effect and long time consumption of the existing paper image data are integrally solved.
Drawings
FIG. 1 is a flow chart of a method for extracting paper image data features of a test document according to the present invention;
FIG. 2 is a CRNN network structure;
fig. 3 is a check chart after identification.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1: as shown in fig. 1, 2 and 3, the invention provides a method for extracting paper image data features of a test document, which comprises the following specific steps:
and S1, uploading PDF images to be identified.
And S2, image preprocessing, namely performing paging processing, inclination correction and binarization processing on the uploaded image.
Specifically, in step S2, the following steps are further included:
step S21, paging processing is carried out on a plurality of pages of PDF images;
s22, correcting the image inclination, namely performing Hough transformation on the shot inclined papery image to obtain a corrected image;
and S23, carrying out image binarization processing, wherein one-dimensional maximum entropy threshold segmentation is adopted for image binarization, so that the quality of the input PDF image is improved to the greatest extent, and the requirement of a subsequent automatic input system on the input image is met.
And S3, analyzing the layout, and detecting a field area, a table area, a chart (table) note area and a page number area of the preprocessed image according to different area types.
And step S4, as shown in fig. 2, mapping the detected regions of different types to corresponding identifiers, respectively, to implement indexing of the regions of different types.
Step S5, based on character recognition of CRNN, FIG. 3 is the structure diagram of the CRNN network, including a CNN layer, an RNN layer and a CTC layer.
Specifically, the CNN layer performs feature extraction on the image sequence by using a convolutional layer and a maximum pooling layer in the VGG structure. Specifically, the step S41 performs fine tuning on the VGG network, which mainly includes:
changing the nuclear scale of the third and fourth largest pooling layers from 2 x 2 to 1 x 2;
the fifth and sixth convolution layers are followed by a Batch Normalization layer to speed up the training process.
The RNN layer is designed by adopting a Deep Bidirectional recurrent neural network (Bi-LSTM) as the RNN layer. The RNN layer corresponds to a sequence of features for CNN layer inputs, one output for each input. In order to prevent the gradient from disappearing during training, and sequence forward information and backward information are used for the prediction of the sequence. In particular, bi-LSTM controls the long-term state c through 3 "gates". The "gate" can be expressed as:
g(x)=σ(Wx+b)
g (x) is a control gate function, sigma is a sigmoid function, W is a weight vector of the gate, b is an offset term, and the input is x. Since σ is a sigmoid function and the range is (0, 1), the gate states are both half-open and half-closed.
The first "gate" controlling the saving of the long-term state c is called the forgetting gate f t
f t =σ(W f ·[h t-1 ,x t ]+b f )
Wherein, W f Is the forgetting gate weight matrix, [ h ] t-1 ,x t ]Is a combined matrix of the combined hidden layer and the current input, b f Is a weight matrix.
The second "gate" controlling the entry of the instantaneous state into the long-term state c, called entry gate i t
i t =σ(W i ·[h t-1 ,x t ]+b i )
Wherein, W i Is a weight matrix, [ h ] t-1 ,x t ]Is a combined matrix of the combined hidden layer and the current input, b i Is a bias term.
The third "gate" for describing the currently input cell state
Figure BDA0003713130520000081
The output of the long-term state c at the current LSTM is controlled.
Figure BDA0003713130520000082
Wherein, W c Is a weight matrix, [ h ] t-1 ,x t ]Is a combined matrix of the combined hidden layer and the current input, b c Is a bias term.
Current state c (t) State c represented as the last cell (t-1) Forgetting gate f by element t Plus the current input cell state
Figure BDA0003713130520000083
Multiply input Gate by element i t Obtaining:
Figure BDA0003713130520000091
the forgetting door controls the state of the unit, so that the early state information can be stored, and the input door controls the current input and controls the input quantity of the current state to enter the memory.
Output gate controls the effect of long-term conditions on the current output:
o (t) =σ(W o ·[h t-1 ,x t ]+b o )
wherein, W o Is a weight matrix, [ h ] t-1 ,x t ]Is a combined matrix of the combined hidden layer and the current input, b o Is the bias term.
The output gate and cell states together determine the final output h of the LSTM t
h t =o (t) ·tanh(c t )
The CTC layer is designed to define a keyword table/syllable set A in the sequence marking task. A' is an extended table set with blank characters added.
Figure BDA0003713130520000092
And outputting the probability of the element k for the CTC network at the time t. Inputting sequence x, A 'with length of T' T Is the set of sequences of length T in the A' set. Assuming that the outputs at different times are conditionally independent, after inputting x, any one path in the set is obtainedπ∈A′ T The probability distribution of (c) is:
Figure BDA0003713130520000093
l is denoted as A' T The sequence of output labels in the set, where multiple paths in the set would map to the same result, defines a function B: a' T →A ≤T Mapping from the set of paths to the final predicted sequence is achieved. The probability of predicting a true tag sequence can be expressed as:
Figure BDA0003713130520000094
and S6, verifying after identification, wherein the method mainly comprises the steps of selecting a target detection character area in a frame, and adding an editable window to realize the change of the identification error.
The invention provides an intelligent detection and identification technology of paper document image data based on deep learning, which can perform intelligent data feature extraction on a large number of paper document scanned parts. Inputting paper image data, converting the paper image data into a picture sequence, and detecting a field area, a table area, a chart (table) note area and a page number area in a document by using a layout recognition algorithm based on RefineNet. And for the detected fields, tables, drawing (table) notes and page numbers, recognizing the characters by using the intelligent recognition technology based on CRNN. And manual verification is added after the identification, and manual verification and error correction can be realized on a front-end web interface.
It should be noted that, the Anchor reference Module is referred to as an ARM Module for short; an Object Detection Module, abbreviated as an ODM Module; the transmission Connection Block is called TCB for short.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (7)

1. A method for extracting paper image data features of a test document is characterized by comprising the following steps:
step SS1: uploading an image to be identified, comprising: uploading a PDF image to be identified to a processing program;
step SS2: image pre-processing, comprising: strengthening effective image information of the PDF image to be identified, and weakening redundant or invalid information, wherein the effective image information comprises paging processing, inclined image correction processing and image binarization processing of input multi-page PDF data;
and step SS3: layout analysis, comprising: detecting a field area, a table area, a legend area, a page number area and an image area from the preprocessed image data through intelligent recognition based on RefineNet;
and step SS4: establishing an index, comprising: establishing indexes for different areas identified in paper image data through a data dictionary, and mapping the position relations of different area types;
step SS5: character recognition, comprising: establishing a CRNN character recognition model, wherein the CRNN character recognition model comprises a CNN layer, an RNN convolution layer and a CTC layer; firstly, extracting the features of a picture through a convolutional neural network to obtain an input feature sequence, and then predicting the input feature sequence by adopting an LSTM cyclic neural network to obtain more context information; finally, solving the alignment problem of indefinite length input by taking the CTC as a loss function;
step SS6: verifying after identification, comprising: after intelligent character extraction, the error is checked and surveyed on the front-end web interface aiming at the identified error.
2. The method for extracting paper image data features of test paper documents as claimed in claim 1, wherein the step SS2 comprises the following steps:
step SS21: paging the multiple pages of PDF images;
step SS22: correcting the image inclination, namely performing Hough transformation on the shot inclined paper-based image to obtain a corrected image;
step SS23: and (3) image binarization processing, namely, adopting image binarization and adopting one-dimensional maximum entropy threshold segmentation, so that the quality of the input PDF image is improved to the greatest extent, and the requirement of a subsequent automatic input system on the input image is met.
3. The method for extracting paper image data of test paper as claimed in claim 1, wherein the layout analysis in step SS3 includes: the method comprises the following steps that an intelligent identification method based on RefineNet is adopted, a frame of the RefineNet consists of two modules, namely an ARM module and an ODM module, and the ARM module and the ODM module are connected through a TCB; the loss function is shown as the following formula and comprises an ARM module and an ODM module, wherein the ARM module comprises two categories of loss lb and regression loss Lr; similarly, the ODM module comprises loss lm and regression loss Lr of Multi-class Classification;
Figure FDA0003713130510000021
wherein p is i And x i Representing confidence of Anchor classification and regression coordinates in ARM module, p i And x i Representing confidence and coordinate regression of the referred Anchor classification in the ODM module; n is a radical of hydrogen arm And N odm Represents the number of positive samples in batch;
Figure FDA0003713130510000022
represents the ground truth position and size of the ith anchor.
4. The method for extracting paper image data features of test paper as claimed in claim 1, wherein the step SS5 of predicting the sequence based on CRNN character recognition comprises the following steps:
step SS51: CNN model design, adopting convolution layer and maximum pooling layer in VGG structure to extract image sequence features;
step SS52: the design of an RNN layer, wherein a deep bidirectional recurrent neural network Bi-LSTM is adopted as the RNN layer, and the RNN layer corresponds to one output for each input of the characteristic sequence input by the CNN layer;
step SS53: designing a CTC layer, defining a keyword-winning mother table/syllable set in a sequence annotation task as A, wherein A' is an expansion table set added with blank characters;
Figure FDA0003713130510000031
inputting a sequence x, A 'with the length of T for the probability of outputting an element k at the moment T by the CTC network' T Is a sequence set with the length of T in the A' set; assuming that the outputs at different times are conditionally independent, after inputting x, any path in the set π ∈ A' T The probability distribution of (c) is:
Figure FDA0003713130510000032
l is denoted as A' T And (3) outputting a sequence of labels in the set, wherein a plurality of paths in the set can be mapped to the same result, and a function B is defined: a' T →A ≤T Mapping from the path set to the final prediction sequence is realized; the probability of predicting a true tag sequence is expressed as:
Figure FDA0003713130510000033
where p (l | x) is the probability that the true tag sequence is predicted.
5. The method for extracting paper image data features of test paper as claimed in claim 4, wherein the step SS51 further comprises fine-tuning a VGG network: changing the core scale of the third and fourth largest pooling layers from 2 x 2 to 1 x 2; the fifth and sixth convolution layers are followed by a Batch Normalization layer to speed up the training process.
6. The method for extracting paper image data features of test paper as claimed in claim 4, wherein the step SS52 specifically comprises: to prevent the gradient from vanishing during training and to use both the forward and backward information of the sequence for the prediction of the sequence, the deep Bi-directional recurrent neural network Bi-LSTM controls the long-term state c by 3 "gates", denoted as:
g(x)=σ(Wx+b)
wherein g (x) is a control gate function, sigma is a sigmoid function, W is a weight vector of a gate, b is a bias term, and the input is x; since σ is a sigmoid function, and the value range is (0, 1), the states of the gates are all half-open and half-closed;
the first gate controls the storage of the long-term state c and is called a forgetting gate f _ t;
f_t=σ(W_f·[h_(t-1),x_t]+b_f)
wherein, W _ f is a weight matrix of a forgetting gate, [ h _ (t-1), x _ t ] is a merged hidden layer and a merged matrix of current input, and b _ f is a weight matrix;
the second gate controls the input of the instant state to the long-term state c, called the input gate i _ t;
i_t=σ(W_i·[h_(t-1),x_t]+b_i)
wherein, W _ i is a weight matrix, [ h _ (t-1), x _ t ] is a merged matrix of the merged hidden layer and the current input, b _ i is a bias term;
the third "gate" for describing the currently input cell state
Figure FDA0003713130510000041
Controlling the output quantity of the Bi-LSTM in the current deep bidirectional circulation neural network in the long-term state c;
Figure FDA0003713130510000042
wherein, W _ c is a weight matrix, [ h _ (t-1), x _ t ] is a merged matrix of the merged hidden layer and the current input, b _ c is an offset term;
the current state c _ ((t)) is represented as the state of the previous cell c _ ((t-1)) times the forgetting gate f _ t by element, plus the current input cell state
Figure FDA0003713130510000043
Multiplying the input gate i _ t by element yields:
Figure FDA0003713130510000044
the forgetting door controls the state of the unit, so that earlier state information can be stored, the input door controls the current input, and the input quantity of the current state can be controlled to enter the memory;
the output gate o _ ((t)) controls the effect of the long-term state on the current output:
o_((t))=σ(W_o·[h_(t-1),x_t]+b_o)
wherein, W _ o is a weight matrix, [ h _ (t-1), x _ t ] is a merged matrix of the merged hidden layer and the current input, and b _ o is an offset term;
the output gate and cell state together determine the final output h _ t of LSTM:
h_t=o_((t))·tanh(c_t)。
7. the method for extracting paper image data features of test paper according to claim 1, wherein the step SS6 specifically comprises: and (4) selecting a target detection text area, and adding an editable window to realize the change of the identification error.
CN202210725519.0A 2022-06-24 2022-06-24 Method for extracting paper image data features of test document Pending CN115546801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210725519.0A CN115546801A (en) 2022-06-24 2022-06-24 Method for extracting paper image data features of test document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210725519.0A CN115546801A (en) 2022-06-24 2022-06-24 Method for extracting paper image data features of test document

Publications (1)

Publication Number Publication Date
CN115546801A true CN115546801A (en) 2022-12-30

Family

ID=84724021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210725519.0A Pending CN115546801A (en) 2022-06-24 2022-06-24 Method for extracting paper image data features of test document

Country Status (1)

Country Link
CN (1) CN115546801A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434266A (en) * 2023-06-14 2023-07-14 邹城市人民医院 Automatic extraction and analysis method for data information of medical examination list

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434266A (en) * 2023-06-14 2023-07-14 邹城市人民医院 Automatic extraction and analysis method for data information of medical examination list
CN116434266B (en) * 2023-06-14 2023-08-18 邹城市人民医院 Automatic extraction and analysis method for data information of medical examination list

Similar Documents

Publication Publication Date Title
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
US11861925B2 (en) Methods and systems of field detection in a document
US10956673B1 (en) Method and system for identifying citations within regulatory content
CN113344206A (en) Knowledge distillation method, device and equipment integrating channel and relation feature learning
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN111539417B (en) Text recognition training optimization method based on deep neural network
CN114821610B (en) Method for generating webpage code from image based on tree-shaped neural network
CN116089648B (en) File management system and method based on artificial intelligence
CN115034200A (en) Drawing information extraction method and device, electronic equipment and storage medium
CN113159013A (en) Paragraph identification method and device based on machine learning, computer equipment and medium
Du et al. Improved detection method for traffic signs in real scenes applied in intelligent and connected vehicles
US20230138491A1 (en) Continuous learning for document processing and analysis
CN115546801A (en) Method for extracting paper image data features of test document
CN111753736A (en) Human body posture recognition method, device, equipment and medium based on packet convolution
US20230134218A1 (en) Continuous learning for document processing and analysis
Sheng et al. End-to-end chinese image text recognition with attention model
CN114818713A (en) Chinese named entity recognition method based on boundary detection
Yuan et al. Improved SSD for Door Panel Missed Installations Inspection Using PASSD
Zheng et al. Fine-grained image classification based on TinyVit object location and graph convolution network
CN115881265B (en) Intelligent medical record quality control method, system and equipment for electronic medical record and storage medium
Sweidan et al. Handwritten Arabic Bills Reader and Recognizer
RU2774653C1 (en) Methods and systems for identifying fields in a document
CN117454987B (en) Mine event knowledge graph construction method and device based on event automatic extraction
US20240233426A9 (en) Method of classifying a document for a straight-through processing
Niranjan et al. A Novel Text Recognition Using Deep Learning Technique

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination