CN111291672A - Method and device for combined image text recognition and fuzzy judgment and storage medium - Google Patents
Method and device for combined image text recognition and fuzzy judgment and storage medium Download PDFInfo
- Publication number
- CN111291672A CN111291672A CN202010077341.4A CN202010077341A CN111291672A CN 111291672 A CN111291672 A CN 111291672A CN 202010077341 A CN202010077341 A CN 202010077341A CN 111291672 A CN111291672 A CN 111291672A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- text
- fuzzy
- image set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a device for text recognition and fuzzy judgment of a combined image and a storage medium. The combined image text recognition and fuzzy judgment method comprises the steps of inputting a text image to be detected into a target model by obtaining the target model, utilizing a convolution neural network shared by an image sequence recognition network and an image fuzzy judgment network, and respectively inputting a high-dimensional characteristic image output by the convolution neural network into the image sequence recognition network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence recognition network can simultaneously obtain the high-dimensional characteristic image to process image text recognition and image fuzzy judgment in parallel. The invention can realize parallel processing of image text recognition and image fuzzy judgment by utilizing the target model, thereby further improving the recognition precision of the text image.
Description
Technical Field
The invention relates to the technical field of text image processing, in particular to a method and a device for text recognition and fuzzy judgment of a combined image and a storage medium.
Background
The text information in the text image is used as semantic content of a relatively high layer in the visual information, and is important for understanding and obtaining the visual content. When recognizing text information of a text image, the existing image text recognition technology is affected by the quality of the text image, and the image fuzzy judgment technology is often needed to be applied to preprocess the text image so as to filter the low-quality text image. However, since the image text recognition technology and the image blur determination technology both process text images independently, not only is resource waste easily caused, but also feature information of related tasks cannot be acquired mutually, and it is difficult to further improve the recognition accuracy of text images.
Disclosure of Invention
The invention provides a method, a device and a storage medium for combined image text recognition and fuzzy judgment, which overcome the defects of the prior art and can realize parallel processing of image text recognition and image fuzzy judgment by utilizing a target model so as to further improve the recognition precision of text images.
In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a combined image text recognition and blur determination method, including:
constructing an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
acquiring real text information and real fuzzy probability corresponding to a text image set and the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
inputting the high-dimensional feature image set into the image sequence identification network, and enabling the image sequence identification network to output predicted text information corresponding to the text image set according to the high-dimensional feature image set;
inputting the high-dimensional feature image set into the image fuzzy judgment network, and enabling the image fuzzy judgment network to output a prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set;
calculating the identification error of the image sequence identification network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network is converged to obtain a target model;
and inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
Further, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence identification network comprises a sequence conversion network.
Further, before the acquiring the text image set and the real text information and the real fuzzy probability corresponding to the text image set, the method further includes:
acquiring text images, and labeling the real text information and the real fuzzy probability for each text image;
and dividing the labeled text image into the text image set.
Further, after acquiring the text image set and the actual text information and the actual fuzzy probability corresponding to the text image set, before inputting the text image set into the convolutional neural network, the method further includes:
preprocessing the text image set; wherein the preprocessing comprises data enhancement and data normalization.
Further, the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set, and the method includes:
slicing the high-dimensional feature image set to obtain an input sequence;
inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;
and inputting the characteristic sequence into a decoding network with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
Further, the image blur judgment network outputs the prediction blur probability corresponding to the text image set according to the high-dimensional feature image set, including:
performing dimensionality reduction on the high-dimensional feature image set to obtain a low-dimensional feature image set, and correspondingly stretching the low-dimensional feature image set into input vectors;
inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;
and converting the target vector into the prediction fuzzy probability through a softmax function.
Further, the two-class network is composed of three fully-connected layers.
In a second aspect, an embodiment of the present invention provides a combined image text recognition and blur determination apparatus, including:
the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
the convolutional neural network training module is used for acquiring text image sets and real text information and real fuzzy probability corresponding to the text image sets, inputting the text image sets into the convolutional neural network, and enabling the convolutional neural network to output high-dimensional feature image sets according to the text image sets;
the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
the image fuzzy judgment network training module is used for inputting the high-dimensional characteristic image set into the image fuzzy judgment network so that the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set;
the network error calculation module is used for calculating the identification error of the image sequence identification network according to the real text information and the predicted text information and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
a target model obtaining module, configured to reversely input the identification error and the determination error into the convolutional neural network, update parameters of the convolutional neural network, and end training of the initial model when the convolutional neural network converges, to obtain a target model;
and the text image detection module is used for inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where the computer program, when running, controls an apparatus where the computer-readable storage medium is located to execute the method for joint image text recognition and blur determination as described above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the method comprises the steps of inputting a text image to be detected into a target model by obtaining the target model, utilizing a convolutional neural network shared by an image sequence identification network and an image fuzzy judgment network, and respectively inputting a high-dimensional characteristic image output by the convolutional neural network into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process image text identification and image fuzzy judgment in parallel. The invention can realize parallel processing of image text recognition and image fuzzy judgment by utilizing the target model, thereby further improving the recognition precision of the text image.
Drawings
FIG. 1 is a flowchart illustrating a combined image text recognition and fuzzy determination method according to a first embodiment of the present invention;
FIG. 2 is a network architecture diagram of an initial model in a first embodiment of the invention;
FIG. 3 is a schematic flow chart of a preferred embodiment of the first embodiment of the present invention;
FIG. 4 is a schematic flow chart of another preferred embodiment of the first embodiment of the present invention;
fig. 5 is a network configuration diagram of an image blur determination network in the first embodiment of the present invention;
fig. 6 is a schematic structural diagram of a combined image text recognition and blur determination apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, the step numbers in the text are only for convenience of explanation of the specific embodiments, and do not serve to limit the execution sequence of the steps. The method provided by the embodiment can be executed by the relevant server, and the server is taken as an example for explanation below.
Please refer to fig. 1-5.
As shown in fig. 1, the first embodiment provides a combined image text recognition and blur determination method, including steps S1 to S7:
s1, constructing an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network.
And S2, acquiring real text information and real fuzzy probability corresponding to the text image set and the text image set, and inputting the text image set into a convolutional neural network, so that the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set.
And S3, inputting the high-dimensional characteristic image set into an image sequence identification network, and enabling the image sequence identification network to output predicted text information corresponding to the text image set according to the high-dimensional characteristic image set.
And S4, inputting the high-dimensional feature image set into an image fuzzy judgment network, and enabling the image fuzzy judgment network to output the prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set.
And S5, calculating the identification error of the image sequence identification network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability.
And S6, reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network is converged to obtain the target model.
And S7, inputting the text image to be detected into the target model to obtain target text information and target fuzzy probability.
It should be noted that the identification error is a relative error between the real text information and the predicted text information, and the determination error is a relative error between the real fuzzy probability and the predicted fuzzy probability.
In a preferred embodiment of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
In step S1, the convolutional neural network is used as a shared network between the image sequence identification network and the image blur determination network by constructing the initial model, so that the image sequence identification network and the image blur determination network can simultaneously acquire the high-dimensional feature image set output by the convolutional neural network to process the image text identification and the image blur determination in parallel. The network structure of the initial model is shown in fig. 2.
In step S2, the training convolutional neural network is implemented by inputting the text image set into a convolutional neural network, so that the convolutional neural network outputs a high-dimensional feature image set according to the text image set, so that the convolutional neural network inputs the high-dimensional feature image set to the image sequence recognition network and the image blur determination network.
In step S3, the high-dimensional feature image set is input into the image sequence recognition network, so that the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, thereby implementing the training of the image sequence recognition network, and facilitating improvement of the text recognition accuracy of the image sequence recognition network.
In step S4, the image blur determination network outputs the predicted blur probability corresponding to the text image set according to the high-dimensional feature image set by inputting the high-dimensional feature image set into the image blur determination network, so as to implement training of the image blur determination network, which is beneficial to improving the blur determination accuracy of the image blur determination network.
In step S5, the prediction accuracy of the target model is further improved by calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability, so as to subsequently optimize the initial model to obtain the target model.
In step S6, the identification error and the determination error are reversely input to the convolutional neural network, the parameters of the convolutional neural network are updated, and the training of the initial model is ended when the convolutional neural network converges, so as to obtain the target model, so that the image sequence identification network and the image blur determination network can jointly optimize and adjust the parameters of the convolutional neural network by learning respective tasks, which is beneficial to improving the prediction accuracy of the target model.
In step S7, the target text information and the target fuzzy probability are obtained by inputting the text image to be detected into the target model, and parallel processing of image text recognition and image fuzzy judgment can be realized by using the target model, thereby further improving the recognition accuracy of the text image.
The embodiment firstly inputs the acquired text image set into a convolutional neural network, so that the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, then respectively inputs the high-dimensional characteristic image set into an image sequence identification network and an image fuzzy judgment network, so that the image sequence identification network and the image fuzzy judgment network respectively output predicted text information and predicted fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set, then respectively calculates identification errors of the image sequence identification network and judgment errors of the image fuzzy judgment network according to real text information and predicted text information, real fuzzy probability and predicted fuzzy probability, so that the identification errors and the judgment errors are reversely input into the convolutional neural network to update parameters of the convolutional neural network, and training an initial model is finished when the convolutional neural network converges to obtain a target model, and finally, inputting the text to be detected into a target model to obtain target text information and target fuzzy probability.
In the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence identification network and the image fuzzy judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process the image text identification and the image fuzzy judgment in parallel.
According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
In a preferred embodiment, before obtaining the real text information and the real fuzzy probability corresponding to the text image set and the text image set, step S2 further includes: collecting text images, and labeling real text information and real fuzzy probability to each text image; and dividing the labeled text image into the text image set.
In a preferred implementation manner of this embodiment, by counting the number of votes for the text information and the fuzzy probability of each text image, the text information with the highest vote is taken as the corresponding real text information, and the fuzzy probability with the highest vote is taken as the corresponding real fuzzy probability.
In the embodiment, before the text image set is obtained, each text image in the text image set is labeled with the real text information and the real fuzzy probability, so that the identification error of the image sequence identification network and the judgment error of the image fuzzy judgment network are calculated according to the real text information and the real fuzzy probability, and the initial model is optimized to obtain the target model.
In a preferred embodiment, the step S2, after obtaining the text image set and the corresponding true text information and true fuzzy probability of the text image set, before inputting the text image set into the convolutional neural network, further includes: preprocessing a text image set; wherein the preprocessing comprises data enhancement and data normalization.
In the embodiment, before the text image set is input into the convolutional neural network, the text image set is subjected to preprocessing such as data enhancement and data normalization, so that the identification precision of the text image is further improved.
As shown in fig. 3, in a preferred embodiment, step S3 includes steps S31-S33:
and S31, slicing the high-dimensional feature image set to obtain an input sequence.
And S32, inputting the input sequence into the LSTM network, and enabling the LSTM network to output the characteristic sequence according to the input sequence.
And S33, inputting the characteristic sequence into a decoding network with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
Take a high-dimensional feature map as an example.
After the convolutional neural network outputs 12 × 25 × 512 high-dimensional feature maps according to the input text images, the high-dimensional feature maps are sliced along the width direction of the high-dimensional feature maps, the obtained 25 × 6144 sequences are input into a bidirectional LSTM network (namely a circulation layer) as input sequences, the characteristic sequences of 25 × 512 are output after being processed by the bidirectional LSTM network, and the characteristic sequences are input into a decoding network provided with an attention mechanism.
The basic design idea of the attention mechanism is to selectively learn the input sequences by keeping intermediate output results of the LSTM encoder on the input sequences, then training a model, and associating the output sequences with the input sequences when the model is output, which is implemented as follows:
inputting: c ═ c1,c2,…,ci,…,cL},L=25 (1)
In the formula (1), ciRepresenting some spatial location feature computed by the LSTM network.
The process is as follows:
contextual attention parameter e: e.g. of the typei=fATT(h,ci) (2)
In the formulae (2) to (4), function fATTAnd h represents a hidden state parameter of the multilayer network.
And (3) outputting: c. Ct。
in the formula (5), the reaction mixture is,m represents the maximum length of the output sequence, N represents the number of samples participating in training, K represents the number of classes to be classified,bi,jrepresenting network parameters, x being a feature vector of the network, si,jAnd showing the softmax output of the jth training sample and the ith position.
In another preferred embodiment, as shown in FIG. 4, step S4 includes steps S41-S43:
and S41, performing dimensionality reduction on the high-dimensional feature image set to obtain a low-dimensional feature image set, and correspondingly stretching the low-dimensional feature image set into an input vector.
And S42, inputting the input vector into the two-classification network, and enabling the two-classification network to output the target vector according to the input vector.
And S43, converting the target vector into the prediction fuzzy probability through a softmax function.
In a preferred implementation of this embodiment, the two-class network consists of three fully-connected layers.
Take a high-dimensional feature map as an example. A network structure diagram of the image blur determination network is shown in fig. 5.
After the convolutional neural network outputs 12 × 25 × 512 high-dimensional feature maps according to the input text images, the high-dimensional feature maps are input into a 1 × 1 convolution layer, the high-dimensional feature map set is subjected to dimensionality reduction by the 1 × 1 convolution layer, namely, the 12 × 25 × 512 high-dimensional feature maps are processed into 12 × 25 × 256 feature maps, and the 12 × 25 × 256 feature maps are stretched into 1 × 76800 vectors to serve as input vectors.
And inputting the input vector into a two-classification network consisting of three fully-connected layers, wherein the dimensionality is 1 x 768 after the input vector passes through a first fully-connected layer, the dimensionality is 1 x 128 after the input vector passes through a second fully-connected layer, and the dimensionality is 1 after the input vector passes through a third fully-connected layer, so that the target vector is output. If the target vector is 0, a sharp image is represented, and if the target vector is 1, a blurred image is represented.
And converting the target vector into probability through a softmax function, and taking the output probability as the prediction fuzzy probability. When the prediction blur probability is larger, the corresponding text image is more likely to be a blurred image.
And then, feeding back the training image fuzzy judgment network through a cross entropy function.
Wherein, the loss function L of the image fuzzy judgment networkblur:
Lblur=-(y*log(yp)+(1-y))log(1-yp) (6)
In the formula (6), ypRepresenting the prediction blur probability and y the true blur probability.
Loss function loss of the initial model: loss is L + Lblur(7)
In the formula (7), L represents a loss function of the image sequence recognition network, LblurRepresenting the loss function of the image blur determination network.
And then, the image sequence identification network and the image fuzzy judgment network reversely input the identification error and the judgment error into the convolutional neural network respectively, update the parameters of the convolutional neural network, finish training the initial model when the convolutional neural network is converged, and take the derived optimal model as a target model.
Please refer to fig. 6.
As shown in fig. 6, the second embodiment provides a joint image text recognition and blur determination device including: an initial model building module 21, configured to build an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network; the convolutional neural network training module 22 is configured to acquire a text image set and real text information and real fuzzy probability corresponding to the text image set, input the text image set into a convolutional neural network, and enable the convolutional neural network to output a high-dimensional feature image set according to the text image set; the image sequence recognition network training module 23 is configured to input the high-dimensional feature image set into an image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set; the image fuzzy judgment network training module 24 is configured to input the high-dimensional feature image set into an image fuzzy judgment network, so that the image fuzzy judgment network outputs a prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set; the network error calculation module 25 is configured to calculate an identification error of the image sequence identification network according to the real text information and the predicted text information, and calculate a judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability; the target model obtaining module 26 is configured to reversely input the identification error and the judgment error into the convolutional neural network, update parameters of the convolutional neural network, and end training of the initial model when the convolutional neural network converges to obtain a target model; and the text image detection module 27 is configured to input the text image to be detected into the target model, so as to obtain the target fuzzy probability and the target text information.
It should be noted that the identification error is a relative error between the real text information and the predicted text information, and the determination error is a relative error between the real fuzzy probability and the predicted fuzzy probability.
In a preferred embodiment of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
An initial model is constructed through the initial model construction module 21, and the convolutional neural network is used as a shared network between the image sequence identification network and the image fuzzy judgment network, so that the image sequence identification network and the image fuzzy judgment network can simultaneously acquire a high-dimensional characteristic image set output by the convolutional neural network, and the image text identification and the image fuzzy judgment are processed in parallel.
The text image set is input into the convolutional neural network through the convolutional neural network training module 22, so that the convolutional neural network outputs a high-dimensional feature image set according to the text image set, the convolutional neural network is trained, and the high-dimensional feature image set is input into the image sequence identification network and the image fuzzy judgment network through the convolutional neural network.
The image sequence recognition network training module 23 inputs the high-dimensional feature image set into the image sequence recognition network, so that the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, the image sequence recognition network is trained, and the text recognition accuracy of the image sequence recognition network is improved.
The high-dimensional feature image set is input into the image fuzzy judgment network through the image fuzzy judgment network training module 24, so that the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set, the image fuzzy judgment network is trained, and the fuzzy judgment accuracy of the image fuzzy judgment network is improved.
Through the network error calculation module 25, the identification error of the image sequence identification network is calculated according to the real text information and the predicted text information, and the judgment error of the image fuzzy judgment network is calculated according to the real fuzzy probability and the predicted fuzzy probability, so that the initial model is optimized subsequently to obtain the target model, and the prediction accuracy of the target model is further improved.
Through the target model obtaining module 26, the identification error and the judgment error are reversely input into the convolutional neural network, the parameters of the convolutional neural network are updated, the initial model is trained when the convolutional neural network is converged, and the target model is obtained, so that the image sequence identification network and the image fuzzy judgment network can jointly optimize and adjust the parameters of the convolutional neural network through respective learning tasks, and the prediction accuracy of the target model is improved.
The text image to be detected is input into the target model through the text image detection module 27 to obtain target text information and target fuzzy probability, and parallel processing of image text recognition and image fuzzy judgment can be realized by using the target model, so that the recognition accuracy of the text image is further improved.
In this embodiment, after the initial model is constructed by the initial model construction module 21, the obtained text image set is input into the convolutional neural network by the convolutional neural network training module 22, so that the convolutional neural network outputs the high-dimensional feature image set according to the text image set, then the high-dimensional feature image set is input into the image sequence identification network and the image blur determination network by the image sequence identification network training module 23 and the image blur determination network training module 24, respectively, so that the image sequence identification network and the image blur determination network output the predicted text information and the predicted blur probability corresponding to the text image set according to the high-dimensional feature image set, respectively, and then the identification error, the prediction error, the image blur probability, and the predicted blur probability of the image sequence identification network are calculated by the network error calculation module 25 according to the real text information and the predicted text information, the real blur probability, and the predicted blur, And judging errors of the image fuzzy judging network, reversely inputting the identification errors and the judgment errors into the convolutional neural network through the target model obtaining module 26 to update parameters of the convolutional neural network, finishing training the initial model when the convolutional neural network is converged to obtain a target model, and finally inputting the text to be detected into the target model through the text image detecting module 27 to be detected to obtain target text information and target fuzzy probability.
In the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence identification network and the image fuzzy judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process the image text identification and the image fuzzy judgment in parallel.
According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
A third embodiment provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for joint image text recognition and blur determination according to the first embodiment, and the same beneficial effects can be achieved.
In summary, the embodiment of the present invention has the following advantages:
the method comprises the steps of inputting a text image to be detected into a target model by obtaining the target model, utilizing a convolutional neural network shared by an image sequence identification network and an image fuzzy judgment network, and respectively inputting a high-dimensional characteristic image output by the convolutional neural network into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process image text identification and image fuzzy judgment in parallel. According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Claims (9)
1. A combined image text recognition and fuzzy judgment method is characterized by comprising the following steps:
constructing an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
acquiring real text information and real fuzzy probability corresponding to a text image set and the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
inputting the high-dimensional feature image set into the image sequence identification network, and enabling the image sequence identification network to output predicted text information corresponding to the text image set according to the high-dimensional feature image set;
inputting the high-dimensional feature image set into the image fuzzy judgment network, and enabling the image fuzzy judgment network to output a prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set;
calculating the identification error of the image sequence identification network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network is converged to obtain a target model;
and inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
2. The joint image text recognition and blur determination method of claim 1, wherein the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
3. The method for joint image text recognition and fuzzy judgment of claim 1, further comprising, before the obtaining of the true text information and the true fuzzy probability corresponding to the text image set, the steps of:
acquiring text images, and labeling the real text information and the real fuzzy probability for each text image;
and dividing the labeled text image into the text image set.
4. The method for joint image text recognition and blur determination of claim 1, wherein after obtaining the true text information and true blur probability corresponding to the text image set and the text image set, before inputting the text image set into the convolutional neural network, further comprising:
preprocessing the text image set; wherein the preprocessing comprises data enhancement and data normalization.
5. The method for joint image text recognition and fuzzy judgment of claim 1, wherein the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, comprising:
slicing the high-dimensional feature image set to obtain an input sequence;
inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;
and inputting the characteristic sequence into a decoding network with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
6. The method for joint image text recognition and fuzzy judgment of claim 1, wherein the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set, comprising:
performing dimensionality reduction on the high-dimensional feature image set to obtain a low-dimensional feature image set, and correspondingly stretching the low-dimensional feature image set into input vectors;
inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;
and converting the target vector into the prediction fuzzy probability through a softmax function.
7. The joint image text recognition and ambiguity resolution method of claim 6, wherein the two-class network consists of three fully connected layers.
8. A combined image text recognition and blur determination device, comprising:
the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
the convolutional neural network training module is used for acquiring text image sets and real text information and real fuzzy probability corresponding to the text image sets, inputting the text image sets into the convolutional neural network, and enabling the convolutional neural network to output high-dimensional feature image sets according to the text image sets;
the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
the image fuzzy judgment network training module is used for inputting the high-dimensional characteristic image set into the image fuzzy judgment network so that the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set;
the network error calculation module is used for calculating the identification error of the image sequence identification network according to the real text information and the predicted text information and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
a target model obtaining module, configured to reversely input the identification error and the determination error into the convolutional neural network, update parameters of the convolutional neural network, and end training of the initial model when the convolutional neural network converges, to obtain a target model;
and the text image detection module is used for inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of joint image text recognition and blur determination of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010077341.4A CN111291672B (en) | 2020-01-22 | 2020-01-22 | Combined image text recognition and fuzzy judgment method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010077341.4A CN111291672B (en) | 2020-01-22 | 2020-01-22 | Combined image text recognition and fuzzy judgment method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111291672A true CN111291672A (en) | 2020-06-16 |
CN111291672B CN111291672B (en) | 2023-05-12 |
Family
ID=71021436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010077341.4A Active CN111291672B (en) | 2020-01-22 | 2020-01-22 | Combined image text recognition and fuzzy judgment method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111291672B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881881A (en) * | 2020-08-10 | 2020-11-03 | 晶璞(上海)人工智能科技有限公司 | Machine intelligent text recognition credibility judgment method based on multiple dimensions |
CN113486858A (en) * | 2021-08-03 | 2021-10-08 | 济南博观智能科技有限公司 | Face recognition model training method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019057169A1 (en) * | 2017-09-25 | 2019-03-28 | 腾讯科技(深圳)有限公司 | Text detection method, storage medium, and computer device |
CN110188819A (en) * | 2019-05-29 | 2019-08-30 | 电子科技大学 | A kind of CNN and LSTM image high-level semantic understanding method based on information gain |
CN110543844A (en) * | 2019-08-26 | 2019-12-06 | 中电科大数据研究院有限公司 | metadata extraction method for government affair metadata PDF file |
-
2020
- 2020-01-22 CN CN202010077341.4A patent/CN111291672B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019057169A1 (en) * | 2017-09-25 | 2019-03-28 | 腾讯科技(深圳)有限公司 | Text detection method, storage medium, and computer device |
CN110188819A (en) * | 2019-05-29 | 2019-08-30 | 电子科技大学 | A kind of CNN and LSTM image high-level semantic understanding method based on information gain |
CN110543844A (en) * | 2019-08-26 | 2019-12-06 | 中电科大数据研究院有限公司 | metadata extraction method for government affair metadata PDF file |
Non-Patent Citations (1)
Title |
---|
林硕蕾;: "基于参数模糊判断的海量信息挖掘模型" * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881881A (en) * | 2020-08-10 | 2020-11-03 | 晶璞(上海)人工智能科技有限公司 | Machine intelligent text recognition credibility judgment method based on multiple dimensions |
CN113486858A (en) * | 2021-08-03 | 2021-10-08 | 济南博观智能科技有限公司 | Face recognition model training method and device, electronic equipment and storage medium |
CN113486858B (en) * | 2021-08-03 | 2024-01-23 | 济南博观智能科技有限公司 | Face recognition model training method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111291672B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108376132B (en) | Method and system for judging similar test questions | |
CN113591546B (en) | Semantic enhancement type scene text recognition method and device | |
CN110033008B (en) | Image description generation method based on modal transformation and text induction | |
CN113010656B (en) | Visual question-answering method based on multi-mode fusion and structural control | |
CN116861014B (en) | Image information extraction method and device based on pre-training language model | |
CN111259940A (en) | Target detection method based on space attention map | |
CN112633420B (en) | Image similarity determination and model training method, device, equipment and medium | |
CN111291672A (en) | Method and device for combined image text recognition and fuzzy judgment and storage medium | |
CN111046771A (en) | Training method of network model for recovering writing track | |
CN112966626A (en) | Face recognition method and device | |
CN114022697A (en) | Vehicle re-identification method and system based on multitask learning and knowledge distillation | |
CN115482385A (en) | Semantic segmentation self-adaptive knowledge distillation method based on channel features | |
CN114694255B (en) | Sentence-level lip language recognition method based on channel attention and time convolution network | |
CN114283325A (en) | Underwater target identification method based on knowledge distillation | |
CN110503090B (en) | Character detection network training method based on limited attention model, character detection method and character detector | |
CN115048870A (en) | Target track identification method based on residual error network and attention mechanism | |
CN109685823B (en) | Target tracking method based on deep forest | |
CN115797808A (en) | Unmanned aerial vehicle inspection defect image identification method, system, device and medium | |
CN112560440A (en) | Deep learning-based syntax dependence method for aspect-level emotion analysis | |
Huang et al. | Flow of renyi information in deep neural networks | |
CN114357166A (en) | Text classification method based on deep learning | |
Hallyal et al. | Optimized recognition of CAPTCHA through attention models | |
Cao et al. | Separable-programming based probabilistic-iteration and restriction-resolving correlation filter for robust real-time visual tracking | |
Feng et al. | Research on optimization method of convolutional nerual network | |
CN116050391B (en) | Speech recognition error correction method and device based on subdivision industry error correction word list |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |