CN111291672A - Method and device for combined image text recognition and fuzzy judgment and storage medium - Google Patents

Method and device for combined image text recognition and fuzzy judgment and storage medium Download PDF

Info

Publication number
CN111291672A
CN111291672A CN202010077341.4A CN202010077341A CN111291672A CN 111291672 A CN111291672 A CN 111291672A CN 202010077341 A CN202010077341 A CN 202010077341A CN 111291672 A CN111291672 A CN 111291672A
Authority
CN
China
Prior art keywords
image
network
text
fuzzy
image set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010077341.4A
Other languages
Chinese (zh)
Other versions
CN111291672B (en
Inventor
牟永强
范宝杰
谭磊
林凌帆
黄耀鸿
王芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Imagedt Co ltd
Original Assignee
Imagedt Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Imagedt Co ltd filed Critical Imagedt Co ltd
Priority to CN202010077341.4A priority Critical patent/CN111291672B/en
Publication of CN111291672A publication Critical patent/CN111291672A/en
Application granted granted Critical
Publication of CN111291672B publication Critical patent/CN111291672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for text recognition and fuzzy judgment of a combined image and a storage medium. The combined image text recognition and fuzzy judgment method comprises the steps of inputting a text image to be detected into a target model by obtaining the target model, utilizing a convolution neural network shared by an image sequence recognition network and an image fuzzy judgment network, and respectively inputting a high-dimensional characteristic image output by the convolution neural network into the image sequence recognition network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence recognition network can simultaneously obtain the high-dimensional characteristic image to process image text recognition and image fuzzy judgment in parallel. The invention can realize parallel processing of image text recognition and image fuzzy judgment by utilizing the target model, thereby further improving the recognition precision of the text image.

Description

Method and device for combined image text recognition and fuzzy judgment and storage medium
Technical Field
The invention relates to the technical field of text image processing, in particular to a method and a device for text recognition and fuzzy judgment of a combined image and a storage medium.
Background
The text information in the text image is used as semantic content of a relatively high layer in the visual information, and is important for understanding and obtaining the visual content. When recognizing text information of a text image, the existing image text recognition technology is affected by the quality of the text image, and the image fuzzy judgment technology is often needed to be applied to preprocess the text image so as to filter the low-quality text image. However, since the image text recognition technology and the image blur determination technology both process text images independently, not only is resource waste easily caused, but also feature information of related tasks cannot be acquired mutually, and it is difficult to further improve the recognition accuracy of text images.
Disclosure of Invention
The invention provides a method, a device and a storage medium for combined image text recognition and fuzzy judgment, which overcome the defects of the prior art and can realize parallel processing of image text recognition and image fuzzy judgment by utilizing a target model so as to further improve the recognition precision of text images.
In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a combined image text recognition and blur determination method, including:
constructing an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
acquiring real text information and real fuzzy probability corresponding to a text image set and the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
inputting the high-dimensional feature image set into the image sequence identification network, and enabling the image sequence identification network to output predicted text information corresponding to the text image set according to the high-dimensional feature image set;
inputting the high-dimensional feature image set into the image fuzzy judgment network, and enabling the image fuzzy judgment network to output a prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set;
calculating the identification error of the image sequence identification network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network is converged to obtain a target model;
and inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
Further, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence identification network comprises a sequence conversion network.
Further, before the acquiring the text image set and the real text information and the real fuzzy probability corresponding to the text image set, the method further includes:
acquiring text images, and labeling the real text information and the real fuzzy probability for each text image;
and dividing the labeled text image into the text image set.
Further, after acquiring the text image set and the actual text information and the actual fuzzy probability corresponding to the text image set, before inputting the text image set into the convolutional neural network, the method further includes:
preprocessing the text image set; wherein the preprocessing comprises data enhancement and data normalization.
Further, the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set, and the method includes:
slicing the high-dimensional feature image set to obtain an input sequence;
inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;
and inputting the characteristic sequence into a decoding network with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
Further, the image blur judgment network outputs the prediction blur probability corresponding to the text image set according to the high-dimensional feature image set, including:
performing dimensionality reduction on the high-dimensional feature image set to obtain a low-dimensional feature image set, and correspondingly stretching the low-dimensional feature image set into input vectors;
inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;
and converting the target vector into the prediction fuzzy probability through a softmax function.
Further, the two-class network is composed of three fully-connected layers.
In a second aspect, an embodiment of the present invention provides a combined image text recognition and blur determination apparatus, including:
the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
the convolutional neural network training module is used for acquiring text image sets and real text information and real fuzzy probability corresponding to the text image sets, inputting the text image sets into the convolutional neural network, and enabling the convolutional neural network to output high-dimensional feature image sets according to the text image sets;
the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
the image fuzzy judgment network training module is used for inputting the high-dimensional characteristic image set into the image fuzzy judgment network so that the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set;
the network error calculation module is used for calculating the identification error of the image sequence identification network according to the real text information and the predicted text information and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
a target model obtaining module, configured to reversely input the identification error and the determination error into the convolutional neural network, update parameters of the convolutional neural network, and end training of the initial model when the convolutional neural network converges, to obtain a target model;
and the text image detection module is used for inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where the computer program, when running, controls an apparatus where the computer-readable storage medium is located to execute the method for joint image text recognition and blur determination as described above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the method comprises the steps of inputting a text image to be detected into a target model by obtaining the target model, utilizing a convolutional neural network shared by an image sequence identification network and an image fuzzy judgment network, and respectively inputting a high-dimensional characteristic image output by the convolutional neural network into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process image text identification and image fuzzy judgment in parallel. The invention can realize parallel processing of image text recognition and image fuzzy judgment by utilizing the target model, thereby further improving the recognition precision of the text image.
Drawings
FIG. 1 is a flowchart illustrating a combined image text recognition and fuzzy determination method according to a first embodiment of the present invention;
FIG. 2 is a network architecture diagram of an initial model in a first embodiment of the invention;
FIG. 3 is a schematic flow chart of a preferred embodiment of the first embodiment of the present invention;
FIG. 4 is a schematic flow chart of another preferred embodiment of the first embodiment of the present invention;
fig. 5 is a network configuration diagram of an image blur determination network in the first embodiment of the present invention;
fig. 6 is a schematic structural diagram of a combined image text recognition and blur determination apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, the step numbers in the text are only for convenience of explanation of the specific embodiments, and do not serve to limit the execution sequence of the steps. The method provided by the embodiment can be executed by the relevant server, and the server is taken as an example for explanation below.
Please refer to fig. 1-5.
As shown in fig. 1, the first embodiment provides a combined image text recognition and blur determination method, including steps S1 to S7:
s1, constructing an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network.
And S2, acquiring real text information and real fuzzy probability corresponding to the text image set and the text image set, and inputting the text image set into a convolutional neural network, so that the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set.
And S3, inputting the high-dimensional characteristic image set into an image sequence identification network, and enabling the image sequence identification network to output predicted text information corresponding to the text image set according to the high-dimensional characteristic image set.
And S4, inputting the high-dimensional feature image set into an image fuzzy judgment network, and enabling the image fuzzy judgment network to output the prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set.
And S5, calculating the identification error of the image sequence identification network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability.
And S6, reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network is converged to obtain the target model.
And S7, inputting the text image to be detected into the target model to obtain target text information and target fuzzy probability.
It should be noted that the identification error is a relative error between the real text information and the predicted text information, and the determination error is a relative error between the real fuzzy probability and the predicted fuzzy probability.
In a preferred embodiment of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
In step S1, the convolutional neural network is used as a shared network between the image sequence identification network and the image blur determination network by constructing the initial model, so that the image sequence identification network and the image blur determination network can simultaneously acquire the high-dimensional feature image set output by the convolutional neural network to process the image text identification and the image blur determination in parallel. The network structure of the initial model is shown in fig. 2.
In step S2, the training convolutional neural network is implemented by inputting the text image set into a convolutional neural network, so that the convolutional neural network outputs a high-dimensional feature image set according to the text image set, so that the convolutional neural network inputs the high-dimensional feature image set to the image sequence recognition network and the image blur determination network.
In step S3, the high-dimensional feature image set is input into the image sequence recognition network, so that the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, thereby implementing the training of the image sequence recognition network, and facilitating improvement of the text recognition accuracy of the image sequence recognition network.
In step S4, the image blur determination network outputs the predicted blur probability corresponding to the text image set according to the high-dimensional feature image set by inputting the high-dimensional feature image set into the image blur determination network, so as to implement training of the image blur determination network, which is beneficial to improving the blur determination accuracy of the image blur determination network.
In step S5, the prediction accuracy of the target model is further improved by calculating the recognition error of the image sequence recognition network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability, so as to subsequently optimize the initial model to obtain the target model.
In step S6, the identification error and the determination error are reversely input to the convolutional neural network, the parameters of the convolutional neural network are updated, and the training of the initial model is ended when the convolutional neural network converges, so as to obtain the target model, so that the image sequence identification network and the image blur determination network can jointly optimize and adjust the parameters of the convolutional neural network by learning respective tasks, which is beneficial to improving the prediction accuracy of the target model.
In step S7, the target text information and the target fuzzy probability are obtained by inputting the text image to be detected into the target model, and parallel processing of image text recognition and image fuzzy judgment can be realized by using the target model, thereby further improving the recognition accuracy of the text image.
The embodiment firstly inputs the acquired text image set into a convolutional neural network, so that the convolutional neural network outputs a high-dimensional characteristic image set according to the text image set, then respectively inputs the high-dimensional characteristic image set into an image sequence identification network and an image fuzzy judgment network, so that the image sequence identification network and the image fuzzy judgment network respectively output predicted text information and predicted fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set, then respectively calculates identification errors of the image sequence identification network and judgment errors of the image fuzzy judgment network according to real text information and predicted text information, real fuzzy probability and predicted fuzzy probability, so that the identification errors and the judgment errors are reversely input into the convolutional neural network to update parameters of the convolutional neural network, and training an initial model is finished when the convolutional neural network converges to obtain a target model, and finally, inputting the text to be detected into a target model to obtain target text information and target fuzzy probability.
In the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence identification network and the image fuzzy judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process the image text identification and the image fuzzy judgment in parallel.
According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
In a preferred embodiment, before obtaining the real text information and the real fuzzy probability corresponding to the text image set and the text image set, step S2 further includes: collecting text images, and labeling real text information and real fuzzy probability to each text image; and dividing the labeled text image into the text image set.
In a preferred implementation manner of this embodiment, by counting the number of votes for the text information and the fuzzy probability of each text image, the text information with the highest vote is taken as the corresponding real text information, and the fuzzy probability with the highest vote is taken as the corresponding real fuzzy probability.
In the embodiment, before the text image set is obtained, each text image in the text image set is labeled with the real text information and the real fuzzy probability, so that the identification error of the image sequence identification network and the judgment error of the image fuzzy judgment network are calculated according to the real text information and the real fuzzy probability, and the initial model is optimized to obtain the target model.
In a preferred embodiment, the step S2, after obtaining the text image set and the corresponding true text information and true fuzzy probability of the text image set, before inputting the text image set into the convolutional neural network, further includes: preprocessing a text image set; wherein the preprocessing comprises data enhancement and data normalization.
In the embodiment, before the text image set is input into the convolutional neural network, the text image set is subjected to preprocessing such as data enhancement and data normalization, so that the identification precision of the text image is further improved.
As shown in fig. 3, in a preferred embodiment, step S3 includes steps S31-S33:
and S31, slicing the high-dimensional feature image set to obtain an input sequence.
And S32, inputting the input sequence into the LSTM network, and enabling the LSTM network to output the characteristic sequence according to the input sequence.
And S33, inputting the characteristic sequence into a decoding network with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
Take a high-dimensional feature map as an example.
After the convolutional neural network outputs 12 × 25 × 512 high-dimensional feature maps according to the input text images, the high-dimensional feature maps are sliced along the width direction of the high-dimensional feature maps, the obtained 25 × 6144 sequences are input into a bidirectional LSTM network (namely a circulation layer) as input sequences, the characteristic sequences of 25 × 512 are output after being processed by the bidirectional LSTM network, and the characteristic sequences are input into a decoding network provided with an attention mechanism.
The basic design idea of the attention mechanism is to selectively learn the input sequences by keeping intermediate output results of the LSTM encoder on the input sequences, then training a model, and associating the output sequences with the input sequences when the model is output, which is implemented as follows:
inputting: c ═ c1,c2,…,ci,…,cL},L=25 (1)
In the formula (1), ciRepresenting some spatial location feature computed by the LSTM network.
The process is as follows:
contextual attention parameter e: e.g. of the typei=fATT(h,ci) (2)
The weight parameter a is obtained by the normalization of the softmax function:
Figure BDA0002378318990000081
the feature obtained after the use of the attention mechanism can be denoted as ct
Figure BDA0002378318990000082
In the formulae (2) to (4), function fATTAnd h represents a hidden state parameter of the multilayer network.
And (3) outputting: c. Ct
Wherein the loss function L of the image sequence identification network:
Figure BDA0002378318990000091
in the formula (5), the reaction mixture is,
Figure BDA0002378318990000092
m represents the maximum length of the output sequence, N represents the number of samples participating in training, K represents the number of classes to be classified,
Figure BDA0002378318990000093
bi,jrepresenting network parameters, x being a feature vector of the network, si,jAnd showing the softmax output of the jth training sample and the ith position.
In another preferred embodiment, as shown in FIG. 4, step S4 includes steps S41-S43:
and S41, performing dimensionality reduction on the high-dimensional feature image set to obtain a low-dimensional feature image set, and correspondingly stretching the low-dimensional feature image set into an input vector.
And S42, inputting the input vector into the two-classification network, and enabling the two-classification network to output the target vector according to the input vector.
And S43, converting the target vector into the prediction fuzzy probability through a softmax function.
In a preferred implementation of this embodiment, the two-class network consists of three fully-connected layers.
Take a high-dimensional feature map as an example. A network structure diagram of the image blur determination network is shown in fig. 5.
After the convolutional neural network outputs 12 × 25 × 512 high-dimensional feature maps according to the input text images, the high-dimensional feature maps are input into a 1 × 1 convolution layer, the high-dimensional feature map set is subjected to dimensionality reduction by the 1 × 1 convolution layer, namely, the 12 × 25 × 512 high-dimensional feature maps are processed into 12 × 25 × 256 feature maps, and the 12 × 25 × 256 feature maps are stretched into 1 × 76800 vectors to serve as input vectors.
And inputting the input vector into a two-classification network consisting of three fully-connected layers, wherein the dimensionality is 1 x 768 after the input vector passes through a first fully-connected layer, the dimensionality is 1 x 128 after the input vector passes through a second fully-connected layer, and the dimensionality is 1 after the input vector passes through a third fully-connected layer, so that the target vector is output. If the target vector is 0, a sharp image is represented, and if the target vector is 1, a blurred image is represented.
And converting the target vector into probability through a softmax function, and taking the output probability as the prediction fuzzy probability. When the prediction blur probability is larger, the corresponding text image is more likely to be a blurred image.
And then, feeding back the training image fuzzy judgment network through a cross entropy function.
Wherein, the loss function L of the image fuzzy judgment networkblur
Lblur=-(y*log(yp)+(1-y))log(1-yp) (6)
In the formula (6), ypRepresenting the prediction blur probability and y the true blur probability.
Loss function loss of the initial model: loss is L + Lblur(7)
In the formula (7), L represents a loss function of the image sequence recognition network, LblurRepresenting the loss function of the image blur determination network.
And then, the image sequence identification network and the image fuzzy judgment network reversely input the identification error and the judgment error into the convolutional neural network respectively, update the parameters of the convolutional neural network, finish training the initial model when the convolutional neural network is converged, and take the derived optimal model as a target model.
Please refer to fig. 6.
As shown in fig. 6, the second embodiment provides a joint image text recognition and blur determination device including: an initial model building module 21, configured to build an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network; the convolutional neural network training module 22 is configured to acquire a text image set and real text information and real fuzzy probability corresponding to the text image set, input the text image set into a convolutional neural network, and enable the convolutional neural network to output a high-dimensional feature image set according to the text image set; the image sequence recognition network training module 23 is configured to input the high-dimensional feature image set into an image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional feature image set; the image fuzzy judgment network training module 24 is configured to input the high-dimensional feature image set into an image fuzzy judgment network, so that the image fuzzy judgment network outputs a prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set; the network error calculation module 25 is configured to calculate an identification error of the image sequence identification network according to the real text information and the predicted text information, and calculate a judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability; the target model obtaining module 26 is configured to reversely input the identification error and the judgment error into the convolutional neural network, update parameters of the convolutional neural network, and end training of the initial model when the convolutional neural network converges to obtain a target model; and the text image detection module 27 is configured to input the text image to be detected into the target model, so as to obtain the target fuzzy probability and the target text information.
It should be noted that the identification error is a relative error between the real text information and the predicted text information, and the determination error is a relative error between the real fuzzy probability and the predicted fuzzy probability.
In a preferred embodiment of this embodiment, the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
An initial model is constructed through the initial model construction module 21, and the convolutional neural network is used as a shared network between the image sequence identification network and the image fuzzy judgment network, so that the image sequence identification network and the image fuzzy judgment network can simultaneously acquire a high-dimensional characteristic image set output by the convolutional neural network, and the image text identification and the image fuzzy judgment are processed in parallel.
The text image set is input into the convolutional neural network through the convolutional neural network training module 22, so that the convolutional neural network outputs a high-dimensional feature image set according to the text image set, the convolutional neural network is trained, and the high-dimensional feature image set is input into the image sequence identification network and the image fuzzy judgment network through the convolutional neural network.
The image sequence recognition network training module 23 inputs the high-dimensional feature image set into the image sequence recognition network, so that the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, the image sequence recognition network is trained, and the text recognition accuracy of the image sequence recognition network is improved.
The high-dimensional feature image set is input into the image fuzzy judgment network through the image fuzzy judgment network training module 24, so that the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set, the image fuzzy judgment network is trained, and the fuzzy judgment accuracy of the image fuzzy judgment network is improved.
Through the network error calculation module 25, the identification error of the image sequence identification network is calculated according to the real text information and the predicted text information, and the judgment error of the image fuzzy judgment network is calculated according to the real fuzzy probability and the predicted fuzzy probability, so that the initial model is optimized subsequently to obtain the target model, and the prediction accuracy of the target model is further improved.
Through the target model obtaining module 26, the identification error and the judgment error are reversely input into the convolutional neural network, the parameters of the convolutional neural network are updated, the initial model is trained when the convolutional neural network is converged, and the target model is obtained, so that the image sequence identification network and the image fuzzy judgment network can jointly optimize and adjust the parameters of the convolutional neural network through respective learning tasks, and the prediction accuracy of the target model is improved.
The text image to be detected is input into the target model through the text image detection module 27 to obtain target text information and target fuzzy probability, and parallel processing of image text recognition and image fuzzy judgment can be realized by using the target model, so that the recognition accuracy of the text image is further improved.
In this embodiment, after the initial model is constructed by the initial model construction module 21, the obtained text image set is input into the convolutional neural network by the convolutional neural network training module 22, so that the convolutional neural network outputs the high-dimensional feature image set according to the text image set, then the high-dimensional feature image set is input into the image sequence identification network and the image blur determination network by the image sequence identification network training module 23 and the image blur determination network training module 24, respectively, so that the image sequence identification network and the image blur determination network output the predicted text information and the predicted blur probability corresponding to the text image set according to the high-dimensional feature image set, respectively, and then the identification error, the prediction error, the image blur probability, and the predicted blur probability of the image sequence identification network are calculated by the network error calculation module 25 according to the real text information and the predicted text information, the real blur probability, and the predicted blur, And judging errors of the image fuzzy judging network, reversely inputting the identification errors and the judgment errors into the convolutional neural network through the target model obtaining module 26 to update parameters of the convolutional neural network, finishing training the initial model when the convolutional neural network is converged to obtain a target model, and finally inputting the text to be detected into the target model through the text image detecting module 27 to be detected to obtain target text information and target fuzzy probability.
In the embodiment, the target model is obtained, the text image to be detected is input into the target model, the convolutional neural network shared by the image sequence identification network and the image fuzzy judgment network is utilized, and the high-dimensional characteristic image output by the convolutional neural network is respectively input into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process the image text identification and the image fuzzy judgment in parallel.
According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
A third embodiment provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the method for joint image text recognition and blur determination according to the first embodiment, and the same beneficial effects can be achieved.
In summary, the embodiment of the present invention has the following advantages:
the method comprises the steps of inputting a text image to be detected into a target model by obtaining the target model, utilizing a convolutional neural network shared by an image sequence identification network and an image fuzzy judgment network, and respectively inputting a high-dimensional characteristic image output by the convolutional neural network into the image sequence identification network and the image fuzzy judgment network, so that the image fuzzy judgment network and the image sequence identification network can simultaneously obtain the high-dimensional characteristic image to process image text identification and image fuzzy judgment in parallel. According to the embodiment, the target model can be utilized to realize parallel processing of image text recognition and image fuzzy judgment, so that the recognition accuracy of the text image is further improved.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
It will be understood by those skilled in the art that all or part of the processes of the above embodiments may be implemented by hardware related to instructions of a computer program, and the computer program may be stored in a computer readable storage medium, and when executed, may include the processes of the above embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (9)

1. A combined image text recognition and fuzzy judgment method is characterized by comprising the following steps:
constructing an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
acquiring real text information and real fuzzy probability corresponding to a text image set and the text image set, and inputting the text image set into the convolutional neural network to enable the convolutional neural network to output a high-dimensional characteristic image set according to the text image set;
inputting the high-dimensional feature image set into the image sequence identification network, and enabling the image sequence identification network to output predicted text information corresponding to the text image set according to the high-dimensional feature image set;
inputting the high-dimensional feature image set into the image fuzzy judgment network, and enabling the image fuzzy judgment network to output a prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set;
calculating the identification error of the image sequence identification network according to the real text information and the predicted text information, and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
reversely inputting the identification error and the judgment error into the convolutional neural network, updating the parameters of the convolutional neural network, and finishing training the initial model when the convolutional neural network is converged to obtain a target model;
and inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
2. The joint image text recognition and blur determination method of claim 1, wherein the convolutional neural network comprises a residual connection network or a dense connection network, and the image sequence recognition network comprises a sequence conversion network.
3. The method for joint image text recognition and fuzzy judgment of claim 1, further comprising, before the obtaining of the true text information and the true fuzzy probability corresponding to the text image set, the steps of:
acquiring text images, and labeling the real text information and the real fuzzy probability for each text image;
and dividing the labeled text image into the text image set.
4. The method for joint image text recognition and blur determination of claim 1, wherein after obtaining the true text information and true blur probability corresponding to the text image set and the text image set, before inputting the text image set into the convolutional neural network, further comprising:
preprocessing the text image set; wherein the preprocessing comprises data enhancement and data normalization.
5. The method for joint image text recognition and fuzzy judgment of claim 1, wherein the image sequence recognition network outputs the predicted text information corresponding to the text image set according to the high-dimensional feature image set, comprising:
slicing the high-dimensional feature image set to obtain an input sequence;
inputting the input sequence into an LSTM network, and enabling the LSTM network to output a characteristic sequence according to the input sequence;
and inputting the characteristic sequence into a decoding network with an attention mechanism, and enabling the decoding network to output the predicted text information according to the characteristic sequence.
6. The method for joint image text recognition and fuzzy judgment of claim 1, wherein the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional feature image set, comprising:
performing dimensionality reduction on the high-dimensional feature image set to obtain a low-dimensional feature image set, and correspondingly stretching the low-dimensional feature image set into input vectors;
inputting the input vector into a two-class network, and enabling the two-class network to output a target vector according to the input vector;
and converting the target vector into the prediction fuzzy probability through a softmax function.
7. The joint image text recognition and ambiguity resolution method of claim 6, wherein the two-class network consists of three fully connected layers.
8. A combined image text recognition and blur determination device, comprising:
the initial model building module is used for building an initial model; the initial model comprises a convolutional neural network, an image sequence identification network and an image fuzzy judgment network;
the convolutional neural network training module is used for acquiring text image sets and real text information and real fuzzy probability corresponding to the text image sets, inputting the text image sets into the convolutional neural network, and enabling the convolutional neural network to output high-dimensional feature image sets according to the text image sets;
the image sequence recognition network training module is used for inputting the high-dimensional characteristic image set into the image sequence recognition network, so that the image sequence recognition network outputs predicted text information corresponding to the text image set according to the high-dimensional characteristic image set;
the image fuzzy judgment network training module is used for inputting the high-dimensional characteristic image set into the image fuzzy judgment network so that the image fuzzy judgment network outputs the prediction fuzzy probability corresponding to the text image set according to the high-dimensional characteristic image set;
the network error calculation module is used for calculating the identification error of the image sequence identification network according to the real text information and the predicted text information and calculating the judgment error of the image fuzzy judgment network according to the real fuzzy probability and the predicted fuzzy probability;
a target model obtaining module, configured to reversely input the identification error and the determination error into the convolutional neural network, update parameters of the convolutional neural network, and end training of the initial model when the convolutional neural network converges, to obtain a target model;
and the text image detection module is used for inputting the text image to be detected into the target model to obtain target fuzzy probability and target text information.
9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of joint image text recognition and blur determination of claims 1-7.
CN202010077341.4A 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium Active CN111291672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010077341.4A CN111291672B (en) 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010077341.4A CN111291672B (en) 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111291672A true CN111291672A (en) 2020-06-16
CN111291672B CN111291672B (en) 2023-05-12

Family

ID=71021436

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010077341.4A Active CN111291672B (en) 2020-01-22 2020-01-22 Combined image text recognition and fuzzy judgment method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111291672B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881881A (en) * 2020-08-10 2020-11-03 晶璞(上海)人工智能科技有限公司 Machine intelligent text recognition credibility judgment method based on multiple dimensions
CN113486858A (en) * 2021-08-03 2021-10-08 济南博观智能科技有限公司 Face recognition model training method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057169A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Text detection method, storage medium, and computer device
CN110188819A (en) * 2019-05-29 2019-08-30 电子科技大学 A kind of CNN and LSTM image high-level semantic understanding method based on information gain
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057169A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Text detection method, storage medium, and computer device
CN110188819A (en) * 2019-05-29 2019-08-30 电子科技大学 A kind of CNN and LSTM image high-level semantic understanding method based on information gain
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林硕蕾;: "基于参数模糊判断的海量信息挖掘模型" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881881A (en) * 2020-08-10 2020-11-03 晶璞(上海)人工智能科技有限公司 Machine intelligent text recognition credibility judgment method based on multiple dimensions
CN113486858A (en) * 2021-08-03 2021-10-08 济南博观智能科技有限公司 Face recognition model training method and device, electronic equipment and storage medium
CN113486858B (en) * 2021-08-03 2024-01-23 济南博观智能科技有限公司 Face recognition model training method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111291672B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN108376132B (en) Method and system for judging similar test questions
CN113591546B (en) Semantic enhancement type scene text recognition method and device
CN110033008B (en) Image description generation method based on modal transformation and text induction
CN113010656B (en) Visual question-answering method based on multi-mode fusion and structural control
CN116861014B (en) Image information extraction method and device based on pre-training language model
CN111259940A (en) Target detection method based on space attention map
CN112633420B (en) Image similarity determination and model training method, device, equipment and medium
CN111291672A (en) Method and device for combined image text recognition and fuzzy judgment and storage medium
CN111046771A (en) Training method of network model for recovering writing track
CN112966626A (en) Face recognition method and device
CN114022697A (en) Vehicle re-identification method and system based on multitask learning and knowledge distillation
CN115482385A (en) Semantic segmentation self-adaptive knowledge distillation method based on channel features
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
CN114283325A (en) Underwater target identification method based on knowledge distillation
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
CN115048870A (en) Target track identification method based on residual error network and attention mechanism
CN109685823B (en) Target tracking method based on deep forest
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN112560440A (en) Deep learning-based syntax dependence method for aspect-level emotion analysis
Huang et al. Flow of renyi information in deep neural networks
CN114357166A (en) Text classification method based on deep learning
Hallyal et al. Optimized recognition of CAPTCHA through attention models
Cao et al. Separable-programming based probabilistic-iteration and restriction-resolving correlation filter for robust real-time visual tracking
Feng et al. Research on optimization method of convolutional nerual network
CN116050391B (en) Speech recognition error correction method and device based on subdivision industry error correction word list

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant