CN110070042A - Character recognition method, device and electronic equipment - Google Patents
Character recognition method, device and electronic equipment Download PDFInfo
- Publication number
- CN110070042A CN110070042A CN201910327434.5A CN201910327434A CN110070042A CN 110070042 A CN110070042 A CN 110070042A CN 201910327434 A CN201910327434 A CN 201910327434A CN 110070042 A CN110070042 A CN 110070042A
- Authority
- CN
- China
- Prior art keywords
- image
- text
- region
- decoding
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Character Discrimination (AREA)
Abstract
The present disclosure discloses a kind of character recognition method, device and electronic equipments.Wherein, the character recognition method includes: that the image-region including text is obtained from original image;The characteristics of image that text is extracted from described image region generates character features image;The character features image is subjected to the first coding and generates the first coded image;First coded image is subjected to the second coding at least once and generates the second coded image;Generation decoding image is decoded to second coded image;Characteristics of image in the decoding image is classified to identify the text.The disclosure solves Text region accuracy in the prior art and promotes difficult technical problem by the way that multiple pictograph cataloged procedure is added during Text region.
Description
Technical field
This disclosure relates to field of information processing, more particularly to a kind of character recognition method, device and electronic equipment.
Background technique
Text region generally refers to carry out the image file of text information analysis identifying processing, obtains text and space of a whole page letter
The process of breath.In general, Text region generally comprises two processes of detection and identification, and wherein detection process includes finding image
In include text region, identification process includes the text identified in the character area.
The method that template matching or feature extraction comparative feature generally can be used in traditional identification process, but it is this
Method would generally be influenced by the state of text, such as the direction of text, the intensity of light etc., lead to the accuracy and speed of identification
It spends limited.In recent years, knowledge method for distinguishing also is carried out using full Connection Neural Network, but full Connection Neural Network can not identify
The semantic information of text causes recognition accuracy not promoted further.Also there is technology that semantic model is added in identification process,
But the semantic feature that can identify of semantic model is difficult to further increase the accuracy rate of identification also than relatively limited.
Summary of the invention
According to one aspect of the disclosure, the following technical schemes are provided:
A kind of character recognition method, comprising: the image-region including text is obtained from original image;
The characteristics of image that text is extracted from described image region generates character features image;By the character features image
It carries out the first coding and generates the first coded image;First coded image is subjected to the second coding at least once and generates the second volume
Code image;Generation decoding image is decoded to second coded image;Characteristics of image in the decoding image is carried out
Classification is to identify the text.
Further, the characteristics of image that text is extracted from described image region generates character features image, comprising:
Described image region is inputted into convolutional neural networks;The character features figure of C*H*W size is exported by the convolutional neural networks
Picture, wherein C is the port number of character features image, and C >=1, H are the height of character features image, and H >=1, W are character features figure
The width of picture, W >=1.
Further, described that the character features image is subjected to first coding the first coded image of generation, comprising: by institute
It states character features image and inputs the first LSTM network;The first LSTM network exports first coded image.
It is further, described that first coded image is subjected to second coding the second coded image of generation at least once,
It include: that first coded image is inputted into go-between, the go-between includes at least one layer of LSTM network;The centre
Network exports second coded image.
Further, described that generation decoding image is decoded to second coded image, comprising: to be compiled described second
Code image input decoding LSTM network;The decoding LSTM network output decoding image.
Further, the characteristics of image by the decoding image is classified to identify the text, comprising:
The decoding image is inputted into the first fully-connected network;Text included in the first fully-connected network output decoding image
Classification;The text in described image region is identified according to the text classification.
Further, the text identified according to the text classification in described image region, comprising: will be identified as
The adjacent text of the same text classification merges into the same text;It is exported amalgamation result as recognition result.
According to another aspect of the disclosure, also the following technical schemes are provided:
A kind of training method of Text region model, comprising:
Initialize Text region model parameter, wherein in the Text region model include a convolutional neural networks,
At least three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, LSTM network and Quan Lian
Connect the parameter of network;
Training image is obtained from training set, includes the classification mark of text and text in the training image;
The training image is exported into a character features image by the convolutional neural networks;
The character features image is exported into a decoding image by at least three LSTM network;
The decoding image is exported into the text classification in the training image by the full articulamentum;
The value of the loss function of the Text region model is calculated according to the text classification and the classification of text mark;
According to the value of the loss function adjust the Text region model parameter until the loss function value most
It is small.
According to another aspect of the disclosure, also the following technical schemes are provided:
A kind of character recognition method, comprising: obtain original image, include text in the original image;To described original
Image is pre-processed to obtain the image-region including the text;Described image region is inputted by above-mentioned Text region mould
Text region model obtained from the training method training of type;The Text region model exports the type of the text.
According to another aspect of the disclosure, also the following technical schemes are provided:
A kind of character recognition device, comprising:
Image-region identification module, for obtaining the image-region including text from original image;
Character features image generation module, it is special that the characteristics of image for extracting text from described image region generates text
Levy image;
First coded image generation module generates the first code pattern for the character features image to be carried out the first coding
Picture;
Second coded image generation module, for first coded image to be carried out the second coding generation at least once
Two coded images;
Image generation module is decoded, for being decoded generation decoding image to second coded image;
First categorization module, for classifying the characteristics of image in the decoding image to identify the text.
Further, the character features image generation module, further includes:
Convolutional neural networks input module, for described image region to be inputted convolutional neural networks;
Convolutional neural networks output module, for exporting the character features of C*H*W size by the convolutional neural networks
Image, wherein C is the port number of character features image, and C >=1, H are the height of character features image, and H >=1, W are character features
The width of image, W >=1.
Further, the first coded image generation module, further includes:
First LSTM network inputs module, for the character features image to be inputted the first LSTM network;
First LSTM network output module exports first coded image for the first LSTM network.
Further, the second coded image generation module, further includes:
Go-between input module, for first coded image to be inputted go-between, the go-between is at least
Including one layer of LSTM network;
Go-between output module exports second coded image for the go-between.
Further, the decoding image generation module, further includes:
LSTM network inputs module is decoded, for second coded image to be inputted decoding LSTM network;
LSTM network output module is decoded, for decoding LSTM network output decoding image.
Further, first categorization module, further includes:
Fully-connected network input module, for the decoding image to be inputted the first fully-connected network;
Fully-connected network output module, for text class included in first fully-connected network output decoding image
Not;
Text region module, for identifying the text in described image region according to the text classification.
Further, the Text region module, further includes:
Merging module, for the adjacent text for being identified as the same text classification to be merged into the same text;
As a result output module, for being exported amalgamation result as recognition result.
According to another aspect of the disclosure, also the following technical schemes are provided:
A kind of training device of Text region model, comprising:
Parameter initialization module, for initializing the parameter of Text region model, wherein being wrapped in the Text region model
A convolutional neural networks, at least three LSTM networks and a fully-connected network are included, the parameter includes the convolutional Neural
The parameter of network, LSTM network and fully-connected network;
Training image obtains module, includes text in the training image for obtaining training image from training set
And the classification mark of text;
Convolution module, for the training image to be exported a character features image by the convolutional neural networks;
Coding/decoding module, for the character features image to be exported a decoding by at least three LSTM network
Image;
Second categorization module, for the decoding image to be exported the text in the training image by the full articulamentum
Word classification;
Error calculating module, for calculating the Text region mould according to the classification of the text classification and text mark
The value of the loss function of type;
Module is adjusted, for adjusting the parameter of the Text region model according to the value of the loss function until the damage
The value for losing function is minimum.
According to the another aspect of the disclosure, and also the following technical schemes are provided:
A kind of character recognition device, comprising:
Original image obtains module, includes text in the original image for obtaining original image;
Preprocessing module obtains the image-region including the text for being pre-processed to the original image;
Input module, for obtaining the input of described image region by the training method training of above-mentioned Text region model
Text region model;
Output module exports the type of the text for the Text region model.
According to the another aspect of the disclosure, and also the following technical schemes are provided:
A kind of electronic equipment, comprising: memory, for storing non-transitory computer-readable instruction;And processor, it uses
In running the computer-readable instruction, so that the processor realizes step described in any of the above-described character recognition method when executing
Suddenly.
According to the another aspect of the disclosure, and also the following technical schemes are provided:
A kind of computer readable storage medium, for storing non-transitory computer-readable instruction, when the non-transitory
When computer-readable instruction is executed by computer, so that the step of computer executes described in any of the above-described method.
The present disclosure discloses a kind of character recognition method, device and electronic equipments.Wherein, the character recognition method packet
It includes: obtaining the image-region including text from original image;The characteristics of image that text is extracted from described image region generates
Character features image;The character features image is subjected to the first coding and generates the first coded image;By first code pattern
The second coded image is generated as carrying out the second coding at least once;Second coded image is decoded and generates decoding figure
Picture;Characteristics of image in the decoding image is classified to identify the text.The disclosure passes through in Text region
Multiple pictograph cataloged procedure is added in the process, solves Text region accuracy in the prior art and promotes difficult technology
Problem.
Above description is only the general introduction of disclosed technique scheme, in order to better understand the technological means of the disclosure, and
It can be implemented in accordance with the contents of the specification, and to allow the above and other objects, features and advantages of the disclosure can be brighter
Show understandable, it is special below to lift preferred embodiment, and cooperate attached drawing, detailed description are as follows.
Detailed description of the invention
Fig. 1 is the flow diagram according to the character recognition method of an embodiment of the present disclosure;
Fig. 2 is according to the image-region comprising text oriented in the character recognition method of an embodiment of the present disclosure
Schematic diagram;
Fig. 3 is the schematic diagram merged according to the classification of the character recognition method of an embodiment of the present disclosure and text;
Fig. 4 is the schematic diagram according to a kind of training method of Text region model of an embodiment of the present disclosure;
Fig. 5 is the structural schematic diagram according to the character recognition device of an embodiment of the present disclosure;
Fig. 6 is the structural schematic diagram of the electronic equipment provided according to the embodiment of the present disclosure.
Specific embodiment
Illustrate embodiment of the present disclosure below by way of specific specific example, those skilled in the art can be by this specification
Disclosed content understands other advantages and effect of the disclosure easily.Obviously, described embodiment is only the disclosure
A part of the embodiment, instead of all the embodiments.The disclosure can also be subject to reality by way of a different and different embodiment
It applies or applies, the various details in this specification can also be based on different viewpoints and application, in the spirit without departing from the disclosure
Lower carry out various modifications or alterations.It should be noted that in the absence of conflict, the feature in following embodiment and embodiment can
To be combined with each other.Based on the embodiment in the disclosure, those of ordinary skill in the art are without creative efforts
Every other embodiment obtained belongs to the range of disclosure protection.
It should be noted that the various aspects of embodiment within the scope of the appended claims are described below.Ying Xian
And be clear to, aspect described herein can be embodied in extensive diversified forms, and any specific structure described herein
And/or function is only illustrative.Based on the disclosure, it will be understood by one of ordinary skill in the art that one described herein
Aspect can be independently implemented with any other aspect, and can combine the two or both in these aspects or more in various ways.
For example, carry out facilities and equipments in terms of any number set forth herein can be used and/or practice method.In addition, can make
With other than one or more of aspect set forth herein other structures and/or it is functional implement this equipment and/or
Practice the method.
It should also be noted that, diagram provided in following embodiment only illustrates the basic structure of the disclosure in a schematic way
Think, component count, shape and the size when only display is with component related in the disclosure rather than according to actual implementation in schema are drawn
System, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its assembly layout kenel can also
It can be increasingly complex.
In addition, in the following description, specific details are provided for a thorough understanding of the examples.However, fields
The skilled person will understand that the aspect can be practiced without these specific details.
The embodiment of the present disclosure provides a kind of character recognition method.The character recognition method provided in this embodiment can be by
One computing device executes, and the computing device can be implemented as software, or be embodied as the combination of software and hardware, the meter
It calculates device and can integrate and be arranged in server, terminal device etc..As shown in Figure 1, the character recognition method mainly includes such as
Lower step S101 to step S106.Wherein:
Step S101: the image-region including text is obtained from original image;
In the disclosure, described that original image is obtained from image source, wherein described image source be local storage space or
Person's network storage space, it is described from image source obtain original image, including from local storage space obtain original image or
Original image is obtained from network storage space, no matter wherefrom obtains original image, and first choice needs to obtain depositing for original image
Address is stored up, obtains original image from the storage address later.
Described image source can also be imaging sensor, described to obtain original image from image source, including from image sensing
Original image is acquired in device.Described image sensor refers to the various equipment that can acquire image, and typical imaging sensor is to take the photograph
Camera, camera, camera etc..In this embodiment, described image sensor can be the camera on mobile terminal, such as intelligence
The original image of preposition or rear camera on energy mobile phone, camera acquisition can be directly displayed at the display screen of mobile phone
On, in this step, video captured by imaging sensor is obtained, for further identifying the text in image in next step.
It in the disclosure, include text in the original image, in a typical application, user uses mobile terminal
Camera shooting environmental in object, may include text on the object in the environment, the object in the environment can be with
It is books, road sign, signboard etc..In another typical application, described image is the video frame in video, the video frame
In include subtitle in text or video in video on object.
In this step, described that the image-region including text is obtained from original image, may include:
Original image is pre-processed to obtain pretreatment image;
The image-region comprising text is oriented from the pretreatment image.
In one embodiment, the pretreatment includes being removed dryness to original image, at slant correction and various filtering
Reason, if input picture is grayscale or color image, can also carry out binary conversion treatment.The binaryzation of image is exactly by gray scale
Image is converted into the process of only black and white two color value images.Binaryzation also belongs to the image segmentation skill in image procossing
Art, image segmentation mainly have threshold value, edge detection and region to increase three categories method.Most common of them is threshold method, threshold
When value is exactly binaryzation, the threshold value of prospect and background is distinguished, the pixel less than or equal to threshold value belongs to prospect, and others belong to back
Scape.Binaryzation is generally divided into two class of global binaryzation and local binarization, wherein global binaryzation uses static threshold, according to whole
The statistical nature of width image does binary conversion treatment using the same threshold value, and the method for typical overall situation binaryzation includes: maximum kind
Between variance method, the threshold method based on entropy, cluster threshold method, fuzzy binary images;Local binarization uses dynamic threshold, is according to picture
The binary processing method of the feature selecting difference threshold value of vegetarian refreshments neighborhood, the determination of each pixel or a pocket threshold value
Be with its surrounding pixel in relation to and with the pixel of other positions it is unrelated.
After being pre-processed to obtain pretreatment image to image, the figure of text is oriented from the pretreatment image
As region.Need to carry out text image cutting at this time, so-called cutting is exactly to divide to extract single word from entire image
The process of image is accorded with, the method for cutting can be based on several combinations in following tactful or following strategy:
1, classical cutting method: the classics cutting method is also referred to as standard cutting method, it is according to text image itself
Possessed some attributive character, such as wide, high, baseline position, are cut into single character picture.Main classics cutting method
Including spacing method, syncopation based on projection properties analysis etc..
2, connected region domain method: connected region domain method is first to find out all pixel regions that is connected, then pass through analysis connected domain
Relationship between itself feature and connected domain, this method are suitable for the cutting of western language handwritten form more.
3, based on the cutting method of identification: this method firstly generates multiple non-deterministic cuttings it is assumed that then identifying, in conjunction with
It is that classifier provides as a result, using recognition confidence, syntax and semantics analyze the methods of uncertain cutting result is repaired
It just and selects, obtains optimal cutting result.
4, whole syncopation: whole syncopation is that a word is identified as a whole, is based on pre-
The cutting of the text image of word composition in " dictionary library " of the range first defined.
By the process of above-mentioned cutting, the character segmentation in the pretreatment image is come out, the figure of parallel synthesis text
As region.As shown in Fig. 2, include " I am Chinese " five words in described image, by after S101, described " I am
The image-region of Chinese " is decided to be and outlines.Identification step later is all based on the image district of the text oriented
Domain.
It is understood that the method for obtaining the image-region including text is not limited to side cited in above-mentioned steps
Method only lists the used method in traditional OCR in above-mentioned steps, can also actually use the algorithm of deep learning
The image-region of the text is positioned, typical such as target detection method, details are not described herein, any to position text
The method of the image-region of word can use in this step.
Step S102: the characteristics of image that text is extracted from described image region generates character features image;
In the disclosure, the characteristics of image that text is extracted from described image region generates character features image, can
To include:
Described image region is inputted into convolutional neural networks;
The character features image of C*H*W size is exported by the convolutional neural networks, wherein C is character features image
Port number, C >=1, H are the height of character features image, and H >=1, W are the width of character features image, W >=1.
Wherein the convolutional neural networks can only include input layer and convolutional layer, can also include pond layer, optional
, the image-region for the text that step S101 center is selected is entered the input layer of the convolutional neural networks, passes through convolution later
After the convolution of layer, the image-region of the text is converted into the character features image of a C*1*W size, wherein the C
It is related with the quantity of convolution kernel used in the last layer convolutional layer for the port number of character features image, C be more than or equal to
1 positive integer;W is the width of character features image, and W is also the positive integer more than or equal to 1;1 in above-mentioned C*1*W is special for text
The height of image is levied, in other words above-mentioned convolutional neural networks extract the characteristics of image in character image region by convolutional layer,
Generating a height is 1, width W, and port number is the character features image of C.Typically, such as input picture size is 1*
32*1024, image herein are 1 channel, it is assumed that the image of binaryzation or the image of gray processing are used, if it is colour
Image, generally 3 channels.Assuming that the convolutional neural networks include input layer, the first convolutional layer, the second convolutional layer, third volume
Lamination and pond layer, wherein the first convolutional layer includes the convolution kernel of 3 5*5, stepping 1, then the input of the second volume layer is one
The characteristic image of 3* (32-5+1) * (1024-5+1)=3*28*1020, the second convolutional layer include the convolution kernel of 16 7*7, then and
The input of three convolutional layers is 16* (28-7+1) * (1020-7+1)=16*22*1014 characteristic image, third convolutional layer packet
The convolution kernel of 128 15*15 is included, then the input of pond layer is 128* (22-15+1) * (1014-15+1)=128*8*
1000 characteristic image, pond layer are that window size is 8*8, and the maximum pond layer that stepping is 8, then above-mentioned convolutional neural networks pass through
It crosses pond layer and obtains the characteristic image of a 128*1*125 later.It should be understood that the structure of above-mentioned convolutional neural networks is only
It is citing, can actually be designed to any structure as needed, herein just to illustrates that convolutional neural networks can be passed through
Feature is extracted to the image-region of text and generates character features image.
Step S103: the character features image is subjected to the first coding and generates the first coded image;
In the disclosure, first coding can by LSTM network implementations, it is described by the character features image into
The coding of row first generates the first coded image, comprising:
The character features image is inputted into the first LSTM network;
The first LSTM network exports first coded image.
It is the characteristic image of a 128*1*125, wherein 128 are characterized by taking the characteristic image in step S102 as an example
The port number of image, 1 is characterized the height of image, and 125 are characterized the width of image, at this time using the characteristic image as one
The time series that length is 125 inputs the first LSTM network, false using each 128*1 as the input in LSTM each time point
If the final output of the first LSTM network is the characteristic image of a 128*1*256, this feature image is the first coded image,
Due to the LSTM network used, the input of LSTM network includes the output of its last moment, therefore the network has memory, can
To remember the contextual information of text.
Step S104: first coded image is subjected to the second coding at least once and generates the second coded image;
In the disclosure, second coding can be realized by a go-between, wherein described encode described first
Image carries out the second coding at least once and generates the second coded image, comprising: first coded image is inputted into go-between,
The go-between includes at least one layer of LSTM network;The go-between exports second coded image.
In this step, the go-between can be formed by least one layer of LSTM network, and specifically the middle layer is defeated
The dimension for entering layer should be identical as the output dimension of characteristic image of the first LSTM network, such as in the example of step S103, the
The size of the output characteristic image of one LSTM network is 128*1*256, then the go-between length of time series is 256, by the
The input of go-between of each column 128*1 of one coded image as each time point.Go-between includes at least one
LSTM network, it is possible to understand that, more LSTM networks can learn and understand more semantic informations, can by go-between
More features is arrived with study, reinforces the accuracy in final Text region.The step to image carry out first coding after,
Further progress second encodes, and generates the second coded image, and second coded image includes the semantic letter of more texts
Breath.
Step S105: generation decoding image is decoded to second coded image;
In the disclosure, the decoding can be completed by decoding LSTM network, described to second coded image
It is decoded generation decoding image, comprising:
Second coded image is inputted into decoding LSTM network;
The decoding LSTM network output decoding image.
In this step, the decoding is actually still by the way that the output result of go-between is passed through another
LSTM network generates the characteristic image for classification.For decoding LSTM network, the dimension for being only required to its input is equal to
The dimension of go-between output, the dimension of output are equal to the processing dimension after it.If the output of go-between is one
The characteristic image of 256*1*256, i.e. the second coded image, then the length of time series is 256, can be according to the place of next step
Reason attribute decodes the output of LSTM network to design, so that processing step later uses decoding image.Specifically, the decoding
The output of LSTM can be the characteristic image of a 128*1*256.
Step S106: the characteristics of image in the decoding image is classified to identify the text.
In the disclosure, the characteristics of image by the decoding image is classified to identify the text, packet
It includes:
The decoding image is inputted into the first fully-connected network;
Text classification included in the first fully-connected network output decoding image;
The text in described image region is identified according to the text classification.
In this step, characteristic image obtained in step S105 is entered and left into a fully-connected network, such as in step S105
Obtained in decoding image be 128*1*256, then the fully-connected network can be designed as include 128*256=32768 input,
Using every 128 input as one group (pixel of the 1*1 on namely each channel is as one group) by entirely connect be mapped to it is N number of
In output, wherein N is the categorical measure of the text to be classified, and for Chinese character, commonly used word probably has 6000, that is to say, that
At least need 6000 outputs.The output of every N number of fully-connected network is calculated into each lead to by the activation of softmax function
Text representated by road, using the maximum text of softmax functional value as the text identified.
Further, described that text in described image region is identified according to the text classification, can also include:
The adjacent text for being identified as the same text classification is merged into the same text;
It is exported amalgamation result as recognition result.
As shown in figure 3, the recognition result in the decoding image of the 128*1*256 is as shown, can be front and back phase
Word in adjacent segmentation block may be identified as the same word, can merge adjacent two word between two spaces at this time
For a word, to form final recognition result.
As shown in figure 4, for the training method of the Text region model in the disclosure, the character recognition method of the disclosure can be with
It is executed by Text region model, the Text region model needs are trained in advance, wherein the Text region model
Training method, comprising:
Step S401: the parameter of initialization Text region model, wherein including a convolution in the Text region model
Neural network, at least three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, LSTM net
The parameter of network and fully-connected network;
Step S402: obtaining training image from training set, includes the class of text and text in the training image
It does not mark;
Step S403: the training image is exported into a character features image by the convolutional neural networks;
Step S404: the character features image is exported into a decoding image by at least three LSTM network;
Step S405: the decoding image is exported into the text classification in the training image by the full articulamentum;
Step S406: the loss of the Text region model is calculated according to the text classification and the classification of text mark
The value of function;
Step S407: the parameter of the Text region model is adjusted until the loss letter according to the value of the loss function
Several values is minimum.
In the Text region model in the disclosure, including at least three parts, one is characteristic extraction part, this portion
Divide and is completed by convolutional neural networks;Semantics recognition part, the part are completed by least three LSTM networks;And classified part,
It is completed by a fully-connected network part.
The parameter of initialization Text region model in step S401, includes above-mentioned convolutional neural networks;LSTM network
And the parameter of fully-connected network, specifically, the parameter of the convolutional neural networks includes at least used in each convolutional layer
Quantity, the size of convolution kernel, the weighted value in convolution kernel and size of pond window of convolution kernel etc.;The LSTM network
Parameter includes at least the weight value matrix in LSTM;The parameter of fully-connected network includes at least every layer of full connection weight coefficient etc.
These parameters can be randomly generated Deng, the initialization to be also possible to read initial specified parameter, details are not described herein.
After initiation parameter, in step S402, training set is inputted into the Text region model, wherein described
Training set is combined into the training atlas including text and label character, wherein the label character can be implemented as cutting mark,
Realize that by training atlas cutting be finally mark corresponding with prediction result form, if prediction result is finally by the image of input
Cutting is 256 parts on the width, then the label character needs to be processed into the mark that cutting on the width is 256 parts, i.e., more
The image of part width can be labeled as the same text.
In step S403- step S405, the image in the training set is obtained by the all-network in verbal model
To a prediction result, this process is identical as common training process, repeats no more.
In step S406, the predicted value and the mark value are brought into and calculate penalty values in loss function, loss function
Setting any suitable loss function can be used, be not the emphasis of the disclosure, repeat no more.
Step S407 adjusts the parameter in Text region model according to the penalty values that the loss function calculates, and again
New penalty values are obtained by the verbal model after training set and adjusting parameter, are repeated the above process up to loss function
Until value is minimum.
The disclosure further includes a kind of method for carrying out Text region using above-mentioned Text region model, comprising:
Original image is obtained, includes text in the original image;
The original image is pre-processed to obtain the image-region including the text;
By described image region input Text region model as obtained from the training of above-mentioned Text region model training method;
The Text region model exports the type of the text.
The process is the prediction process of above-mentioned Text region model, and it is shown in FIG. 1 specifically to predict that details can refer to
Process, details are not described herein.
The present disclosure discloses a kind of character recognition method, device and electronic equipments.Wherein, the character recognition method packet
It includes: obtaining the image-region including text from original image;The characteristics of image that text is extracted from described image region generates
Character features image;The character features image is subjected to the first coding and generates the first coded image;By first code pattern
The second coded image is generated as carrying out the second coding at least once;Second coded image is decoded and generates decoding figure
Picture;Characteristics of image in the decoding image is classified to identify the text.The disclosure passes through in Text region
Multiple pictograph cataloged procedure is added in the process, solves Text region accuracy in the prior art and promotes difficult technology
Problem.
Hereinbefore, although describing each step in above method embodiment, this field skill according to above-mentioned sequence
Art personnel it should be clear that the step in the embodiment of the present disclosure not necessarily executes in the order described above, can also with inverted order, it is parallel,
Other sequences such as intersection execute, moreover, those skilled in the art can also add other steps on the basis of above-mentioned steps
Suddenly, the mode of these obvious variants or equivalent replacement should also be included within the protection scope of the disclosure, and details are not described herein.
It is below embodiment of the present disclosure, embodiment of the present disclosure can be used for executing embodiments of the present disclosure realization
The step of, for ease of description, part relevant to the embodiment of the present disclosure is illustrated only, it is disclosed by specific technical details, it asks
Referring to embodiments of the present disclosure.
The embodiment of the present disclosure provides a kind of character recognition device.Described device can execute above-mentioned character recognition method and implement
The step of described in example.As shown in figure 5, described device 500 specifically includes that image-region identification module 501, character features image
Generation module 502, the first coded image generation module 503, the second coded image generation module 504, decoding image generation module
505 and first categorization module 506.Wherein,
Image-region identification module 501, for obtaining the image-region including text from original image;
Character features image generation module 502, the characteristics of image for extracting text from described image region generate text
Word characteristic image;
First coded image generation module 503 generates the first volume for the character features image to be carried out the first coding
Code image;
Second coded image generation module 504 is given birth to for first coded image to be carried out the second coding at least once
At the second coded image;
Image generation module 505 is decoded, for being decoded generation decoding image to second coded image;
First categorization module 506, for classifying the characteristics of image in the decoding image to identify the text
Word.
Further, the character features image generation module 502, further includes:
Convolutional neural networks input module, for described image region to be inputted convolutional neural networks;
Convolutional neural networks output module, for exporting the character features of C*H*W size by the convolutional neural networks
Image, wherein C is the port number of character features image, and C >=1, H are the height of character features image, and H >=1, W are character features
The width of image, W >=1.
Further, the first coded image generation module 503, further includes:
First LSTM network inputs module, for the character features image to be inputted the first LSTM network;
First LSTM network output module exports first coded image for the first LSTM network.
Further, the second coded image generation module 504, further includes:
Go-between input module, for first coded image to be inputted go-between, the go-between is at least
Including one layer of LSTM network;
Go-between output module exports second coded image for the go-between.
Further, the decoding image generation module 505, further includes:
LSTM network inputs module is decoded, for second coded image to be inputted decoding LSTM network;
LSTM network output module is decoded, for decoding LSTM network output decoding image.
Further, first categorization module 506, further includes:
Fully-connected network input module, for the decoding image to be inputted the first fully-connected network;
Fully-connected network output module, for text class included in first fully-connected network output decoding image
Not;
Text region module, for identifying the text in described image region according to the text classification.
Further, the Text region module, further includes:
Merging module, for the adjacent text for being identified as the same text classification to be merged into the same text;
As a result output module, for being exported amalgamation result as recognition result.
Fig. 5 shown device can execute Fig. 1-embodiment illustrated in fig. 3 method, the part that the present embodiment is not described in detail,
It can refer to Fig. 1-embodiment illustrated in fig. 3 related description.The implementation procedure and technical effect of the technical solution are referring to Fig. 1-
Description in embodiment illustrated in fig. 3, details are not described herein.
A kind of training device of the embodiment of the present disclosure also for Text region model, comprising:
Parameter initialization module, for initializing the parameter of Text region model, wherein being wrapped in the Text region model
A convolutional neural networks, at least three LSTM networks and a fully-connected network are included, the parameter includes the convolutional Neural
The parameter of network, LSTM network and fully-connected network;
Training image obtains module, includes text in the training image for obtaining training image from training set
And the classification mark of text;
Convolution module, for the training image to be exported a character features image by the convolutional neural networks;
Coding/decoding module, for the character features image to be exported a decoding by at least three LSTM network
Image;
Second categorization module, for the decoding image to be exported the text in the training image by the full articulamentum
Word classification;
Error calculating module, for calculating the Text region mould according to the classification of the text classification and text mark
The value of the loss function of type;
Module is adjusted, for adjusting the parameter of the Text region model according to the value of the loss function until the damage
The value for losing function is minimum.
The embodiment of the present disclosure is also for a kind of character recognition device, comprising:
Original image obtains module, includes text in the original image for obtaining original image;
Preprocessing module obtains the image-region including the text for being pre-processed to the original image;
Input module, for obtaining the input of described image region by the training method training of above-mentioned Text region model
Text region model;
Output module exports the type of the text for the Text region model.
Below with reference to Fig. 6, it illustrates the structural representations for the electronic equipment 600 for being suitable for being used to realize the embodiment of the present disclosure
Figure.Electronic equipment in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting and connect
Receive device, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle
Carry navigation terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Electricity shown in Fig. 6
Sub- equipment is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.)
601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608
Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment
Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM 603 pass through the phase each other of bus 604
Even.Input/output (I/O) interface 605 is also connected to bus 604.
In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, figure
As the input unit 606 of sensor, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaking
The output device 607 of device, vibrator etc.;Storage device 608 including such as tape, hard disk etc.;And communication device 609.It is logical
T unit 609 can permit electronic equipment 600 and wirelessly or non-wirelessly be communicated with other equipment to exchange data.Although Fig. 6 shows
The electronic equipment 600 with various devices is gone out, it should be understood that being not required for implementing or having all dresses shown
It sets.It can alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, the computer program includes the program code for method shown in execution flow chart.Such
In embodiment, the computer program can be downloaded and installed from network by communication device 609, or from storage device
608 are mounted, or are mounted from ROM 602.When the computer program is executed by processing unit 601, the disclosure is executed
The above-mentioned function of being limited in the method for embodiment.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, described program can be commanded execution system, device or device use or in connection.And
In the disclosure, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable
Any computer-readable medium other than storage medium, the computer-readable signal media can send, propagate or transmit
For by the use of instruction execution system, device or device or program in connection.It is wrapped on computer-readable medium
The program code contained can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc., or
Above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are described
When electronic equipment executes, so that the electronic equipment: obtaining the image-region including text from original image;From described image
The characteristics of image that text is extracted in region generates character features image;The character features image is subjected to the first coding and generates the
One coded image;First coded image is subjected to the second coding at least once and generates the second coded image;To described second
Coded image is decoded generation decoding image;Characteristics of image in the decoding image is classified to identify the text
Word.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are described
When electronic equipment executes, so that the electronic equipment: the parameter of initialization Text region model, wherein the Text region model
In include a convolutional neural networks, at least three LSTM networks and a fully-connected network, the parameter includes the convolution
The parameter of neural network, LSTM network and fully-connected network;Training image is obtained from training set, is wrapped in the training image
Include the classification mark of text and text;The training image is exported into a character features figure by the convolutional neural networks
Picture;The character features image is exported into a decoding image by at least three LSTM network;By the decoding image
The text classification in the training image is exported by the full articulamentum;According to the text classification and the classification mark of text
Note calculates the value of the loss function of the Text region model;The Text region model is adjusted according to the value of the loss function
Parameter until the value of the loss function is minimum.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are described
When electronic equipment executes, so that the electronic equipment: obtaining original image, include text in the original image;To the original
Beginning image is pre-processed to obtain the image-region including the text;Described image region is inputted by above-mentioned Text region mould
Text region model obtained from the training of type training method;The Text region model exports the type of the text.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof
Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+
+, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can
Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package,
Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part.
In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN)
Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service
Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong
The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer
The combination of order is realized.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard
The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions.
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (18)
1. a kind of character recognition method, comprising:
The image-region including text is obtained from original image;
The characteristics of image that text is extracted from described image region generates character features image;
The character features image is subjected to the first coding and generates the first coded image;
First coded image is subjected to the second coding at least once and generates the second coded image;
Generation decoding image is decoded to second coded image;
Characteristics of image in the decoding image is classified to identify the text.
2. character recognition method as described in claim 1, wherein the image for extracting text from described image region
Feature generates character features image, comprising:
Described image region is inputted into convolutional neural networks;
The character features image of C*H*W size is exported by the convolutional neural networks, wherein C is the channel of character features image
Number, C >=1, H are the height of character features image, and H >=1, W are the width of character features image, W >=1.
3. character recognition method as described in claim 1, wherein described carry out the first coding for the character features image
Generate the first coded image, comprising:
The character features image is inputted into the first LSTM network;
The first LSTM network exports first coded image.
4. character recognition method as described in claim 1, wherein described carry out first coded image at least once
Second coding generates the second coded image, comprising:
First coded image is inputted into go-between, the go-between includes at least one layer of LSTM network;
The go-between exports second coded image.
5. character recognition method as described in claim 1, wherein described be decoded generation to second coded image
Decode image, comprising:
Second coded image is inputted into decoding LSTM network;
The decoding LSTM network output decoding image.
6. character recognition method as described in claim 1, wherein the characteristics of image by the decoding image carries out
Classification is to identify the text, comprising:
The decoding image is inputted into the first fully-connected network;
Text classification included in the first fully-connected network output decoding image;
The text in described image region is identified according to the text classification.
7. character recognition method as claimed in claim 6, wherein described identify described image area according to the text classification
Text in domain, comprising:
The adjacent text for being identified as the same text classification is merged into the same text;
It is exported amalgamation result as recognition result.
8. a kind of training method of Text region model, comprising:
The parameter of Text region model is initialized, wherein including convolutional neural networks, at least in the Text region model
Three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, LSTM network and fully connected network
The parameter of network;
Training image is obtained from training set, includes the classification mark of text and text in the training image;
The training image is exported into a character features image by the convolutional neural networks;
The character features image is exported into a decoding image by at least three LSTM network;
The decoding image is exported into the text classification in the training image by the full articulamentum;
The value of the loss function of the Text region model is calculated according to the text classification and the classification of text mark;
The parameter of the Text region model is adjusted according to the value of the loss function until the value of the loss function is minimum.
9. a kind of character recognition method, comprising:
Original image is obtained, includes text in the original image;
The original image is pre-processed to obtain the image-region including the text;
By described image region input Text region model as obtained from method according to any one of claims 8 training;
The Text region model exports the type of the text.
10. a kind of character recognition device, comprising:
Image-region identification module, for obtaining the image-region including text from original image;
Character features image generation module, the characteristics of image for extracting text from described image region generate character features figure
Picture;
First coded image generation module generates the first coded image for the character features image to be carried out the first coding;
Second coded image generation module generates the second volume for first coded image to be carried out the second coding at least once
Code image;
Image generation module is decoded, for being decoded generation decoding image to second coded image;
First categorization module, for classifying the characteristics of image in the decoding image to identify the text.
11. a kind of training device of Text region model, comprising:
Parameter initialization module, for initializing the parameter of Text region model, wherein including one in the Text region model
A convolutional neural networks, at least three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks,
The parameter of LSTM network and fully-connected network;
Training image obtains module, for obtaining training image from training set, include in the training image text and
The classification of text marks;
Convolution module, for the training image to be exported a character features image by the convolutional neural networks;
Coding/decoding module, for the character features image to be exported a decoding figure by at least three LSTM network
Picture;
Second categorization module, for the decoding image to be exported the text class in the training image by the full articulamentum
Not;
Error calculating module, for calculating the Text region model according to the classification of the text classification and text mark
The value of loss function;
Module is adjusted, for adjusting the parameter of the Text region model according to the value of the loss function until the loss letter
Several values is minimum.
12. a kind of character recognition device, comprising:
Original image obtains module, includes text in the original image for obtaining original image;
Preprocessing module obtains the image-region including the text for being pre-processed to the original image;
Input module, for Text region obtained from training the input of described image region as method according to any one of claims 8
Model;
Output module exports the type of the text for the Text region model.
13. a kind of electronic equipment, comprising:
Memory, for storing computer-readable instruction;And
Processor, for running the computer-readable instruction, so that realizing according to claim 1-7 when the processor is run
Any one of described in character recognition method.
14. a kind of electronic equipment, comprising:
Memory, for storing computer-readable instruction;And
Processor, for running the computer-readable instruction, so that realizing when the processor is run according to claim 8 institute
The training method for the Text region model stated.
15. a kind of electronic equipment, comprising:
Memory, for storing computer-readable instruction;And
Processor, for running the computer-readable instruction, so that realizing when the processor is run according to claim 9 institute
The character recognition method stated.
16. a kind of non-transient computer readable storage medium, for storing computer-readable instruction, when the computer-readable finger
When order is executed by computer, so that the computer perform claim requires character recognition method described in any one of 1-7.
17. a kind of non-transient computer readable storage medium, for storing computer-readable instruction, when the computer-readable finger
When order is executed by computer, so that the computer perform claim requires the training method of Text region model described in 8.
18. a kind of non-transient computer readable storage medium, for storing computer-readable instruction, when the computer-readable finger
When order is executed by computer, so that the computer perform claim requires character recognition method described in 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327434.5A CN110070042A (en) | 2019-04-23 | 2019-04-23 | Character recognition method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327434.5A CN110070042A (en) | 2019-04-23 | 2019-04-23 | Character recognition method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110070042A true CN110070042A (en) | 2019-07-30 |
Family
ID=67368425
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910327434.5A Pending CN110070042A (en) | 2019-04-23 | 2019-04-23 | Character recognition method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110070042A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674813A (en) * | 2019-09-24 | 2020-01-10 | 北京字节跳动网络技术有限公司 | Chinese character recognition method and device, computer readable medium and electronic equipment |
CN110738262A (en) * | 2019-10-16 | 2020-01-31 | 北京市商汤科技开发有限公司 | Text recognition method and related product |
CN111476853A (en) * | 2020-03-17 | 2020-07-31 | 西安万像电子科技有限公司 | Method, equipment and system for encoding and decoding character image |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570457A (en) * | 2016-10-14 | 2017-04-19 | 上海新同惠自动化***有限公司 | Chinese and Japanese character identification method |
CN107239733A (en) * | 2017-04-19 | 2017-10-10 | 上海嵩恒网络科技有限公司 | Continuous hand-written character recognizing method and system |
CN108320624A (en) * | 2017-12-22 | 2018-07-24 | 昆山遥矽微电子科技有限公司 | Text region phonetic machine |
CN108399419A (en) * | 2018-01-25 | 2018-08-14 | 华南理工大学 | Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks |
CN108427953A (en) * | 2018-02-26 | 2018-08-21 | 北京易达图灵科技有限公司 | A kind of character recognition method and device |
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
US20180336452A1 (en) * | 2017-05-22 | 2018-11-22 | Sap Se | Predicting wildfires on the basis of biophysical indicators and spatiotemporal properties using a long short term memory network |
CN109074494A (en) * | 2016-03-28 | 2018-12-21 | 松下知识产权经营株式会社 | Character and graphic identification device, character and graphic recognition methods and character and graphic recognizer |
US10176388B1 (en) * | 2016-11-14 | 2019-01-08 | Zoox, Inc. | Spatial and temporal information for semantic segmentation |
CN109492679A (en) * | 2018-10-24 | 2019-03-19 | 杭州电子科技大学 | Based on attention mechanism and the character recognition method for being coupled chronological classification loss |
-
2019
- 2019-04-23 CN CN201910327434.5A patent/CN110070042A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109074494A (en) * | 2016-03-28 | 2018-12-21 | 松下知识产权经营株式会社 | Character and graphic identification device, character and graphic recognition methods and character and graphic recognizer |
CN106570457A (en) * | 2016-10-14 | 2017-04-19 | 上海新同惠自动化***有限公司 | Chinese and Japanese character identification method |
US10176388B1 (en) * | 2016-11-14 | 2019-01-08 | Zoox, Inc. | Spatial and temporal information for semantic segmentation |
CN107239733A (en) * | 2017-04-19 | 2017-10-10 | 上海嵩恒网络科技有限公司 | Continuous hand-written character recognizing method and system |
US20180336452A1 (en) * | 2017-05-22 | 2018-11-22 | Sap Se | Predicting wildfires on the basis of biophysical indicators and spatiotemporal properties using a long short term memory network |
CN108320624A (en) * | 2017-12-22 | 2018-07-24 | 昆山遥矽微电子科技有限公司 | Text region phonetic machine |
CN108399419A (en) * | 2018-01-25 | 2018-08-14 | 华南理工大学 | Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks |
CN108427953A (en) * | 2018-02-26 | 2018-08-21 | 北京易达图灵科技有限公司 | A kind of character recognition method and device |
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN109492679A (en) * | 2018-10-24 | 2019-03-19 | 杭州电子科技大学 | Based on attention mechanism and the character recognition method for being coupled chronological classification loss |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674813A (en) * | 2019-09-24 | 2020-01-10 | 北京字节跳动网络技术有限公司 | Chinese character recognition method and device, computer readable medium and electronic equipment |
CN110674813B (en) * | 2019-09-24 | 2022-04-05 | 北京字节跳动网络技术有限公司 | Chinese character recognition method and device, computer readable medium and electronic equipment |
CN110738262A (en) * | 2019-10-16 | 2020-01-31 | 北京市商汤科技开发有限公司 | Text recognition method and related product |
CN111476853A (en) * | 2020-03-17 | 2020-07-31 | 西安万像电子科技有限公司 | Method, equipment and system for encoding and decoding character image |
CN111476853B (en) * | 2020-03-17 | 2024-05-24 | 西安万像电子科技有限公司 | Method, equipment and system for encoding and decoding text image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111755078B (en) | Drug molecule attribute determination method, device and storage medium | |
CN111898696B (en) | Pseudo tag and tag prediction model generation method, device, medium and equipment | |
CN110084172A (en) | Character recognition method, device and electronic equipment | |
Zhang et al. | Vsa: Learning varied-size window attention in vision transformers | |
Zhao et al. | Cutie: Learning to understand documents with convolutional universal text information extractor | |
CN112396613B (en) | Image segmentation method, device, computer equipment and storage medium | |
CN111475613A (en) | Case classification method and device, computer equipment and storage medium | |
CN110070042A (en) | Character recognition method, device and electronic equipment | |
Zhou et al. | Reverse-engineering bar charts using neural networks | |
CN114596566B (en) | Text recognition method and related device | |
CN114495129A (en) | Character detection model pre-training method and device | |
CN111507337A (en) | License plate recognition method based on hybrid neural network | |
Capobianco et al. | Historical handwritten document segmentation by using a weighted loss | |
CN114972847A (en) | Image processing method and device | |
CN114581710A (en) | Image recognition method, device, equipment, readable storage medium and program product | |
Yao et al. | Deep capsule network for recognition and separation of fully overlapping handwritten digits | |
Hu et al. | Octave convolution-based vehicle detection using frame-difference as network input | |
CN116980541B (en) | Video editing method, device, electronic equipment and storage medium | |
Peng et al. | Recognizing micro-expression in video clip with adaptive key-frame mining | |
CN117056474A (en) | Session response method and device, electronic equipment and storage medium | |
Zhang et al. | Saliency detection via sparse reconstruction and joint label inference in multiple features | |
CN113111684A (en) | Training method and device of neural network model and image processing system | |
Zheng et al. | Cmfn: Cross-modal fusion network for irregular scene text recognition | |
Panchal et al. | An investigation on feature and text extraction from images using image recognition in Android | |
CN110059739A (en) | Image composition method, device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190730 |