CN110070042A

CN110070042A - Character recognition method, device and electronic equipment

Info

Publication number: CN110070042A
Application number: CN201910327434.5A
Authority: CN
Inventors: 卢永晨
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2019-07-30

Abstract

The present disclosure discloses a kind of character recognition method, device and electronic equipments.Wherein, the character recognition method includes: that the image-region including text is obtained from original image；The characteristics of image that text is extracted from described image region generates character features image；The character features image is subjected to the first coding and generates the first coded image；First coded image is subjected to the second coding at least once and generates the second coded image；Generation decoding image is decoded to second coded image；Characteristics of image in the decoding image is classified to identify the text.The disclosure solves Text region accuracy in the prior art and promotes difficult technical problem by the way that multiple pictograph cataloged procedure is added during Text region.

Description

Character recognition method, device and electronic equipment

Technical field

This disclosure relates to field of information processing, more particularly to a kind of character recognition method, device and electronic equipment.

Background technique

Text region generally refers to carry out the image file of text information analysis identifying processing, obtains text and space of a whole page letter The process of breath.In general, Text region generally comprises two processes of detection and identification, and wherein detection process includes finding image In include text region, identification process includes the text identified in the character area.

The method that template matching or feature extraction comparative feature generally can be used in traditional identification process, but it is this Method would generally be influenced by the state of text, such as the direction of text, the intensity of light etc., lead to the accuracy and speed of identification It spends limited.In recent years, knowledge method for distinguishing also is carried out using full Connection Neural Network, but full Connection Neural Network can not identify The semantic information of text causes recognition accuracy not promoted further.Also there is technology that semantic model is added in identification process, But the semantic feature that can identify of semantic model is difficult to further increase the accuracy rate of identification also than relatively limited.

Summary of the invention

According to one aspect of the disclosure, the following technical schemes are provided:

A kind of character recognition method, comprising: the image-region including text is obtained from original image；

The characteristics of image that text is extracted from described image region generates character features image；By the character features image It carries out the first coding and generates the first coded image；First coded image is subjected to the second coding at least once and generates the second volume Code image；Generation decoding image is decoded to second coded image；Characteristics of image in the decoding image is carried out Classification is to identify the text.

Further, the characteristics of image that text is extracted from described image region generates character features image, comprising: Described image region is inputted into convolutional neural networks；The character features figure of C*H*W size is exported by the convolutional neural networks Picture, wherein C is the port number of character features image, and C >=1, H are the height of character features image, and H >=1, W are character features figure The width of picture, W >=1.

Further, described that the character features image is subjected to first coding the first coded image of generation, comprising: by institute It states character features image and inputs the first LSTM network；The first LSTM network exports first coded image.

It is further, described that first coded image is subjected to second coding the second coded image of generation at least once, It include: that first coded image is inputted into go-between, the go-between includes at least one layer of LSTM network；The centre Network exports second coded image.

Further, described that generation decoding image is decoded to second coded image, comprising: to be compiled described second Code image input decoding LSTM network；The decoding LSTM network output decoding image.

Further, the characteristics of image by the decoding image is classified to identify the text, comprising: The decoding image is inputted into the first fully-connected network；Text included in the first fully-connected network output decoding image Classification；The text in described image region is identified according to the text classification.

Further, the text identified according to the text classification in described image region, comprising: will be identified as The adjacent text of the same text classification merges into the same text；It is exported amalgamation result as recognition result.

According to another aspect of the disclosure, also the following technical schemes are provided:

A kind of training method of Text region model, comprising:

Initialize Text region model parameter, wherein in the Text region model include a convolutional neural networks, At least three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, LSTM network and Quan Lian Connect the parameter of network；

Training image is obtained from training set, includes the classification mark of text and text in the training image；

The training image is exported into a character features image by the convolutional neural networks；

The character features image is exported into a decoding image by at least three LSTM network；

The decoding image is exported into the text classification in the training image by the full articulamentum；

The value of the loss function of the Text region model is calculated according to the text classification and the classification of text mark；

According to the value of the loss function adjust the Text region model parameter until the loss function value most It is small.

A kind of character recognition method, comprising: obtain original image, include text in the original image；To described original Image is pre-processed to obtain the image-region including the text；Described image region is inputted by above-mentioned Text region mould Text region model obtained from the training method training of type；The Text region model exports the type of the text.

A kind of character recognition device, comprising:

Image-region identification module, for obtaining the image-region including text from original image；

Character features image generation module, it is special that the characteristics of image for extracting text from described image region generates text Levy image；

First coded image generation module generates the first code pattern for the character features image to be carried out the first coding Picture；

Second coded image generation module, for first coded image to be carried out the second coding generation at least once Two coded images；

Image generation module is decoded, for being decoded generation decoding image to second coded image；

First categorization module, for classifying the characteristics of image in the decoding image to identify the text.

Further, the character features image generation module, further includes:

Convolutional neural networks input module, for described image region to be inputted convolutional neural networks；

Convolutional neural networks output module, for exporting the character features of C*H*W size by the convolutional neural networks Image, wherein C is the port number of character features image, and C >=1, H are the height of character features image, and H >=1, W are character features The width of image, W >=1.

Further, the first coded image generation module, further includes:

First LSTM network inputs module, for the character features image to be inputted the first LSTM network；

First LSTM network output module exports first coded image for the first LSTM network.

Further, the second coded image generation module, further includes:

Go-between input module, for first coded image to be inputted go-between, the go-between is at least Including one layer of LSTM network；

Go-between output module exports second coded image for the go-between.

Further, the decoding image generation module, further includes:

LSTM network inputs module is decoded, for second coded image to be inputted decoding LSTM network；

LSTM network output module is decoded, for decoding LSTM network output decoding image.

Further, first categorization module, further includes:

Fully-connected network input module, for the decoding image to be inputted the first fully-connected network；

Fully-connected network output module, for text class included in first fully-connected network output decoding image Not；

Text region module, for identifying the text in described image region according to the text classification.

Further, the Text region module, further includes:

Merging module, for the adjacent text for being identified as the same text classification to be merged into the same text；

As a result output module, for being exported amalgamation result as recognition result.

A kind of training device of Text region model, comprising:

Parameter initialization module, for initializing the parameter of Text region model, wherein being wrapped in the Text region model A convolutional neural networks, at least three LSTM networks and a fully-connected network are included, the parameter includes the convolutional Neural The parameter of network, LSTM network and fully-connected network；

Training image obtains module, includes text in the training image for obtaining training image from training set And the classification mark of text；

Convolution module, for the training image to be exported a character features image by the convolutional neural networks；

Coding/decoding module, for the character features image to be exported a decoding by at least three LSTM network Image；

Second categorization module, for the decoding image to be exported the text in the training image by the full articulamentum Word classification；

Error calculating module, for calculating the Text region mould according to the classification of the text classification and text mark The value of the loss function of type；

Module is adjusted, for adjusting the parameter of the Text region model according to the value of the loss function until the damage The value for losing function is minimum.

According to the another aspect of the disclosure, and also the following technical schemes are provided:

A kind of character recognition device, comprising:

Original image obtains module, includes text in the original image for obtaining original image；

Preprocessing module obtains the image-region including the text for being pre-processed to the original image；

Input module, for obtaining the input of described image region by the training method training of above-mentioned Text region model Text region model；

Output module exports the type of the text for the Text region model.

A kind of electronic equipment, comprising: memory, for storing non-transitory computer-readable instruction；And processor, it uses In running the computer-readable instruction, so that the processor realizes step described in any of the above-described character recognition method when executing Suddenly.

A kind of computer readable storage medium, for storing non-transitory computer-readable instruction, when the non-transitory When computer-readable instruction is executed by computer, so that the step of computer executes described in any of the above-described method.

The present disclosure discloses a kind of character recognition method, device and electronic equipments.Wherein, the character recognition method packet It includes: obtaining the image-region including text from original image；The characteristics of image that text is extracted from described image region generates Character features image；The character features image is subjected to the first coding and generates the first coded image；By first code pattern The second coded image is generated as carrying out the second coding at least once；Second coded image is decoded and generates decoding figure Picture；Characteristics of image in the decoding image is classified to identify the text.The disclosure passes through in Text region Multiple pictograph cataloged procedure is added in the process, solves Text region accuracy in the prior art and promotes difficult technology Problem.

Above description is only the general introduction of disclosed technique scheme, in order to better understand the technological means of the disclosure, and It can be implemented in accordance with the contents of the specification, and to allow the above and other objects, features and advantages of the disclosure can be brighter Show understandable, it is special below to lift preferred embodiment, and cooperate attached drawing, detailed description are as follows.

Detailed description of the invention

Fig. 1 is the flow diagram according to the character recognition method of an embodiment of the present disclosure；

Fig. 2 is according to the image-region comprising text oriented in the character recognition method of an embodiment of the present disclosure Schematic diagram；

Fig. 3 is the schematic diagram merged according to the classification of the character recognition method of an embodiment of the present disclosure and text；

Fig. 4 is the schematic diagram according to a kind of training method of Text region model of an embodiment of the present disclosure；

Fig. 5 is the structural schematic diagram according to the character recognition device of an embodiment of the present disclosure；

Fig. 6 is the structural schematic diagram of the electronic equipment provided according to the embodiment of the present disclosure.

Specific embodiment

Illustrate embodiment of the present disclosure below by way of specific specific example, those skilled in the art can be by this specification Disclosed content understands other advantages and effect of the disclosure easily.Obviously, described embodiment is only the disclosure A part of the embodiment, instead of all the embodiments.The disclosure can also be subject to reality by way of a different and different embodiment It applies or applies, the various details in this specification can also be based on different viewpoints and application, in the spirit without departing from the disclosure Lower carry out various modifications or alterations.It should be noted that in the absence of conflict, the feature in following embodiment and embodiment can To be combined with each other.Based on the embodiment in the disclosure, those of ordinary skill in the art are without creative efforts Every other embodiment obtained belongs to the range of disclosure protection.

It should be noted that the various aspects of embodiment within the scope of the appended claims are described below.Ying Xian And be clear to, aspect described herein can be embodied in extensive diversified forms, and any specific structure described herein And/or function is only illustrative.Based on the disclosure, it will be understood by one of ordinary skill in the art that one described herein Aspect can be independently implemented with any other aspect, and can combine the two or both in these aspects or more in various ways. For example, carry out facilities and equipments in terms of any number set forth herein can be used and/or practice method.In addition, can make With other than one or more of aspect set forth herein other structures and/or it is functional implement this equipment and/or Practice the method.

It should also be noted that, diagram provided in following embodiment only illustrates the basic structure of the disclosure in a schematic way Think, component count, shape and the size when only display is with component related in the disclosure rather than according to actual implementation in schema are drawn System, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its assembly layout kenel can also It can be increasingly complex.

In addition, in the following description, specific details are provided for a thorough understanding of the examples.However, fields The skilled person will understand that the aspect can be practiced without these specific details.

The embodiment of the present disclosure provides a kind of character recognition method.The character recognition method provided in this embodiment can be by One computing device executes, and the computing device can be implemented as software, or be embodied as the combination of software and hardware, the meter It calculates device and can integrate and be arranged in server, terminal device etc..As shown in Figure 1, the character recognition method mainly includes such as Lower step S101 to step S106.Wherein:

Step S101: the image-region including text is obtained from original image；

In the disclosure, described that original image is obtained from image source, wherein described image source be local storage space or Person's network storage space, it is described from image source obtain original image, including from local storage space obtain original image or Original image is obtained from network storage space, no matter wherefrom obtains original image, and first choice needs to obtain depositing for original image Address is stored up, obtains original image from the storage address later.

Described image source can also be imaging sensor, described to obtain original image from image source, including from image sensing Original image is acquired in device.Described image sensor refers to the various equipment that can acquire image, and typical imaging sensor is to take the photograph Camera, camera, camera etc..In this embodiment, described image sensor can be the camera on mobile terminal, such as intelligence The original image of preposition or rear camera on energy mobile phone, camera acquisition can be directly displayed at the display screen of mobile phone On, in this step, video captured by imaging sensor is obtained, for further identifying the text in image in next step.

It in the disclosure, include text in the original image, in a typical application, user uses mobile terminal Camera shooting environmental in object, may include text on the object in the environment, the object in the environment can be with It is books, road sign, signboard etc..In another typical application, described image is the video frame in video, the video frame In include subtitle in text or video in video on object.

In this step, described that the image-region including text is obtained from original image, may include:

Original image is pre-processed to obtain pretreatment image；

The image-region comprising text is oriented from the pretreatment image.

In one embodiment, the pretreatment includes being removed dryness to original image, at slant correction and various filtering Reason, if input picture is grayscale or color image, can also carry out binary conversion treatment.The binaryzation of image is exactly by gray scale Image is converted into the process of only black and white two color value images.Binaryzation also belongs to the image segmentation skill in image procossing Art, image segmentation mainly have threshold value, edge detection and region to increase three categories method.Most common of them is threshold method, threshold When value is exactly binaryzation, the threshold value of prospect and background is distinguished, the pixel less than or equal to threshold value belongs to prospect, and others belong to back Scape.Binaryzation is generally divided into two class of global binaryzation and local binarization, wherein global binaryzation uses static threshold, according to whole The statistical nature of width image does binary conversion treatment using the same threshold value, and the method for typical overall situation binaryzation includes: maximum kind Between variance method, the threshold method based on entropy, cluster threshold method, fuzzy binary images；Local binarization uses dynamic threshold, is according to picture The binary processing method of the feature selecting difference threshold value of vegetarian refreshments neighborhood, the determination of each pixel or a pocket threshold value Be with its surrounding pixel in relation to and with the pixel of other positions it is unrelated.

After being pre-processed to obtain pretreatment image to image, the figure of text is oriented from the pretreatment image As region.Need to carry out text image cutting at this time, so-called cutting is exactly to divide to extract single word from entire image The process of image is accorded with, the method for cutting can be based on several combinations in following tactful or following strategy:

1, classical cutting method: the classics cutting method is also referred to as standard cutting method, it is according to text image itself Possessed some attributive character, such as wide, high, baseline position, are cut into single character picture.Main classics cutting method Including spacing method, syncopation based on projection properties analysis etc..

2, connected region domain method: connected region domain method is first to find out all pixel regions that is connected, then pass through analysis connected domain Relationship between itself feature and connected domain, this method are suitable for the cutting of western language handwritten form more.

3, based on the cutting method of identification: this method firstly generates multiple non-deterministic cuttings it is assumed that then identifying, in conjunction with It is that classifier provides as a result, using recognition confidence, syntax and semantics analyze the methods of uncertain cutting result is repaired It just and selects, obtains optimal cutting result.

4, whole syncopation: whole syncopation is that a word is identified as a whole, is based on pre- The cutting of the text image of word composition in " dictionary library " of the range first defined.

By the process of above-mentioned cutting, the character segmentation in the pretreatment image is come out, the figure of parallel synthesis text As region.As shown in Fig. 2, include " I am Chinese " five words in described image, by after S101, described " I am The image-region of Chinese " is decided to be and outlines.Identification step later is all based on the image district of the text oriented Domain.

It is understood that the method for obtaining the image-region including text is not limited to side cited in above-mentioned steps Method only lists the used method in traditional OCR in above-mentioned steps, can also actually use the algorithm of deep learning The image-region of the text is positioned, typical such as target detection method, details are not described herein, any to position text The method of the image-region of word can use in this step.

Step S102: the characteristics of image that text is extracted from described image region generates character features image；

In the disclosure, the characteristics of image that text is extracted from described image region generates character features image, can To include:

Described image region is inputted into convolutional neural networks；

The character features image of C*H*W size is exported by the convolutional neural networks, wherein C is character features image Port number, C >=1, H are the height of character features image, and H >=1, W are the width of character features image, W >=1.

Wherein the convolutional neural networks can only include input layer and convolutional layer, can also include pond layer, optional , the image-region for the text that step S101 center is selected is entered the input layer of the convolutional neural networks, passes through convolution later After the convolution of layer, the image-region of the text is converted into the character features image of a C*1*W size, wherein the C It is related with the quantity of convolution kernel used in the last layer convolutional layer for the port number of character features image, C be more than or equal to 1 positive integer；W is the width of character features image, and W is also the positive integer more than or equal to 1；1 in above-mentioned C*1*W is special for text The height of image is levied, in other words above-mentioned convolutional neural networks extract the characteristics of image in character image region by convolutional layer, Generating a height is 1, width W, and port number is the character features image of C.Typically, such as input picture size is 1* 32*1024, image herein are 1 channel, it is assumed that the image of binaryzation or the image of gray processing are used, if it is colour Image, generally 3 channels.Assuming that the convolutional neural networks include input layer, the first convolutional layer, the second convolutional layer, third volume Lamination and pond layer, wherein the first convolutional layer includes the convolution kernel of 3 5*5, stepping 1, then the input of the second volume layer is one The characteristic image of 3* (32-5+1) * (1024-5+1)=3*28*1020, the second convolutional layer include the convolution kernel of 16 7*7, then and The input of three convolutional layers is 16* (28-7+1) * (1020-7+1)=16*22*1014 characteristic image, third convolutional layer packet The convolution kernel of 128 15*15 is included, then the input of pond layer is 128* (22-15+1) * (1014-15+1)=128*8* 1000 characteristic image, pond layer are that window size is 8*8, and the maximum pond layer that stepping is 8, then above-mentioned convolutional neural networks pass through It crosses pond layer and obtains the characteristic image of a 128*1*125 later.It should be understood that the structure of above-mentioned convolutional neural networks is only It is citing, can actually be designed to any structure as needed, herein just to illustrates that convolutional neural networks can be passed through Feature is extracted to the image-region of text and generates character features image.

Step S103: the character features image is subjected to the first coding and generates the first coded image；

In the disclosure, first coding can by LSTM network implementations, it is described by the character features image into The coding of row first generates the first coded image, comprising:

The character features image is inputted into the first LSTM network；

The first LSTM network exports first coded image.

It is the characteristic image of a 128*1*125, wherein 128 are characterized by taking the characteristic image in step S102 as an example The port number of image, 1 is characterized the height of image, and 125 are characterized the width of image, at this time using the characteristic image as one The time series that length is 125 inputs the first LSTM network, false using each 128*1 as the input in LSTM each time point If the final output of the first LSTM network is the characteristic image of a 128*1*256, this feature image is the first coded image, Due to the LSTM network used, the input of LSTM network includes the output of its last moment, therefore the network has memory, can To remember the contextual information of text.

Step S104: first coded image is subjected to the second coding at least once and generates the second coded image；

In the disclosure, second coding can be realized by a go-between, wherein described encode described first Image carries out the second coding at least once and generates the second coded image, comprising: first coded image is inputted into go-between, The go-between includes at least one layer of LSTM network；The go-between exports second coded image.

In this step, the go-between can be formed by least one layer of LSTM network, and specifically the middle layer is defeated The dimension for entering layer should be identical as the output dimension of characteristic image of the first LSTM network, such as in the example of step S103, the The size of the output characteristic image of one LSTM network is 128*1*256, then the go-between length of time series is 256, by the The input of go-between of each column 128*1 of one coded image as each time point.Go-between includes at least one LSTM network, it is possible to understand that, more LSTM networks can learn and understand more semantic informations, can by go-between More features is arrived with study, reinforces the accuracy in final Text region.The step to image carry out first coding after, Further progress second encodes, and generates the second coded image, and second coded image includes the semantic letter of more texts Breath.

Step S105: generation decoding image is decoded to second coded image；

In the disclosure, the decoding can be completed by decoding LSTM network, described to second coded image It is decoded generation decoding image, comprising:

Second coded image is inputted into decoding LSTM network；

The decoding LSTM network output decoding image.

In this step, the decoding is actually still by the way that the output result of go-between is passed through another LSTM network generates the characteristic image for classification.For decoding LSTM network, the dimension for being only required to its input is equal to The dimension of go-between output, the dimension of output are equal to the processing dimension after it.If the output of go-between is one The characteristic image of 256*1*256, i.e. the second coded image, then the length of time series is 256, can be according to the place of next step Reason attribute decodes the output of LSTM network to design, so that processing step later uses decoding image.Specifically, the decoding The output of LSTM can be the characteristic image of a 128*1*256.

Step S106: the characteristics of image in the decoding image is classified to identify the text.

In the disclosure, the characteristics of image by the decoding image is classified to identify the text, packet It includes:

The decoding image is inputted into the first fully-connected network；

Text classification included in the first fully-connected network output decoding image；

The text in described image region is identified according to the text classification.

In this step, characteristic image obtained in step S105 is entered and left into a fully-connected network, such as in step S105 Obtained in decoding image be 128*1*256, then the fully-connected network can be designed as include 128*256=32768 input, Using every 128 input as one group (pixel of the 1*1 on namely each channel is as one group) by entirely connect be mapped to it is N number of In output, wherein N is the categorical measure of the text to be classified, and for Chinese character, commonly used word probably has 6000, that is to say, that At least need 6000 outputs.The output of every N number of fully-connected network is calculated into each lead to by the activation of softmax function Text representated by road, using the maximum text of softmax functional value as the text identified.

Further, described that text in described image region is identified according to the text classification, can also include:

The adjacent text for being identified as the same text classification is merged into the same text；

It is exported amalgamation result as recognition result.

As shown in figure 3, the recognition result in the decoding image of the 128*1*256 is as shown, can be front and back phase Word in adjacent segmentation block may be identified as the same word, can merge adjacent two word between two spaces at this time For a word, to form final recognition result.

As shown in figure 4, for the training method of the Text region model in the disclosure, the character recognition method of the disclosure can be with It is executed by Text region model, the Text region model needs are trained in advance, wherein the Text region model Training method, comprising:

Step S401: the parameter of initialization Text region model, wherein including a convolution in the Text region model Neural network, at least three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, LSTM net The parameter of network and fully-connected network；

Step S402: obtaining training image from training set, includes the class of text and text in the training image It does not mark；

Step S403: the training image is exported into a character features image by the convolutional neural networks；

Step S404: the character features image is exported into a decoding image by at least three LSTM network；

Step S405: the decoding image is exported into the text classification in the training image by the full articulamentum；

Step S406: the loss of the Text region model is calculated according to the text classification and the classification of text mark The value of function；

Step S407: the parameter of the Text region model is adjusted until the loss letter according to the value of the loss function Several values is minimum.

In the Text region model in the disclosure, including at least three parts, one is characteristic extraction part, this portion Divide and is completed by convolutional neural networks；Semantics recognition part, the part are completed by least three LSTM networks；And classified part, It is completed by a fully-connected network part.

The parameter of initialization Text region model in step S401, includes above-mentioned convolutional neural networks；LSTM network And the parameter of fully-connected network, specifically, the parameter of the convolutional neural networks includes at least used in each convolutional layer Quantity, the size of convolution kernel, the weighted value in convolution kernel and size of pond window of convolution kernel etc.；The LSTM network Parameter includes at least the weight value matrix in LSTM；The parameter of fully-connected network includes at least every layer of full connection weight coefficient etc. These parameters can be randomly generated Deng, the initialization to be also possible to read initial specified parameter, details are not described herein.

After initiation parameter, in step S402, training set is inputted into the Text region model, wherein described Training set is combined into the training atlas including text and label character, wherein the label character can be implemented as cutting mark, Realize that by training atlas cutting be finally mark corresponding with prediction result form, if prediction result is finally by the image of input Cutting is 256 parts on the width, then the label character needs to be processed into the mark that cutting on the width is 256 parts, i.e., more The image of part width can be labeled as the same text.

In step S403- step S405, the image in the training set is obtained by the all-network in verbal model To a prediction result, this process is identical as common training process, repeats no more.

In step S406, the predicted value and the mark value are brought into and calculate penalty values in loss function, loss function Setting any suitable loss function can be used, be not the emphasis of the disclosure, repeat no more.

Step S407 adjusts the parameter in Text region model according to the penalty values that the loss function calculates, and again New penalty values are obtained by the verbal model after training set and adjusting parameter, are repeated the above process up to loss function Until value is minimum.

The disclosure further includes a kind of method for carrying out Text region using above-mentioned Text region model, comprising:

Original image is obtained, includes text in the original image；

The original image is pre-processed to obtain the image-region including the text；

By described image region input Text region model as obtained from the training of above-mentioned Text region model training method；

The Text region model exports the type of the text.

The process is the prediction process of above-mentioned Text region model, and it is shown in FIG. 1 specifically to predict that details can refer to Process, details are not described herein.

Hereinbefore, although describing each step in above method embodiment, this field skill according to above-mentioned sequence Art personnel it should be clear that the step in the embodiment of the present disclosure not necessarily executes in the order described above, can also with inverted order, it is parallel, Other sequences such as intersection execute, moreover, those skilled in the art can also add other steps on the basis of above-mentioned steps Suddenly, the mode of these obvious variants or equivalent replacement should also be included within the protection scope of the disclosure, and details are not described herein.

It is below embodiment of the present disclosure, embodiment of the present disclosure can be used for executing embodiments of the present disclosure realization The step of, for ease of description, part relevant to the embodiment of the present disclosure is illustrated only, it is disclosed by specific technical details, it asks Referring to embodiments of the present disclosure.

The embodiment of the present disclosure provides a kind of character recognition device.Described device can execute above-mentioned character recognition method and implement The step of described in example.As shown in figure 5, described device 500 specifically includes that image-region identification module 501, character features image Generation module 502, the first coded image generation module 503, the second coded image generation module 504, decoding image generation module 505 and first categorization module 506.Wherein,

Image-region identification module 501, for obtaining the image-region including text from original image；

Character features image generation module 502, the characteristics of image for extracting text from described image region generate text Word characteristic image；

First coded image generation module 503 generates the first volume for the character features image to be carried out the first coding Code image；

Second coded image generation module 504 is given birth to for first coded image to be carried out the second coding at least once At the second coded image；

Image generation module 505 is decoded, for being decoded generation decoding image to second coded image；

First categorization module 506, for classifying the characteristics of image in the decoding image to identify the text Word.

Further, the character features image generation module 502, further includes:

Further, the first coded image generation module 503, further includes:

Further, the second coded image generation module 504, further includes:

Go-between output module exports second coded image for the go-between.

Further, the decoding image generation module 505, further includes:

Further, first categorization module 506, further includes:

Further, the Text region module, further includes:

Fig. 5 shown device can execute Fig. 1-embodiment illustrated in fig. 3 method, the part that the present embodiment is not described in detail, It can refer to Fig. 1-embodiment illustrated in fig. 3 related description.The implementation procedure and technical effect of the technical solution are referring to Fig. 1- Description in embodiment illustrated in fig. 3, details are not described herein.

A kind of training device of the embodiment of the present disclosure also for Text region model, comprising:

The embodiment of the present disclosure is also for a kind of character recognition device, comprising:

Output module exports the type of the text for the Text region model.

Below with reference to Fig. 6, it illustrates the structural representations for the electronic equipment 600 for being suitable for being used to realize the embodiment of the present disclosure Figure.Electronic equipment in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting and connect Receive device, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle Carry navigation terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Electricity shown in Fig. 6 Sub- equipment is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.

As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM 603 pass through the phase each other of bus 604 Even.Input/output (I/O) interface 605 is also connected to bus 604.

In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, figure As the input unit 606 of sensor, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaking The output device 607 of device, vibrator etc.；Storage device 608 including such as tape, hard disk etc.；And communication device 609.It is logical T unit 609 can permit electronic equipment 600 and wirelessly or non-wirelessly be communicated with other equipment to exchange data.Although Fig. 6 shows The electronic equipment 600 with various devices is gone out, it should be understood that being not required for implementing or having all dresses shown It sets.It can alternatively implement or have more or fewer devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, the computer program includes the program code for method shown in execution flow chart.Such In embodiment, the computer program can be downloaded and installed from network by communication device 609, or from storage device 608 are mounted, or are mounted from ROM 602.When the computer program is executed by processing unit 601, the disclosure is executed The above-mentioned function of being limited in the method for embodiment.

It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, described program can be commanded execution system, device or device use or in connection.And In the disclosure, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Any computer-readable medium other than storage medium, the computer-readable signal media can send, propagate or transmit For by the use of instruction execution system, device or device or program in connection.It is wrapped on computer-readable medium The program code contained can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc., or Above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.

Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are described When electronic equipment executes, so that the electronic equipment: obtaining the image-region including text from original image；From described image The characteristics of image that text is extracted in region generates character features image；The character features image is subjected to the first coding and generates the One coded image；First coded image is subjected to the second coding at least once and generates the second coded image；To described second Coded image is decoded generation decoding image；Characteristics of image in the decoding image is classified to identify the text Word.

Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are described When electronic equipment executes, so that the electronic equipment: the parameter of initialization Text region model, wherein the Text region model In include a convolutional neural networks, at least three LSTM networks and a fully-connected network, the parameter includes the convolution The parameter of neural network, LSTM network and fully-connected network；Training image is obtained from training set, is wrapped in the training image Include the classification mark of text and text；The training image is exported into a character features figure by the convolutional neural networks Picture；The character features image is exported into a decoding image by at least three LSTM network；By the decoding image The text classification in the training image is exported by the full articulamentum；According to the text classification and the classification mark of text Note calculates the value of the loss function of the Text region model；The Text region model is adjusted according to the value of the loss function Parameter until the value of the loss function is minimum.

Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are described When electronic equipment executes, so that the electronic equipment: obtaining original image, include text in the original image；To the original Beginning image is pre-processed to obtain the image-region including the text；Described image region is inputted by above-mentioned Text region mould Text region model obtained from the training of type training method；The Text region model exports the type of the text.

The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart, Ke Yiyong The dedicated hardware based system of defined functions or operations is executed to realize, or can be referred to specialized hardware and computer The combination of order is realized.

Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions.

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of character recognition method, comprising:

The image-region including text is obtained from original image；

The characteristics of image that text is extracted from described image region generates character features image；

The character features image is subjected to the first coding and generates the first coded image；

First coded image is subjected to the second coding at least once and generates the second coded image；

Generation decoding image is decoded to second coded image；

Characteristics of image in the decoding image is classified to identify the text.

2. character recognition method as described in claim 1, wherein the image for extracting text from described image region Feature generates character features image, comprising:

Described image region is inputted into convolutional neural networks；

The character features image of C*H*W size is exported by the convolutional neural networks, wherein C is the channel of character features image Number, C >=1, H are the height of character features image, and H >=1, W are the width of character features image, W >=1.

3. character recognition method as described in claim 1, wherein described carry out the first coding for the character features image Generate the first coded image, comprising:

The character features image is inputted into the first LSTM network；

The first LSTM network exports first coded image.

4. character recognition method as described in claim 1, wherein described carry out first coded image at least once Second coding generates the second coded image, comprising:

First coded image is inputted into go-between, the go-between includes at least one layer of LSTM network；

The go-between exports second coded image.

5. character recognition method as described in claim 1, wherein described be decoded generation to second coded image Decode image, comprising:

Second coded image is inputted into decoding LSTM network；

The decoding LSTM network output decoding image.

6. character recognition method as described in claim 1, wherein the characteristics of image by the decoding image carries out Classification is to identify the text, comprising:

The decoding image is inputted into the first fully-connected network；

7. character recognition method as claimed in claim 6, wherein described identify described image area according to the text classification Text in domain, comprising:

It is exported amalgamation result as recognition result.

8. a kind of training method of Text region model, comprising:

The parameter of Text region model is initialized, wherein including convolutional neural networks, at least in the Text region model Three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, LSTM network and fully connected network The parameter of network；

The parameter of the Text region model is adjusted according to the value of the loss function until the value of the loss function is minimum.

9. a kind of character recognition method, comprising:

Original image is obtained, includes text in the original image；

By described image region input Text region model as obtained from method according to any one of claims 8 training；

The Text region model exports the type of the text.

10. a kind of character recognition device, comprising:

Character features image generation module, the characteristics of image for extracting text from described image region generate character features figure Picture；

First coded image generation module generates the first coded image for the character features image to be carried out the first coding；

Second coded image generation module generates the second volume for first coded image to be carried out the second coding at least once Code image；

11. a kind of training device of Text region model, comprising:

Parameter initialization module, for initializing the parameter of Text region model, wherein including one in the Text region model A convolutional neural networks, at least three LSTM networks and a fully-connected network, the parameter include the convolutional neural networks, The parameter of LSTM network and fully-connected network；

Training image obtains module, for obtaining training image from training set, include in the training image text and The classification of text marks；

Coding/decoding module, for the character features image to be exported a decoding figure by at least three LSTM network Picture；

Second categorization module, for the decoding image to be exported the text class in the training image by the full articulamentum Not；

Error calculating module, for calculating the Text region model according to the classification of the text classification and text mark The value of loss function；

Module is adjusted, for adjusting the parameter of the Text region model according to the value of the loss function until the loss letter Several values is minimum.

12. a kind of character recognition device, comprising:

Input module, for Text region obtained from training the input of described image region as method according to any one of claims 8 Model；

Output module exports the type of the text for the Text region model.

13. a kind of electronic equipment, comprising:

Memory, for storing computer-readable instruction；And

Processor, for running the computer-readable instruction, so that realizing according to claim 1-7 when the processor is run Any one of described in character recognition method.

14. a kind of electronic equipment, comprising:

Memory, for storing computer-readable instruction；And

Processor, for running the computer-readable instruction, so that realizing when the processor is run according to claim 8 institute The training method for the Text region model stated.

15. a kind of electronic equipment, comprising:

Memory, for storing computer-readable instruction；And

Processor, for running the computer-readable instruction, so that realizing when the processor is run according to claim 9 institute The character recognition method stated.

16. a kind of non-transient computer readable storage medium, for storing computer-readable instruction, when the computer-readable finger When order is executed by computer, so that the computer perform claim requires character recognition method described in any one of 1-7.

17. a kind of non-transient computer readable storage medium, for storing computer-readable instruction, when the computer-readable finger When order is executed by computer, so that the computer perform claim requires the training method of Text region model described in 8.

18. a kind of non-transient computer readable storage medium, for storing computer-readable instruction, when the computer-readable finger When order is executed by computer, so that the computer perform claim requires character recognition method described in 9.