CN110008961A

CN110008961A - Text real-time identification method, device, computer equipment and storage medium

Info

Publication number: CN110008961A
Application number: CN201910256927.4A
Authority: CN
Inventors: 张欢; 李爱林; 张仕洋
Original assignee: Shenzhen Huafu Information Technology Co Ltd
Current assignee: Shenzhen Huafu Information Technology Co Ltd
Priority date: 2019-04-01
Filing date: 2019-04-01
Publication date: 2019-07-12
Anticipated expiration: 2039-04-01
Also published as: CN110008961B

Abstract

The present invention relates to text real-time identification method, device, computer equipment and storage medium, this method includes obtaining images to be recognized data；Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result；The recognition result is aligned using CTC loss function, to obtain character string；Wherein, the Text region model is resulting as sample data training convolutional neural networks by the image data of tape identification.The present invention carries out Text region by the way that images to be recognized data are input in Text region model, in the training process to verbal model, by using convolutional calculation, in conjunction with pond layer it is down-sampled and batch standardization layer and lose layer accelerate convergence rate, improve stability, prevent over-fitting, change convolution kernel, to reduce calculation amount, realization can both guarantee to identify text with low power, can also improve the rate of Text region.

Description

Text real-time identification method, device, computer equipment and storage medium

Technical field

The present invention relates to character recognition methods, more specifically refer to text real-time identification method, device, computer equipment And storage medium.

Background technique

Text detection process be include String localization and text identification, existing character identification system mostly uses traditional meter Calculation machine vision algorithm does not use neural network, and accuracy rate is lower, needs preparatory Character segmentation mostly, and the error of segmentation will be into One step influences identification, and concrete scheme is to carry out Character segmentation, the character split is classified respectively, after then carrying out Reason is by all Connection operators identified at final recognition result.Identification is divided into two steps by such algorithm, and first The error that step generates is intended only as intermediate steps, does not need segmentation result centainly, and the error divided can travel to Next step will seriously affect the accuracy of monocase classification, to influence final recognition effect.

In addition, also having new recognition methods at present, Text region mould is gone out using the good neural metwork training of current effect Type identifies text using the model, and generally, line of text identification is a sequence to sequence problem, that is, inputs picture Information, that is, pixel sequence exports a text sequence, and the RNN model based on LSTM is due to good Series Modeling ability at this time, It can be very good to solve the problems, such as such sequence, however from power consumption and speed, relative to convolution, LSTM is very unfavorable In mobile terminal deployment.And sequence of pictures it is born without time-dependent relation, with heavy LSTM modeling be not it is unique most Good selection, neural network Text region need to expend a large amount of computing resource mostly, cannot be detached from the environment of cloud.

Therefore, it is necessary to design a kind of new method, realization can both guarantee to identify text with low power, can also improve The rate of Text region.

Summary of the invention

It is an object of the invention to overcome the deficiencies of existing technologies, text real-time identification method, device, computer are provided and set Standby and storage medium.

To achieve the above object, the invention adopts the following technical scheme: text real-time identification method includes:

Obtain images to be recognized data；

Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result；

The recognition result is aligned using CTC loss function, to obtain character string；

Wherein, the Text region model is the image data by tape identification as sample data training convolutional nerve net Network is resulting.

Its further technical solution are as follows: the Text region model is the image data by tape identification as sample data Training convolutional neural networks are resulting, comprising:

Construct loss function and convolutional neural networks；

The image data of tape identification is obtained, to obtain sample data；

Sample data is inputted in convolutional neural networks and carries out convolutional calculation, to obtain sample output result；

In the image data entrance loss function that sample is exported to result and tape identification, to obtain penalty values；

The parameter of convolutional neural networks is adjusted according to penalty values；

Convolutional neural networks are learnt using sample data and using deep learning frame, to obtain Text region mould Type.

Its further technical solution are as follows: described input sample data in convolutional neural networks carries out convolutional calculation, with Result is exported to sample, comprising:

The process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result；

Maximum pondization processing is carried out to the first output result, to obtain the second output result；

Intersection process of convolution is carried out to the second output result, to obtain third output result；

The processing of mean value pondization is carried out to third output result, to obtain the 4th output result；

The process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, it is defeated to obtain the 5th Result out；

4th output result and the 5th output result are spliced, to obtain the 6th output result；

6th output result is subjected to intersection process of convolution, to obtain the 7th output result；

7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result；

Intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result；

Maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result；

Abandon to the 9th output result the adjacent area processing of layer characteristic pattern, to obtain the tenth output result；

The processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result；

Tenth output result and the 11st output result are spliced, to obtain the 12nd output result；

Intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result；

The process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result；

Abandon to the 14th output result the adjacent area processing of layer characteristic pattern, to obtain the 15th output result；

The process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result；

Global pool processing is carried out to the 16th output result, to obtain the 17th output result；

17th output result is connected entirely, to obtain the 18th output result；

Tiling processing is carried out to the 18th output result, to obtain the 19th output result；

19th output result and the 16th output result are spliced, to obtain the 20th output result；

The process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output result.

Its further technical solution are as follows: it is described that intersection process of convolution is carried out to the second output result, to obtain third output As a result, comprising:

The process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS；

The process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting；

The process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times；

The process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.

Its further technical solution are as follows: it is described that the processing of mean value pondization is carried out to third output result, to obtain the 4th output As a result, comprising:

Pixel adjacent in third output result is averaged, to obtain the 4th output result.

Its further technical solution are as follows: it is described that the recognition result is aligned using CTC loss function, to obtain character string Later, further includes:

Output character sequence.

The present invention also provides text real-time distinguishing apparatus, comprising:

Data capture unit, for obtaining images to be recognized data；

Recognition unit carries out Text region for images to be recognized data to be input in Text region model, to obtain Recognition result；

Alignment unit, for being aligned the recognition result using CTC loss function, to obtain character string.

Its further technical solution are as follows: state device further include:

Training unit, for resulting as sample data training convolutional neural networks by the image data of tape identification, To obtain Text region model.

The present invention also provides a kind of computer equipment, the computer equipment includes memory and processor, described to deposit Computer program is stored on reservoir, the processor realizes above-mentioned method when executing the computer program.

The present invention also provides a kind of storage medium, the storage medium is stored with computer program, the computer journey Sequence can realize above-mentioned method when being executed by processor.

Compared with the prior art, the invention has the advantages that: the present invention is by being input to text for images to be recognized data Text region is carried out in identification model, in the training process to verbal model, by using convolutional calculation, is dropped in conjunction with pond layer Sampling and batch standardization layer and loss layer accelerate convergence rate, improve stability, prevent over-fitting, change convolution kernel, to subtract Few calculation amount, realizing can both guarantee to identify text with low power, can also improve the rate of Text region.

The invention will be further described in the following with reference to the drawings and specific embodiments.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the application scenarios schematic diagram of text real-time identification method provided in an embodiment of the present invention；

Fig. 2 is the flow diagram of text real-time identification method provided in an embodiment of the present invention；

Fig. 3 is the sub-process schematic diagram of text real-time identification method provided in an embodiment of the present invention；

Fig. 4 is the sub-process schematic diagram of text real-time identification method provided in an embodiment of the present invention；

Fig. 5 is the sub-process schematic diagram of text real-time identification method provided in an embodiment of the present invention；

Fig. 6 is the schematic diagram provided in an embodiment of the present invention for intersecting process of convolution；

Fig. 7 is the schematic diagram of equalization provided in an embodiment of the present invention processing；

Fig. 8 be another embodiment of the present invention provides text real-time identification method flow diagram；

Fig. 9 is the schematic block diagram of text real-time distinguishing apparatus provided in an embodiment of the present invention；

Figure 10 be another embodiment of the present invention provides text real-time distinguishing apparatus schematic block diagram；

Figure 11 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

Fig. 1 and Fig. 2 are please referred to, Fig. 1 is that the application scenarios of text real-time identification method provided in an embodiment of the present invention are illustrated Figure.Fig. 2 is the schematic flow chart of text real-time identification method provided in an embodiment of the present invention.The text real-time identification method is answered For in server, the server and terminal progress data interaction to be shot to obtain images to be recognized data by terminal, and will be wait know Other image data is transmitted to server, Text region is carried out to it by the Text region model in server, to recognition result To obtain true character string, i.e. text information after being aligned, the text information can be transmitted to terminal or be believed with text Breath controlling terminal makes corresponding response.

Fig. 2 is the flow diagram of text real-time identification method provided in an embodiment of the present invention.As shown in Fig. 2, this method Include the following steps S110 to S130.

S110, images to be recognized data are obtained.

In the present embodiment, images to be recognized data, which refer to, shoots resulting image data by terminal, or is also possible to The modes such as scanning obtain resulting image data.

S120, it images to be recognized data is input in Text region model carries out Text region, to obtain recognition result.

In the present embodiment, recognition result refers to that length is about the probability sequence of 50 to 200 character.

In one embodiment, referring to Fig. 3, above-mentioned Text region model training step may include step S121~ S126。

S121, building loss function and convolutional neural networks.

In the present embodiment, building convolutional neural networks are to carry out convolutional calculation to image data, to reach classification and mesh The effect of position is demarcated, each network is required to carry out penalty values calculating, the penalty values generation using loss function in the training process Gap between the result and actual result of table output, penalty values are smaller, then gap is smaller, show that the network training must be better, Vice versa.Convolutional neural networks are widely used in target detection, semantic segmentation, in the Computer Vision Tasks such as object classification, take Very good effect was obtained, shows its adaptability good for visual task.

S122, the image data for obtaining tape identification, to obtain sample data.

In the present embodiment, sample data refers to the image data with words identification, if the sample data is segmented into Dry training set and fraction test set, are trained convolutional neural networks using several training sets, to select loss It is worth lesser convolutional neural networks, is tested using test set.

S123, convolutional calculation will be carried out in sample data input convolutional neural networks, to obtain sample output result.

In the present embodiment, sample output result refers to probability sequence, that is, the text sequence number of sample data prediction.

In one embodiment, referring to Fig. 4, above-mentioned step S123 may include step S123a~S123v.

S123a, the process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result；

S123b, maximum pondization processing is carried out to the first output result, to obtain the second output result.

In the present embodiment, maximum pondization processing refers to the pixel maximum for reading image.

S123c, intersection process of convolution is carried out to the second output result, to obtain third output result.

In the present embodiment, referring to Fig. 5, above-mentioned step S123c may include step S123c1~S123c4.

S123c1, the process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS；

S123c2, the process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting；

S123c3, the process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times；

S123c4, the process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.

As shown in fig. 6, the figure middle section is convolution kernel, currently used is usually the convolution kernel of 3*3, middle layer volume Product core can with do multiplication on the feature mutual relation of front, calculation amount is larger, use convolution kernel be 1*3 and 3*1 superposition convolution at It is 3*3 process of convolution that reason, which replaces convolution kernel, and forms bottleneck with the process of convolution that convolution kernel is 1*1 in front and back, reduces calculation amount.

S123d, the processing of mean value pondization is carried out to third output result, to obtain the 4th output result.

Specifically, pixel adjacent in third output result is averaged, to obtain the 4th output result.

When due to splicing, each feature resolution is different, and the picture of big resolution ratio is aligned with the mode in mean value pond Information, so-called mean value pond, that is, be averaged in adjacent pixel, to reduce average value, as shown in Figure 7.

S123e, the process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, to obtain 5th output result；

S123f, the 4th output result and the 5th output result are spliced, to obtain the 6th output result；

S123g, the 6th output result is subjected to intersection process of convolution, to obtain the 7th output result；

S123h, the 7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result；

S123i, intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result；

S123j, maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result；

S123k, the adjacent area for abandon layer characteristic pattern to the 9th output result are handled, to obtain the tenth output knot Fruit；

S123l, the processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result；

S123m, the tenth output result and the 11st output result are spliced, to obtain the 12nd output result；

S123n, intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result；

S123o, the process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result；

S123p, the adjacent area for abandon layer characteristic pattern to the 14th output result are handled, to obtain the 15th output As a result；

S123q, the process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result；

S123r, global pool processing is carried out to the 16th output result, to obtain the 17th output result；

S123s, the 17th output result is connected entirely, to obtain the 18th output result；

S123t, tiling processing is carried out to the 18th output result, to obtain the 19th output result；

S123u, the 19th output result and the 16th output result are spliced, to obtain the 20th output result；

S123v, the process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output knot Fruit.

Repeatedly connection shallow-layer and further feature, extract the feature of image sequence.The spy for being extracted network early period using splicing The feature that the process of convolution that sign i.e. shallow-layer feature is constantly and this feature passes through intersection convolution or convolution kernel is 3*3 is extracted later is being led to Road dimension is spliced, and (takes Chinese characters in common use number, for 8500), width is that W (can be with to obtain the classification number of a length of character to be identified The number being set as between 50 to 200) characteristic pattern, by characteristic pattern along width cut, obtain length be W characteristic sequence, that is, Probability sequence.

Above-mentioned intersection process of convolution is first to carry out convolution kernel to be the process of convolution of 1*1, then carrying out convolution kernel is 1*1's Process of convolution, then carries out the process of convolution that convolution kernel is 3*1, finally carries out the process of convolution that convolution kernel is 1*1.Using convolution It calculates, it is down-sampled in conjunction with pond layer, and criticize standardization layer and lose layer and accelerate convergence rate, raising stability prevented from intending It closes, random drop feature is effective to full articulamentum, and experiment shows but without so effective for convolutional layer, therefore uses newest The discarding mode for convolutional layer, enhance network robustness.

In convolutional network finally, using big transverse direction and lateral convolution kernel (1*8 and 8*1), small calculation amount is being kept Simultaneously as convolution kernel horizontal and vertical (being all 8) is very long, this will well using between lateral position between lengthwise position Related information compensates for instead of the ability of LSTM processing image lateral position feature and character string feature and lacks LSTM band The influence come.LSTM is mainly used in speech processes originally, and the fields such as natural language processing, it can processing sequence input well To the problem of sequence output, Text region task, due to that can be sequence of pictures by picture segmentation, output be also word sequence, Therefore the framework that its sequence can be used to sequence is handled, however from voice and unlike, the natural only left and right knot of picture Structure, there is no dependences as voice for the sequence relation of text picture from left to right, therefore use long core process of convolution text Identification picture can preferably substitute LSTM network.

S124, sample is exported in the image data entrance loss function of result and tape identification, to obtain penalty values；

S125, the parameter that convolutional neural networks are adjusted according to penalty values；

S126, convolutional neural networks are learnt using sample data and using deep learning frame, to obtain text Identification model.

By continuously adjusting the parameter of convolutional neural networks, and repeatedly learnt and trained, to be met the requirements Convolutional neural networks, specifically using tensorflow training, after being converted to corresponding Text region model, pass through Tensorflow tflite and tensorflow mace have very easily been deployed on server or terminal.It is not only It supports common controller to run, opencl (full name open computing language, open operation language) can also be passed through Accelerate in the enterprising line control unit of relevant device.

Obtained Text region model, single forward calculation only have about 0.22Gflops, and forward calculation can be located in real time Manage a large amount of Text region tasks.Eliminate complicated RNN (Recognition with Recurrent Neural Network, Recurrent Neural Network) model Power and memory requirements are largely calculated on embedded device, in addition, the Text region algorithm of reality actual use can be put into Need to face the fuzzy of picture, a series of problems, such as illumination is bad, physical deformation etc..Fine and extensive text augmentation can be passed through And generation, this problem is carefully handled well, so that algorithm obtains extraordinary effect under reality scene, specific service test Fruit.

S130, the recognition result is aligned using CTC loss function, to obtain character string；

Text region model exports the probability sequence that a string length is about 50 to 200 character.Due to final purpose It is to obtain true character string, i.e. alphabetic character number in images to be recognized data, as being usually 7 digit sequences in license plate, It needs for the two to be aligned.Both alignment are gone using very more CTC loss functions in speech recognition, obtain character Sequence.

This method operates on Android device RK3399, to the Text region of several quasi-representatives, identifies 8 bit digitals, privately owned survey Examination, which collects, closes accuracy rate about 99.1%, and about 20 milliseconds of speed；Identify 14 Chinese characters, the accuracy rate on privately owned test set is about 98.8%, about 46 milliseconds of speed.

Above-mentioned text real-time identification method carries out text by the way that images to be recognized data are input in Text region model Word identification, it is down-sampled in conjunction with pond layer and batch standardize by using convolutional calculation in the training process to verbal model Layer and loss layer accelerate convergence rate, improve stability, prevent over-fitting, change convolution kernel, and to reduce calculation amount, realization both may be used To guarantee to identify text with low power, the rate of Text region can also be improved.

Fig. 8 be another embodiment of the present invention provides a kind of text real-time identification method flow diagram.Such as Fig. 8 institute Show, the text real-time identification method of the present embodiment includes step S210-S240.Wherein step S210-S230 and above-described embodiment In step S110-S130 it is similar, details are not described herein.The following detailed description of in the present embodiment increase step S240.

S240, output character sequence.

It will identify that resulting character string is exported to the character ordinal number that terminal show or according to output to correspond to Response, for example transfer corresponding data etc..

Fig. 9 is a kind of schematic block diagram of text real-time distinguishing apparatus 300 provided in an embodiment of the present invention.As shown in figure 9, Corresponding to the above text real-time identification method, the present invention also provides a kind of text real-time distinguishing apparatus 300.The text identifies in real time Device 300 includes the unit for executing above-mentioned text real-time identification method, which can be configured in server or terminal In.

Specifically, referring to Fig. 9, the text real-time distinguishing apparatus 300 includes:

Data capture unit 301, for obtaining images to be recognized data；

Recognition unit 302 carries out Text region for images to be recognized data to be input in Text region model, with To recognition result；

Alignment unit 303, for being aligned the recognition result using CTC loss function, to obtain character string.

In one embodiment, described device further include:

In one embodiment, the training unit includes:

Subelement is constructed, for constructing loss function and convolutional neural networks；

Sample data forms subelement, for obtaining the image data of tape identification, to obtain sample data；

Computation subunit carries out convolutional calculation for inputting sample data in convolutional neural networks, defeated to obtain sample Result out；

Penalty values obtain subelement, for sample to be exported to the image data entrance loss function of result and tape identification It is interior, to obtain penalty values；

Ginseng subelement is adjusted, for adjusting the parameter of convolutional neural networks according to penalty values；

Learn subelement, for learning using sample data and using deep learning frame to convolutional neural networks, To obtain Text region model.

In one embodiment, the computation subunit includes:

First convolution processing module, it is defeated to obtain first for carrying out the process of convolution that convolution kernel is 3*3 to sample data Result out；

First maximum pond module, for carrying out maximum pondization processing to the first output result, to obtain the second output knot Fruit；

Second convolution processing module, for carrying out intersection process of convolution to the second output result, to obtain third output knot Fruit；

First mean value pond module, for carrying out the processing of mean value pondization to third output result, to obtain the 4th output knot Fruit；

Third process of convolution module, for carrying out the process of convolution and intersection that convolution kernel is 3*3 to third output result Process of convolution, to obtain the 5th output result；

First splicing module, for splicing the 4th output result and the 5th output result, to obtain the 6th Export result；

Volume Four accumulates processing module, for the 6th output result to be carried out intersection process of convolution, to obtain the 7th output knot Fruit；

Second splicing module, for the 7th output result and the 4th output result to be carried out splicing, to obtain the 6th Export result；

5th process of convolution module, for carrying out intersection process of convolution to the 6th output result, to obtain the 8th output knot Fruit；

Second maximum pond module, for carrying out maximum pondization processing to the 8th output result, to obtain the 9th output knot Fruit；

First discard module, the adjacent area for abandon layer characteristic pattern to the 9th output result is handled, to obtain Tenth output result；

Second mean value pond module, for carrying out the processing of mean value pondization to the 7th output result, to obtain the 11st output As a result；

Third splicing module, for splicing the tenth output result and the 11st output result, to obtain the 12nd Export result；

6th process of convolution module, it is defeated to obtain the 13rd for carrying out intersection process of convolution to the 12nd output result Result out；

7th process of convolution module, for carrying out the process of convolution that convolution kernel is 3*3 to the 13rd output result, to obtain 14th output result；

Second discard module, the adjacent area for abandon layer characteristic pattern to the 14th output result are handled, with To the 15th output result；

8th process of convolution module, for carrying out the process of convolution that convolution kernel is 3*3 to the 15th output result, to obtain 16th output result；

Global pool module, for carrying out global pool processing to the 16th output result, to obtain the 17th output knot Fruit；

Full link block, for being connected entirely to the 17th output result, to obtain the 18th output result；

Tile module, for carrying out tiling processing to the 18th output result, to obtain the 19th output result；

4th splicing module, for splicing to the 19th output result and the 16th output result, to obtain second Ten output results；

9th process of convolution module, for carrying out the process of convolution that convolution kernel is 1*8 and 8*1 to the 20th output result, To obtain sample output result.

In one embodiment, the second convolution processing module includes:

Preliminary convolution submodule, it is preliminary to obtain for carrying out the process of convolution that convolution kernel is 1*1 to the second output result As a result；

Secondary convolution submodule, for carrying out the process of convolution that convolution kernel is 1*3 to PRELIMINARY RESULTS, to obtain secondary knot Fruit；

Cubic convolution submodule, for carrying out the process of convolution that convolution kernel is 3*1 to second fruiting, to be tied three times Fruit；

Four convolution submodules, for carrying out the process of convolution that convolution kernel is 1*1 to result three times, to obtain third output As a result.

Figure 10 be another embodiment of the present invention provides a kind of text real-time distinguishing apparatus 300 schematic block diagram.Such as figure Shown in 10, the text real-time distinguishing apparatus 300 of the present embodiment is to increase output unit 304 on the basis of above-described embodiment.

Output unit 304 is used for output character sequence.

It should be noted that it is apparent to those skilled in the art that, above-mentioned text real-time distinguishing apparatus 300 and each unit specific implementation process, can with reference to the corresponding description in preceding method embodiment, for convenience of description and Succinctly, details are not described herein.

Above-mentioned text real-time distinguishing apparatus 300 can be implemented as a kind of form of computer program, which can To be run in computer equipment as shown in figure 11.

Figure 11 is please referred to, Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The calculating Machine equipment 500 can be terminal, be also possible to server, wherein terminal can be smart phone, tablet computer, notebook electricity Brain, desktop computer, personal digital assistant and wearable device etc. have the electronic equipment of communication function.Server can be independence Server, be also possible to the server cluster of multiple servers composition.

Refering to fig. 11, which includes processor 502, memory and the net connected by system bus 501 Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.

The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program 5032 include program instruction, which is performed, and processor 502 may make to execute a kind of text real-time identification method.

The processor 502 is for providing calculating and control ability, to support the operation of entire computer equipment 500.

The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should When computer program 5032 is executed by processor 502, processor 502 may make to execute a kind of text real-time identification method.

The network interface 505 is used to carry out network communication with other equipment.It will be understood by those skilled in the art that in Figure 11 The structure shown, only the block diagram of part-structure relevant to application scheme, does not constitute and is applied to application scheme The restriction of computer equipment 500 thereon, specific computer equipment 500 may include more more or fewer than as shown in the figure Component perhaps combines certain components or with different component layouts.

Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following step It is rapid:

Obtain images to be recognized data；

In one embodiment, processor 502 is realizing that the Text region model is made by the image data of tape identification When for step obtained by sample data training convolutional neural networks, it is implemented as follows step:

Construct loss function and convolutional neural networks；

The image data of tape identification is obtained, to obtain sample data；

In one embodiment, processor 502 carries out convolution in described input sample data in convolutional neural networks of realization It calculates, when obtaining sample output result step, is implemented as follows step:

17th output result is connected entirely, to obtain the 18th output result；

In one embodiment, processor 502 realize it is described to second output result carry out intersection process of convolution, to obtain When third exports result step, it is implemented as follows step:

In one embodiment, processor 502 is described to third output result progress mean value pondization processing in realization, to obtain When the 4th output result step, it is implemented as follows step:

In one embodiment, processor 502 is described using the CTC loss function alignment recognition result in realization, to obtain To after character string step, following steps are also realized:

Output character sequence.

It should be appreciated that in the embodiment of the present application, processor 502 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or Person's processor is also possible to any conventional processor etc..

Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process, It is that relevant hardware can be instructed to complete by computer program.The computer program includes program instruction, computer journey Sequence can be stored in a storage medium, which is computer readable storage medium.The program instruction is by the department of computer science At least one processor in system executes, to realize the process step of the embodiment of the above method.

Therefore, the present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium.This is deposited Storage media is stored with computer program, and processor is made to execute following steps when wherein the computer program is executed by processor:

Obtain images to be recognized data；

In one embodiment, the processor realizes that the Text region model is logical executing the computer program When crossing the image data of tape identification as step obtained by sample data training convolutional neural networks, it is implemented as follows step:

Construct loss function and convolutional neural networks；

The image data of tape identification is obtained, to obtain sample data；

In one embodiment, the processor is realized described input sample data and is rolled up in the execution computer program Convolutional calculation is carried out in product neural network, when obtaining sample output result step, is implemented as follows step:

17th output result is connected entirely, to obtain the 18th output result；

In one embodiment, the processor execute the computer program and realize it is described to the second output result into Row intersects process of convolution, when obtaining third output result step, is implemented as follows step:

In one embodiment, the processor execute the computer program and realize it is described to third export result into The processing of row mean value pondization is implemented as follows step when obtaining the 4th output result step:

In one embodiment, the processor is realized described using CTC loss function in the execution computer program It is aligned the recognition result, after obtaining character string step, also realizes following steps:

Output character sequence.

The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk Or the various computer readable storage mediums that can store program code such as CD.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary.For example, the division of each unit, only Only a kind of logical function partition, there may be another division manner in actual implementation.Such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.

The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.This hair Unit in bright embodiment device can be combined, divided and deleted according to actual needs.In addition, in each implementation of the present invention Each functional unit in example can integrate in one processing unit, is also possible to each unit and physically exists alone, can also be with It is that two or more units are integrated in one unit.

If the integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product, It can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing skill The all or part of part or the technical solution that art contributes can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, terminal or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. text real-time identification method characterized by comprising

Obtain images to be recognized data；

Wherein, the Text region model is the image data by tape identification as sample data training convolutional neural networks institute ?.

2. text real-time identification method according to claim 1, which is characterized in that the Text region model is to pass through band The image data of mark is resulting as sample data training convolutional neural networks, comprising:

Construct loss function and convolutional neural networks；

The image data of tape identification is obtained, to obtain sample data；

Convolutional neural networks are learnt using sample data and using deep learning frame, to obtain Text region model.

3. text real-time identification method according to claim 2, which is characterized in that described by sample data input convolution mind Convolutional calculation is carried out, in network to obtain sample output result, comprising:

The process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, to obtain the 5th output knot Fruit；

17th output result is connected entirely, to obtain the 18th output result；

4. text real-time identification method according to claim 3, which is characterized in that described to hand over the second output result Process of convolution is pitched, to obtain third output result, comprising:

5. text real-time identification method according to claim 3, which is characterized in that described to be carried out to third output result It is worth pondization processing, to obtain the 4th output result, comprising:

6. text real-time identification method according to any one of claims 1 to 5, which is characterized in that described to be lost using CTC Function is aligned the recognition result, after obtaining character string, further includes:

Output character sequence.

7. text real-time distinguishing apparatus characterized by comprising

Data capture unit, for obtaining images to be recognized data；

Recognition unit carries out Text region for images to be recognized data to be input in Text region model, to be identified As a result；

8. text real-time identification method according to claim 7, which is characterized in that described device further include:

Training unit, for resulting as sample data training convolutional neural networks by the image data of tape identification, with To Text region model.

9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, on the memory It is stored with computer program, the processor is realized as described in any one of claims 1 to 7 when executing the computer program Method.

10. a kind of storage medium, which is characterized in that the storage medium is stored with computer program, the computer program quilt Processor can realize the method as described in any one of claims 1 to 7 when executing.