CN110008961A - Text real-time identification method, device, computer equipment and storage medium - Google Patents
Text real-time identification method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110008961A CN110008961A CN201910256927.4A CN201910256927A CN110008961A CN 110008961 A CN110008961 A CN 110008961A CN 201910256927 A CN201910256927 A CN 201910256927A CN 110008961 A CN110008961 A CN 110008961A
- Authority
- CN
- China
- Prior art keywords
- output result
- convolution
- carried out
- result
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Discrimination (AREA)
Abstract
The present invention relates to text real-time identification method, device, computer equipment and storage medium, this method includes obtaining images to be recognized data;Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result;The recognition result is aligned using CTC loss function, to obtain character string;Wherein, the Text region model is resulting as sample data training convolutional neural networks by the image data of tape identification.The present invention carries out Text region by the way that images to be recognized data are input in Text region model, in the training process to verbal model, by using convolutional calculation, in conjunction with pond layer it is down-sampled and batch standardization layer and lose layer accelerate convergence rate, improve stability, prevent over-fitting, change convolution kernel, to reduce calculation amount, realization can both guarantee to identify text with low power, can also improve the rate of Text region.
Description
Technical field
The present invention relates to character recognition methods, more specifically refer to text real-time identification method, device, computer equipment
And storage medium.
Background technique
Text detection process be include String localization and text identification, existing character identification system mostly uses traditional meter
Calculation machine vision algorithm does not use neural network, and accuracy rate is lower, needs preparatory Character segmentation mostly, and the error of segmentation will be into
One step influences identification, and concrete scheme is to carry out Character segmentation, the character split is classified respectively, after then carrying out
Reason is by all Connection operators identified at final recognition result.Identification is divided into two steps by such algorithm, and first
The error that step generates is intended only as intermediate steps, does not need segmentation result centainly, and the error divided can travel to
Next step will seriously affect the accuracy of monocase classification, to influence final recognition effect.
In addition, also having new recognition methods at present, Text region mould is gone out using the good neural metwork training of current effect
Type identifies text using the model, and generally, line of text identification is a sequence to sequence problem, that is, inputs picture
Information, that is, pixel sequence exports a text sequence, and the RNN model based on LSTM is due to good Series Modeling ability at this time,
It can be very good to solve the problems, such as such sequence, however from power consumption and speed, relative to convolution, LSTM is very unfavorable
In mobile terminal deployment.And sequence of pictures it is born without time-dependent relation, with heavy LSTM modeling be not it is unique most
Good selection, neural network Text region need to expend a large amount of computing resource mostly, cannot be detached from the environment of cloud.
Therefore, it is necessary to design a kind of new method, realization can both guarantee to identify text with low power, can also improve
The rate of Text region.
Summary of the invention
It is an object of the invention to overcome the deficiencies of existing technologies, text real-time identification method, device, computer are provided and set
Standby and storage medium.
To achieve the above object, the invention adopts the following technical scheme: text real-time identification method includes:
Obtain images to be recognized data;
Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result;
The recognition result is aligned using CTC loss function, to obtain character string;
Wherein, the Text region model is the image data by tape identification as sample data training convolutional nerve net
Network is resulting.
Its further technical solution are as follows: the Text region model is the image data by tape identification as sample data
Training convolutional neural networks are resulting, comprising:
Construct loss function and convolutional neural networks;
The image data of tape identification is obtained, to obtain sample data;
Sample data is inputted in convolutional neural networks and carries out convolutional calculation, to obtain sample output result;
In the image data entrance loss function that sample is exported to result and tape identification, to obtain penalty values;
The parameter of convolutional neural networks is adjusted according to penalty values;
Convolutional neural networks are learnt using sample data and using deep learning frame, to obtain Text region mould
Type.
Its further technical solution are as follows: described input sample data in convolutional neural networks carries out convolutional calculation, with
Result is exported to sample, comprising:
The process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result;
Maximum pondization processing is carried out to the first output result, to obtain the second output result;
Intersection process of convolution is carried out to the second output result, to obtain third output result;
The processing of mean value pondization is carried out to third output result, to obtain the 4th output result;
The process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, it is defeated to obtain the 5th
Result out;
4th output result and the 5th output result are spliced, to obtain the 6th output result;
6th output result is subjected to intersection process of convolution, to obtain the 7th output result;
7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result;
Intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result;
Maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result;
Abandon to the 9th output result the adjacent area processing of layer characteristic pattern, to obtain the tenth output result;
The processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result;
Tenth output result and the 11st output result are spliced, to obtain the 12nd output result;
Intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result;
Abandon to the 14th output result the adjacent area processing of layer characteristic pattern, to obtain the 15th output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result;
Global pool processing is carried out to the 16th output result, to obtain the 17th output result;
17th output result is connected entirely, to obtain the 18th output result;
Tiling processing is carried out to the 18th output result, to obtain the 19th output result;
19th output result and the 16th output result are spliced, to obtain the 20th output result;
The process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output result.
Its further technical solution are as follows: it is described that intersection process of convolution is carried out to the second output result, to obtain third output
As a result, comprising:
The process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS;
The process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting;
The process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times;
The process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.
Its further technical solution are as follows: it is described that the processing of mean value pondization is carried out to third output result, to obtain the 4th output
As a result, comprising:
Pixel adjacent in third output result is averaged, to obtain the 4th output result.
Its further technical solution are as follows: it is described that the recognition result is aligned using CTC loss function, to obtain character string
Later, further includes:
Output character sequence.
The present invention also provides text real-time distinguishing apparatus, comprising:
Data capture unit, for obtaining images to be recognized data;
Recognition unit carries out Text region for images to be recognized data to be input in Text region model, to obtain
Recognition result;
Alignment unit, for being aligned the recognition result using CTC loss function, to obtain character string.
Its further technical solution are as follows: state device further include:
Training unit, for resulting as sample data training convolutional neural networks by the image data of tape identification,
To obtain Text region model.
The present invention also provides a kind of computer equipment, the computer equipment includes memory and processor, described to deposit
Computer program is stored on reservoir, the processor realizes above-mentioned method when executing the computer program.
The present invention also provides a kind of storage medium, the storage medium is stored with computer program, the computer journey
Sequence can realize above-mentioned method when being executed by processor.
Compared with the prior art, the invention has the advantages that: the present invention is by being input to text for images to be recognized data
Text region is carried out in identification model, in the training process to verbal model, by using convolutional calculation, is dropped in conjunction with pond layer
Sampling and batch standardization layer and loss layer accelerate convergence rate, improve stability, prevent over-fitting, change convolution kernel, to subtract
Few calculation amount, realizing can both guarantee to identify text with low power, can also improve the rate of Text region.
The invention will be further described in the following with reference to the drawings and specific embodiments.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the application scenarios schematic diagram of text real-time identification method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of text real-time identification method provided in an embodiment of the present invention;
Fig. 3 is the sub-process schematic diagram of text real-time identification method provided in an embodiment of the present invention;
Fig. 4 is the sub-process schematic diagram of text real-time identification method provided in an embodiment of the present invention;
Fig. 5 is the sub-process schematic diagram of text real-time identification method provided in an embodiment of the present invention;
Fig. 6 is the schematic diagram provided in an embodiment of the present invention for intersecting process of convolution;
Fig. 7 is the schematic diagram of equalization provided in an embodiment of the present invention processing;
Fig. 8 be another embodiment of the present invention provides text real-time identification method flow diagram;
Fig. 9 is the schematic block diagram of text real-time distinguishing apparatus provided in an embodiment of the present invention;
Figure 10 be another embodiment of the present invention provides text real-time distinguishing apparatus schematic block diagram;
Figure 11 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded
Body, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment
And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Fig. 1 and Fig. 2 are please referred to, Fig. 1 is that the application scenarios of text real-time identification method provided in an embodiment of the present invention are illustrated
Figure.Fig. 2 is the schematic flow chart of text real-time identification method provided in an embodiment of the present invention.The text real-time identification method is answered
For in server, the server and terminal progress data interaction to be shot to obtain images to be recognized data by terminal, and will be wait know
Other image data is transmitted to server, Text region is carried out to it by the Text region model in server, to recognition result
To obtain true character string, i.e. text information after being aligned, the text information can be transmitted to terminal or be believed with text
Breath controlling terminal makes corresponding response.
Fig. 2 is the flow diagram of text real-time identification method provided in an embodiment of the present invention.As shown in Fig. 2, this method
Include the following steps S110 to S130.
S110, images to be recognized data are obtained.
In the present embodiment, images to be recognized data, which refer to, shoots resulting image data by terminal, or is also possible to
The modes such as scanning obtain resulting image data.
S120, it images to be recognized data is input in Text region model carries out Text region, to obtain recognition result.
In the present embodiment, recognition result refers to that length is about the probability sequence of 50 to 200 character.
Wherein, the Text region model is the image data by tape identification as sample data training convolutional nerve net
Network is resulting.
In one embodiment, referring to Fig. 3, above-mentioned Text region model training step may include step S121~
S126。
S121, building loss function and convolutional neural networks.
In the present embodiment, building convolutional neural networks are to carry out convolutional calculation to image data, to reach classification and mesh
The effect of position is demarcated, each network is required to carry out penalty values calculating, the penalty values generation using loss function in the training process
Gap between the result and actual result of table output, penalty values are smaller, then gap is smaller, show that the network training must be better,
Vice versa.Convolutional neural networks are widely used in target detection, semantic segmentation, in the Computer Vision Tasks such as object classification, take
Very good effect was obtained, shows its adaptability good for visual task.
S122, the image data for obtaining tape identification, to obtain sample data.
In the present embodiment, sample data refers to the image data with words identification, if the sample data is segmented into
Dry training set and fraction test set, are trained convolutional neural networks using several training sets, to select loss
It is worth lesser convolutional neural networks, is tested using test set.
S123, convolutional calculation will be carried out in sample data input convolutional neural networks, to obtain sample output result.
In the present embodiment, sample output result refers to probability sequence, that is, the text sequence number of sample data prediction.
In one embodiment, referring to Fig. 4, above-mentioned step S123 may include step S123a~S123v.
S123a, the process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result;
S123b, maximum pondization processing is carried out to the first output result, to obtain the second output result.
In the present embodiment, maximum pondization processing refers to the pixel maximum for reading image.
S123c, intersection process of convolution is carried out to the second output result, to obtain third output result.
In the present embodiment, referring to Fig. 5, above-mentioned step S123c may include step S123c1~S123c4.
S123c1, the process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS;
S123c2, the process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting;
S123c3, the process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times;
S123c4, the process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.
As shown in fig. 6, the figure middle section is convolution kernel, currently used is usually the convolution kernel of 3*3, middle layer volume
Product core can with do multiplication on the feature mutual relation of front, calculation amount is larger, use convolution kernel be 1*3 and 3*1 superposition convolution at
It is 3*3 process of convolution that reason, which replaces convolution kernel, and forms bottleneck with the process of convolution that convolution kernel is 1*1 in front and back, reduces calculation amount.
S123d, the processing of mean value pondization is carried out to third output result, to obtain the 4th output result.
Specifically, pixel adjacent in third output result is averaged, to obtain the 4th output result.
When due to splicing, each feature resolution is different, and the picture of big resolution ratio is aligned with the mode in mean value pond
Information, so-called mean value pond, that is, be averaged in adjacent pixel, to reduce average value, as shown in Figure 7.
S123e, the process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, to obtain
5th output result;
S123f, the 4th output result and the 5th output result are spliced, to obtain the 6th output result;
S123g, the 6th output result is subjected to intersection process of convolution, to obtain the 7th output result;
S123h, the 7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result;
S123i, intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result;
S123j, maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result;
S123k, the adjacent area for abandon layer characteristic pattern to the 9th output result are handled, to obtain the tenth output knot
Fruit;
S123l, the processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result;
S123m, the tenth output result and the 11st output result are spliced, to obtain the 12nd output result;
S123n, intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result;
S123o, the process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result;
S123p, the adjacent area for abandon layer characteristic pattern to the 14th output result are handled, to obtain the 15th output
As a result;
S123q, the process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result;
S123r, global pool processing is carried out to the 16th output result, to obtain the 17th output result;
S123s, the 17th output result is connected entirely, to obtain the 18th output result;
S123t, tiling processing is carried out to the 18th output result, to obtain the 19th output result;
S123u, the 19th output result and the 16th output result are spliced, to obtain the 20th output result;
S123v, the process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output knot
Fruit.
Repeatedly connection shallow-layer and further feature, extract the feature of image sequence.The spy for being extracted network early period using splicing
The feature that the process of convolution that sign i.e. shallow-layer feature is constantly and this feature passes through intersection convolution or convolution kernel is 3*3 is extracted later is being led to
Road dimension is spliced, and (takes Chinese characters in common use number, for 8500), width is that W (can be with to obtain the classification number of a length of character to be identified
The number being set as between 50 to 200) characteristic pattern, by characteristic pattern along width cut, obtain length be W characteristic sequence, that is,
Probability sequence.
Above-mentioned intersection process of convolution is first to carry out convolution kernel to be the process of convolution of 1*1, then carrying out convolution kernel is 1*1's
Process of convolution, then carries out the process of convolution that convolution kernel is 3*1, finally carries out the process of convolution that convolution kernel is 1*1.Using convolution
It calculates, it is down-sampled in conjunction with pond layer, and criticize standardization layer and lose layer and accelerate convergence rate, raising stability prevented from intending
It closes, random drop feature is effective to full articulamentum, and experiment shows but without so effective for convolutional layer, therefore uses newest
The discarding mode for convolutional layer, enhance network robustness.
In convolutional network finally, using big transverse direction and lateral convolution kernel (1*8 and 8*1), small calculation amount is being kept
Simultaneously as convolution kernel horizontal and vertical (being all 8) is very long, this will well using between lateral position between lengthwise position
Related information compensates for instead of the ability of LSTM processing image lateral position feature and character string feature and lacks LSTM band
The influence come.LSTM is mainly used in speech processes originally, and the fields such as natural language processing, it can processing sequence input well
To the problem of sequence output, Text region task, due to that can be sequence of pictures by picture segmentation, output be also word sequence,
Therefore the framework that its sequence can be used to sequence is handled, however from voice and unlike, the natural only left and right knot of picture
Structure, there is no dependences as voice for the sequence relation of text picture from left to right, therefore use long core process of convolution text
Identification picture can preferably substitute LSTM network.
S124, sample is exported in the image data entrance loss function of result and tape identification, to obtain penalty values;
S125, the parameter that convolutional neural networks are adjusted according to penalty values;
S126, convolutional neural networks are learnt using sample data and using deep learning frame, to obtain text
Identification model.
By continuously adjusting the parameter of convolutional neural networks, and repeatedly learnt and trained, to be met the requirements
Convolutional neural networks, specifically using tensorflow training, after being converted to corresponding Text region model, pass through
Tensorflow tflite and tensorflow mace have very easily been deployed on server or terminal.It is not only
It supports common controller to run, opencl (full name open computing language, open operation language) can also be passed through
Accelerate in the enterprising line control unit of relevant device.
Obtained Text region model, single forward calculation only have about 0.22Gflops, and forward calculation can be located in real time
Manage a large amount of Text region tasks.Eliminate complicated RNN (Recognition with Recurrent Neural Network, Recurrent Neural Network) model
Power and memory requirements are largely calculated on embedded device, in addition, the Text region algorithm of reality actual use can be put into
Need to face the fuzzy of picture, a series of problems, such as illumination is bad, physical deformation etc..Fine and extensive text augmentation can be passed through
And generation, this problem is carefully handled well, so that algorithm obtains extraordinary effect under reality scene, specific service test
Fruit.
S130, the recognition result is aligned using CTC loss function, to obtain character string;
Text region model exports the probability sequence that a string length is about 50 to 200 character.Due to final purpose
It is to obtain true character string, i.e. alphabetic character number in images to be recognized data, as being usually 7 digit sequences in license plate,
It needs for the two to be aligned.Both alignment are gone using very more CTC loss functions in speech recognition, obtain character
Sequence.
This method operates on Android device RK3399, to the Text region of several quasi-representatives, identifies 8 bit digitals, privately owned survey
Examination, which collects, closes accuracy rate about 99.1%, and about 20 milliseconds of speed;Identify 14 Chinese characters, the accuracy rate on privately owned test set is about
98.8%, about 46 milliseconds of speed.
Above-mentioned text real-time identification method carries out text by the way that images to be recognized data are input in Text region model
Word identification, it is down-sampled in conjunction with pond layer and batch standardize by using convolutional calculation in the training process to verbal model
Layer and loss layer accelerate convergence rate, improve stability, prevent over-fitting, change convolution kernel, and to reduce calculation amount, realization both may be used
To guarantee to identify text with low power, the rate of Text region can also be improved.
Fig. 8 be another embodiment of the present invention provides a kind of text real-time identification method flow diagram.Such as Fig. 8 institute
Show, the text real-time identification method of the present embodiment includes step S210-S240.Wherein step S210-S230 and above-described embodiment
In step S110-S130 it is similar, details are not described herein.The following detailed description of in the present embodiment increase step S240.
S240, output character sequence.
It will identify that resulting character string is exported to the character ordinal number that terminal show or according to output to correspond to
Response, for example transfer corresponding data etc..
Fig. 9 is a kind of schematic block diagram of text real-time distinguishing apparatus 300 provided in an embodiment of the present invention.As shown in figure 9,
Corresponding to the above text real-time identification method, the present invention also provides a kind of text real-time distinguishing apparatus 300.The text identifies in real time
Device 300 includes the unit for executing above-mentioned text real-time identification method, which can be configured in server or terminal
In.
Specifically, referring to Fig. 9, the text real-time distinguishing apparatus 300 includes:
Data capture unit 301, for obtaining images to be recognized data;
Recognition unit 302 carries out Text region for images to be recognized data to be input in Text region model, with
To recognition result;
Alignment unit 303, for being aligned the recognition result using CTC loss function, to obtain character string.
In one embodiment, described device further include:
Training unit, for resulting as sample data training convolutional neural networks by the image data of tape identification,
To obtain Text region model.
In one embodiment, the training unit includes:
Subelement is constructed, for constructing loss function and convolutional neural networks;
Sample data forms subelement, for obtaining the image data of tape identification, to obtain sample data;
Computation subunit carries out convolutional calculation for inputting sample data in convolutional neural networks, defeated to obtain sample
Result out;
Penalty values obtain subelement, for sample to be exported to the image data entrance loss function of result and tape identification
It is interior, to obtain penalty values;
Ginseng subelement is adjusted, for adjusting the parameter of convolutional neural networks according to penalty values;
Learn subelement, for learning using sample data and using deep learning frame to convolutional neural networks,
To obtain Text region model.
In one embodiment, the computation subunit includes:
First convolution processing module, it is defeated to obtain first for carrying out the process of convolution that convolution kernel is 3*3 to sample data
Result out;
First maximum pond module, for carrying out maximum pondization processing to the first output result, to obtain the second output knot
Fruit;
Second convolution processing module, for carrying out intersection process of convolution to the second output result, to obtain third output knot
Fruit;
First mean value pond module, for carrying out the processing of mean value pondization to third output result, to obtain the 4th output knot
Fruit;
Third process of convolution module, for carrying out the process of convolution and intersection that convolution kernel is 3*3 to third output result
Process of convolution, to obtain the 5th output result;
First splicing module, for splicing the 4th output result and the 5th output result, to obtain the 6th
Export result;
Volume Four accumulates processing module, for the 6th output result to be carried out intersection process of convolution, to obtain the 7th output knot
Fruit;
Second splicing module, for the 7th output result and the 4th output result to be carried out splicing, to obtain the 6th
Export result;
5th process of convolution module, for carrying out intersection process of convolution to the 6th output result, to obtain the 8th output knot
Fruit;
Second maximum pond module, for carrying out maximum pondization processing to the 8th output result, to obtain the 9th output knot
Fruit;
First discard module, the adjacent area for abandon layer characteristic pattern to the 9th output result is handled, to obtain
Tenth output result;
Second mean value pond module, for carrying out the processing of mean value pondization to the 7th output result, to obtain the 11st output
As a result;
Third splicing module, for splicing the tenth output result and the 11st output result, to obtain the 12nd
Export result;
6th process of convolution module, it is defeated to obtain the 13rd for carrying out intersection process of convolution to the 12nd output result
Result out;
7th process of convolution module, for carrying out the process of convolution that convolution kernel is 3*3 to the 13rd output result, to obtain
14th output result;
Second discard module, the adjacent area for abandon layer characteristic pattern to the 14th output result are handled, with
To the 15th output result;
8th process of convolution module, for carrying out the process of convolution that convolution kernel is 3*3 to the 15th output result, to obtain
16th output result;
Global pool module, for carrying out global pool processing to the 16th output result, to obtain the 17th output knot
Fruit;
Full link block, for being connected entirely to the 17th output result, to obtain the 18th output result;
Tile module, for carrying out tiling processing to the 18th output result, to obtain the 19th output result;
4th splicing module, for splicing to the 19th output result and the 16th output result, to obtain second
Ten output results;
9th process of convolution module, for carrying out the process of convolution that convolution kernel is 1*8 and 8*1 to the 20th output result,
To obtain sample output result.
In one embodiment, the second convolution processing module includes:
Preliminary convolution submodule, it is preliminary to obtain for carrying out the process of convolution that convolution kernel is 1*1 to the second output result
As a result;
Secondary convolution submodule, for carrying out the process of convolution that convolution kernel is 1*3 to PRELIMINARY RESULTS, to obtain secondary knot
Fruit;
Cubic convolution submodule, for carrying out the process of convolution that convolution kernel is 3*1 to second fruiting, to be tied three times
Fruit;
Four convolution submodules, for carrying out the process of convolution that convolution kernel is 1*1 to result three times, to obtain third output
As a result.
Figure 10 be another embodiment of the present invention provides a kind of text real-time distinguishing apparatus 300 schematic block diagram.Such as figure
Shown in 10, the text real-time distinguishing apparatus 300 of the present embodiment is to increase output unit 304 on the basis of above-described embodiment.
Output unit 304 is used for output character sequence.
It should be noted that it is apparent to those skilled in the art that, above-mentioned text real-time distinguishing apparatus
300 and each unit specific implementation process, can with reference to the corresponding description in preceding method embodiment, for convenience of description and
Succinctly, details are not described herein.
Above-mentioned text real-time distinguishing apparatus 300 can be implemented as a kind of form of computer program, which can
To be run in computer equipment as shown in figure 11.
Figure 11 is please referred to, Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The calculating
Machine equipment 500 can be terminal, be also possible to server, wherein terminal can be smart phone, tablet computer, notebook electricity
Brain, desktop computer, personal digital assistant and wearable device etc. have the electronic equipment of communication function.Server can be independence
Server, be also possible to the server cluster of multiple servers composition.
Refering to fig. 11, which includes processor 502, memory and the net connected by system bus 501
Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.
The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program
5032 include program instruction, which is performed, and processor 502 may make to execute a kind of text real-time identification method.
The processor 502 is for providing calculating and control ability, to support the operation of entire computer equipment 500.
The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should
When computer program 5032 is executed by processor 502, processor 502 may make to execute a kind of text real-time identification method.
The network interface 505 is used to carry out network communication with other equipment.It will be understood by those skilled in the art that in Figure 11
The structure shown, only the block diagram of part-structure relevant to application scheme, does not constitute and is applied to application scheme
The restriction of computer equipment 500 thereon, specific computer equipment 500 may include more more or fewer than as shown in the figure
Component perhaps combines certain components or with different component layouts.
Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following step
It is rapid:
Obtain images to be recognized data;
Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result;
The recognition result is aligned using CTC loss function, to obtain character string;
Wherein, the Text region model is the image data by tape identification as sample data training convolutional nerve net
Network is resulting.
In one embodiment, processor 502 is realizing that the Text region model is made by the image data of tape identification
When for step obtained by sample data training convolutional neural networks, it is implemented as follows step:
Construct loss function and convolutional neural networks;
The image data of tape identification is obtained, to obtain sample data;
Sample data is inputted in convolutional neural networks and carries out convolutional calculation, to obtain sample output result;
In the image data entrance loss function that sample is exported to result and tape identification, to obtain penalty values;
The parameter of convolutional neural networks is adjusted according to penalty values;
Convolutional neural networks are learnt using sample data and using deep learning frame, to obtain Text region mould
Type.
In one embodiment, processor 502 carries out convolution in described input sample data in convolutional neural networks of realization
It calculates, when obtaining sample output result step, is implemented as follows step:
The process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result;
Maximum pondization processing is carried out to the first output result, to obtain the second output result;
Intersection process of convolution is carried out to the second output result, to obtain third output result;
The processing of mean value pondization is carried out to third output result, to obtain the 4th output result;
The process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, it is defeated to obtain the 5th
Result out;
4th output result and the 5th output result are spliced, to obtain the 6th output result;
6th output result is subjected to intersection process of convolution, to obtain the 7th output result;
7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result;
Intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result;
Maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result;
Abandon to the 9th output result the adjacent area processing of layer characteristic pattern, to obtain the tenth output result;
The processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result;
Tenth output result and the 11st output result are spliced, to obtain the 12nd output result;
Intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result;
Abandon to the 14th output result the adjacent area processing of layer characteristic pattern, to obtain the 15th output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result;
Global pool processing is carried out to the 16th output result, to obtain the 17th output result;
17th output result is connected entirely, to obtain the 18th output result;
Tiling processing is carried out to the 18th output result, to obtain the 19th output result;
19th output result and the 16th output result are spliced, to obtain the 20th output result;
The process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output result.
In one embodiment, processor 502 realize it is described to second output result carry out intersection process of convolution, to obtain
When third exports result step, it is implemented as follows step:
The process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS;
The process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting;
The process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times;
The process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.
In one embodiment, processor 502 is described to third output result progress mean value pondization processing in realization, to obtain
When the 4th output result step, it is implemented as follows step:
Pixel adjacent in third output result is averaged, to obtain the 4th output result.
In one embodiment, processor 502 is described using the CTC loss function alignment recognition result in realization, to obtain
To after character string step, following steps are also realized:
Output character sequence.
It should be appreciated that in the embodiment of the present application, processor 502 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic
Device, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or
Person's processor is also possible to any conventional processor etc..
Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process,
It is that relevant hardware can be instructed to complete by computer program.The computer program includes program instruction, computer journey
Sequence can be stored in a storage medium, which is computer readable storage medium.The program instruction is by the department of computer science
At least one processor in system executes, to realize the process step of the embodiment of the above method.
Therefore, the present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium.This is deposited
Storage media is stored with computer program, and processor is made to execute following steps when wherein the computer program is executed by processor:
Obtain images to be recognized data;
Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result;
The recognition result is aligned using CTC loss function, to obtain character string;
Wherein, the Text region model is the image data by tape identification as sample data training convolutional nerve net
Network is resulting.
In one embodiment, the processor realizes that the Text region model is logical executing the computer program
When crossing the image data of tape identification as step obtained by sample data training convolutional neural networks, it is implemented as follows step:
Construct loss function and convolutional neural networks;
The image data of tape identification is obtained, to obtain sample data;
Sample data is inputted in convolutional neural networks and carries out convolutional calculation, to obtain sample output result;
In the image data entrance loss function that sample is exported to result and tape identification, to obtain penalty values;
The parameter of convolutional neural networks is adjusted according to penalty values;
Convolutional neural networks are learnt using sample data and using deep learning frame, to obtain Text region mould
Type.
In one embodiment, the processor is realized described input sample data and is rolled up in the execution computer program
Convolutional calculation is carried out in product neural network, when obtaining sample output result step, is implemented as follows step:
The process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result;
Maximum pondization processing is carried out to the first output result, to obtain the second output result;
Intersection process of convolution is carried out to the second output result, to obtain third output result;
The processing of mean value pondization is carried out to third output result, to obtain the 4th output result;
The process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, it is defeated to obtain the 5th
Result out;
4th output result and the 5th output result are spliced, to obtain the 6th output result;
6th output result is subjected to intersection process of convolution, to obtain the 7th output result;
7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result;
Intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result;
Maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result;
Abandon to the 9th output result the adjacent area processing of layer characteristic pattern, to obtain the tenth output result;
The processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result;
Tenth output result and the 11st output result are spliced, to obtain the 12nd output result;
Intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result;
Abandon to the 14th output result the adjacent area processing of layer characteristic pattern, to obtain the 15th output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result;
Global pool processing is carried out to the 16th output result, to obtain the 17th output result;
17th output result is connected entirely, to obtain the 18th output result;
Tiling processing is carried out to the 18th output result, to obtain the 19th output result;
19th output result and the 16th output result are spliced, to obtain the 20th output result;
The process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output result.
In one embodiment, the processor execute the computer program and realize it is described to the second output result into
Row intersects process of convolution, when obtaining third output result step, is implemented as follows step:
The process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS;
The process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting;
The process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times;
The process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.
In one embodiment, the processor execute the computer program and realize it is described to third export result into
The processing of row mean value pondization is implemented as follows step when obtaining the 4th output result step:
Pixel adjacent in third output result is averaged, to obtain the 4th output result.
In one embodiment, the processor is realized described using CTC loss function in the execution computer program
It is aligned the recognition result, after obtaining character string step, also realizes following steps:
Output character sequence.
The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk
Or the various computer readable storage mediums that can store program code such as CD.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially
Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not
It is considered as beyond the scope of this invention.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary.For example, the division of each unit, only
Only a kind of logical function partition, there may be another division manner in actual implementation.Such as multiple units or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored or not executed.
The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.This hair
Unit in bright embodiment device can be combined, divided and deleted according to actual needs.In addition, in each implementation of the present invention
Each functional unit in example can integrate in one processing unit, is also possible to each unit and physically exists alone, can also be with
It is that two or more units are integrated in one unit.
If the integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product,
It can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing skill
The all or part of part or the technical solution that art contributes can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, terminal or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
Claims (10)
1. text real-time identification method characterized by comprising
Obtain images to be recognized data;
Images to be recognized data are input in Text region model and carry out Text region, to obtain recognition result;
The recognition result is aligned using CTC loss function, to obtain character string;
Wherein, the Text region model is the image data by tape identification as sample data training convolutional neural networks institute
?.
2. text real-time identification method according to claim 1, which is characterized in that the Text region model is to pass through band
The image data of mark is resulting as sample data training convolutional neural networks, comprising:
Construct loss function and convolutional neural networks;
The image data of tape identification is obtained, to obtain sample data;
Sample data is inputted in convolutional neural networks and carries out convolutional calculation, to obtain sample output result;
In the image data entrance loss function that sample is exported to result and tape identification, to obtain penalty values;
The parameter of convolutional neural networks is adjusted according to penalty values;
Convolutional neural networks are learnt using sample data and using deep learning frame, to obtain Text region model.
3. text real-time identification method according to claim 2, which is characterized in that described by sample data input convolution mind
Convolutional calculation is carried out, in network to obtain sample output result, comprising:
The process of convolution that convolution kernel is 3*3 is carried out to sample data, to obtain the first output result;
Maximum pondization processing is carried out to the first output result, to obtain the second output result;
Intersection process of convolution is carried out to the second output result, to obtain third output result;
The processing of mean value pondization is carried out to third output result, to obtain the 4th output result;
The process of convolution and intersect process of convolution that convolution kernel is 3*3 are carried out to third output result, to obtain the 5th output knot
Fruit;
4th output result and the 5th output result are spliced, to obtain the 6th output result;
6th output result is subjected to intersection process of convolution, to obtain the 7th output result;
7th output result and the 4th output result are subjected to splicing, to obtain the 6th output result;
Intersection process of convolution is carried out to the 6th output result, to obtain the 8th output result;
Maximum pondization processing is carried out to the 8th output result, to obtain the 9th output result;
Abandon to the 9th output result the adjacent area processing of layer characteristic pattern, to obtain the tenth output result;
The processing of mean value pondization is carried out to the 7th output result, to obtain the 11st output result;
Tenth output result and the 11st output result are spliced, to obtain the 12nd output result;
Intersection process of convolution is carried out to the 12nd output result, to obtain the 13rd output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 13rd output result, to obtain the 14th output result;
Abandon to the 14th output result the adjacent area processing of layer characteristic pattern, to obtain the 15th output result;
The process of convolution that convolution kernel is 3*3 is carried out to the 15th output result, to obtain the 16th output result;
Global pool processing is carried out to the 16th output result, to obtain the 17th output result;
17th output result is connected entirely, to obtain the 18th output result;
Tiling processing is carried out to the 18th output result, to obtain the 19th output result;
19th output result and the 16th output result are spliced, to obtain the 20th output result;
The process of convolution that convolution kernel is 1*8 and 8*1 is carried out to the 20th output result, to obtain sample output result.
4. text real-time identification method according to claim 3, which is characterized in that described to hand over the second output result
Process of convolution is pitched, to obtain third output result, comprising:
The process of convolution that convolution kernel is 1*1 is carried out to the second output result, to obtain PRELIMINARY RESULTS;
The process of convolution that convolution kernel is 1*3 is carried out to PRELIMINARY RESULTS, to obtain second fruiting;
The process of convolution that convolution kernel is 3*1 is carried out to second fruiting, to obtain result three times;
The process of convolution that convolution kernel is 1*1 is carried out to result three times, to obtain third output result.
5. text real-time identification method according to claim 3, which is characterized in that described to be carried out to third output result
It is worth pondization processing, to obtain the 4th output result, comprising:
Pixel adjacent in third output result is averaged, to obtain the 4th output result.
6. text real-time identification method according to any one of claims 1 to 5, which is characterized in that described to be lost using CTC
Function is aligned the recognition result, after obtaining character string, further includes:
Output character sequence.
7. text real-time distinguishing apparatus characterized by comprising
Data capture unit, for obtaining images to be recognized data;
Recognition unit carries out Text region for images to be recognized data to be input in Text region model, to be identified
As a result;
Alignment unit, for being aligned the recognition result using CTC loss function, to obtain character string.
8. text real-time identification method according to claim 7, which is characterized in that described device further include:
Training unit, for resulting as sample data training convolutional neural networks by the image data of tape identification, with
To Text region model.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, on the memory
It is stored with computer program, the processor is realized as described in any one of claims 1 to 7 when executing the computer program
Method.
10. a kind of storage medium, which is characterized in that the storage medium is stored with computer program, the computer program quilt
Processor can realize the method as described in any one of claims 1 to 7 when executing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910256927.4A CN110008961B (en) | 2019-04-01 | 2019-04-01 | Text real-time identification method, text real-time identification device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910256927.4A CN110008961B (en) | 2019-04-01 | 2019-04-01 | Text real-time identification method, text real-time identification device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110008961A true CN110008961A (en) | 2019-07-12 |
CN110008961B CN110008961B (en) | 2023-05-12 |
Family
ID=67169203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910256927.4A Active CN110008961B (en) | 2019-04-01 | 2019-04-01 | Text real-time identification method, text real-time identification device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110008961B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688411A (en) * | 2019-09-25 | 2020-01-14 | 北京地平线机器人技术研发有限公司 | Text recognition method and device |
CN111428656A (en) * | 2020-03-27 | 2020-07-17 | 信雅达***工程股份有限公司 | Mobile terminal identity card identification method based on deep learning and mobile device |
CN112116001A (en) * | 2020-09-17 | 2020-12-22 | 苏州浪潮智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN112215229A (en) * | 2020-08-27 | 2021-01-12 | 北京英泰智科技股份有限公司 | Lightweight network end-to-end-based license plate identification method and device |
CN112668600A (en) * | 2019-10-16 | 2021-04-16 | 商汤国际私人有限公司 | Text recognition method and device |
CN113283427A (en) * | 2021-07-20 | 2021-08-20 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and medium |
WO2024088269A1 (en) * | 2022-10-26 | 2024-05-02 | 维沃移动通信有限公司 | Character recognition method and apparatus, and electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335754A (en) * | 2015-10-29 | 2016-02-17 | 小米科技有限责任公司 | Character recognition method and device |
CN106354701A (en) * | 2016-08-30 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Chinese character processing method and device |
CN106570509A (en) * | 2016-11-04 | 2017-04-19 | 天津大学 | Dictionary learning and coding method for extracting digital image feature |
CN108182455A (en) * | 2018-01-18 | 2018-06-19 | 齐鲁工业大学 | A kind of method, apparatus and intelligent garbage bin of the classification of rubbish image intelligent |
CN108427953A (en) * | 2018-02-26 | 2018-08-21 | 北京易达图灵科技有限公司 | A kind of character recognition method and device |
CN108875904A (en) * | 2018-04-04 | 2018-11-23 | 北京迈格威科技有限公司 | Image processing method, image processing apparatus and computer readable storage medium |
-
2019
- 2019-04-01 CN CN201910256927.4A patent/CN110008961B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335754A (en) * | 2015-10-29 | 2016-02-17 | 小米科技有限责任公司 | Character recognition method and device |
CN106354701A (en) * | 2016-08-30 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Chinese character processing method and device |
CN106570509A (en) * | 2016-11-04 | 2017-04-19 | 天津大学 | Dictionary learning and coding method for extracting digital image feature |
CN108182455A (en) * | 2018-01-18 | 2018-06-19 | 齐鲁工业大学 | A kind of method, apparatus and intelligent garbage bin of the classification of rubbish image intelligent |
CN108427953A (en) * | 2018-02-26 | 2018-08-21 | 北京易达图灵科技有限公司 | A kind of character recognition method and device |
CN108875904A (en) * | 2018-04-04 | 2018-11-23 | 北京迈格威科技有限公司 | Image processing method, image processing apparatus and computer readable storage medium |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688411A (en) * | 2019-09-25 | 2020-01-14 | 北京地平线机器人技术研发有限公司 | Text recognition method and device |
CN112668600A (en) * | 2019-10-16 | 2021-04-16 | 商汤国际私人有限公司 | Text recognition method and device |
CN112668600B (en) * | 2019-10-16 | 2024-05-21 | 商汤国际私人有限公司 | Text recognition method and device |
CN111428656A (en) * | 2020-03-27 | 2020-07-17 | 信雅达***工程股份有限公司 | Mobile terminal identity card identification method based on deep learning and mobile device |
CN112215229A (en) * | 2020-08-27 | 2021-01-12 | 北京英泰智科技股份有限公司 | Lightweight network end-to-end-based license plate identification method and device |
CN112215229B (en) * | 2020-08-27 | 2023-07-18 | 北京英泰智科技股份有限公司 | License plate recognition method and device based on lightweight network end-to-end |
CN112116001A (en) * | 2020-09-17 | 2020-12-22 | 苏州浪潮智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN112116001B (en) * | 2020-09-17 | 2022-06-07 | 苏州浪潮智能科技有限公司 | Image recognition method, image recognition device and computer-readable storage medium |
CN113283427A (en) * | 2021-07-20 | 2021-08-20 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and medium |
CN113283427B (en) * | 2021-07-20 | 2021-10-01 | 北京世纪好未来教育科技有限公司 | Text recognition method, device, equipment and medium |
WO2024088269A1 (en) * | 2022-10-26 | 2024-05-02 | 维沃移动通信有限公司 | Character recognition method and apparatus, and electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110008961B (en) | 2023-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110008961A (en) | Text real-time identification method, device, computer equipment and storage medium | |
WO2020199931A1 (en) | Face key point detection method and apparatus, and storage medium and electronic device | |
CN109685819B (en) | Three-dimensional medical image segmentation method based on feature enhancement | |
KR20210073569A (en) | Method, apparatus, device and storage medium for training image semantic segmentation network | |
CN110188331A (en) | Model training method, conversational system evaluation method, device, equipment and storage medium | |
CN109902546A (en) | Face identification method, device and computer-readable medium | |
CN109522967A (en) | A kind of commodity attribute recognition methods, device, equipment and storage medium | |
CN108288075A (en) | A kind of lightweight small target detecting method improving SSD | |
CN110378348A (en) | Instance of video dividing method, equipment and computer readable storage medium | |
CN106203363A (en) | Human skeleton motion sequence Activity recognition method | |
CN106599800A (en) | Face micro-expression recognition method based on deep learning | |
CN108776983A (en) | Based on the facial reconstruction method and device, equipment, medium, product for rebuilding network | |
CN109766840A (en) | Facial expression recognizing method, device, terminal and storage medium | |
CN109902548A (en) | A kind of object properties recognition methods, calculates equipment and system at device | |
CN111160350A (en) | Portrait segmentation method, model training method, device, medium and electronic equipment | |
CN110232373A (en) | Face cluster method, apparatus, equipment and storage medium | |
CN109145717A (en) | A kind of face identification method of on-line study | |
CN109086768A (en) | The semantic image dividing method of convolutional neural networks | |
CN107491729B (en) | Handwritten digit recognition method based on cosine similarity activated convolutional neural network | |
CN109344920A (en) | Customer attributes prediction technique, storage medium, system and equipment | |
CN110826462A (en) | Human body behavior identification method of non-local double-current convolutional neural network model | |
CN108509833A (en) | A kind of face identification method, device and equipment based on structured analysis dictionary | |
CN110097090A (en) | A kind of image fine granularity recognition methods based on multi-scale feature fusion | |
CN109086653A (en) | Handwriting model training method, hand-written character recognizing method, device, equipment and medium | |
CN110378208A (en) | A kind of Activity recognition method based on depth residual error network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.) Applicant after: Shenzhen Huafu Technology Co.,Ltd. Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.) Applicant before: SHENZHEN HUAFU INFORMATION TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |