CN108304814A

CN108304814A - A kind of construction method and computing device of literal type detection model

Info

Publication number: CN108304814A
Application number: CN201810128155.1A
Authority: CN
Inventors: 徐行; 刘辉; 刘宁; 张东祥; 郭龙; 陈李江; 李启林
Original assignee: Hainan Cloud River Technology Co Ltd
Current assignee: Hainan Avanti Technology Co ltd
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2018-07-20
Anticipated expiration: 2038-02-08
Also published as: CN108304814B

Abstract

The invention discloses a kind of construction method of literal type detection model and literal type detection methods, and suitable for being executed in computing device, model building method includes：Acquisition training picture；Each trained picture is extended for a rectangular picture；Obtain the result after being labeled to the print hand writing region of each rectangular picture and handwritten text region；Convolutional neural networks are trained according to each trained picture and its annotation results, obtain literal type detection model.Detection method includes：Original image to be identified is obtained, is multiple subgraphs by the original image cutting；Print hand writing region and handwritten text region in each subgraph are detected using literal type detection model respectively, obtain the coordinate information and its literal type of each character area；The adjacent same type of character area cut for belonging to different subgraphs is merged, the print hand writing region in original image and handwritten text region are obtained.The invention also discloses corresponding computing devices.

Description

A kind of construction method and computing device of literal type detection model

Technical field

The present invention relates to image real time transfer field more particularly to a kind of construction method, the texts of literal type detection model Word type detection method and computing device.

Background technology

With the development of computer and Internet technology, people more and more use automation equipment to try student examination Volume is goed over examination papers.In in examination paper analysis, it is often necessary to identify that the word of each identification region is hand-written script or printed words Body.Current character recognition method is typically based on character color or the simple character features answered are identified.This method is to image Quality requirement it is very high, if image have shade or occur it is hand-written immersion and it is fuzzy situations such as, it will cause accuracy of detection mistakes Low problem.Moreover, this method is typically only capable to be split detection based on horizontal line word, to rotation image can not be into Row detection well.In addition, word itself has various features, it is based only upon color characteristic and the detection differentiation of handwriting is failed The feature of handwriting is fully excavated, and then limits its detection result to a certain extent.

Accordingly, it is desirable to provide a kind of detection method of more effective handwritten text and print hand writing.

Invention content

In view of the above problems, the present invention proposes a kind of construction method of literal type detection model, literal type detection Method and computing device exist above to try hard to solve the problems, such as or at least solve.

According to an aspect of the present invention, a kind of construction method of literal type detection model is provided, suitable for being set in calculating Standby middle execution, this method include：Acquisition training picture, wherein every trained picture includes print hand writing and handwritten text At least one of；Each trained picture is extended for a rectangular picture according to the long width values of each trained picture；It obtains to each side The print hand writing region and handwritten text region of shape picture be labeled after result；And according to each trained picture and its Annotation results are trained convolutional neural networks, obtain literal type detection model.

Optionally, in the construction method of literal type detection model according to the present invention, convolutional neural networks include 6 layers Convolutional layer and 2 layers of full articulamentum.

Optionally, intermediate in convolutional neural networks in the construction method of literal type detection model according to the present invention The convolution kernel of convolutional layer includes 3*3 convolution kernels, 5*5 convolution kernels and 7*7 convolution kernels, and last output layer includes print hand writing area Domain, handwritten text region and 3 kinds of background area classification.

Optionally, in the construction method of literal type detection model according to the present invention, the block letter of square shaped picture The operation that character area and handwritten text region are labeled includes：Determine each line of text in the rectangular picture and each text Character area in one's own profession；The character area type of each line of text is labeled line by line, character area type includes block letter Character area and handwritten text region；And by the coordinate information of each character area in each line of text and its affiliated word Classification is preserved.

It optionally, will according to the long width values of picture in the construction method of literal type detection model according to the present invention Training picture the step of being extended for a rectangular picture includes：Higher value framework one during selection is long and wide is white background figure Picture, and the training picture is placed on to the center of white background picture.

According to a further aspect of the invention, a kind of literal type detection method is provided, suitable for being executed in computing device, Literal type detection model is stored in computing device, literal type detection model is suitable for examining using literal type as described above The construction method structure of model is surveyed, literal type detection method includes：The original image of literal type to be identified is obtained, and should Original image cutting is multiple subgraphs, wherein each subgraph is not overlapped and connects；Using literal type detection model respectively to each Print hand writing region and handwritten text region in subgraph are detected, obtain wherein each character area coordinate information and Literal type belonging to it；And different subgraphs will be belonged to respectively and the adjacent same type of character area cut closes And and using in all subgraphs print hand writing regional ensemble and handwritten text regional ensemble as the print in the original image Brush body character area and handwritten text region

Optionally, in literal type detection method according to the present invention, different subgraphs will be belonged to respectively and adjacent cut Same type of character area the step of merging include：Print hand writing region in each subgraph and hand-written is obtained respectively First coordinate information of the body character area in corresponding subgraph, and first coordinate information is converted to based on original image the Two coordinate informations；It is detected whether there are two according to the second coordinate information of each character area or multiple belongs to same type of word Region is adjacent to cut, if so, then merge these it is adjacent cut region, with obtain all print hand writing regions in original image and Handwritten text region.

Optionally, in literal type detection method according to the present invention, by the step that original image cutting is multiple subgraphs Suddenly include：The original image is extended for a rectangular picture according to the long width values of original image, and by the rectangular picture cutting For multiple subgraphs.

Optionally, in literal type detection method according to the present invention, the coordinate information of character area includes the word The top left corner apex coordinate and lower right corner apex coordinate in region.

Optionally, in literal type detection method according to the present invention, if the top left corner apex of original image is in its institute Rectangular picture in coordinate value be (x, y), the top left corner apex of some subgraph is in the rectangular picture in the rectangular picture Coordinate value be (x₁, y₁), coordinate value of the top left corner apex of certain character area in the subgraph is (x in the subgraph₂, y₂), then Coordinate value of the character area in the original image is (x₁+x₂- x, y₁+y₂-y)。

According to a further aspect of the invention, a kind of computing device is provided, including：At least one processor；Be stored with The memory of program instruction, wherein the program instruction is configured as being suitable for being executed by least one processor, program instruction It include the instruction of the construction method and/or literal type detection method for executing literal type detection model as described above.

According to a further aspect of the invention, a kind of readable storage medium storing program for executing for the instruction that has program stored therein is provided, when the program When instruction is read and is executed by computing device so that the computing device executes the structure of literal type detection model as described above Method and/or literal type detection method.

According to the technique and scheme of the present invention, during model training, block letter and handwritten form are largely carried by acquiring The textual image of word carries out it rectangular expansion processing, and to print hand writing region therein and handwritten text region Be input in convolutional neural networks after manually marking and learnt, obtains literal type detection model.Rectangular expansion processing Can effectively reduce in following model training process makes model training effect become since tab area is too small, size is irregular Difference.Artificial mark can enable following model training identify the text of single row in this way according to the mark line by line of horizontal direction Block domain, avoids the result roughness of model whole detection, and improves the fine granularity and precision of detection.

Can be multiple subgraphs by its actual size cutting for original image to be identified during model use, and The print hand writing region in each subgraph and handwritten text region are detected respectively.Finally, by the print hand writing of each subgraph Region and handwritten text region merge, you can obtain print hand writing region and the handwritten text area of the original image Domain.Here, original image is cut into the detection that region detection model is more suitable for after subgraph, compared to directly in artwork it is enterprising Row identification, can improve the fine granularity and precision of identification.And after being merged to all subgraph results, it more actual can obtain The region fragment formed in factor graph detection is reduced, to obtain being more in line in artwork in block letter and handwritten text region The region of word distribution.

Description of the drawings

To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can put into practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical reference numeral generally refers to identical Component or element.

Fig. 1 shows the structure diagram of computing device 100 according to an embodiment of the invention；

Fig. 2 shows the flow charts of the construction method 200 of literal type detection model according to an embodiment of the invention；

Fig. 3 shows the flow chart of literal type detection method 300 according to an embodiment of the invention；

Fig. 4 A and Fig. 4 B respectively illustrate the sample picture for meeting model training requirement；

Fig. 4 C and 4D respectively illustrate the sample picture for not meeting model training requirement；

Fig. 5 A and Fig. 5 B are respectively illustrated carries out the rectangular schematic diagram for expanding processing by picture；

Fig. 6 shows the signal according to an embodiment of the invention being labeled line by line to respectively asking the character area of one's own profession Figure；

Fig. 7 shows the structural schematic diagram of convolutional neural networks according to an embodiment of the invention；

Fig. 8 shows the signal according to an embodiment of the invention by the adaptive cutting of original image for multiple subgraphs Figure；And

Fig. 9 shows the schematic diagram of substrate coordinate system transformation according to an embodiment of the invention.

Specific implementation mode

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, computing device 100, which typically comprises, is System memory 106 and one or more processor 104.Memory bus 108 can be used for storing in processor 104 and system Communication between device 106.

Depending on desired configuration, processor 104 can be any kind of processing, including but not limited to：Microprocessor (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include such as The cache of one or more rank of on-chip cache 110 and second level cache 112 etc, processor core 114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.

Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to：Easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System stores Device 106 may include operating system 120, one or more apply 122 and program data 124.In some embodiments, It may be arranged to be operated using program data 124 on an operating system using 122.Program data 124 includes instruction, in root In computing device 100 according to the present invention, program data 124 includes the construction method 200 for executing literal type detection model And/or the instruction of literal type detection method 300.

Computing device 100 can also include contributing to from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as contribute to via One or more port A/V 152 is communicated with the various external equipments of such as display or loud speaker etc.Outside example If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, contributes to Via one or more port I/O 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, touch Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one The communication that other a or multiple computing devices 162 pass through network communication link.

Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or the computer-readable instruction in the modulated data signal of other transmission mechanisms etc, data structure, program module, and can To include any information delivery media." modulated data signal " can such signal, one in its data set or more It is a or it change can the mode of coding information in the signal carry out.As unrestricted example, communication media can be with Include the wire medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, infrared (IR) the various wireless mediums or including other wireless mediums.Term computer-readable medium used herein may include depositing Both storage media and communication media.

Computing device 100 can be implemented as server, such as file server, database server, application program service Device and WEB server etc. can also be embodied as a part for portable (or mobile) electronic equipment of small size, these electronic equipments Can be such as cellular phone, personal digital assistant (PDA), personal media player device, wireless network browsing apparatus, individual Helmet, application specific equipment or may include any of the above function mixing apparatus.Computing device 100 can also be real It includes desktop computer and the personal computer of notebook computer configuration to be now.In some embodiments, 100 quilt of computing device It is configured to execute the construction method 200 and/or literal type detection method 300 of literal type detection model according to the present invention.

Fig. 2 shows the construction methods 200 of literal type detection model according to an embodiment of the invention, can count It calculates and is executed in equipment, such as executed in computing device 100.As shown in Fig. 2, this method starts from step S220.

In step S220, training picture is acquired, wherein every trained picture includes print hand writing and handwritten text At least one of.

For specific application scenarios, the word comprising block letter and/or handwritten form met under the scene can be collected Picture, it shall be noted that the word line number in picture should not be excessive overstocked, in order to reduce the cost of labor of subsequent artefacts' mark.Figure 4A and Fig. 4 B respectively illustrate the sample picture for meeting model training requirement, and text line number and spacing are more appropriate；Fig. 4 C and 4D Respectively illustrate the sample picture for not meeting model training requirement；Its text line number is excessively also overstocked.

Then, in step S240, each trained picture is extended for a square chart according to the long width values of each trained picture Piece.

The training picture usually acquired not necessarily meets the training requirement of subsequent detection model, it is therefore desirable to each Picture carries out rectangular expansion processing, can reduce since tab area is too small in following model training process in this way, size is not advised Then the problem of, makes model training effect be deteriorated.Rectangular expansion can (such as a length of w wide be according to the original size size of picture H), the image that one background of value framework larger in w and h is white is chosen, and picture is placed on to the center of white image, in this way Original image is just extended for the rectangular picture of a w*w or h*h.Fig. 5 A and Fig. 5 B respectively illustrate two kinds of rectangular processing and show Example, the picture width w in wherein Fig. 5 A are more than height h, therefore picture are extended for according to width value w rectangular；And in Fig. 5 B Picture width w is less than height h, therefore picture is extended for according to height value h rectangular.Certainly, if picture itself is exactly side Shape picture does not have to then carry out rectangular expansion again.

Then, in step S260, obtain to the print hand writing region of each rectangular picture and handwritten text region into Result after rower note.

Wherein, the operation that the print hand writing region and handwritten text region of square shaped picture are labeled includes：Really Each line of text in the fixed rectangular picture and the character area in each line of text；Line by line to the character area type of each line of text It is labeled, character area type includes print hand writing region and handwritten text region；By each word in each line of text The coordinate information in region and its affiliated word classification are preserved.The coordinate information of character area generally includes character area Top left corner apex coordinate and lower right corner apex coordinate, naturally it is also possible to other coordinate representation methods are chosen, as lower-left angular vertex is sat Mark and upper right corner apex coordinate or the long width values of top left corner apex coordinate and region, as long as a literal field can be represented accurately The regional location in domain, this is not limited by the present invention.In addition, it will be appreciated that the identification of character area may be used existing Some arbitrary region recognition methods such as use OCR recognition methods, the invention is not limited in this regard.

Fig. 6 shows the signal according to an embodiment of the invention being labeled line by line to respectively asking the character area of one's own profession Figure, 4 line of text are print hand writing, and preceding 3 line of text are respectively there are one character area, and there are four texts in the 4th line of text Block domain.This mask method line by line enables following model training to identify the character area of single row, and it is whole to avoid model The result that physical examination is surveyed is coarse, can improve the fine granularity and precision of detection.

Then, in step S280, convolutional neural networks is trained according to each trained picture and its annotation results, are obtained To literal type detection model.

The present invention carries out model training according to the picture set of the existing mark of certain scale, specifically, using rectangular place The markup information of picture set and every figure after reason is carried out using the detection model of improved fast area convolutional neural networks Training.The detection model that training pattern is based on fast area convolutional neural networks (ZF networks) is transformed.For convolution god Structure through network and each layer content, those skilled in the art can sets itself as needed, the invention is not limited in this regard.

According to an embodiment of the present invention, which includes that 6 layers of convolutional layer and 2 layers of full articulamentum, Fig. 7 are shown The structural schematic diagrams of the convolutional neural networks.In view of the dimension of picture of deep neural network input needs fixation (different Picture will cut out specified size), w*w the or h*h original images of input are cut into system by the present invention by multiple dimensioned processing One size is such as cut into 224*224 sizes, ensures that model can support multiple dimensioned image to input in this way.In addition, intermediate volume Lamination can increase the convolution kernel of sizes, appropriate after convolutional layer to adopt such as 3x3 convolution kernels, 5x5 convolution kernels and 7x7 convolution kernels It is set as 3 with the class number of parameter drop policy, last output layer, including three block letter, handwritten form and background classifications.Its In, the plain white background for referring at background, pixel value is RGB (255,255,255), not to original in neural computing Picture region generates interference or influences.Certainly, it about each layer structure in the convolutional neural networks, can also be arranged as required to For other numerical value, the present invention is limited this.

As shown in fig. 7, the convolutional neural networks contain 12 layer network structures, wherein each layer of code name is Input Layer (input data layer), conv (convolutional layer), pool (pond layer), full articulamentum (fc), output layer (output).In Fig. 7 Full articulamentum and pond layer be together, such as conv2+pool2, conv3+pool3, conv5+pool5, to have plenty of individually Convolutional layer there is no pond layer, such as conv1, conv4, conv6.It is, the complete structure of the convolutional neural networks is：Input The+the second pond of the convolutional layer layer of layer → first convolutional layer → second → third convolutional layer+third pond layer → Volume Four lamination → the Full articulamentum → the output layer of full articulamentum → the second of convolutional layer → the first of five the+the five pond layers of convolutional layer → the 6th, each layer Parameter is as shown in the table：

In addition, the mode that cross validation may be used in the training process carries out model selection：By entire picture set point For three training set, verification set and test set parts, it is trained on training set picture, is damaged according in iteration cycle The reduction of function is lost to select the training pattern under the appropriate period to close the performance of test detection in verification collection, and is chosen at verification The training pattern to behave oneself best in set is as candidate optimum training model.

Fig. 3 shows literal type detection method 300 according to an embodiment of the invention, can be held in computing device Row, such as executes in computing device 100.Literal type detection model as described above, the word are stored in the computing device Type detection model is suitable for building using literal type detection model method as described above.As shown in figure 3, this method starts from step Rapid S320.

In step S320, the original image of literal type to be identified is obtained, and is multiple subgraphs by the original image cutting, Wherein each subgraph is not overlapped and connects.

As it was noted above, block letter handwritten text detection method in the prior art is higher to image request, usually want The high-definition image that Seeking Truth scanner scanning obtains.And the present invention provides a kind of literal type detection models, can effectively reduce Requirement to image definition.Therefore, original image to be identified can obtain the character image of high definition by scanner, also may be used To take pictures by mobile phone or camera acquisition and obtain image.Moreover, picture obtains not stringent environmental requirement (such as illumination, angle Degree and paper texture etc.), normal photographing Plain paper under natural lighting is only needed, to effectively increase text image The universality of identification also reduces image recognition workload and cost.

The cutting of original image can take adaptive cutting method, i.e., according to the length of original image and roomy small to original Picture carries out region division, and each region is not overlapped and connects, and each region is as a subgraph (such as the picture cutting institute in Fig. 8 Show).Usually, can limit a subgraph size be no more than 480*320 sizes, such a 1920*1280 sizes it is original Picture can be with cutting for 16-20 subgraph.It is cut into the detection that region detection model is more suitable for after subgraph, compared to directly existing It is identified in artwork, the fine granularity and precision of identification can be improved.It further, can also be first according to the long width values of original image The original image is extended for a rectangular picture, then by the rectangular picture cutting is multiple subgraphs.The rectangular expansion side of its picture Method is referring to being described above, and which is not described herein again.

Then, in step S340, using literal type detection model respectively to the print hand writing area in each subgraph Domain and handwritten text region are detected, and obtain the coordinate information of wherein each character area and its affiliated literal type. Block letter and handwritten text region detection are exactly carried out one by one to each subgraph that step S320 cuttings obtain, obtain every height The coordinate information of multiple block letter and handwritten text region in figure, and the type of each detection zone (belong to block letter Or hand-written body region).Similarly, the coordinate information of character area includes top left corner apex coordinate and the bottom right of the character area Angular vertex coordinate, but not limited to this, as long as the regional location of the character area can be indicated accurately.

Then, in step S360, different subgraphs will be belonged to respectively and the adjacent same type of character area cut into Row merge, and using in all subgraphs print hand writing regional ensemble and handwritten text regional ensemble as in the original image Print hand writing region and handwritten text region.

To in all subgraphs printing body region and hand-written body region merge respectively, more actual can be printed Brush body and handwritten text region are reduced because of the region fragment formed in subgraph detection, to obtain being more in line in artwork The region of word distribution.Include to the rule that subgraph merges：1) the regional ensemble for belonging to same type in different subgraphs Together, the region as the corresponding types of original image；2) due to the detection in each subgraph (block letter is hand-written Body) area information be the first coordinate information based on subgraph, need first coordinate information being mapped to based on original image Second coordinate information (transformation for relating to substrate coordinate system)；3) after being converted into the second coordinate information based on original image, inspection There are two surveys whether or multiple regions are adjacent cuts, and if there is overlapping, then merges these regions；4) it finally arranges and obtains original image All non-overlapping block letter and hand-written body region.

According to one embodiment of present invention, if seat in rectangular picture of the top left corner apex of original image where it Scale value is (x, y), and coordinate value of the top left corner apex of some subgraph in the rectangular picture is (x in the rectangular picture₁, y₁), it should Coordinate value of the top left corner apex of certain character area in the subgraph is (x in subgraph₂, y₂), then the character area is in the original graph Coordinate value in piece is (x₁+x₂- x, y₁+y₂-y)。

Mainly how Fig. 9 shows substrate coordinate system transfer principle schematic diagram according to an embodiment of the invention, By detected in subgraph the coordinate of character area be converted into based on rectangular expansion after original w*w or h*h pictures in coordinate.Such as Shown in Fig. 9, for by the rectangular picture for expanding (including white background), word picture region only accounts for the part in its center, should Top left corner apex (the i.e. left frame five-pointed star position) coordinate in region is (x, y).Since the present invention carries out block letter/handwritten form Text detection is to carry out sub- Fig. 1-4 (by 4 pieces of the picture cutting of rectangular expansion in exemplary plot, it is of course possible to be cut into it The subgraph of his number, such as 8 12 or 16 etc.), therefore the coordinate of block letter or handwritten form the style of writing word detected is also Based on subgraph, i.e. the first coordinate information.For example, in subgraph 2 rectangle frame handwriting region, top left corner apex coordinate is (x₂, y₂), this coordinate value is the vertex (i.e. upper side frame five-pointed star position in figure) relative to subgraph 2, and the target of the present invention It is by coordinate (x₂, y₂) be converted to coordinate value (x relative to the original image vertex (x, y) in rectangular picture₂＇, y₂＇), i.e. phase For second coordinate information on original image vertex.By calculating it is found that x₂＇=x₁+x₂- x, y₂＇=y₁+y₂-y。

According to another embodiment of the invention, according to the second coordinate information relative to original image of each character area Afterwards, you can detect whether there are two or multiple regions are adjacent cuts.Here, adjacent cut through refers to different subgraph edges and has printing Body or hand-written body region are adjacent, the case where being isolated by different subgraphs primarily directed to same character area.For this The word isolated needs to merge it to obtain complete a line word.It generally can be according to two character areas Top left corner apex coordinate and lower right corner apex coordinate value to determine whether adjacent cut, it is adjacent would generally be there are one abscissa value when cutting Or ordinate value is identical.As the rectangle frame of subgraph 1 and subgraph 3 in Fig. 9 be it is adjacent cut, they are one in original image Whole region, it is therefore desirable to be merged.

Specifically, the character area of the adjacent same type cut can be merged according to following method：It obtains respectively each The first coordinate information of print hand writing region and handwritten text region in subgraph in corresponding subgraph, and this first is sat Mark information is converted to the second coordinate information based on original image；It has been detected whether according to the second coordinate information of each character area It is two or more to belong to that same type of character area is adjacent to be cut, if so, then merge these it is adjacent cut region, it is original to obtain All print hand writing regions in picture and handwritten text region.Here merging can refer to taking two or more words The maximum union refion in region.

According to the technique and scheme of the present invention, after carrying out rectangular expansion processing to each picture, it is possible to reduce following model Since tab area is too small in training process, the irregular problem of size makes model training effect be deteriorated.To training picture into The mark line by line of row horizontal direction so that following model training can identify the character area of single row, avoid model entirety The result of detection is coarse, can improve the fine granularity and precision of detection.For the image data collection feature in the present invention, network is changed As a result, carrying out model training using based on improved fast area convolutional neural networks so that model performance higher.It is cut into son It is more suitable for the detection of region detection model after figure, compared to being directly identified in artwork, the fine granularity of identification can be improved And precision.To block letter and handwritten text region more actual can be obtained after being merged in subgraph, reduce because of son The region fragment formed in figure detection, to obtain being more in line with the region that word is distributed in artwork.

B9, the method as described in any one of B6-B8, wherein the coordinate information of character area includes the character area Top left corner apex coordinate and lower right corner apex coordinate.

B10, the method as described in B7, wherein if in rectangular picture of the top left corner apex of original image where it Coordinate value is (x, y), and coordinate value of the top left corner apex of some subgraph in the rectangular picture is (x in the rectangular picture₁, y₁), Coordinate value of the top left corner apex of certain character area in the subgraph is (x in the subgraph₂, y₂), then the character area is original at this Coordinate value in picture is (x₁+x₂- x, y₁+y₂-y)。

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice without these specific details.In some instances, well known method, knot is not been shown in detail Structure and technology, so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：It is i.e. required to protect Shield the present invention claims the feature more features than being expressly recited in each claim.More precisely, as following As claims reflect, inventive aspect is all features less than single embodiment disclosed above.Therefore, it abides by Thus the claims for following specific implementation mode are expressly incorporated in the specific implementation mode, wherein each claim itself As a separate embodiment of the present invention.

Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or be segmented into addition multiple Submodule.

Those skilled in the art, which are appreciated that, to carry out adaptively the module in the equipment in embodiment Change and they are arranged in the one or more equipment different from the embodiment.It can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it may be used any Combination is disclosed to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so to appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit requires, abstract and attached drawing) disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.

Various technologies described herein are realized together in combination with hardware or software or combination thereof.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take embedded tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other arbitrary machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to put into practice this hair Bright equipment.

In the case where program code executes on programmable computers, computing device generally comprises processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is configured for storage program code；Processor is configured for according to the memory Instruction in the said program code of middle storage executes the construction method and/or word of the literal type detection model of the present invention Type detection method.

In addition, be described as herein can be by the processor of computer system or by executing for some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, device embodiment Element described in this is the example of following device：The device is used to implement performed by the element by the purpose in order to implement the invention Function.

As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects are merely representative of the different instances for being related to similar object, and are not intended to imply that the object being described in this way must Must have the time it is upper, spatially, in terms of sequence or given sequence in any other manner.

Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that The language that is used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, to this skill Many modifications and changes will be apparent from for the those of ordinary skill in art field.For the scope of the present invention, to this hair Bright done disclosure is illustrative and not restrictive, and it is intended that the scope of the present invention be defined by the claims appended hereto.

Claims

1. a kind of construction method of literal type detection model, suitable for being executed in computing device, this method includes：

Acquisition training picture, wherein every trained picture includes at least one of print hand writing and handwritten text；

Each trained picture is extended for a rectangular picture according to the long width values of each trained picture；

Obtain the result after being labeled to the print hand writing region of each rectangular picture and handwritten text region；And

Convolutional neural networks are trained according to each trained picture and its annotation results, obtain the literal type detection mould Type.

2. the method for claim 1, wherein the convolutional neural networks include 6 layers of convolutional layer and 2 layers of full articulamentum.

3. method as claimed in claim 2, wherein the convolution kernel of intermediate convolutional layer includes 3*3 in the convolutional neural networks Convolution kernel, 5*5 convolution kernels and 7*7 convolution kernels, last output layer include print hand writing region, hand-written body region and background area 3 kinds of domain classification.

4. the method for claim 1, wherein the print hand writing region and handwritten text region of square shaped picture into Rower note operation include：

Determine each line of text in the rectangular picture and the character area in each line of text；

The character area type of each line of text is labeled line by line, the character area type include print hand writing region and Handwritten text region；And

The coordinate information of each character area in each line of text and its affiliated word classification are preserved.

5. training picture is the method for claim 1, wherein extended for a rectangular picture according to the long width values of picture The step of include：

Higher value framework one during selection is long and wide is white background picture, and the training picture is placed on white background picture Center.

6. a kind of literal type detection method is stored with literal type suitable for being executed in computing device in the computing device Detection model, the literal type detection model is suitable for using the method structure as described in any one of claim 1-5, described Literal type detection method includes：

The original image of literal type to be identified is obtained, and is multiple subgraphs by the original image cutting, wherein each subgraph does not weigh Folded and connection；

Using the literal type detection model respectively in each subgraph print hand writing region and handwritten text region It is detected, obtains the coordinate information of wherein each character area and its affiliated literal type；And

Different subgraphs will be belonged to respectively and the adjacent same type of character area cut merges, and will be in all subgraphs Print hand writing regional ensemble and handwritten text regional ensemble are as print hand writing region in the original image and hand-written Body character area.

7. method as claimed in claim 6, wherein it is described will belong to respectively different subgraphs and it is adjacent cut it is same type of The step of character area merges include：

First coordinate letter of the print hand writing region and handwritten text region in each subgraph in corresponding subgraph is obtained respectively Breath, and first coordinate information is converted to the second coordinate information based on original image；

It is detected whether there are two according to the second coordinate information of each character area or multiple belongs to same type of character area phase Face and cut, if so, then merge these it is adjacent cut region, to obtain all print hand writing regions and the handwritten form in original image Character area.

8. method as claimed in claim 6, wherein include for the step of multiple subgraphs by original image cutting：

The original image is extended for a rectangular picture according to the long width values of original image, and is more by the rectangular picture cutting A subgraph.

9. a kind of computing device, including：

One or more processors；

Memory；And

One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include for executing the method according to claim 1-8 In either method instruction.

10. a kind of computer readable storage medium of the one or more programs of storage, one or more of programs include instruction, Described instruction is when executed by a computing apparatus so that the computing device executes in the method according to claim 1-8 Either method.