CN110533079A

CN110533079A - Form method, apparatus, medium and the electronic equipment of image pattern

Info

Publication number: CN110533079A
Application number: CN201910717086.2A
Authority: CN
Inventors: 李壮
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2019-12-03
Anticipated expiration: 2039-08-05
Also published as: CN110533079B

Abstract

Disclose a kind of method, apparatus, medium and electronic equipment for forming image pattern.The method therein for forming image pattern includes: to obtain the first image pattern, wherein the first image sample is provided with an at least text marking information；The first image sample is supplied to text box detection model, text box detection processing is carried out to the first image sample via the text box detection model, obtains the text box location information detected；Determine text marking information corresponding with the text box location information detected；According to the text box location information and the corresponding text marking information, new text marking information is set for the first image pattern, forms the second image pattern.The disclosure while rich image sample, is conducive to the identification accuracy for improving content of text identification model in the setting efficiency for the text marking information for being conducive to improve image pattern.

Description

Form method, apparatus, medium and the electronic equipment of image pattern

Technical field

This disclosure relates to form computer vision technique, especially a kind of method for forming image pattern forms image sample This device, storage medium and electronic equipment.

Background technique

OCR (Optical Character Recognition, optical character identification) technology is that one kind can recognize that paper The technology of character (such as text and symbol) on part.

Currently, some OCR techniques are realized by way of deep learning.Specifically, first images to be recognized is provided To the text box detection model based on deep learning, text box inspection is carried out by images to be recognized of the text box detection model to input Survey processing, obtain images to be recognized in text box location information, then, according to text box location information to images to be recognized into Row shear treatment obtains images to be recognized block, images to be recognized block is supplied to content of text identification model, so as to basis The output information of content of text identification model obtains the content of text in images to be recognized block.

The recognition accuracy for how improving content of text identification model is merit attention the technical issues of.

Summary of the invention

In order to solve the above-mentioned technical problem, the disclosure is proposed.Embodiment of the disclosure provides a kind of formation image sample This method, device, storage medium and the electronic equipment for forming image pattern.

According to the one aspect of the embodiment of the present disclosure, a kind of method for forming image pattern is provided, comprising: obtain the first figure Decent, wherein the first image sample is provided with an at least text marking information；The first image sample is supplied to Text box detection model carries out text box detection processing to the first image sample via the text box detection model, obtains The text box location information that must be detected；Determine text marking letter corresponding with the text box location information detected Breath；According to the text box location information and the corresponding text marking information, new for the setting of the first image pattern Text marking information forms the second image pattern.

It is described that the first image sample is supplied to text box detection model in one embodiment of the disclosure, via The text box detection model carries out text box detection processing to the first image sample, obtains the text box position detected Information, comprising: the first image sample is supplied to multiple text box detection models based on different detection algorithms, via institute It states multiple text box detection models and text box detection processing is carried out to the first image sample respectively, obtain multiple text box inspections Survey the text box location information that model respectively detects.

In the another embodiment of the disclosure, the multiple text box detection model corresponding super ginseng in the training process Number, has differences.

In disclosure a further embodiment, determination text corresponding with the text box location information detected This markup information, comprising: according in each text marking information of the first image sample text box position markup information, with And the text box location information detected, determine the text box field overlay information；According to the overlay information and Preset condition determines text marking information corresponding with the text box location information detected.

It is described according to the text box location information and the corresponding text in disclosure a further embodiment Markup information is arranged new text marking information for the first image pattern, forms the second image pattern, comprising: by the text Content of text markup information in frame location information and the corresponding text marking information, as the first image sample New text marking information, formed the second image pattern.

In disclosure a further embodiment, the method also includes: second image pattern is utilized, is treated trained Content of text identification model is trained processing.

It is described to utilize second image pattern in disclosure a further embodiment, it treats trained content of text and knows Other model is trained processing, comprising: according to the text marking information of the first image pattern and based on the more of different detection algorithms The text marking information for the second image pattern that a text box detection model obtains respectively, from the first image pattern and the second image Be cut out in sample include content of text image block sample；According to default mixed proportion, obtain from the first image sample The image block sample of this image block sample and the second image pattern from the different detection algorithms of correspondence；By the acquisition Image block sample is supplied to content of text identification model to be trained, via the content of text identification model pair to be trained Each image block sample carries out content of text identifying processing, obtains inside the multiple texts identified；According to it is described identify it is more The difference of content of text markup information in a content of text and the text marking information adjusts in the text to be trained Hold the model parameter of identification model.

According to the other side of the embodiment of the present disclosure, a kind of device for forming image pattern is provided, described device includes: Module is obtained, for obtaining the first image pattern, wherein the first image sample is provided with an at least text marking information； Detection module, the first image pattern for obtaining the acquisition module is supplied to text box detection model, via the text This frame detection model carries out text box detection processing to the first image sample, obtains the text box location information detected； Determining module, for determining text marking information corresponding with the text box location information that the detection module detects；If Module is set, the text box location information and the determining module for detecting according to the detection module are determined opposite The text marking information answered is arranged new text marking information for the first image pattern, forms the second image pattern.

In one embodiment of the disclosure, the detection module is further used for: the first image sample is supplied to Multiple text box detection models based on different detection algorithms, via the multiple text box detection model respectively to described first Image pattern carries out text box detection processing, obtains the text box location information that multiple text box detection models respectively detect.

In disclosure a further embodiment, the determining module includes: the first submodule, for according to first figure Text box position markup information and the text box location information detected in decent each text marking information, Determine the text box field overlay information；Second submodule, the overlay information for being determined according to first submodule And preset condition, determine text marking information corresponding with the text box location information detected.

In disclosure a further embodiment, the setup module is further used for: by the text box location information and Content of text markup information in the corresponding text marking information, the new text mark as the first image sample Information is infused, the second image pattern is formed.

In disclosure a further embodiment, described device further include: training module, for utilizing the second image sample This, treats trained content of text identification model and is trained processing.

In disclosure a further embodiment, the training module includes: third submodule, for according to the first image sample The second image pattern that this text marking information and multiple text box detection models based on different detection algorithms obtain respectively Text marking information, be cut out from the first image pattern and the second image pattern include content of text image block sample This；4th submodule, for according to preset mixed proportion, obtain from the first image pattern image block sample and from The image block sample of second image pattern of corresponding different detection algorithms；5th submodule, for by the image block of the acquisition Sample is supplied to content of text identification model to be trained, via the content of text identification model to be trained to each image Block sample carries out content of text identifying processing, obtains inside the multiple texts identified；6th submodule, for according to the knowledge Not Chu multiple content of text and the content of text markup information in the text marking information difference, adjustment is described wait train Content of text identification model model parameter.

According to the embodiment of the present disclosure in another aspect, provide a kind of computer readable storage medium, the storage medium It is stored with computer program, the method that the computer program is used to execute above-mentioned formation image pattern.

According to the another aspect of the embodiment of the present disclosure, a kind of electronic equipment is provided, which includes: processor； For storing the memory of the processor-executable instruction；The processor, being used for can described in reading from the memory The method for executing instruction, and executing described instruction to realize above-mentioned formation image pattern.

Based on a kind of disclosure method and apparatus for forming image pattern provided by the above embodiment, by utilizing text box Detection model carries out text box detection processing to the first image pattern, can obtain the text box location information detected, in this way, The disclosure can form new text with the text box location information detected by the text marking information of the first image pattern Markup information, so that the disclosure can easily form new image pattern.Since the text box position of the image pattern marks Information be it is detected by text box detection model, therefore, using such image pattern to content of text identification model When being trained, the content of text identification model after successfully training can be made more to be bonded with practical application scene.It follows that this Setting efficiency of the technical solution in the text marking information for being conducive to improve image pattern of offer is disclosed, rich image sample Meanwhile being conducive to improve the identification accuracy of content of text identification model.

Below by drawings and examples, the technical solution of the disclosure is described in further detail.

Detailed description of the invention

The attached drawing for constituting part of specification describes embodiment of the disclosure, and together with description for explaining The principle of the disclosure.

The disclosure can be more clearly understood according to following detailed description referring to attached drawing, in which:

Fig. 1 is the schematic diagram of one embodiment for being applicable in scene of the disclosure；

Fig. 2 is the flow chart of method one embodiment of the formation image pattern of the disclosure；

Fig. 3 is the first area of the disclosure and the schematic diagram of second area one embodiment；

Fig. 4 is the schematic diagram of transmitting content of text markup information one embodiment of the disclosure；

Fig. 5 is the flow chart that the disclosure treats one embodiment that trained content of text identification model is trained；

Fig. 6 is the flow chart that the disclosure treats another embodiment that trained content of text identification model is trained；

Fig. 7 is the structural schematic diagram of device one embodiment of the formation image pattern of the disclosure；

Fig. 8 is the structure chart for the electronic equipment that one exemplary embodiment of the disclosure provides.

Specific embodiment

It describes in detail below with reference to the accompanying drawings according to an example embodiment of the present disclosure.Obviously, described embodiment is only It is only a part of this disclosure embodiment, rather than the whole embodiments of the disclosure, it should be appreciated that the disclosure is not by described herein The limitation of example embodiment.

It should also be noted that unless specifically stated otherwise, the opposite cloth of the component and step that otherwise illustrate in these embodiments It sets, numerical expression and the unlimited the scope of the present disclosure processed of numerical value.

It will be understood by those skilled in the art that the terms such as " first ", " second " in the embodiment of the present disclosure are only used for distinguishing Different step, equipment or module etc., neither represent any particular technology meaning, also do not indicate that the inevitable logic between them is suitable Sequence.

It should also be understood that in the embodiments of the present disclosure, " multiple " can refer to two or more, and "at least one" can be with Refer to one, two or more.

It should also be understood that for the either component, data or the structure that are referred in the embodiment of the present disclosure, clearly limit no or Person may be generally understood to one or more in the case where context provides opposite enlightenment.

In addition, term "and/or" in the disclosure, is only a kind of incidence relation for describing affiliated partner, indicates may exist Three kinds of relationships can be indicated such as A and/or B: individualism A, exist simultaneously A and B, these three situations of individualism B.In addition, Character "/" in the disclosure typicallys represent the relationship that forward-backward correlation object is a kind of "or".

It should also be understood that the disclosure highlights the difference between each embodiment to the description of each embodiment, Same or similar place can be referred to mutually, for sake of simplicity, no longer repeating one by one.

Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the disclosure And its application or any restrictions used.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.

Embodiment of the disclosure can be applied to the electronic equipments such as terminal device, computer system, server, can be with crowd Mostly other general or special-purpose computing system environment or configuration operate together.Suitable for terminal device, computer system or clothes The example of well-known terminal device, computing system, environment and/or configuration that the business electronic equipments such as device are used together include but Be not limited to: personal computer system, thin client, thick client computer, hand-held or laptop devices, is based on server computer system The system of microprocessor, set-top box, programmable consumer electronics, NetPC Network PC, minicomputer system, mass computing Machine system and the distributed cloud computing technology environment including above-mentioned any system etc..

The electronic equipments such as terminal device, computer system, server can be in the department of computer science executed by computer system It is described under the general context of system executable instruction (such as program module).In general, program module may include routine, program, mesh Beacon course sequence, component, logic, data structure etc., they execute specific task or realize specific abstract data type.Meter Calculation machine systems/servers can be implemented in distributed cloud computing environment.In distributed cloud computing environment, task can be by What the remote processing devices being linked through a communication network executed.In distributed cloud computing environment, program module can be located at packet On the Local or Remote computing system storage medium for including storage equipment.

The disclosure is summarized

In implementing the present disclosure, it inventors have found that being trained to content of text identification model, generally requires big The image pattern of amount.If be labeled by the way of manually marking to image pattern, it is generally necessary to expend higher people Power cost and time cost.And if using scaling, position translation, setting contrast, color for existing image pattern It adjusts and increases the modes such as noise to expand image pattern, then since the image pattern of expansion is often deposited with true image In distributional difference, therefore, content of text identification model is trained using such image pattern, may be unfavorable for guaranteeing text The identification accuracy of this content recognition model in practical applications.

In addition, in practical applications, being supplied to the image block of content of text identification model, often detected using text box Model detected text box location information, the image block being cut out from images to be recognized from images to be recognized are being adopted Content of text identification model is trained with the image pattern that above-mentioned expansion image pattern mode is formed, is equivalent to and has obstructed text Being connected between this frame detection model and content of text identification model, to be unfavorable for guaranteeing content of text identification model in reality Identification accuracy in.

The disclosure can be obtained by carrying out text box detection processing to the first image pattern using text box detection model The text box location information detected, by utilizing text box location information and text corresponding with text frame location information This markup information is arranged new text marking information for the first image pattern, forms the second image pattern, make the first image pattern Text marking information in partial content can be for delivery in the text marking information of the second image pattern, thus not only can be with Efficiently form new image pattern, rich image sample, moreover, because the text marking information of the new image pattern formed In text box location information be that text box detection model is detected, therefore, the disclosure can make content of text identify mould Type in the training process, is maintained with being connected for text box detection model, to advantageously ensure that content of text identification model Identification accuracy in practical applications.

Exemplary overview

The technology for the formation image pattern that the disclosure provides, is usually applied to be trained content of text identification model In application scenarios.It is illustrated below with reference to an application scenarios of the Fig. 1 to the technology of the formation image pattern of the disclosure.

It include multiple images sample in image pattern collection 100 in Fig. 1, for example, image pattern collection 100 includes image pattern 1, image pattern 2 ... and image pattern N.At least partly image pattern in the image pattern collection 100 is using the disclosure Driving image pattern technology formed.Each image pattern is provided with text marking information, and text markup information is usual It include: text box position markup information and content of text markup information.

Multiple images sample is obtained from image pattern collection 100, is marked according to the text box position of each image pattern of acquisition Information is infused, shear treatment is carried out respectively to each image pattern, so that multiple images block sample is obtained, such as image block sample 1, image Block sample 2 ... and image block template M.According to the content of text markup information of image pattern, each figure can be determined As the content of text markup information of block sample.

According to preset batch processing size, a certain number of image blocks are obtained from the above-mentioned multiple images block sample obtained Sample, and the image block sample that will acquire is supplied to content of text identification model 101 to be trained respectively as input.Through Content of text identifying processing is carried out by each image block sample of 101 pairs of content of text identification model inputs to be trained respectively, from And content of text identification model 101 to be trained can be obtained and be directed to the content of text that each image block specimen discerning goes out.According to Trained content of text identification model 101 is directed to the content of text and each image block sample that each image block sample identifies respectively Content of text markup information, adjust the model parameter of content of text identification model 101 to be trained, model parameter such as convolution kernel Weight and/or matrix weight etc..

Content of text identification model 101 using above-mentioned training process, after the training that may finally succeed.

Illustrative methods

Fig. 2 is the flow diagram of method one embodiment of the formation image pattern of the disclosure.

As shown in Fig. 2, the method for the embodiment is comprising steps of S200, S201, S202 and S203.Below to each step It is illustrated respectively.

S200, the first image pattern is obtained.

The first image pattern in the disclosure refers to the image pattern for training neural network, in order to be subsequently generated New image pattern distinguishes, and is referred to as the first image pattern.First image pattern can be pre-set image pattern The image pattern of concentration.Character is generally comprised in first image pattern, character here may include text and symbol etc.. Here text can be the sub and word etc. in Chinese, or the word and word of other language (such as English).In the disclosure The first image pattern can be by scan or according to equivalent way obtain image.

The first image pattern of each of disclosure is provided with an at least text marking information.Text markup information It can be and be arranged by artificial notation methods, is also possible to using other modes setting.A text mark in the disclosure Note information generally includes: text box position markup information and content of text markup information.Text box position markup information is used for Characterize position of the text box in the first image pattern.Content of text markup information is used to indicate content of text in text box, Text and symbol in such as text box.

S201, the first image pattern is supplied to text box detection model, via text box detection model to the first image Sample carries out text box detection processing, obtains the text box location information detected.

Text box detection model in the disclosure can refer to the specific location for the text box in forecast image sample Neural network.The neural network may include convolutional neural networks etc..Neural network in the disclosure may include but unlimited In: convolutional layer, Relu (Rectified Linear Unit corrects linear unit) layer (being referred to as active coating), pond layer And full articulamentum etc..The number of plies that the neural network is included is more, then network is deeper.Specific knot of the disclosure to neural network Structure is with no restriction.

Text box detection model in the disclosure, which can be, advances with the model that image pattern is trained and obtains.This It may include: to generate multiple times that text box detection model in open, which carries out text box detection processing process to the first image pattern, Frame is selected, confidence level prediction is carried out to candidate frame and recurrence processing is carried out to corresponding candidate frame according to confidence level prediction result Deng.Text box detection model in the disclosure generallys use corresponding detection algorithm and carries out text box detection processing, the disclosure pair This is with no restriction.

Text box location information in the disclosure is for indicating position of the text box in the first image pattern.Text box position Confidence breath may include: the coordinate information on any vertex in text box and the length of text box and height.Text box location information It also may include: the coordinate information of the central point of text box and the length of text box and width.The disclosure is to text box location information Specific manifestation form be not construed as limiting.

S202, determination text marking information corresponding with the text box location information detected.

Since each text marking information of the first image pattern includes text box position markup information, the disclosure It can the region according to represented by each text box position markup information of the first image pattern and the text box position that detects Text corresponding with the text box location information detected is determined in region represented by information from each text marking information Markup information.That is, the text box location information detected can indicate first area, and each text of the first image pattern This frame position markup information can respectively indicate a second area, and the disclosure can be according in advance for first area and second The condition of region setting, determines text marking information corresponding with the text box location information detected.

S203, according to text box location information and above-mentioned corresponding text marking information, set for the first image pattern New text marking information is set, the second image pattern is formed.

The second image pattern in the disclosure can be equally used for training neural network.Second image pattern and the first image The picture material of sample is usually identical, and only text marking information is not fully identical.The text marking of second image pattern is believed Text box location information in breath is to carry out the acquisition of text box detection processing to the first image pattern using text box detection model , and other markup informations in the text marking information of the second image pattern are the text marking information from the first image pattern What middle succession obtained.That is, the part markup information in the text marking information of the first image pattern is passed to second In the text marking information of image pattern.

The disclosure can be obtained by carrying out text box detection processing to the first image pattern using text box detection model The text box location information detected, by utilizing text box location information and text corresponding with text frame location information This markup information is arranged new text marking information for the first image pattern, forms the second image pattern, make the first image pattern Text marking information in partial content can be passed in the text marking information of the second image pattern, thus the disclosure New image pattern not only can be easily formed, rich image sample improves the setting efficiency of text marking information, avoids people Work mark bring human cost and time cost consume larger phenomenon；Moreover, because the text of the new image pattern formed Text box location information in this markup information is that text box detection model is detected, therefore, new using what is formed In the case that image pattern is trained content identification model, the disclosure can make the training process of content of text identification model Mutually it is connected with the detection process of text box detection model, i.e. the linking of the two is maintained, to be conducive to improve content of text The identification accuracy of identification model in practical applications.

In an optional example, the text box detection model in the disclosure can be multiple, multiple text boxes detection moulds Type is typically based on different text box detection algorithms, it can also be expected that multiple text box detection models carry out the inspection of text box position The implementation of survey is different.For example, multiple text detection models include but is not limited to: CTPN (Connectionist Text Proposal Network, continuous text suggest neural network), EAST (An Efficient and Accurate Scene Text Detector, efficiently and accurately segment text detector) or text detection algorithm oriented based on SegLink scene Text box detection model etc..

Optionally, the first image pattern that training data is concentrated can be respectively supplied to multiple text boxes detections by the disclosure Model carries out text box detection processing to the first image pattern of input respectively via multiple text box detection models, so as to The text box location information that the first image pattern detects respectively is directed to obtain multiple text box detection models.

The disclosure, which passes through, carries out text box detection processing to the first image pattern respectively using multiple text box detection models, Since the testing result of different text box detection models output usually has differences, the disclosure utilizes different detection knots Fruit can be generated multiple with different texts by the transmitting of the partial content in the text marking information of the first image pattern Second image pattern of markup information.In this way, not only contributing to improve the formation efficiency of image pattern, rich image sample；And And since the text box location information in the text marking information of the new image pattern of formation is a variety of text box detection models Detected, therefore, in the case where being trained using new image pattern to content identification model, the disclosure can make The training process of content of text identification model is connected respectively with the detection process of a variety of text box detection models, to further have Conducive to the identification accuracy of raising content of text identification model in practical applications.

In an optional example, multiple text box detection models in the disclosure in the training process used in super ginseng Number, has differences, i.e., used hyper parameter is not exactly the same in the training process for multiple text box detection models.In the disclosure Hyper parameter include but is not limited to: batch processing quantity (batch_size), pixel threshold (pixel_threshold), boundary picture Plain threshold value (side_vertex_pixel_threshold) and end to end pixel discrimination threshold (trunc_threshold).

Optionally, multiple text box detection models in the disclosure in the training process, the number of used image pattern Amount can not be identical.For example, some text box detection models are trained using full dose image pattern, and some text boxes detect Model is trained using non-full dose image pattern.For another example space of the different text box detection models to input picture sample The requirement of the size of resolution ratio is not identical.

The disclosure is conducive to enhance by making multiple text box detection models use different hyper parameters in the training process The randomness of the training process of text box detection model, to be conducive to enhance the text box detection of text box detection model output As a result diversity, and then be conducive to the diversity for the second image pattern that enhancing is formed.

In an optional example, the disclosure determines text marking letter corresponding with the text box location information detected The process of breath may include following two steps:

Step 1, according in each text marking information of the first image pattern text box position markup information and detection The text box location information arrived, determines text box field overlay information.

Optionally, the text box field overlay information in the disclosure, which refers to, can reflect out: whether two text boxes correspond to The information of the same content of text in first image pattern.Text box field overlay information in the disclosure may include but not It is limited to: the friendship of two text boxes and than (IoU).

It is assumed that testing result of the disclosure according to a text box detection model for the first image pattern of input is examined N1 (N1 is greater than zero integer) a text box location information measured, corresponding first image of each text box location information A region in sample, it is following to be known as first area.I.e. the disclosure can be according to a text box detection model for input The testing result of first image pattern obtains N1 first area.As " * * * * * * " frame is entered dotted line frame institute therein in Fig. 3 The region of formation is first area.

It is assumed that the first image pattern in the disclosure has N2, (N2 is the integer greater than zero, and N2 can be equal with N1, can also With unequal) a text marking information, corresponding first figure of text box position markup information in each text marking information A region in decent, it is following to be known as second area.I.e. the disclosure can be believed according to the text marking of the first image pattern Breath obtains N2 second area.As the solid box therein that enters " * * * * * * " frame in Fig. 3 is formed by region as second area.

The disclosure can be directed to each first area, calculate separately the friendship between the first area and all second areas And compare, it hands over and compares so that the disclosure can obtain N1 × N2 in total.For example, calculating the first area in Fig. 3 and second area Between friendship and ratio.The region institute of such as filling point in Fig. 4 of the intersection area between first area and second area in Fig. 3 Show.The union area between first area and second area in Fig. 3 is such as

Shown in the region of filling vertical line in Fig. 4.The friendship between first area and second area and ratio in Fig. 3 can be The ratio of the area in the region of the area and filling vertical line in the region of the filling point in Fig. 4.

Step 2, according to above-mentioned overlay information and preset condition, determination is corresponding with the text box location information detected Text marking information.

Preset condition in the disclosure can include but is not limited to: the friendship of first area and second area is simultaneously more default than being greater than Threshold value.Continuous precedent, N1 × N2 of acquisition can be handed over and be compared than respectively with preset threshold by the disclosure, so as to obtain More than preset threshold friendship and ratio, the disclosure can will be more than preset threshold friendship and than belonging to corresponding second area Text marking information, as text marking information corresponding with the text box location information detected.

The disclosure, can be with the convenient text box accurately determined and detected by utilizing overlay information and preset condition The corresponding text marking information of location information, to advantageously ensure that the part in the text marking information of the first image pattern The transmitting accuracy of content, and then advantageously ensure that the accuracy of the text marking information of the second image pattern.

In an optional example, in the disclosure according to text box location information and above-mentioned corresponding text marking New text marking information is arranged for the first image pattern in information, and the process for forming the second image pattern can be with are as follows: will utilize text Text in the detected text box location information of this frame detection model and the above-mentioned corresponding text marking information determined This content markup information forms the second image pattern as the new text marking information of the first image pattern.For example, Fig. 4 In, the friendship of first area and second area and than be more than preset threshold T in the case where, can be by the corresponding text in first area Content markup information " * * * * * * " passes to the corresponding content of text markup information of second area, i.e. text box detection model detects The text box location information and content of text markup information " * * * * * * " of acquisition are by a text mark as the second image pattern Infuse information.

The disclosure is examined by content of text markup information in the text marking information by the first image pattern and text box Survey corresponding text box location information that model inspection obtains together, as the content of text markup information of the second image pattern, It can be formed with the text marking information of the second image pattern of convenient accurately automatic setting so as to avoid passing through artificial mark Human cost and time cost brought by image pattern are conducive to the formation efficiency for improving image pattern, rich image sample.

In an optional example, the second image pattern that the disclosure generates is used to treat trained content of text identification mould Type is trained.It is as shown in Figure 5 to treat the example that trained content of text identification model is trained.

In Fig. 5, the second image pattern that the disclosure obtains can form an image pattern collection, and the second image pattern can also To be formed together an image pattern collection with the first image pattern.The first image pattern and the second image that the image pattern is concentrated Sample standard deviation is properly termed as image pattern.

S500, acquisition multiple images sample is concentrated from the image pattern, may include at least one in multiple images sample Second image pattern.Multiple images sample can also include at least one first image pattern

S501, the text box position markup information according to each image pattern of acquisition, cut each image pattern respectively Processing is cut, to obtain multiple images block sample.

The disclosure can be according to the content of text markup information of image pattern, in the text for determining each image block sample Hold markup information.

S502, according to preset batch processing size, obtained from the above-mentioned multiple images block sample obtained a certain number of Image block sample, and the image block sample that will acquire is supplied to content of text identification model to be trained respectively as input.

S503, content of text identification model to be trained carry out content of text identification to each image block sample of input respectively Processing.

S504, the recognition result according to the output of content of text identification model to be trained, obtain in text to be trained Hold identification model and is directed to the content of text that each image block specimen discerning goes out.

S505, the content of text that each image block sample identifies respectively is directed to according to content of text identification model to be trained And the content of text markup information of each image block sample, costing bio disturbance is carried out using corresponding loss function.

S506, the loss obtained according to calculating, carry out backpropagation, to adjust content of text identification model to be trained Model parameter, for example, adjusting convolution kernel weight and/or matrix weight of content of text identification model to be trained etc..

S507, judge whether to reach predetermined iterated conditional, if it is judged that then arriving to reach predetermined iterated conditional S508, if it is judged that then returning to S502 for not up to predetermined iterated conditional.

Optionally, the predetermined iterated conditional in the disclosure may include: content of text identification model to be trained for figure Difference between the content of text markup information of the content of text and image block sample that export as block sample, meets predetermined difference and wants It asks.In the case where difference meets predetermined difference requirement, this successfully trains completion to content of text identification model.In the disclosure Predetermined iterated conditional also may include: to treat trained content of text identification model to be trained, used image block sample This quantity reaches predetermined quantity requirement etc..Reach predetermined quantity requirement in the quantity of the image block sample used, but difference In the case where not meeting predetermined difference requirement, this is treated trained content of text identification model and is not trained successfully.Success The content of text identification model that training is completed can be used for content of text detection processing.

S508, this training process terminate.

The disclosure, which passes through, treats trained content of text identification mould using the image pattern collection for including the second image pattern Type is trained, since the text box location information in the text marking information of the second image pattern is the inspection of text box detection model Acquisition is surveyed, therefore, the disclosure can make the training process of content of text identification model and the detection of text box detection model Journey is mutually connected, to be conducive to improve the identification accuracy of content of text identification model in practical applications.

In an optional example, multiple second images are obtained using the text box detection model of many algorithms in the disclosure In the case where sample, the disclosure is utilizing the first image pattern and the second image pattern, treats trained content of text identification mould During type is trained, the image block sample mixed in proportion can be used.Treat trained content of text identification model Another example being trained is as shown in Figure 6.

In Fig. 6, the disclosure utilizes text box detection model the second image sample obtained of a variety of text box detection algorithms Originally multiple images sample set can be formed, a text box detects the corresponding image of detection model the second image pattern obtained Sample set.In addition, the first image pattern forms an image pattern collection.

S600, acquisition multiple images sample respectively is concentrated from each image pattern.

S601, basis concentrate the text box position markup information of each image pattern obtained from different image patterns, to each Image pattern carries out shear treatment respectively, obtains the corresponding multiple images block sample of each image pattern collection.

S602, according to preset batch processing size and default mixed proportion, respectively corresponded to from above-mentioned each image pattern collection Multiple images block sample in, all image block samples for obtaining a certain number of image block samples respectively, and will acquire point It Zuo Wei not input, be supplied to content of text identification model to be trained.

Default mixed proportion in the disclosure may is that for the image block sample from the first image pattern quantity, And the quantity of the image block sample from the second image pattern corresponding with algorithms of different, and the ratio being arranged.One example It is as follows:

It is assumed that utilizing each detection algorithm there are the text box detection model of three kinds of different detection algorithms in the disclosure The second image pattern that text box model obtains forms an image pattern collection, so as to obtain three image pattern collection, i.e., Second image pattern collection 1, the second image pattern collection 2 and third image pattern collection 3.The corresponding set of first image pattern can claim For the first image pattern collection.

The disclosure can obtain a certain number of first image patterns from the first image pattern collection, and each to what is currently obtained First image pattern carries out shear treatment, and the multiple images block sample of acquisition forms the first image block sample set.

The disclosure can obtain a certain number of second image patterns from the second image pattern collection 1, and to currently obtaining Each second image pattern carries out shear treatment, and the multiple images block sample of acquisition forms the second image block sample set.

The disclosure can obtain a certain number of second image patterns from the second image pattern collection 2, and to currently obtaining Each second image pattern carries out shear treatment, and the multiple images block sample of acquisition forms third image block sample set.

The disclosure can obtain a certain number of second image patterns from the second image pattern collection 3, and to currently obtaining Each second image pattern carries out shear treatment, and the multiple images block sample of acquisition forms the 4th image block sample set.

The disclosure can be according to the size of preset batch processing, according to the default mixed proportion of a1:a2:a3:a4 from first Image block sample set, the second image block sample set, third image block sample set, obtain in the 4th image block sample set respectively it is certain The image block sample of quantity.Wherein, the sum of a1, a2, a3 and a4 can be 1.In addition, the specific value of a1, a2, a3 and a4 can be with It is arranged according to actual needs.It, then can be with for example, the use scope of the text box detection model of a certain detection algorithm is relatively broad The value of the corresponding ratio of text frame detection model is arranged more bigger.The disclosure is not construed as limiting this.

S603, content of text identification model to be trained carry out content of text identification to each image block sample of input respectively Processing.

S604, the recognition result according to the output of content of text identification model to be trained, obtain in text to be trained Hold identification model and is directed to the content of text that each image block specimen discerning goes out.

S605, the content of text that each image block sample identifies respectively is directed to according to content of text identification model to be trained And the content of text markup information of each image block sample, costing bio disturbance is carried out using corresponding loss function.

S606, the loss obtained according to calculating, carry out backpropagation, to adjust content of text identification model to be trained Model parameter, for example, adjusting convolution kernel weight and/or matrix weight of content of text identification model to be trained etc..

S607, judge whether to reach predetermined iterated conditional, if it is judged that then arriving to reach predetermined iterated conditional S608, if it is judged that then returning to S602 for not up to predetermined iterated conditional.

Optionally, the predetermined iterated conditional in the disclosure may include: content of text identification model to be trained for figure Difference between the content of text markup information of the content of text and image block sample that export as block sample, meets predetermined difference and wants It asks.In the case where difference meets predetermined difference requirement, this successfully trains completion to content of text identification model.In the disclosure Predetermined iterated conditional also may include: to treat trained content of text identification model to be trained, used image block sample This quantity reaches predetermined quantity requirement etc..Reach predetermined quantity requirement in the quantity of the image block sample used, but difference In the case where not meeting predetermined difference requirement, this is treated trained content of text identification model and is not trained successfully.Success The content of text identification model that training is completed can be used for content of text detection processing.In addition, if according to default mixed proportion, During obtaining image block sample from above-mentioned each corresponding multiple images block sample of image pattern collection, some image sample The corresponding image block sample size of this collection is insufficient, and the disclosure can be obtained from the corresponding image block sample of other image pattern collection The image block sample of short quantity, so that the quantity of the image block sample got meets the requirement of preset batch processing size.

S608, this training process terminate.

The disclosure is further advantageous by obtaining the image block sample from different images sample according to default mixed proportion In being connected the training process of content of text identification model mutually with the detection process of text box detection model, thus further advantageous In the identification accuracy of raising content of text identification model in practical applications.

Exemplary means

Fig. 7 is the structural schematic diagram of device one embodiment of the formation image pattern of the disclosure.The device of the embodiment It can be used for realizing the above-mentioned each method embodiment of the disclosure.As shown in fig. 7, the device of the embodiment specifically includes that acquisition module 700, detection module 701, determining module 702 and setup module 703.Optionally, which can also include: training module 704。

Module 700 is obtained for obtaining the first image pattern.First image pattern therein is provided with an at least text mark Infuse information.The first image pattern of acquisition can be concentrated from image pattern by obtaining module 700.Obtain the behaviour that module 700 specifically executes It may refer to the description for being directed to S200 in above method embodiment, be no longer described in detail herein.

The first image pattern that detection module 701 is used to will acquire the acquisition of module 700 is supplied to text box detection model, passes through Text box detection processing is carried out to the first image pattern by text box detection model, obtains the text box location information detected.

Optionally, detection module 701 can be further used for for the first image pattern being supplied to based on different detection algorithms Multiple text box detection models, via multiple text box detection models respectively to the first image pattern carry out text box detection at Reason, obtains the text box location information that multiple text box detection models respectively detect.Optionally, multiple text box detection models Corresponding hyper parameter in the training process, has differences.The operation that detection module 701 specifically executes may refer to the above method It is directed to the description of S201 in embodiment, is no longer described in detail herein.

Determining module 702 is for determining text mark corresponding with the text box location information that detection module 701 detects Infuse information.

Optionally, determining module 702 may include: the first submodule and second submodule.First submodule therein is used Text box position markup information in each text marking information according to the first image pattern and the text box position detected Confidence breath, determines text box field overlay information.Second submodule therein is used for the overlapping determined according to the first submodule Information and preset condition determine text marking information corresponding with the text box location information detected.Determining module 702 The operation specifically executed may refer to the description that S202 is directed in above method embodiment, no longer be described in detail herein.

The text box location information and determining module 702 that setup module 703 is used to be detected according to detection module 701 are true The corresponding text marking information made is arranged new text marking information for the first image pattern, forms the second image sample This.

Optionally, setup module 703 can be further used for believing text box location information and corresponding text marking Content of text markup information in breath forms the second image pattern as the new text marking information of the first image pattern.If Setting the operation that module 703 specifically executes may refer to the description that S203 is directed in above method embodiment, herein no longer specifically It is bright.

Training module 704 is used to utilize the second image pattern, treats trained content of text identification model and is trained place Reason.Optionally, training module 704 may include third submodule, the 4th submodule, the 5th submodule and the 6th submodule. Third submodule therein is used for the text marking information according to the first image pattern and multiple texts based on different detection algorithms The text marking information for the second image pattern that this frame detection model obtains respectively, from the first image pattern and the second image pattern In be cut out include content of text image block sample.4th submodule therein is used to obtain according to mixed proportion is preset From the image block of the image block sample and the second image pattern from the different detection algorithms of correspondence of the first image pattern Sample.5th submodule therein is used for the image block sample that will acquire, and is supplied to content of text identification model to be trained, and passes through By content of text identification model to be trained to each image block sample carry out content of text identifying processing, obtain identify it is multiple Inside text.6th submodule therein is used for according in the text in the multiple content of text and text marking information identified The difference for holding markup information, adjusts the model parameter of content of text identification model to be trained.What training module 704 specifically executed Operation may refer to the description that Fig. 5 and Fig. 6 is directed in above method embodiment, no longer be described in detail herein.

Example electronic device

Electronic equipment according to the embodiment of the present disclosure is described below with reference to Fig. 8.Fig. 8 is shown according to the embodiment of the present disclosure Electronic equipment block diagram.As shown in figure 8, electronic equipment 81 includes one or more processors 811 and memory 812.

Processor 811 can be central processing unit (CPU) or hold with the ability and/or instruction for forming image pattern The processing unit of the other forms of row ability, and can control the other assemblies in electronic equipment 81 to execute desired function Energy.

Memory 812 may include one or more computer program products, and the computer program product may include Various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.The volatibility is deposited Reservoir, for example, may include: random access memory (RAM) and/or cache memory (cache) etc..It is described non-volatile Property memory, for example, may include: read-only memory (ROM), hard disk and flash memory etc..In the computer-readable storage medium It can store one or more computer program instructions in matter, processor 811 can run described program instruction, above to realize The method and/or other desired functions of the formation image pattern of each embodiment of the disclosure.In the meter The various contents such as input signal, signal component, noise component(s) can also be stored in calculation machine readable storage medium storing program for executing.

In one example, electronic equipment 81 can also include: input unit 813 and output device 814 etc., these groups Part passes through the interconnection of bindiny mechanism's (not shown) of bus system and/or other forms.In addition, the input equipment 813 can also wrap Include such as keyboard, mouse etc..The output device 814 can be output to the outside various information.The output equipment 814 may include Such as display, loudspeaker, printer and communication network and its remote output devices connected etc..

Certainly, to put it more simply, illustrated only in Fig. 8 it is some in component related with the disclosure in the electronic equipment 81, The component of such as bus, input/output interface etc. is omitted.In addition to this, according to concrete application situation, electronic equipment 81 is also It may include any other component appropriate.

Illustrative computer program product and computer readable storage medium

Other than the above method and equipment, embodiment of the disclosure can also be computer program product comprising meter Calculation machine program instruction, it is above-mentioned that the computer program instructions make the processor execute this specification when being run by processor According to the step in the method for the formation image pattern of the various embodiments of the disclosure described in " illustrative methods " part.

The computer program product can be write with any combination of one or more programming languages for holding The program code of row embodiment of the present disclosure operation, described program design language includes object oriented program language, such as Java, C++ etc. further include conventional procedural programming language, such as " C " language or similar programming language.Journey Sequence code can be executed fully on the user computing device, partly execute on a user device, be independent soft as one Part packet executes, part executes on a remote computing or completely in remote computing device on the user computing device for part Or it is executed on server.

In addition, embodiment of the disclosure can also be computer readable storage medium, it is stored thereon with computer program and refers to It enables, the computer program instructions make the processor execute above-mentioned " the exemplary side of this specification when being run by processor According to the step in the method for the formation image pattern of the various embodiments of the disclosure described in method " part.

The computer readable storage medium can be using any combination of one or more readable mediums.Readable medium can To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity Magnetic, the system of infrared ray or semiconductor, device or device, or any above combination.Readable storage medium storing program for executing it is more specific Example (non exhaustive enumerates) may include: electrical connection with one or more conducting wire, portable disc, hard disk, deposit at random It is access to memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable Compact disk read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

The basic principle of the disclosure is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that in the disclosure The advantages of referring to, advantage, effect etc. are only exemplary rather than limitation, must not believe that these advantages, advantage and effect etc. are this public affairs The each embodiment opened is prerequisite.In addition, detail disclosed above is merely to exemplary act on and be easy to understand Effect, rather than limit, above-mentioned details be not intended to limit the disclosure be must be realized using above-mentioned concrete details.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with its The difference of its embodiment, the same or similar part cross-reference between each embodiment.For system embodiment For, since it is substantially corresponding with embodiment of the method, so being described relatively simple, referring to the portion of embodiment of the method in place of correlation It defends oneself bright.

Device involved in the disclosure, device, equipment, system block diagram only as illustrative example and be not intended to It is required that or hint must be attached in such a way that box illustrates, arrange, configure.As those skilled in the art will appreciate that , it can be connected by any way, arrange, configure these devices, device, equipment and system.Such as " comprising ", " include, The word of " having " etc. is open vocabulary, is referred to " including but not limited to ", and can be used interchangeably with it.Word used herein above Remittance "or" and "and" refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here institute The vocabulary " such as " used refers to phrase " such as, but not limited to ", and can be used interchangeably with it.

Disclosed method and device may be achieved in many ways.For example, can by software, hardware, firmware or Software, hardware, firmware any combination realize disclosed method and device.The said sequence of the step of for the method Merely to be illustrated, the step of disclosed method, is not limited to sequence described in detail above, special unless otherwise It does not mentionlet alone bright.In addition, in some embodiments, also the disclosure can be embodied as to record program in the recording medium, these programs Including for realizing according to the machine readable instructions of disclosed method.Thus, the disclosure also covers storage for executing basis The recording medium of the program of disclosed method.

It may also be noted that each component or each step are can to decompose in the device of the disclosure, device and method And/or reconfigure.These decompose and/or reconfigure the equivalent scheme that should be regarded as the disclosure.

The above description of disclosed aspect is provided, so that any person skilled in the art can make or use this It is open.To those skilled in the art to various modifications in terms of these etc., it is readily apparent, and fixed herein The General Principle of justice can be applied to other aspects, without departing from the scope of the present disclosure.Therefore, the disclosure is not intended to be limited to The aspect being shown here, but according to principle disclosed herein and the consistent widest range of novel feature.

In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the disclosure Example is applied to be restricted in form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, ability Its certain modifications, modification, change, addition and sub-portfolio will be recognized in field technique personnel.

Claims

1. a kind of method for forming image pattern, comprising:

Obtain the first image pattern, wherein the first image sample is provided with an at least text marking information；

The first image sample is supplied to text box detection model, via the text box detection model to first figure Decent progress text box detection processing, obtains the text box location information detected；

Determine text marking information corresponding with the text box location information detected；

According to the text box location information and the corresponding text marking information, new for the setting of the first image pattern Text marking information forms the second image pattern.

2. the method according to claim 1 for forming image pattern, wherein described to be supplied to the first image sample Text box detection model carries out text box detection processing to the first image sample via the text box detection model, obtains The text box location information that must be detected, comprising:

The first image sample is supplied to multiple text box detection models based on different detection algorithms, via the multiple Text box detection model carries out text box detection processing to the first image sample respectively, obtains multiple text box detection models The text box location information respectively detected.

3. the method according to claim 2 for forming image pattern, wherein the multiple text box detection model is in training Corresponding hyper parameter in the process, has differences.

4. the method according to any one of claim 1 to 3 for forming image pattern, wherein the determination and the inspection The corresponding text marking information of the text box location information measured, comprising:

According to the text box position markup information in each text marking information of the first image sample and described detect Text box location information, determine the text box field overlay information；

According to the overlay information and preset condition, text corresponding with the text box location information detected is determined Markup information.

5. the method according to any one of claim 1 to 4 for forming image pattern, wherein described according to the text New text marking information, shape is arranged for the first image pattern in frame location information and the corresponding text marking information At the second image pattern, comprising:

By the content of text markup information in the text box location information and the corresponding text marking information, as institute The new text marking information of the first image pattern is stated, the second image pattern is formed.

6. the method according to any one of claim 1 to 5 for forming image pattern, wherein the method also includes:

Using second image pattern, treats trained content of text identification model and be trained processing.

7. the method according to claim 6 for forming image pattern, wherein it is described using second image pattern, it is right Content of text identification model to be trained is trained processing, comprising:

Distinguished according to the text marking information of the first image pattern and multiple text box detection models based on different detection algorithms The text marking information of the second image pattern obtained, being cut out from the first image pattern and the second image pattern includes text The image block sample of this content；

According to default mixed proportion, the image block sample obtained from the first image pattern is calculated with from the different detections of correspondence The image block sample of second image pattern of method；

By the image block sample of the acquisition, it is supplied to content of text identification model to be trained, via the text to be trained This content recognition model carries out content of text identifying processing to each image block sample, obtains inside the multiple texts identified；

According to the difference of the content of text markup information in the multiple content of text identified and the text marking information, The model parameter of the adjustment content of text identification model to be trained.

8. a kind of device for forming image pattern, wherein described device includes:

Module is obtained, for obtaining the first image pattern, wherein the first image sample is provided at least text marking letter Breath；

Detection module, the first image pattern for obtaining the acquisition module is supplied to text box detection model, via institute It states text box detection model and text box detection processing is carried out to the first image sample, obtain the text box position letter detected Breath；

Determining module, for determining that text marking corresponding with the text box location information that the detection module detects is believed Breath；

Setup module, text box location information and the determining module for being detected according to the detection module are determined Corresponding text marking information, new text marking information is set for the first image pattern, forms the second image pattern.

9. a kind of computer readable storage medium, the storage medium is stored with computer program, and the computer program is used for Execute method described in any one of the claims 1-7.

10. a kind of electronic equipment, the electronic equipment include:

Processor；

For storing the memory of the processor-executable instruction；

The processor, for reading the executable instruction from the memory, and it is above-mentioned to realize to execute described instruction Method of any of claims 1-7.