CN106815561A - Business license printed page analysis method and device - Google Patents

Business license printed page analysis method and device Download PDF

Info

Publication number
CN106815561A
CN106815561A CN201611200465.7A CN201611200465A CN106815561A CN 106815561 A CN106815561 A CN 106815561A CN 201611200465 A CN201611200465 A CN 201611200465A CN 106815561 A CN106815561 A CN 106815561A
Authority
CN
China
Prior art keywords
boundary rectangle
business license
character
connected domain
locating shaft
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611200465.7A
Other languages
Chinese (zh)
Inventor
杨羿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing 58 Information Technology Co Ltd
Original Assignee
Beijing 58 Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing 58 Information Technology Co Ltd filed Critical Beijing 58 Information Technology Co Ltd
Priority to CN201611200465.7A priority Critical patent/CN106815561A/en
Publication of CN106815561A publication Critical patent/CN106815561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

The application provides a kind of business license printed page analysis method and device, by determining at least one of business license connected domain, for each connected domain, determine the boundary rectangle of the connected domain, so as to obtain at least one boundary rectangle, then according at least one boundary rectangle, determine the locating shaft of preset characters, character in business license includes preset characters, the content of business license is split finally according to locating shaft, by the content segmentation of business license at least one character, and at least one character obtained according to segmentation, printed page analysis is carried out to business license.During being somebody's turn to do, by extracting the connected domain of business license, the purpose to business license printed page analysis is realized.

Description

Business license printed page analysis method and device
Technical field
The invention relates to image analysis technology, more particularly to a kind of business license printed page analysis method and device.
Background technology
Business license is the voucher of enterprise or tissue lawful operation power, economic strength, money for proving enterprise or tissue Matter and prestige etc..It is general to be filed an application to the administrative agency for industry and commerce from enterprise or organize, it is complicated formality by one, by work Business administrative management organization issues.Because the application process of business license is complicated, and applicant has to comply with certain condition, because This, many lawless persons carry out illegal operation by forging business license.
To prevent lawless person from forging business license, it is necessary to carry out printed page analysis to business license, so as to business license Audited.However, software for discerning characters on the market is only for conventional portable document format (Portable at present Document Format, PDF) or document carry out structural analysis, the method for not carrying out printed page analysis for business license. Software for discerning characters according to analysis PDF or document is analyzed to business license, then analyze accuracy rate low.
The content of the invention
The embodiment of the present application provides a kind of business license printed page analysis method and device, by the connection for extracting business license Domain, realizes the purpose to business license printed page analysis.
In a first aspect, the embodiment of the present application provides a kind of business license printed page analysis method, including:
Determine at least one of business license connected domain;
The boundary rectangle of each connected domain at least one connected domain is determined, to obtain at least one boundary rectangle;
According at least one boundary rectangle, the locating shaft of preset characters is determined, the character bag in the business license Include the preset characters;
According to the locating shaft, split the content of the business license, by the content segmentation of the business license at least One character;
According at least one character, printed page analysis is carried out to the business license.
In a kind of feasible implementation, before at least one of described determination business license connected domain, also include:
Obtain the binary image of the business license;
At least one of determination business license connected domain, including:
Determine at least one of binary image connected domain.
It is described according at least one boundary rectangle in a kind of feasible implementation, determine determining for preset characters Before the axle of position, also include:
At least one boundary rectangle is filtered, to obtain qualified external square at least one boundary rectangle Shape.
In a kind of feasible implementation, the qualified boundary rectangle includes:Pixel quantity is less than the first threshold The boundary rectangle of value, transverse and longitudinal are outer less than the 3rd threshold value than the number of the boundary rectangle less than Second Threshold, the connected domain for including Connect rectangle.
It is described according at least one boundary rectangle in a kind of feasible implementation, determine determining for preset characters Position axle, including:Characteristics of image is extracted to each boundary rectangle at least one boundary rectangle, to obtain characteristics of image Collection;Extract the characteristics of image of the preset characters;Determined from described image feature set special with the image of the preset characters Levy immediate characteristics of image, using the corresponding boundary rectangle of the immediate image as the preset characters locating shaft.
It is described according to the locating shaft in a kind of feasible implementation, split the content of the business license, by institute The content segmentation of business license is stated at least one character, including:According to the locating shaft, by the content of the business license point It is cut at least one character string;By in each character string maps at least one character string to the business license, with Obtain the business license after mapping character strings;At least one is extracted from the business license after the mapping character strings with behavior unit Individual line character string;Each line character string at least one line character string is split, to obtain described at least one Character.
Second aspect, the embodiment of the present application provides a kind of business license printed page analysis device, including:
Processing module, for determining at least one of business license connected domain;
Boundary rectangle determining module, the boundary rectangle for determining each connected domain at least one connected domain, with Obtain at least one boundary rectangle;
Locating shaft determining module, it is described for according at least one boundary rectangle, determining the locating shaft of preset characters Character in business license includes the preset characters;
Segmentation module, for according to the locating shaft, splitting the content of the business license, by the business license Appearance is divided at least one character;
Analysis module, for according at least one character, printed page analysis being carried out to the business license.
In a kind of feasible implementation, the processing module, the binary image for obtaining the business license, Determine at least one of binary image connected domain.
In a kind of feasible implementation, the boundary rectangle determining module is additionally operable to determine mould in the locating shaft Root tuber before determining the locating shaft of preset characters, filters at least one boundary rectangle according at least one boundary rectangle, To obtain qualified boundary rectangle at least one boundary rectangle.
In a kind of feasible implementation, the qualified boundary rectangle includes:Pixel quantity is less than the first threshold The boundary rectangle of value, transverse and longitudinal are outer less than the 3rd threshold value than the number of the boundary rectangle less than Second Threshold, the connected domain for including Connect rectangle.
In a kind of feasible implementation, the locating shaft determining module, specifically for external to described at least one Each boundary rectangle in rectangle extracts characteristics of image, to obtain set of image characteristics;Extract the characteristics of image of the preset characters; Determined from described image feature set with the immediate characteristics of image of the characteristics of image of the preset characters, will be described closest The corresponding boundary rectangle of image as the preset characters locating shaft.
In a kind of feasible implementation, the segmentation module, specifically for according to the locating shaft, by the business The content segmentation of license is at least one character string;By each character string maps at least one character string to the battalion In industry license, to obtain the business license after mapping character strings;With business license of the behavior unit from after the mapping character strings At least one line character string of middle extraction;Each line character string at least one line character string is split, to obtain At least one character.
The business license printed page analysis method and device that the embodiment of the present application is provided, by determining in business license at least One connected domain, for each connected domain, determines the boundary rectangle of the connected domain, so as to obtain at least one boundary rectangle, Then according at least one boundary rectangle, the locating shaft of preset characters is determined, the character in business license includes preset characters, most The content of business license is split according to locating shaft afterwards, by the content segmentation of business license at least one character, and root According at least one character that segmentation is obtained, printed page analysis is carried out to business license.During being somebody's turn to do, by the company for extracting business license Logical domain, realizes the purpose to business license printed page analysis.
Brief description of the drawings
Fig. 1 is the flow chart of the application business license printed page analysis embodiment of the method one;
Fig. 2 is the flow chart of preprocessing process in the application business license printed page analysis method;
Fig. 3 is the schematic diagram of binary image in the application business license printed page analysis method;
Fig. 4 is the schematic diagram of the boundary rectangle of connected domain in the application business license printed page analysis method;
Fig. 5 is the flow chart of preprocessing process in the application business license printed page analysis method;
Fig. 6 is the structural representation of the application business license printed page analysis device.
Specific embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, those skilled in the art are not having There is the every other embodiment made and being obtained under the premise of creative work, belong to the scope of the application protection.Herein below It is with reference to accompanying drawing and preferred embodiment, to the specific embodiment according to the application, structure, feature and its effect specifically It is bright.
Fig. 1 is the flow chart of the application business license printed page analysis embodiment of the method one, including:
101st, at least one of business license connected domain is determined.
There is the information such as national emblem, title, content, seal, Quick Response Code on the business license space of a whole page, wherein, title includes registration Number, title, type, residence, legal representative, registered capital, Date of Incorporation etc., accordingly, content include specific number of registration, Specific title, particular type (such as partnership or restricted), specific address, the name of legal representative, registered capital The amount of money, exact date etc. for setting up.In those information, interconnected multiple points constitute a region, and the region is referred to as Connected domain.For example, " name " of title constitutes a connected domain, four words of business license respectively constitute four connected domains.This step In, connected domain is determined from the business license space of a whole page by software algorithm.During determining connected domain, adjacent characteristic point quilt Same region is divided into so as to form connected domain.
102nd, the boundary rectangle of each connected domain at least one connected domain is determined, to obtain at least one external square Shape.
After connected domain on the business license space of a whole page is determined, for each connected domain, the external of the connected domain is asked for Rectangle, so as to obtain the boundary rectangle of each connected domain.
103rd, according at least one boundary rectangle, the locating shaft of preset characters, the word in the business license are determined Symbol includes the preset characters.
In this step, preset characters are pre-set, the image of the characteristics of image of preset characters and each boundary rectangle is special Levy and compare, the external square of degree of approach highest with the characteristics of image of preset characters is determined from least one boundary rectangle Shape, using the boundary rectangle as preset characters locating shaft.
104th, according to the locating shaft, split the content of the business license, by the content segmentation of the business license into At least one character.
After locating shaft is determined, a series of segmentation is carried out to the content in business license, so as to by business license Content segmentation is into character one by one.
105th, according at least one character, printed page analysis is carried out to the business license.
By the content segmentation on the business license space of a whole page into after character one by one, business license space of a whole page auditor according to Those characters, the space of a whole page to business license is analyzed.During analysis, business license space of a whole page auditor obtain according to segmentation At least one character, business license entire content is checked, then by content item classification (point title, type, residence), Each single item is audited.
The business license printed page analysis method that the embodiment of the present application is provided, by determining at least one of business license even Logical domain, for each connected domain, determines the boundary rectangle of the connected domain, so as to obtain at least one boundary rectangle, Ran Hougen According at least one boundary rectangle, the locating shaft of preset characters is determined, the character in business license includes preset characters, finally according to Locating shaft is split to the content of business license, by the content segmentation of business license at least one character, and according to segmentation At least one character for obtaining, printed page analysis is carried out to business license.During being somebody's turn to do, by extracting the connected domain of business license, Realize the purpose to business license printed page analysis.
In the embodiment of the present application, three processes are roughly divided into business license printed page analysis, below, to three processes point It is not described in detail.
First, the first process:Preprocessing process.
Specifically, reference can be made to Fig. 2, Fig. 2 be the application business license printed page analysis method in preprocessing process flow chart, Including:
201st, marginal information is filtered.
In this step, for a specific business license, it is normalized according to original scale to by business license, is led to Image processing algorithm is crossed to filter out the marginal information of normalized business license.Wherein, image processing algorithm is, for example, longitudinal direction Projection and transverse projection;Marginal information is, for example, the business license left and right sides with the white space of upper and lower both sides etc..
202nd, the binary image of business license is obtained.
In this step, binaryzation is carried out to the business license for filtering out marginal information using methods such as LMM binaryzations, so that Obtain the binary image of business license.Specifically, Fig. 3 can be participated in, Fig. 3 is two in the application business license printed page analysis method The schematic diagram of value image.
Fig. 3 is refer to, the image on the left side is original business license, and the image on the right is the two-value obtained by binaryzation Change image.Wherein, the business license for filtering out marginal information is eliminated.
203rd, determine at least one of binary image connected domain, determine each connected domain at least one connected domain Boundary rectangle, to obtain at least one boundary rectangle.
In this step, the connected domain in binary image is asked for, and draw the boundary rectangle of all connected domains.Wherein, outward Connect the boundary rectangle of boundary rectangle and non-legible connected domain of the rectangle including word connected domain.Specifically, reference can be made to Fig. 4, Fig. 4 It is the schematic diagram of the boundary rectangle of connected domain in the application business license printed page analysis method.
Fig. 4 is refer to, the connected domain in binary image is multiple, accordingly, the boundary rectangle of connected domain is multiple.
204th, at least one boundary rectangle is filtered, to obtain qualified boundary rectangle at least one boundary rectangle.
In this step, according to a large amount of text-processing experiences, by comprising very few boundary rectangle, the transverse and longitudinal of pixel than excessive Boundary rectangle, filtered out comprising the boundary rectangle that connected domain is excessive and other are ineligible, it is only remaining qualified Boundary rectangle.Wherein, the pixel that pixel is crossed in oligodactyly boundary rectangle is less than first threshold, and first threshold is, for example, 6;It is horizontal The vertical length-width ratio than excessive finger boundary rectangle is more than Second Threshold, and Second Threshold is, for example, 4;Comprising connected domain excessively refer to it is external The quantity of the connected domain included in rectangle is more than the 3rd threshold value, and the 3rd threshold value is, for example, 4.In addition, ineligible is external Rectangle can also be the transverse and longitudinal ratio of boundary rectangle less than the 4th threshold value, such as 0.3 boundary rectangle, the embodiment of the present application not with This is limitation.
In this step, boundary rectangle, the angle of the angle of rectangle, as literal line are obtained by connected domain.
Secondly, the second process:Cutting procedure.
Specifically, reference can be made to Fig. 5, Fig. 5 be the application business license printed page analysis method in preprocessing process flow chart, Including:
301st, according at least one boundary rectangle, the locating shaft of preset characters is determined, the character in business license includes pre- If character.
In this step, according to the boundary rectangle of each connected domain, the locating shaft of preset characters is found using knn algorithms etc.. Under normal circumstances, preset characters are the character that business license includes.Specifically, several characters that desk of doing business is included As preset characters, characteristics of image, such as histograms of oriented gradients (Histogram of Oriented are extracted to preset characters Gradient, HOG) feature, characteristics of image, such as HOG features are extracted to each boundary rectangle at least one boundary rectangle, with Obtain set of image characteristics.Then, each characteristics of image is concentrated to compare the characteristics of image of preset characters and characteristics of image, from Characteristics of image concentrates the degree of approach highest characteristics of image determined with the characteristics of image of preset characters, by characteristics of image correspondence Boundary rectangle as preset characters locating shaft.
302nd, according to locating shaft, by the content segmentation of business license at least one character string.
In this step, according to business license Chinese space between words gap, the width of the boundary rectangle to meeting word requirement, Height etc. is counted, and combines the position of locating shaft, and by business license, title, the macroplate of content two are separated.So Afterwards, in plate carry out word respectively between axial clearance, the statistics of lateral clearance, and combine the battalion after being normalized during first Industry license, by intraplate content segmentation into the character string of behavior unit.
303rd, by each character string maps at least one character string to business license, after obtaining mapping character strings Business license.
In this step, the business license before the character string maps to normalization for obtaining, i.e., original business license will be split In.
304th, at least one line character string is extracted from the business license after mapping character strings with behavior unit, at least one Each line character string in individual line character string is split, to obtain at least one character.
In this step, according to the relative position of text character block, by projects title in business license, content is successfully divided Isolate.Specifically, carrying out background extracting again with behavior unit, and horizontal partition is carried out with reference to global text size, finally It is Chinese character one by one by string segmentation.
305th, the Chinese character that segmentation is obtained is processed.
In this step, the Chinese character after segmentation is processed, the non-textual content of priori conditions is not met to filter out.Example Such as, it is generally the case that the transverse and longitudinal ratio of Chinese character is 1:1, the transverse and longitudinal ratio of numeral is 1:2.By filtering, will be horizontal at least one character Vertical ratio is not 1:1 or 1:2 character filtering falls.Wherein, transverse and longitudinal ratio is 1:1 or 1:2 are priori conditions.
Finally, the 3rd process:Printed page analysis process.
During being somebody's turn to do, according to Chinese character after treatment etc., the space of a whole page to business license is analyzed.During analysis, business At least one character that license space of a whole page auditor obtains according to segmentation, checks to business license entire content, then will be interior Hold classification of the items (point title, type, residence), each single item is audited.Should during, if desired optical character identification (Optical Character Recognition, OCR), then can directly invoke each character that segmentation is obtained.
Fig. 6 is the structural representation of the application business license printed page analysis device, including:
Processing module 11, for determining at least one of business license connected domain;
Boundary rectangle determining module 12, the boundary rectangle for determining each connected domain at least one connected domain, To obtain at least one boundary rectangle;
Locating shaft determining module 13, for according at least one boundary rectangle, determining the locating shaft of preset characters, institute The character stated in business license includes the preset characters;
Segmentation module 14, for according to the locating shaft, splitting the content of the business license, by the business license Content segmentation is at least one character;
Analysis module 15, for according at least one character, printed page analysis being carried out to the business license.
The business license printed page analysis device that the embodiment of the present application is provided, by determining at least one of business license even Logical domain, for each connected domain, determines the boundary rectangle of the connected domain, so as to obtain at least one boundary rectangle, Ran Hougen According at least one boundary rectangle, the locating shaft of preset characters is determined, the character in business license includes preset characters, finally according to Locating shaft is split to the content of business license, by the content segmentation of business license at least one character, and according to segmentation At least one character for obtaining, printed page analysis is carried out to business license.During being somebody's turn to do, by extracting the connected domain of business license, Realize the purpose to business license printed page analysis.
Optionally, in the embodiment of the application one, the processing module 11, the binaryzation for obtaining the business license Image, determines at least one of binary image connected domain.
Optionally, in the embodiment of the application one, the boundary rectangle determining module 12 is additionally operable to true in the locating shaft Cover half block 13 before determining the locating shaft of preset characters, is filtered outside described at least one according at least one boundary rectangle Rectangle is connect, to obtain qualified boundary rectangle at least one boundary rectangle.
Optionally, in the embodiment of the application one, the qualified boundary rectangle includes:Pixel quantity is less than first The boundary rectangle of threshold value, transverse and longitudinal are less than the 3rd threshold value than the number of the boundary rectangle less than Second Threshold, the connected domain for including Boundary rectangle.
Optionally, in the embodiment of the application one, the locating shaft determining module 13, specifically for described at least one Each boundary rectangle in boundary rectangle extracts characteristics of image, to obtain set of image characteristics;Extract the image of the preset characters Feature;Determined from described image feature set with the immediate characteristics of image of the characteristics of image of the preset characters, will be described The corresponding boundary rectangle of immediate image as the preset characters locating shaft.
Optionally, in the embodiment of the application one, the segmentation module 14, specifically for according to the locating shaft, by institute State the content segmentation of business license at least one character string;Each character string maps at least one character string are arrived In the business license, to obtain the business license after mapping character strings;With battalion of the behavior unit from after the mapping character strings At least one line character string is extracted in industry license;Each line character string at least one line character string is split, To obtain at least one character.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above-mentioned each method embodiment can lead to The related hardware of programmed instruction is crossed to complete.Foregoing program can be stored in a computer read/write memory medium.The journey Sequence upon execution, performs the step of including above-mentioned each method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or Person's CD etc. is various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is only used to illustrate the technical scheme of the application, rather than its limitations;To the greatest extent Pipe has been described in detail with reference to foregoing embodiments to the application, it will be understood by those within the art that:Its according to The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered Row equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from each embodiment technology of the application The scope of scheme.

Claims (10)

1. a kind of business license printed page analysis method, it is characterised in that including:
Determine at least one of business license connected domain;
The boundary rectangle of each connected domain at least one connected domain is determined, to obtain at least one boundary rectangle;
According at least one boundary rectangle, the locating shaft of preset characters is determined, the character in the business license includes institute State preset characters;
According to the locating shaft, split the content of the business license, by the content segmentation of the business license at least one Character;
According at least one character, printed page analysis is carried out to the business license.
2. method according to claim 1, it is characterised in that at least one of described determination business license connected domain it Before, also include:
Obtain the binary image of the business license;
At least one of determination business license connected domain, including:
Determine at least one of binary image connected domain.
3. method according to claim 1, it is characterised in that described according at least one boundary rectangle, it is determined that in advance If before the locating shaft of character, also including:
At least one boundary rectangle is filtered, to obtain qualified boundary rectangle at least one boundary rectangle.
4. method according to claim 3, it is characterised in that the qualified boundary rectangle includes:Pixel quantity Boundary rectangle, transverse and longitudinal less than first threshold than the boundary rectangle less than Second Threshold, the connected domain for including number less than the The boundary rectangle of three threshold values.
5. the method according to any one of Claims 1 to 4, it is characterised in that described according to described at least one external square Shape, determines the locating shaft of preset characters, including:
Characteristics of image is extracted to each boundary rectangle at least one boundary rectangle, to obtain set of image characteristics;
Extract the characteristics of image of the preset characters;
Determined from described image feature set with the immediate characteristics of image of the characteristics of image of the preset characters, will described in most The corresponding boundary rectangle of close image as the preset characters locating shaft.
6. the method according to any one of Claims 1 to 4, it is characterised in that described according to the locating shaft, segmentation is described The content of business license, by the content segmentation of the business license at least one character, including:
According to the locating shaft, by the content segmentation of the business license at least one character string;
By in each character string maps at least one character string to the business license, after obtaining mapping character strings Business license;
At least one line character string is extracted from the business license after the mapping character strings with behavior unit;
Each line character string at least one line character string is split, to obtain at least one character.
7. a kind of business license printed page analysis device, it is characterised in that including:
Processing module, for determining at least one of business license connected domain;
Boundary rectangle determining module, the boundary rectangle for determining each connected domain at least one connected domain, to obtain At least one boundary rectangle;
Locating shaft determining module, for according at least one boundary rectangle, determining the locating shaft of preset characters, the business Character in license includes the preset characters;
Segmentation module, for according to the locating shaft, splitting the content of the business license, by the content of the business license point It is cut at least one character;
Analysis module, for according at least one character, printed page analysis being carried out to the business license.
8. device according to claim 7, it is characterised in that
The processing module, the binary image for obtaining the business license is determined in the binary image at least One connected domain.
9. device according to claim 7, it is characterised in that
The boundary rectangle determining module, is additionally operable in the locating shaft determining module according at least one boundary rectangle, Before determining the locating shaft of preset characters, at least one boundary rectangle is filtered, to obtain at least one boundary rectangle In qualified boundary rectangle.
10. device according to claim 9, it is characterised in that the qualified boundary rectangle includes:Pixel quantity Boundary rectangle, transverse and longitudinal less than first threshold than the boundary rectangle less than Second Threshold, the connected domain for including number less than the The boundary rectangle of three threshold values.
CN201611200465.7A 2016-12-22 2016-12-22 Business license printed page analysis method and device Pending CN106815561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611200465.7A CN106815561A (en) 2016-12-22 2016-12-22 Business license printed page analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611200465.7A CN106815561A (en) 2016-12-22 2016-12-22 Business license printed page analysis method and device

Publications (1)

Publication Number Publication Date
CN106815561A true CN106815561A (en) 2017-06-09

Family

ID=59110398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611200465.7A Pending CN106815561A (en) 2016-12-22 2016-12-22 Business license printed page analysis method and device

Country Status (1)

Country Link
CN (1) CN106815561A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460385A (en) * 2018-03-02 2018-08-28 山东超越数控电子股份有限公司 A kind of Document Segmentation method and apparatus
CN110135431A (en) * 2019-05-16 2019-08-16 深圳市信联征信有限公司 The automatic identifying method and system of business license
CN111507813A (en) * 2020-04-21 2020-08-07 江西省机电设备招标有限公司 Bidder identity confirming method and bidding method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339618A (en) * 2007-07-06 2009-01-07 上海思必得通讯技术有限公司 Mobile phones name card recognition device
CN101408937A (en) * 2008-11-07 2009-04-15 东莞市微模式软件有限公司 Method and apparatus for locating character row
CN102955941A (en) * 2011-08-31 2013-03-06 汉王科技股份有限公司 Identity information recording method and device
CN103839058A (en) * 2012-11-21 2014-06-04 方正国际软件(北京)有限公司 Information locating method for document image based on standard template
CN104200209A (en) * 2014-08-29 2014-12-10 南京烽火星空通信发展有限公司 Image text detecting method
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105261110A (en) * 2015-10-26 2016-01-20 江苏国光信息产业股份有限公司 Efficient DSP banknote serial number recognizing method
CN105701488A (en) * 2016-01-01 2016-06-22 广州恒巨信息科技有限公司 Identity card identification method
CN106056114A (en) * 2016-05-24 2016-10-26 腾讯科技(深圳)有限公司 Business card content identification method and business card content identification device
CN106156767A (en) * 2016-03-02 2016-11-23 平安科技(深圳)有限公司 Driving license effect duration extraction method, server and terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339618A (en) * 2007-07-06 2009-01-07 上海思必得通讯技术有限公司 Mobile phones name card recognition device
CN101408937A (en) * 2008-11-07 2009-04-15 东莞市微模式软件有限公司 Method and apparatus for locating character row
CN102955941A (en) * 2011-08-31 2013-03-06 汉王科技股份有限公司 Identity information recording method and device
CN103839058A (en) * 2012-11-21 2014-06-04 方正国际软件(北京)有限公司 Information locating method for document image based on standard template
CN104200209A (en) * 2014-08-29 2014-12-10 南京烽火星空通信发展有限公司 Image text detecting method
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105261110A (en) * 2015-10-26 2016-01-20 江苏国光信息产业股份有限公司 Efficient DSP banknote serial number recognizing method
CN105701488A (en) * 2016-01-01 2016-06-22 广州恒巨信息科技有限公司 Identity card identification method
CN106156767A (en) * 2016-03-02 2016-11-23 平安科技(深圳)有限公司 Driving license effect duration extraction method, server and terminal
CN106056114A (en) * 2016-05-24 2016-10-26 腾讯科技(深圳)有限公司 Business card content identification method and business card content identification device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460385A (en) * 2018-03-02 2018-08-28 山东超越数控电子股份有限公司 A kind of Document Segmentation method and apparatus
CN110135431A (en) * 2019-05-16 2019-08-16 深圳市信联征信有限公司 The automatic identifying method and system of business license
CN111507813A (en) * 2020-04-21 2020-08-07 江西省机电设备招标有限公司 Bidder identity confirming method and bidding method
CN111507813B (en) * 2020-04-21 2023-05-12 江西省机电设备招标有限公司 Bidder identity identification method and bidding method

Similar Documents

Publication Publication Date Title
AU2020200251B2 (en) Label and field identification without optical character recognition (OCR)
CA2900818C (en) Systems and methods for tax data capture and use
CN103995904B (en) A kind of identifying system of image file electronic bits of data
US20200184210A1 (en) Multi-modal document feature extraction
CA3117374C (en) Sensitive data detection and replacement
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN111695453B (en) Drawing recognition method and device and robot
CN104182722B (en) Method for text detection and device and text message extracting method and system
CN108154132A (en) A kind of identity card text extraction method, system and equipment and storage medium
CN106815561A (en) Business license printed page analysis method and device
Abramova et al. Detecting copy–move forgeries in scanned text documents
Baechler et al. Text line extraction using DMLP classifiers for historical manuscripts
CN114463767A (en) Credit card identification method, device, computer equipment and storage medium
CN110197140A (en) Material checking method and equipment based on Text region
WO2017069741A1 (en) Digitized document classification
CN104182744A (en) Text detection method and device, and text message extraction method and system
CN113033562A (en) Image processing method, device, equipment and storage medium
CN114005131A (en) Certificate character recognition method and device
CN205670326U (en) There is the bank note inspection pseudo-device of pseudo-coinage type statistics function
CN111931229B (en) Data identification method, device and storage medium
JP6896260B1 (en) Layout analysis device, its analysis program and its analysis method
CN113111882A (en) Card identification method and device, electronic equipment and storage medium
Vasilopoulos et al. Automatic text extraction from arabic newspapers
Chithra et al. A Survey on Various Leaf Identification Techniques for Medicinal Plants
JP2022108130A (en) Information processor and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170609

RJ01 Rejection of invention patent application after publication