CN106056114A

CN106056114A - Business card content identification method and business card content identification device

Info

Publication number: CN106056114A
Application number: CN201610347295.9A
Authority: CN
Inventors: 叶浩; 张睿欣; 郭晓威; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2016-05-24
Filing date: 2016-05-24
Publication date: 2016-10-26
Anticipated expiration: 2036-05-24
Also published as: CN106056114B; WO2017202232A1

Abstract

The invention relates to a business card content identification method and a business card content identification device. The method comprises the following steps: getting a business card image; detecting a text sequence image in the business card image; performing text identification on a local image starting from the head of the text sequence image to get a corresponding head text fragment; determining the text sequence content type according to the head text fragment; and completely identifying the text sequence image to get a corresponding text sequence if the text sequence content type is a specified text sequence content type. The business card content identification method and the business card content identification device provided by the invention are of strong adaptive ability, and can help to improve the efficiency of business card content identification.

Description

Contents of visiting cards recognition methods and device

Technical field

The present invention relates to technical field of image processing, particularly relate to a kind of business card identification method and device.

Background technology

Business card is a kind of important article in etiquette, by exchanging visiting cards and can quickly understand each other between stranger The other side, sets up social networks.Entity business card is traditional business card form, is still the business card form of main flow at present.Entity business card can So that contents of visiting cards is printed on paper card or plastic cards.Traditional business card occupation mode, is after receiving entity business card Entity business card is covered up, needs the used time manually to search, waste time and energy.

Business card processing mode the most eaily, after being shooting business card photo, goes out the content recognition in business card photo Come and preserve, quickly can be searched by information retrieval technique when requiring to look up contents of visiting cards.Identify contents of visiting cards time need by Business card photo upload, to server, by the business card templates mated with business card photo in server search data base, thus utilizes number Contents of visiting cards identification has been assisted according to the marked content of business card templates in storehouse.

But, current contents of visiting cards recognition method depends on artificial constructed business card templates data base, and business card templates Needing through artificial mark, the mark of Database and business card templates is required for artificial participative decision making, when not depositing in data base Discrimination can be caused to be decreased obviously when corresponding business card templates, adaptive ability is very poor.

Summary of the invention

Based on this, it is necessary to for the problem of current contents of visiting cards recognition method adaptive ability difference, it is provided that a kind of business card Content identification method and device.

A kind of contents of visiting cards recognition methods, including:

Obtain business card image；

Detect the text sequence image in described business card image；

To described text sequence image, the topography from head carries out text identification, obtains corresponding head text sheet Section；

Text sequence content type is determined according to described head text fragments；

When described text sequence content type is the text sequence content type specified, then to described text sequence image Completely identify and obtain corresponding text sequence.

A kind of contents of visiting cards identification device, including:

Text sequence detection module, is used for obtaining business card image；Detect the text sequence image in described business card image；

Text sequence pre-identification module, for carrying out text knowledge to described text sequence image topography from head , corresponding head text fragments is not obtained；

Text sequence identification module, for determining text sequence content type according to described head text fragments；When described When text sequence content type is the text sequence content type specified, then described text sequence image is completely identified To corresponding text sequence.

Above-mentioned contents of visiting cards recognition methods and device, after obtaining business card image, detect text sequence image, by right The text identification of text sequence image local image may determine that corresponding text sequence content type, and then to required text Text sequence image corresponding to sequence content type completely identifies and obtains corresponding text sequence.Use the hands of text identification Section carries out contents of visiting cards identification, it is not necessary to manually sets up business card templates data base and manually marks, being adapted to various types of The business card of type carries out content recognition, and adaptive ability is strong.And when text sequence content type is the text sequence content specified During type, described text sequence image is completely identified and obtain corresponding text sequence, contents of visiting cards identification effect can be improved Rate.

Accompanying drawing explanation

Fig. 1 is the applied environment figure of namecard processing system in an embodiment；

Fig. 2 is the internal structure schematic diagram of electronic equipment in an embodiment；

Fig. 3 is the schematic flow sheet of contents of visiting cards recognition methods in an embodiment；

Fig. 4 is the schematic flow sheet of the step detecting the text sequence image in business card image in an embodiment；

Fig. 5 is business card image, the business card image of binaryzation and carry from the business card image of binaryzation in an embodiment The schematic diagram of the connected domain taken；

Fig. 6 is, in an embodiment, text sequence image topography from head is carried out text identification, obtains phase The schematic flow sheet of the step of the head text fragments answered；

Fig. 7 is the flow process signal of the step of the sequence being syncopated as individual character image in an embodiment from text sequence image Figure；

Fig. 8 is the schematic flow sheet of contents of visiting cards recognition methods in a concrete application scenarios；

Fig. 9 is the structured flowchart of contents of visiting cards identification device in an embodiment；

Figure 10 is the structured flowchart of an embodiment Chinese version Sequence Detection module；

Figure 11 is the structured flowchart of an embodiment Chinese version sequence pre-identification module；

Figure 12 is the structured flowchart of contents of visiting cards identification device in another embodiment.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, right The present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, and It is not used in the restriction present invention.

As it is shown in figure 1, in one embodiment, it is provided that a kind of namecard processing system, including terminal 110 and server 120.Wherein terminal 110 can be personal computer, mobile terminal or Wearable, mobile terminal such as mobile phone, flat board Computer or personal digital assistant.Server 120 can be independent server or server cluster.Terminal 110 can be used for Obtaining business card image and send to server 120, server 120 can be used for receiving the business card image that terminal 110 sends；Detection name Text sequence image in picture；To text sequence image, the topography from head carries out text identification, obtains corresponding Head text fragments；Text sequence content type is determined according to head text fragments；When text sequence content type is for specifying Text sequence content type time, then text sequence image is completely identified and obtains corresponding text sequence；Being additionally operable to will The text sequence and the corresponding text sequence content type that recognize send to terminal 110 as contents of visiting cards.Terminal 110 can be used In the contents of visiting cards receiving server feedback, it is also possible to for the contents of visiting cards received is shared.

As in figure 2 it is shown, in one embodiment, it is provided that a kind of electronic equipment, this electronic equipment can be as shown in Figure 1 Terminal 110 or server 120.Processor that electronic equipment includes being connected by system bus, non-volatile memory medium, Built-in storage and network interface.When this electronic equipment is terminal 110, electronic equipment can also include display screen and input dress Put.Wherein, the non-volatile memory medium storage of electronic equipment has operating system, also includes a kind of contents of visiting cards identification device, This contents of visiting cards identification device is used for realizing a kind of contents of visiting cards recognition methods.The processor of this electronic equipment is used for providing calculating And control ability, support the operation of electronic equipment.In the built-in storage of electronic equipment is the business card in non-volatile memory medium Hold and identify that the operation of device provides environment, this built-in storage can store computer-readable instruction, this computer-readable instruction It is when executed by, processor can be made to perform a kind of contents of visiting cards recognition methods.The network interface of electronic equipment is for even Receive network to communicate.Display screen can be LCDs or electric ink display screen etc., and input equipment can be aobvious The touch layer covered in display screen, it is also possible to be button, trace ball or the Trackpad arranged on electronic equipment casing, it is also possible to outside being Keyboard, Trackpad or the mouse etc. connect.It will be understood by those skilled in the art that the structure shown in Fig. 2, be only and the application The block diagram of the part-structure that scheme is relevant, is not intended that the restriction to the electronic equipment that the application scheme is applied thereon, tool The electronic equipment of body can include than shown in figure more or less of parts, or combines some parts, or has difference Parts arrange.

As it is shown on figure 3, in one embodiment, it is provided that a kind of contents of visiting cards recognition methods, the method can be used alone Terminal 110 side of namecard processing system in Fig. 1；Or server 120 side can be applied individually to any；Or the method can one Certain applications are then applied to server 120 side in terminal 110 side other parts, mutual by terminal 110 and server 120 Realize contents of visiting cards recognition methods.The present embodiment is applied to server 120 in this way and illustrates, and the method specifically includes Following steps:

Step 302, obtains business card image.

Wherein, business card image refers to comprise the image of contents of visiting cards, can be business card photo or business card scan part or Electronic business card picture.Terminal can shoot entity business card by the photographic head of terminal and obtain business card image, or swept by scanner Retouch entity business card and obtain business card image, or receive the business card image that another terminal sends.Business card image can be sent by terminal To server, server receive this business card image.In one embodiment, server can carry out fuzzy journey to business card image Degree is analyzed, and excludes the business card image that fog-level is high, and fog-level can be estimated according to gradient power；Can also exclude Do not meet the business card image of business card basic feature, to weed out non-business card image.

Step 304, the text sequence image in detection business card image.

Wherein, text sequence refers to the text-string that character arranged in sequence is formed.Text sequence can be line of text or Person's text column, corresponding text sequence image can be then line of text image or text column image.Wherein line of text refers to word According with the text sequence approximately along transversely arranged one-tenth, text column is then the text sequence that character is substantially longitudinally arranged in.

Specifically, server can detect text sequence image according to the priori features of text sequence from business card image. Character pitch feature within the priori features of text sequence such as line of text or text column, inside line of text or text column Character center feature etc. the most point-blank.Character pitch within line of text or text column is less, is generally less than The width of one or more characters or height.When the length of the text sequence image detected exceedes preset length, can be by Text sequence image is divided into multiple text sequence image and continues with.

Step 306, to text sequence image, the topography from head carries out text identification, obtains corresponding head literary composition This fragment.

Wherein, the head of text sequence image refers to the initial position of the reading order according to text sequence, such as text The head of row image can be the high order end of line of text image, and the most such as head of text column image can be text column image Topmost.Topography can be regular length from head or the part figure of regular length accounting in text sequence image Picture, wherein length accounting refers to that topography accounts for the ratio of text sequence image length along the length in text sequence direction.Service Device can carry out text identification to topography, obtains the head text fragments that this topography is corresponding.Head text fragments is phase A part for the text sequence answered.

Server can use neural network model to carry out text identification, specifically can use CNN (Convolutional Neural Networks, convolutional neural networks) model or FCNN (Fully Convolutional Neural Networks, full convolutional neural networks) model.Wherein CNN model is very strong at visual field classification capacity, can accurately carry out list Word identification.

Step 308, determines text sequence content type according to head text fragments.

Wherein, the type of content in the text sequence during text sequence content type refers to text sequence image.Text sequence Row content type such as telephone number-type, name type, e-mail address type, Business Name type or mailing address Type etc..

In one embodiment, step 308 includes: head text fragments is carried out keyword match and/or format match, Determine corresponding text sequence content type.

Specifically, for identifying text sequence during server can collect the head text fragments of text sequence sample in advance The keyword of content type constitutes set of keywords, and records the text sequence content type that each keyword is corresponding.Server exists When performing step 308, set of keywords can be traveled through and search the keyword that mates with current head text fragments, if finding The keyword joined, then be defined as the text sequence content type corresponding to keyword of coupling by text sequence content type.

Wherein, keyword can be the field name of mark text sequence content type, such as " phone ", " name ", " duty Position ", " mailbox ", the field name such as " company " or " mailing address ".Keyword can also be to rise through statistics text sequence head One or more characters in can distinguish the content of text feature of text sequence content type, for example as surname " Lee ", The individual character such as " king " or " Nie ", the most such as "+86 ", " 136 " or " 139 " expect someone's call prefix.

Form refers to the structural constraint of character combination in the character string that at least two character is constituted.Server can prepare in advance The form formula that each text sequence content type is corresponding, when performing step 308 by head text fragments and each form formula Relatively, if there is the form formula of coupling, then text sequence content type is defined as the text that the form formula of coupling is corresponding Sequence content type.Form formula can represent with regular expression.

In one embodiment, keyword match and format match can separately be used alone, it is also possible to be applied in combination.Group Close when using, such as can find keyword mate with head text fragments, and existence is mated with head text fragments Form formula, and the keyword of coupling identical text sequence content type corresponding with the form formula of coupling, then by text sequence Row content type is defined as the text sequence content type that this is identical.

Step 310, when text sequence content type is the text sequence content type specified, then to text sequence image Completely identify and obtain corresponding text sequence.

Wherein, it is intended that text sequence content type be in advance or this perform contents of visiting cards identification time specify need identify The text sequence content type gone out.The text sequence content type specified can be one or more.When text sequence content class When type is the text sequence content type specified, illustrate that corresponding text sequence is the contents of visiting cards needed for contents of visiting cards identification, Text sequence image is completely identified and just can obtain required text sequence.If text sequence content type cannot be determined As required text sequence image can completely be identified and obtain text sequence, then confirm whether text sequence is required literary composition This sequence；Or corresponding text sequence image can be abandoned.If it is determined that text sequence content type be not the text specified Sequence content type, then can directly abandon corresponding text sequence image, no longer be identified.

In one embodiment, server can also verify text sequence content type and the root of the text sequence identified The text sequence content type determined according to head text fragments is the most consistent, if consistent, verification is passed through, and retains the literary composition identified This sequence and corresponding text sequence content type；If text sequence that is inconsistent, that can will determine according to head text fragments Content type is changed to the text sequence content type of the text sequence identified.So can ensure that contents of visiting cards recognition result Accuracy.

Above-mentioned contents of visiting cards recognition methods, after obtaining business card image, detects text sequence image, by text sequence The text identification of row image local image may determine that corresponding text sequence content type, and then in required text sequence Hold text sequence image corresponding to type completely to identify and obtain corresponding text sequence.The means using text identification are entered Row contents of visiting cards identification, it is not necessary to manually set up business card templates data base and manually mark, being adapted to various types of name Sheet carries out content recognition, and adaptive ability is strong.And when text sequence content type is the text sequence content type specified Text sequence image is completely identified and obtains corresponding text sequence, contents of visiting cards recognition efficiency can be improved.

As shown in Figure 4, in one embodiment, step 304 specifically includes following steps:

Step 402, extracts the connected domain in business card image.

Specifically, the business card image after binaryzation by business card image binaryzation, and can be carried out connected domain analysis by server Extract connected domain, it is also possible to adjacent connected domain merged.Server specifically can use stroke to smooth (Run Length Smooth Algorithm, is abbreviated as RLSA) algorithm carries out connected domain analysis and merging, and this algorithm can be by adjacent connected domain Pixel be connected, formed monoblock region, due to the distance between each connected domain of one text interior sequences relatively, So the connected domain in same text sequence can form a complete connected domain.

As it is shown in figure 5, shown in business card image such as Fig. 5 (a), wherein part sensitive information hides for protection privacy purpose Lid processes.After the business card image binaryzation shown in Fig. 5 (a), obtain the image as shown in Fig. 5 (b), then pass through connected domain analysis With the connected domain that merging obtains each white as shown in Fig. 5 (c).

Step 404, determines corresponding text sequence image according to connected domain.

Specifically, server can will be approximately at the outline of the multiple connected domains on same straight line and is defined as text sequence The position of image record, to determine corresponding text sequence image.When text sequence image rectangle represents, text sequence The position of image can represent with a summit of the text sequence image of rectangle and rectangle width and rectangle height.Server is also Can be using each connected domain as independent text sequence image procossing.

Step 406, determines the inclination angle of each connected domain.

Wherein, inclination angle refers to deviate the angle of reference direction, and reference direction can be consistent with the direction of text sequence, than As for line of text, inclination angle can be the angle of deviation horizontal direction, and the most such as text column, inclination angle can be deviation The angle of vertical direction.Specifically, each connected domain can represent by its rectangular profile, and server can calculate this rectangular profile Inclination angle as the inclination angle of corresponding connected domain.

In one embodiment, the pixel of connected domain can be projected on straight line by server so that this straight line On projection variance maximum, and then using the inclination angle of this straight line as the inclination angle of corresponding connected domain.Server specifically can use Principal component analysis (Principal Component Analysis, PCA) algorithm or least square regression algorithm scheduling algorithm come Obtain projecting the inclination angle of the maximum straight line of variance.

Step 408, determines the inclination angle of business card image according to the inclination angle of each connected domain.

Specifically, server can be using the arithmetic mean of instantaneous value at the inclination angle of each connected domain or weighted mean as business card The inclination angle of image.

Step 410, corrects business card image travel direction according to the inclination angle of business card image, it is thus achieved that correct through direction Each text sequence image.

Specifically, business card image can be rotated towards the direction reducing inclination angle by server according to the inclination angle of business card image Equal to the angle at inclination angle, thus realize the direction to business card image and correct.After business card image entirety has carried out direction rectification, name Each text sequence image in picture has also been correspondingly made available direction rectification.

In one embodiment, step 404 can be deleted, and step 410 could alternatively be: according to the inclination of business card image Business card image travel direction is corrected by angle, determines corresponding text sequence according to each connected domain in the business card image of overcorrection Row image.

In the present embodiment, by the connected domain extracted from business card image, it is possible not only to determine corresponding text sequence Image, it is also possible to corrected by the direction that business card image is overall and realize the rectification of the direction to each text sequence image.In foundation Connected domain just can be utilized to realize detection and the direction of text sequence image during connected domain detection text sequence image Correct, it is not necessary to correct individually for each text sequence image travel direction, improve computational efficiency.

As shown in Figure 6, in one embodiment, step 306 specifically includes following steps:

Step 602, is syncopated as the sequence of individual character image from text sequence image.

Wherein, individual character image is the rectangular image including single character, and server is syncopated as one from text sequence image Each and every one individual character image, these individual character images constitute the sequence of individual character image according to the order in text sequence image.Service Implement body can be according to prioris such as text sequence pitch characteristics, character length feature and character ratio concordance from text sequence Row image is syncopated as the sequence of individual character image.Text sequence image can such as increase figure through image enhaucament before being split Image contrast.

In one embodiment, wherein each pixel value can be projected to literary composition by after text sequence image binaryzation by server Obtain accumulated value on this sequence image long side direction, search out local maxima accumulated value or Local Minimum accumulated value is cut Point, thus obtain the sequence of individual character image.If wherein representing after text sequence image binaryzation, the pixel color of character is white, Then find Local Minimum accumulated value；If representing after text sequence image binaryzation, the pixel color of character is black, then searching office Portion's cumulative maximum value.

Step 604, carries out text identification to part continuous individual character image from head in the sequence of individual character image, obtains Corresponding head text fragments.

Specifically, server, from the individual character image that the sequence of individual character image is whole, chooses sequence from individual character image The part continuous individual character image that head rises, and then individual character image continuous to the part chosen carry out text identification, obtains corresponding Head text fragments.Wherein part continuous individual character image from head in the sequence of individual character image, can be specifically individual character figure The continuous individual character image of fixed qty from head in the sequence of picture, or the continuous individual character image of default accounting.Preset and account for Than can be that the continuous individual character image chosen accounts for the ratio of individual character total number of images in the sequence of individual character image.

In the present embodiment, to obtaining the sequence of individual character image after text sequence image cutting, the sequence of individual character image is entered Row local identifies and obtains head text fragments, can conveniently and efficiently determine head text fragments.

In one embodiment, text sequence image is completely identified by step 310 obtain corresponding text sequence Comprise determining that and the sequence of individual character image removes the individual character image that the continuous individual character image of the part from head is remaining；To surplus Remaining individual character image carries out text identification, is remained local segment accordingly；According to residue local segment and head local sheet Section obtains the text sequence corresponding with text sequence image.

Specifically, server first determines that the sequence of individual character image local identify and obtain head text fragments, when according to head Portion's text fragments judges when the text sequence in text sequence image is the text sequence content type specified, then to continue individual character In the sequence of image, remaining individual character image carries out text identification, obtains remaining local segment, will residue local segment and head Local segment combination just can obtain complete text sequence.

In the present embodiment, after server can be required contents of visiting cards determining text sequence, can be efficiently to text Sequence image completely identifies and obtains corresponding text sequence, improves contents of visiting cards recognition efficiency.

As it is shown in fig. 7, in one embodiment, step 602 specifically includes following steps:

Step 702, along the long limit of text sequence image according to the minor face than text sequence image in text sequence image Short spacing takes candidate's cut-off.

Specifically, text sequence image is rectangle, the width of character in the minor face of text sequence image substantially text sequence Or high, long limit is then about the length of text sequence image Chinese version sequence, and server is chosen according to the spacing shorter than minor face Candidate's cut-off, the quantity of the candidate's cut-off so selected is greater than the quantity of the cut-off of reality.Choose candidate's cut-off Spacing specifically can less than or equal to text sequence image minor face 1/2nd or 1/3rd or 1/4th.Wait Selecting cut-off is the dicing position of candidate, can represent by the distance of coordinate or distance text sequence picture headers starting point.

In one embodiment, all of text sequence image holding length-width ratio can be carried out minor face normalization by server, Make through minor face each text sequence image bond length normalized equal, the most again by server through minor face normalizing The text sequence image changed takes candidate's cut-off along its long limit according to the spacing shorter than its minor face.Such as can be by all texts Row image keeps length-width ratio scaling so that the height of the line of text image after scaling is 120 pixels, according still further to 30 pixels Spacing from scaling after line of text image take candidate's cut-off.

Step 704, obtains the cutting confidence level of each candidate's cut-off.

Here cutting problems is converted into two classification problems, namely judges whether candidate's cut-off is actual cutting Point, cutting confidence level be corresponding candidate's cut-off be the quantized value of the probability of actual cut-off.Server specifically can be by It is syncopated as corresponding picture according to candidate's cut-off, is sequentially inputted to trained after the picture being syncopated as is extracted characteristics of image In grader, the cutting confidence level of output corresponding candidate cut-off.Grader can use random forest grader.

The characteristics of image extracted can use HOG (Histogram of Oriented Gradient, direction gradient Nogata Figure) feature.In the case of business card image is relatively fuzzyyer, can be sticked together between character, there is no obvious spacing；Wrap in character Such as " (" etc., the ratio of these symbols is the most different from Chinese character and numeral, uses HOG feature here, between character to have contained symbol The apparent upper difference in region and the region of character inner that cut-off is corresponding is very big, and HOG feature can give expression to well accordingly Feature, uses HOG feature can improve the robustness of cutting.The characteristics of image extracted can also use LBP (Local Binary Patterns, local binary patterns) further feature such as feature.

Step 706, determines cut-off according to cutting confidence level.

Specifically, cutting confidence level can be compared by server with predetermined threshold value, as being then judged to reality higher than predetermined threshold value Cut-off.In one embodiment, server can exclude the time of cutting confidence level local maximum from each candidate's cut-off Select candidate's cut-off that cut-off is adjacent, determine cut-off according to remaining candidate's cut-off.Wherein cutting confidence level local pole Big candidate's cut-off refers to the cutting confidence level cutting confidence level higher than adjacent candidate's cut-off of this candidate's cut-off.Examine Consider the quantity the arriving candidate's cut-off quantity less than actual cut-off, though the cutting confidence of two adjacent candidate's cut-offs Spending the highest, the most also only having one is actual cut-off, after so excluding impossible candidate's cut-off, remaining time Select the cut-off just can be all or according to above-mentioned predetermined threshold value selectively as actual cut-off, the so cutting of selection Point is the most accurate.

Step 708, is syncopated as the sequence of individual character image from text sequence image according to the cut-off determined.Specifically, Carry out cutting at the server cut-off that everywhere determines in text sequence image, obtain individual character image one by one, constitute single The sequence of word image.

In the present embodiment, by selecting candidate's cut-off in text sequence image thick and fast, and each candidate can be utilized The cutting confidence level of cut-off carrys out cutting text sequence image and obtains the sequence of individual character image, it is possible to achieve to text sequence image Accurate cutting, improve contents of visiting cards discrimination.

In one embodiment, electronic equipment (such as terminal) is in getting the text sequence and text sequence identified After holding type, can show according to text sequence content type classification in the appointment position specifying interface.Such as electronic equipment can be Specify the field name showing each text sequence content type in the interface at interface, thus each field name corresponding shows phase The text sequence answered.

In one embodiment, electronic equipment (such as terminal) can also receive typing instruction, obtains according to typing instruction The contents of visiting cards of typing, and the contents of visiting cards of typing is together preserved with text sequence and text sequence content type.This enforcement In example, user is possible not only to identify contents of visiting cards, it is also possible to the new contents of visiting cards not having in the text sequence that mark identifies, And together preserve with the contents of visiting cards identified, contents of visiting cards can be enriched further, improve business card ease of use.Terminal is being protected This locality can be saved in when depositing and can also be saved in server.

In one embodiment, electronic equipment (such as terminal) can also obtain business card and shares instruction；Share according to business card Instruction determines that recipient identifies；Text sequence is shared with corresponding text sequence content type to recipient and identify corresponding end End.The new contents of visiting cards of typing can also together be divided by electronic equipment with text sequence and corresponding text sequence content type Enjoy to recipient identify correspondence terminal.Recipient's mark can be the ID of social good friend.ID can uniquely be marked Know and user, such as user account.

In the present embodiment, after obtaining the text sequence and corresponding text sequence content type identified, can be by text Sequence is shared to the recipient specified with corresponding text sequence content type, it is simple to by the business card after entity business card electronization That holds shares, and recipient need not typing business card again, improves operation ease.

As shown in Figure 8, in a concrete application scenarios, server can detect in the one's own profession of advanced style of writing, then carries out line of text Pre-identification, finally carries out line of text content recognition and extraction.Server is when carrying out line of text detection, first by business card image two-value Change, then extract connected domain and merge, thus extracting line of text image, estimating line of text quantity and inclination angle, utilizing literary composition One's own profession quantity and tilt angle calculation go out the inclination angle of business card image, thus carry out business card image according to the inclination angle of business card image General direction is corrected, thus reaches the result correcting line of text image travel direction.Server can with ambiguous estimation degree, If fog-level is higher than fog-level threshold value, it is not identified.

Further, server, when carrying out line of text pre-identification, carries out image enhaucament to line of text image, and then to literary composition One's own profession image carries out individual character cutting, individual character image binaryzation this pre-identification of laggard style of writing that will be syncopated as, and obtains corresponding head Portion's text fragments.Then, head text fragments, during carrying out line of text content recognition and extracting, is closed by server Key word mates, if matching keyword, more completely identifies corresponding line of text image, and verifies recognition result, finally The contents of visiting cards output that will recognize.

As it is shown in figure 9, in one embodiment, it is provided that a kind of contents of visiting cards identification device 900, examine including text sequence Survey module 901, text sequence pre-identification module 902 and text sequence identification module 903.

Text sequence detection module 901, is used for obtaining business card image；Text sequence image in detection business card image.

Text sequence pre-identification module 902, for carrying out text knowledge to text sequence image topography from head , corresponding head text fragments is not obtained.

Text sequence identification module 903, for determining text sequence content type according to head text fragments；When text sequence When row content type is the text sequence content type specified, then text sequence image is completely identified and obtain corresponding literary composition This sequence.

Above-mentioned contents of visiting cards identification device 900, after obtaining business card image, detects text sequence image, by literary composition The text identification of this sequence image topography may determine that corresponding text sequence content type, and then to required text sequence The text sequence image that row content type is corresponding completely identifies and obtains corresponding text sequence.Use the means of text identification Carry out contents of visiting cards identification, it is not necessary to manually set up business card templates data base and manually mark, being adapted to all kinds Business card carry out content recognition, adaptive ability is strong.And when text sequence content type is the text sequence content class specified During type, text sequence image is completely identified and obtain corresponding text sequence, contents of visiting cards recognition efficiency can be improved.

As shown in Figure 10, in one embodiment, text sequence detection module 901 includes: connected domain extraction module 901a, Text sequence image determines module 901b and direction rectification module 901c.

Connected domain extraction module 901a, for extracting the connected domain in business card image.

Text sequence image determines module 901b, for determining corresponding text sequence image according to connected domain.

Direction rectification module 901c, for determining the inclination angle of each connected domain；Inclination angle according to each connected domain is true Name the inclination angle of picture；Business card image travel direction is corrected by the inclination angle according to business card image, it is thus achieved that rectify through direction Positive each text sequence image.

In one embodiment, text sequence identification module 903 is additionally operable to head text fragments is carried out keyword match And/or format match, determine corresponding text sequence content type.

As shown in figure 11, in one embodiment, text sequence pre-identification module 902 includes: individual character cutting module 902a With text head pre-identification module 902b.

Individual character cutting module 902a, for being syncopated as the sequence of individual character image from text sequence image.

Text head pre-identification module 902b, for part continuous individual character image from head in the sequence of individual character image Carry out text identification, obtain corresponding head text fragments.

In one embodiment, from the beginning text sequence identification module 903 removes in being additionally operable to determine the sequence of individual character image The individual character image that part continuous individual character image that portion rises is remaining；Remaining individual character image is carried out text identification, obtains corresponding Residue local segment；The text sequence corresponding with text sequence image is obtained with head local segment according to remaining local segment Row.

In one embodiment, individual character cutting module 902a is additionally operable in text sequence image along text sequence image Long limit takes candidate's cut-off according to the spacing that the minor face than text sequence image is short；Obtain the cutting confidence of each candidate's cut-off Degree；Cut-off is determined according to cutting confidence level；From text sequence image, individual character image it is syncopated as according to the cut-off determined Sequence.

In one embodiment, individual character cutting module 902a is additionally operable to exclude cutting confidence level from each candidate's cut-off Candidate's cut-off that candidate's cut-off of local maximum is adjacent, determines cut-off according to remaining candidate's cut-off.

As shown in figure 12, in one embodiment, contents of visiting cards identification device 900 also includes business card sharing module 904, uses Instruction is shared in obtaining business card；Share instruction according to business card and determine that recipient identifies；By text sequence and corresponding text sequence Content type share to recipient identify correspondence terminal.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible Instructing relevant hardware by computer program to complete, this computer program can be stored in an embodied on computer readable storage and be situated between In matter, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, aforesaid storage medium can be The non-volatile memory mediums such as magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM), or store note at random Recall body (Random Access Memory, RAM) etc..

Each technical characteristic of above example can combine arbitrarily, for making description succinct, not to above-described embodiment In all possible combination of each technical characteristic be all described, but, as long as there is not lance in the combination of these technical characteristics Shield, is all considered to be the scope that this specification is recorded.

Above example only have expressed the several embodiments of the present invention, and it describes more concrete and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that, for the person of ordinary skill of the art, On the premise of present inventive concept, it is also possible to make some deformation and improvement, these broadly fall into protection scope of the present invention. Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims

1. a contents of visiting cards recognition methods, including:

Obtain business card image；

Detect the text sequence image in described business card image；

To described text sequence image, the topography from head carries out text identification, obtains corresponding head text fragments；

When described text sequence content type is the text sequence content type specified, then described text sequence image is carried out Complete identification obtains corresponding text sequence.

Method the most according to claim 1, it is characterised in that the text sequence image in the described business card image of described detection Including:

Extract the connected domain in described business card image；

Corresponding text sequence image is determined according to described connected domain；

Determine the inclination angle of each connected domain；

Inclination angle according to each connected domain determines the inclination angle of described business card image；

Described business card image travel direction is corrected by the inclination angle according to described business card image, it is thus achieved that each institute corrected through direction State text sequence image.

Method the most according to claim 1, it is characterised in that described determine text sequence according to described head text fragments Content type includes:

Described head text fragments is carried out keyword match and/or format match, determines corresponding text sequence content type.

Method the most according to claim 1, it is characterised in that the described local to described text sequence image from head Image carries out text identification, obtains corresponding head text fragments and includes:

The sequence of individual character image it is syncopated as from described text sequence image；

Part continuous individual character image from head in the sequence of described individual character image is carried out text identification, obtains corresponding head Portion's text fragments.

Method the most according to claim 4, it is characterised in that described described text sequence image is completely identified Include to corresponding text sequence:

Determine and the sequence of described individual character image removes the individual character figure that described part continuous individual character image from head is remaining Picture；

Remaining individual character image is carried out text identification, is remained local segment accordingly；

The text sequence corresponding with described text sequence image is obtained with described head local segment according to described residue local segment Row.

Method the most according to claim 4, it is characterised in that described be syncopated as individual character figure from described text sequence image The sequence of picture includes:

Along the long limit of described text sequence image according to the minor face than described text sequence image in described text sequence image Short spacing takes candidate's cut-off；

Obtain the cutting confidence level of each candidate's cut-off；

Cut-off is determined according to described cutting confidence level；

From described text sequence image, the sequence of individual character image it is syncopated as according to the cut-off determined.

Method the most according to claim 6, it is characterised in that described determine cut-off bag according to described cutting confidence level Include:

Candidate's cut-off that candidate's cut-off of cutting confidence level local maximum is adjacent is excluded from each candidate's cut-off, according to Remaining candidate's cut-off determines cut-off.

Method the most according to claim 1, it is characterised in that also include:

Obtain business card and share instruction；

Share instruction according to described business card and determine that recipient identifies；

Described text sequence is shared with corresponding described text sequence content type to described recipient and identify corresponding terminal.

9. a contents of visiting cards identification device, it is characterised in that including:

Text sequence pre-identification module, for described text sequence image topography from head is carried out text identification, Obtain corresponding head text fragments；

Text sequence identification module, for determining text sequence content type according to described head text fragments；When described text When sequence content type is the text sequence content type specified, then described text sequence image is completely identified and obtain phase The text sequence answered.

Device the most according to claim 9, it is characterised in that described text sequence detection module includes:

Connected domain extraction module, for extracting the connected domain in described business card image；

Text sequence image determines module, for determining corresponding text sequence image according to described connected domain；

Direction rectification module, for determining the inclination angle of each connected domain；Inclination angle according to each connected domain determines described name The inclination angle of picture；Described business card image travel direction is corrected by the inclination angle according to described business card image, it is thus achieved that Jing Guofang To each described text sequence image corrected.

11. devices according to claim 9, it is characterised in that described text sequence identification module is additionally operable to described head Portion's text fragments carries out keyword match and/or format match, determines corresponding text sequence content type.

12. devices according to claim 9, it is characterised in that described text sequence pre-identification module includes:

Individual character cutting module, for being syncopated as the sequence of individual character image from described text sequence image；

Text head pre-identification module, for carrying out part continuous individual character image from head in the sequence of described individual character image Text identification, obtains corresponding head text fragments.

13. devices according to claim 12, it is characterised in that described text sequence identification module is additionally operable to determine described The sequence of individual character image removes the individual character image that described part continuous individual character image from head is remaining；To remaining list Word image carries out text identification, is remained local segment accordingly；According to described residue local segment and described head local Fragment obtains the text sequence corresponding with described text sequence image.

14. devices according to claim 12, it is characterised in that described individual character cutting module is additionally operable in described text sequence In row image, long limit along described text sequence image takes candidate according to the spacing shorter than the minor face of described text sequence image and cuts Branch；Obtain the cutting confidence level of each candidate's cut-off；Cut-off is determined according to described cutting confidence level；According to the cutting determined Point is syncopated as the sequence of individual character image from described text sequence image.

15. devices according to claim 14, it is characterised in that described individual character cutting module is additionally operable to from each candidate's cutting Point excludes candidate's cut-off that candidate's cut-off of cutting confidence level local maximum is adjacent, according to remaining candidate's cut-off Determine cut-off.

16. devices according to claim 9, it is characterised in that also include:

Business card sharing module, is used for obtaining business card and shares instruction；Share instruction according to described business card and determine that recipient identifies；By institute State text sequence to share with corresponding described text sequence content type to described recipient and identify corresponding terminal.