CN112488106A

CN112488106A - Fuzzy, inclined and watermark-carrying identity card copy element extraction method

Info

Publication number: CN112488106A
Application number: CN202011390772.2A
Authority: CN
Inventors: 袁顺杰; 徐华建; 汤敏伟; 李�真
Original assignee: Tianyi Electronic Commerce Co Ltd
Current assignee: Tianyi Electronic Commerce Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-03-12

Abstract

The invention discloses a fuzzy, inclined and watermark-carrying identity card copy element extraction method, which comprises the following steps: s1, a preprocessing module obtains the coordinates of the certificate photo at four vertexes in the image and obtains the certificate photo according to the coordinates; s2, identifying the watermark and sharpening of the pattern and positioning the key information; s4: correcting the recognition result; meanwhile, the countercheck generation network is adopted to remove the watermark, the countercheck generation network can effectively fade or even remove the watermark coverage, the key information problem is reduced, and the accuracy of text identification is improved; finally, the invention corrects the extracted key information by utilizing the correlation between the key information of the certificate photo and the official administrative division issued by the national statistical bureau, thereby further improving the accuracy rate of text recognition.

Description

Fuzzy, inclined and watermark-carrying identity card copy element extraction method

Technical Field

The invention relates to the technical field of electronic information, in particular to a fuzzy, inclined and watermark-carrying identity card copy element extraction method.

Background

With the development of artificial intelligence technology, Optical Character Recognition (OCR) technology is being applied in large quantities to recognize user uploaded identification photographs to extract key information. In the field of OCR, the method mainly adopts character position positioning based on a target detection algorithm and a character recognition method based on a convolution cyclic neural network at present. The target detection algorithm is used for detecting the area containing the characters in the picture, identifying the characters of the corresponding content by utilizing the convolution cyclic neural network, and finally converting the picture information into character information. The method has good effect under the conditions of good picture definition, horizontal angle and no watermark coverage, but the method has poor identification effect under the conditions of blurring, inclination and watermark coverage.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a fuzzy, inclined and watermark-carrying identity card copy element extraction method.

In order to solve the technical problems, the invention provides the following technical scheme:

the invention relates to a fuzzy, inclined and watermark-carrying identity card copy element extraction method, which comprises the following steps:

s1, a preprocessing module, which preprocesses the input ID copy picture, obtains the coordinates of the ID photo at four vertexes of the picture through a series of digital image processing techniques, and obtains the ID photo according to the coordinates;

s2, watermark removal, sharpening and key information position positioning of the certificate image, obtaining the position of each piece of information by utilizing the relative relation of the key information of the certificate image, finally obtaining the determined position of the key information of name, gender, ethnicity, birth year and month, address and validity period, and intercepting related position pictures to generate corresponding information entry pictures;

s3, a text recognition module, which uses a convolution circulation neural network to recognize each information item, firstly extracts item picture characteristics through the convolution network, then extracts context information through the convolution neural network, and finally outputs character information in the item picture by using a CTC model;

s4: and (4) correcting the recognition result, and performing post-processing on the predicted text information by using a standard library of national administrative divisions and an identity card combination rule to obtain a final text recognition result.

As a preferred embodiment of the present invention, the step S1 includes the following steps:

s1.1, highlighting a main body contour of the verification map by using a binarization technology for the input verification map;

s1.2, removing particle noise in the binary image obtained in the S1.1 by using a corrosion and expansion technology, wherein the position of the license image is clearer and more complete;

s1.3, detecting the picture obtained in the step S1.2 by using a rectangular frame detection technology, removing a rectangle with a small area, and obtaining a rectangular frame of the certificate position and four vertex coordinates of the rectangular frame;

s1.4, calculating the inclination angle of the certificate photo according to the relative positions of the four vertexes, and performing projection affine change according to the inclination angle to obtain a horizontal rectangular frame;

s1.5, according to the pixel distribution rule of the certificate photo, the national emblem face and the human face of the certificate photo are judged by utilizing the pixel mean value and the variance value in the specific area.

As a preferred embodiment of the present invention, the step S2 includes the following steps:

s2.1, carrying out item disintegration operation on the verification picture to highlight the texture of the handwriting and improve the recognition rate of the fuzzy picture;

s2.2, generating a countermeasure network by using conditions to remove the watermark in the certificate image to obtain the certificate image after the watermark is removed;

and S2.3, determining an entry area needing to extract key information according to the relative position of each module of the certificate photo, and finally dividing a certificate graph into a plurality of area graphs only containing key character information.

As a preferred embodiment of the present invention, the step S3 includes the following steps:

s3.1, adjusting the regional image obtained in the step S2.3 into an image with the height of 32 pixels, inputting the image into a convolutional neural network, and extracting characteristics to obtain a two-dimensional tensor of the corresponding image;

s3.2, performing context analysis on the tensor obtained in the step S3.1 by using a bidirectional circulation neural network, predicting the probability of belonging to a certain character in a fixed width by using a full connection layer, and then, converting the probability into an output character;

and S3.3, finally, aligning and de-duplicating the characters obtained in the step S3.2 by using a CTC algorithm to obtain an output text of the model.

As a preferred embodiment of the present invention, the step S4 includes the following steps:

s4.1, correcting the certificate photo and the birthday by taking a group with higher prediction probability according to the rule that the 7 th to 14 th certificate numbers are the birthday;

s4.2, correcting the 'address' and the 'issuing authority', calculating the predicted address and the edit distance between the predicted issuing authority and the standard administrative district by using a national administrative district table issued by an official, and taking the administrative district address with the minimum edit distance;

and S4.3, correcting the validity period, namely correcting the date of validity by using the validity period of the certificate photo as 5 years, 10 years, 20 years and a long term.

Compared with the prior art, the invention has the following beneficial effects:

the invention utilizes the relative position address of the certificate photo to determine the position of the key information, solves the problem of inaccurate text positioning under the condition of watermark covering in the prior art, and improves the accuracy rate of text positioning; meanwhile, the countercheck generation network is adopted to remove the watermark, the countercheck generation network can effectively fade or even remove the watermark coverage, the key information problem is reduced, and the accuracy of text identification is improved; finally, the invention corrects the extracted key information by utilizing the correlation between the key information of the certificate photo and the official administrative division issued by the national statistical bureau, thereby further improving the accuracy rate of text recognition.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart illustrating a fuzzy, skewed, watermarked identification card copy text recognition method according to an exemplary embodiment of the present invention;

FIG. 2 is a diagram illustrating a pre-processing module in accordance with an exemplary embodiment of the present invention;

FIG. 3 is a flow chart illustrating watermark removal, sharpening, and key information location positioning according to an exemplary embodiment of the present invention;

FIG. 4 is a flow diagram illustrating a convolutional recurrent neural network module for text recognition of an input information item, giving an image a recognized image, in accordance with an illustrative embodiment of the present invention;

FIG. 5 is a flowchart illustrating a post-processing module according to an exemplary embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Example 1

The embodiment of the invention provides a method for identifying a copy text of a fuzzy, inclined and watermarked identity card, which comprises the following steps: firstly, obtaining a level verification picture by utilizing binaryzation, corrosion, expansion, rectangle detection and affine transformation of a traditional digital image processing method; then, removing watermark interference by using a countermeasure generation network, and intercepting key information items according to the certificate photo-alignment position relation; then, text prediction is carried out on each item by utilizing a convolution cyclic neural network; and finally, correcting the prediction result by using the rule to obtain the final identified key information text.

FIG. 1 is a flow chart illustrating a fuzzy, skewed, watermarked identification card copy text recognition method according to an exemplary embodiment, the method comprising, as shown in FIG. 1, the steps of:

specifically, the preprocessing of the input identity card copy is an important process, the position and the inclination angle of the identity card photo in the input picture are always not fixed, and only the main part of the identity card photo needs to be concerned in order to identify key character information in the input picture. In order to obtain the main body part of the certificate photo, the certificate photo has obvious rectangular edge textures, four vertex coordinates of the certificate photo in an input picture can be found through Hough transformation, the inclination angle of the certificate photo can be calculated according to the four vertex coordinates, and a horizontal certificate photo can be obtained by scratching after rotation.

FIG. 2 is a pre-processing module, shown with reference to FIG. 2, according to an exemplary embodiment, including the steps of:

specifically, for inputting an identity card copy, the pixel value range of the background is small, the pixel value of the identity card main body is high, and the outline of the identity card main body can be protruded through the binary input image;

specifically, the digital image processing technology corrosion and expansion can effectively remove 'residues' in the binary image obtained in S1.1, and the main body outline of the certificate photo is not affected.

Specifically, the erosion operation is to find a local minimum, and mathematically, the image is convolved with a kernel, the minimum of the kernel coverage area is calculated, and the minimum is assigned to a reference point. The erosion operation gradually reduces the highlight areas in the image. The dilation operation is the opposite of the erosion operation, and local maxima are found, and dilation will expand the highlight region of the picture.

specifically, the contour detection module mainly uses an open-source opencv library function findContours to detect the contour map obtained in the step S1.2, and obtains the vertex coordinates of the identification photo contour;

specifically, the inclination angle of the certificate photo in the picture can be calculated through the four vertex coordinates obtained in the step S1.3, the input picture is rotated counterclockwise by a corresponding angle, the coordinate positions of the four rotated vertex coordinates are calculated, the picture of the corresponding position is intercepted, and a horizontal certificate photo can be obtained;

s1.5, according to the pixel distribution rule of the certificate photo, the national emblem surface and the human image surface of the certificate photo are judged by utilizing the pixel mean value and the variance value in the specific area.

Specifically, the portrait face and the national emblem face of the identification photo can be obtained in step S1.4, and the portrait face and the national emblem face can be determined by using rules by using different pixel distributions of the portrait face and the national emblem face. Specifically, the pixel value of the upper right half of the human image plane is lower than that of the other three parts; the pixel value of the upper left half of the national emblem is lower than the other three parts. The exact faces of the license map, namely the human image face and the national emblem face, can be obtained by utilizing the rule.

S2, watermark removal, sharpening and key information position positioning of the certificate image, obtaining the position of each piece of information by utilizing the relative relation of the key information of the certificate image, and finally obtaining the determined position of the key information such as name, gender, ethnicity, birth year and month, address, validity period and the like;

specifically, the step S1 obtains a horizontal standard license map, but the interference of blurring and watermark covering is not solved, the conditional countermeasure generation network can learn the watermark characteristics and effectively remove the watermark covering, and the sharpening of the image can strengthen the detail of the image handwriting texture and reduce the blurring interference. Finally, by utilizing the uniqueness of the certificate photo typesetting and the relative fixed position of the key information text, the items of the key information such as name, gender, nationality, year and month of birth, address, validity period and the like can be intercepted.

FIG. 3 is a flow chart illustrating watermark removal, sharpening, and key information location positioning according to an exemplary embodiment, and is described with reference to FIG. 3, which includes the following steps:

s2.1, generating a countermeasure network by using conditions to remove the watermark in the certificate image to obtain the certificate image after removing the watermark

Specifically, the training of the conditional countermeasure network needs to be completed in advance, the Unet network is adopted as a generation model of the conditional countermeasure network, a large number of license pairs with watermarks and without watermarks are prepared in advance, namely one license pair without watermarks is added randomly to form a pair. Then training the network, and finally inputting a new standard certificate image to effectively remove watermark interference;

s2.2, carrying out item disintegration operation on the verification picture to highlight the texture of the handwriting and improve the recognition rate of the fuzzy picture;

specifically, the contour of the image is compensated, the edge of the image and the part with gray level jump are enhanced, the image becomes clear, and the image is divided into two types of spatial domain processing and frequency domain processing. Image sharpening is to highlight edges, contours, or features of some linear target elements of a terrain on an image. This filtering method improves the contrast between the feature edges and the surrounding picture elements and is therefore also referred to as edge enhancement.

Specifically, when information items are acquired, the items of "issuing authority" and "address" may be 4 rows at most, and in order to acquire all information to the maximum extent, 4 rows are respectively intercepted for the two items, and then a long item is spliced. Through the step, the invention obtains the item pictures of each key information.

S3, identifying each information item by using a convolutional recurrent neural network, firstly extracting the characteristics of the item picture by using the convolutional network, then extracting context information by using the recurrent neural network, and finally outputting character information in the item picture by using a CTC model;

specifically, the present embodiment uses a Convolutional Recurrent Neural Network (CRNN) as a model proposed in the "an end-to-end feasible neural network for image-based sequence recognition and mapping implementation for scene recognition", which can recognize indefinite-length texts and has a good effect on chinese and english number recognition.

Specifically, the convolutional neural network in step S3 needs to be trained in advance, and the embodiment performs fine tuning on a new data set by using a pre-trained model, so that the method has better accuracy on the copy scenario of the identification card. Specifically, the new data set refers to an identification card copy picture, information in the picture needs to be manually marked, and the part of marked data is used for model fine adjustment and plays a very critical role in model training.

FIG. 4 is a flowchart illustrating a convolutional recurrent neural network module for text recognition of an input information item, providing image recognition, according to an exemplary embodiment, and referring to FIG. 4, including the following steps:

s3.1, adjusting the area graph obtained in the step S2.3 into a picture with the height of 32 pixels, inputting the picture into a convolutional neural network, and extracting features to obtain a feature graph corresponding to the picture;

specifically, the convolution part uses VGG as a basic module to perform downsampling on the input picture for 5 times, the height of the finally output feature map is 1, the number of channels is 512, that is, the feature map becomes a vector, wherein the length of each 32 pixels of the input map is responsible for predicting one character in the region.

In this embodiment, as a preferred embodiment, the convolutional neural network used in step S3.1 is a VGG19 network. In other embodiments, other forms of convolutional neural networks such as resnet may be used.

S3.2, performing context analysis on the feature vector obtained in the step S3.1 by using a bidirectional cyclic neural network, predicting the probability of belonging to a certain character in a fixed width by using a full-connection layer, and then converting the probability into an output character;

specifically, the number of channels in step S3.1 is 512, so the input vector of the step is 512 dimensions, and the character feature sequence forms a time sequence, and the context of the text can be effectively extracted by using the bidirectional long-short memory neural network, so that the accuracy of text prediction is improved.

Specifically, a full connection layer is connected behind the bidirectional long and short sequence time network and used for predicting characters, the number of output nodes of the full connection layer is 5529, and the full connection layer comprises all Chinese characters, English capital and small case characters, numbers and special symbols. And finally, taking the character corresponding to the node with the maximum probability value as a predicted character.

S3.3: and (5) aligning and de-duplicating the characters obtained in the step (S3.2) by using a CTC algorithm to obtain an output text of the model.

Specifically, the prediction result of step S3.2 is not the final prediction result, and the CTC algorithm may accept a sequence of indefinite length and output a new sequence by calculating a maximum value of conditional probability, thereby solving the problem of misalignment of the output sequence of step S3.2.

S4: and (4) correcting the recognition result, and performing post-processing on the predicted text information by using a combination rule of a standard library and a certificate photo of the national administrative division to obtain a final text recognition result.

Specifically, step S3 may obtain the text of each entry, and the present invention performs one-step post-processing on the prediction result according to the identity card formation rule, so that the output result is more accurate.

FIG. 5 is a flowchart illustrating a post-processing module according to an exemplary embodiment, including the steps of:

specifically, the output text in the step 3 is attached with a probability value, the rule that the 7 th to 14 th positions of the certificate number are the year and month of birth is utilized, and the higher probability is taken as a prediction result, for example, the probability of the year and month of birth is greater than the probability of the 7 th to 14 th positions of the certificate number, so that the 7 th to 14 th positions of the predicted certificate number are changed into the prediction result of the year and month of birth;

specifically, the 'address' and the 'issuing organization' belong to an administrative division, the method takes the administrative division issued by the national statistical bureau as a standard, calculates the edit distance between the prediction result and the standard division, and takes the standard administrative division with the minimum edit distance as the prediction result;

s4.3, correcting the validity period, namely correcting the date of validity period by using the validity period of the certificate photo as 5 years, 10 years, 20 years and a long period;

specifically, the age limit of the validity period of the identity card is generally 5 years, 10 years, 20 years or long term, and the validity period is corrected by utilizing the rule. Specifically, the year, month, day of the validity period start time and the year, month and day of the deadline time are taken to respectively correspond to the prediction results with high prediction probability.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A fuzzy, inclined and watermark identity card copy element extraction method is characterized by comprising the following steps:

2. The method for extracting blurred, inclined and watermarked identification card copy elements according to claim 1, wherein the step S1 comprises the following steps:

3. The method for extracting blurred, inclined and watermarked identification card copy elements according to claim 2, wherein the step S2 comprises the following steps:

4. The method of claim 3, wherein the step S3 comprises the following steps:

5. The method for extracting blurred, inclined and watermarked identification card copy elements according to claim 1, wherein the step S4 comprises the following steps: