CN114677769A

CN114677769A - Method and device for identifying copied certificate, computer equipment and storage medium

Info

Publication number: CN114677769A
Application number: CN202210369155.7A
Authority: CN
Inventors: 易苗
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-06-28

Abstract

The embodiment of the application belongs to the technical field of image processing in artificial intelligence, and relates to a method and a device for identifying a copied certificate, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and the certificate image to be identified and the certificate identification result of the user can be stored in the block chain. According to the method and the device, when the copied document identification model is trained, the LBP characteristic diagram is introduced as auxiliary information for model training, the training data is subjected to edge expanding training according to different multiplying powers of 1, 1.5, 3 times and the like, part of background information except the identity card area can be utilized, dispersed distinguishing clues are dynamically adapted, the trained model can identify the document image according to the texture characteristic information implied by the LBP characteristic diagram, and the robustness of the copied document identification model is effectively enhanced.

Description

Method and device for identifying copied certificate, computer equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence acquired image processing, in particular to a method and a device for identifying a copied certificate, computer equipment and a storage medium.

Background

The certificate copying identification can identify whether the identity card photo is a copy or a screen copying picture instead of an identity card real object, so that the photographed identity card photo is prevented from being illegally used by lawless persons, and the method is a research field with a high landing practical value. The copying detection of the RGB picture can only be distinguished by the information of the image itself, and it faces a great challenge in the actual open scene.

The existing identification method of the copied certificate is to construct an image classification model to classify and detect RGB images according to the thought of an image classification task, thereby realizing the purpose of identifying the copied certificate.

However, the applicant finds that the traditional identification method for the copied document is generally not intelligent, and because different attack media, different image imaging devices, input resolution and other influence factors often have direct influence on the classification result, the simple image classification model is difficult to deploy and apply in each common situation, and certain scene robustness is lacked, so that the traditional identification method for the copied document has the problem of being lacked in certain scene robustness.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for identifying a copied document, computer equipment and a storage medium, so as to solve the problem that a certain scene robustness is lacked in the traditional method for identifying the copied document.

In order to solve the above technical problem, an embodiment of the present application provides a method for identifying a copied document, which adopts the following technical scheme:

acquiring sample data, wherein the sample data comprises a positive sample image marked with a real picture and a negative sample image marked with a copied picture;

preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image;

inputting the initial positive sample image and the initial negative sample image into an initial model for model training operation to obtain a copying certificate identification model based on LBP characteristics, wherein the initial model consists of a MobilenetV3 network, an LBPGenerator module, a characteristic extraction module and a full connection layer;

acquiring a certificate image to be identified;

inputting the certificate image to be identified into the LBP generator module for LBP feature conversion to obtain an LBP feature map of the certificate to be identified;

inputting the LBP characteristic diagram of the certificate to be identified into the copied certificate identification model based on the LBP characteristics for texture characteristic identification operation to obtain a texture characteristic identification result;

if the texture feature identification result has obvious texture difference features, confirming that the certificate to be identified is a copied certificate;

and if the texture feature identification result does not have obvious texture difference features, determining that the certificate to be identified is an original certificate.

In order to solve the above technical problem, an embodiment of the present application further provides a device for recognizing a copied document, which adopts the following technical scheme:

the system comprises a sample data acquisition module, a data storage module and a data processing module, wherein the sample data acquisition module is used for acquiring sample data, and the sample data comprises a positive sample image marked with a real picture and a negative sample image marked with a copied picture;

the preprocessing module is used for preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image;

the model training module is used for inputting the initial positive sample image and the initial negative sample image into an initial model to perform model training operation to obtain a copied certificate recognition model based on LBP characteristics, wherein the initial model consists of a MobilenetV3 network, an LBPGenerater module, a characteristic extraction module and a full connection layer;

the certificate to be identified acquisition module is used for acquiring a certificate image to be identified;

the LBP feature conversion module is used for inputting the certificate image to be identified to the LBP generator module for LBP feature conversion to obtain a certificate LBP feature map to be identified;

the texture feature recognition module is used for inputting the LBP feature map of the certificate to be recognized into the copied certificate recognition model based on the LBP features to perform texture feature recognition operation so as to obtain a texture feature recognition result;

the first result module is used for confirming that the certificate to be identified is a copied certificate if the texture feature identification result has obvious texture difference features;

and the second result module is used for confirming that the certificate to be identified is the original certificate if the texture feature identification result does not have obvious texture difference features.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

the identification method comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and the processor realizes the steps of the identification method of the copied document when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the method of recognizing a copied document as described above.

The application provides a method for identifying copied certificates, which comprises the following steps: acquiring sample data, wherein the sample data comprises a positive sample image marked with a real picture and a negative sample image marked with a copied picture; preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image; inputting the initial positive sample image and the initial negative sample image into an initial model for model training operation to obtain a copying certificate identification model based on LBP characteristics, wherein the initial model consists of a MobilenetV3 network, an LBPGenerator module, a characteristic extraction module and a full connection layer; acquiring a certificate image to be identified; inputting the certificate image to be identified into the LBP generator module for LBP feature conversion to obtain an LBP feature map of the certificate to be identified; inputting the LBP characteristic diagram of the certificate to be identified into the copied certificate identification model based on the LBP characteristics for texture characteristic identification operation to obtain a texture characteristic identification result; if the texture feature identification result has obvious texture difference features, confirming that the certificate to be identified is a copied certificate; and if the texture feature identification result does not have obvious texture difference features, determining that the certificate to be identified is an original certificate. Compared with the prior art, when the copied document recognition model is trained, the LBP characteristic diagram is introduced as auxiliary information of model training, the training data is subjected to edge extension training according to different multiplying powers of 1 time, 1.5 times, 3 times and the like, partial background information except the identity card area can be utilized, dispersed distinguishing clues can be dynamically adapted, the trained model can recognize the document image according to the texture characteristic information implied by the LBP characteristic diagram, and the robustness of the copied document recognition model is effectively enhanced.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flowchart illustrating an implementation of a method for recognizing a copied document according to an embodiment of the present application;

FIG. 3 is a flow chart of one embodiment of a network model and use of supervisory information provided in an embodiment of the present application;

FIG. 4 is a flowchart of one embodiment of step S202 in FIG. 2;

FIG. 5 is a flowchart of one embodiment of step S203 in FIG. 2;

FIG. 6 is a flowchart of one embodiment of acquiring a subimage of a document to be recognized according to the present application;

FIG. 7 is a flowchart of one embodiment of obtaining a target gradient map prediction model according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a device for recognizing a copied document according to an embodiment of the present application;

FIG. 9 is a schematic diagram of one embodiment of the pre-processing module 220 of FIG. 8;

FIG. 10 is a block diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the method for recognizing the copied document provided in the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the copied document recognition apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to fig. 2, a flowchart of an implementation of a method for recognizing a copied document according to an embodiment of the present application is shown, and for convenience of description, only the portion related to the present application is shown.

The identification method of the copied certificate comprises the following steps: step S201, step S202, step S203, step S204, step S205, and step S206.

Step S201: and acquiring sample data, wherein the sample data comprises a positive sample image marked with a real picture and a negative sample image marked with a copied picture.

Step S202: and preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image.

In this embodiment of the application, the preprocessing operation may be to use an identification card detector (Retinaface) to obtain an identification card coordinate frame in the image, then expand the identification card frame by 1 time, 1.5 times, and 3 times according to a certain ratio, and scale the identification card pictures with different background sizes to a size of 224 × 224.

Step S203: inputting the initial positive sample image and the initial negative sample image into an initial model for model training operation to obtain a copying certificate identification model based on LBP characteristics, wherein the initial model consists of a MobilenetV3 network, an LBPGenerater module, a characteristic extraction module and a full connection layer.

In the embodiment of the application, after the copied picture and the real picture are converted to generate the LBP feature map, we find that the real LBP feature map and the copied identity card LBP feature map have obvious texture difference, so that we consider that the LBP feature maps of the real id card and the copied identity card are distinguishable, we add the LBP feature map as a supervision signal in model training, and the use of specific supervision information and a network model are shown in fig. 3 below.

In the embodiment of the present application, considering that the effective discrimination information of the real/copy id card is not necessarily completely distributed in the id card area, and may also appear in the background area (such as moire, border, etc.), we use the backbone network (MobilenetV3) with attention mechanism to extract the picture features, and differentiate the picture features into two branches in the latter half of the network, where one branch uses the LBP generator module to generate the LBP feature map, and the other branch obtains the feature vector with 512 dimensions. And performing feature extraction on the identity card picture subjected to edge expansion of each scale by using the network, splicing all 512-dimensional features (3 512-dimensional features) to obtain 1536-dimensional features, and finally obtaining a 2-dimensional feature vector through a full connection layer.

In the embodiment of the present application, two kinds of loss functions are adopted in the type training, which are an LBP characteristic loss function and a softmax loss function respectively.

Step S204: and acquiring an image of the certificate to be identified.

In the embodiment of the application, the acquisition of the to-be-identified certificate image may be acquired in real time through an image acquisition terminal, and the to-be-identified certificate image may also be acquired by sending data carrying the to-be-identified certificate image through a user terminal.

Step S205: and inputting the certificate image to be identified into an LBP generator module for LBP characteristic conversion to obtain an LBP characteristic diagram of the certificate to be identified.

Step S206: and inputting the LBP characteristic diagram of the certificate to be identified into a copied certificate identification model based on the LBP characteristics to carry out texture characteristic identification operation, so as to obtain a texture characteristic identification result.

Step S207: and if the texture feature identification result has obvious texture difference features, determining the certificate to be identified as a copied certificate.

Step S208: and if the texture feature identification result does not have obvious texture difference features, determining that the certificate to be identified is the original certificate.

In the embodiment of the application, after the real identity card picture and the copied identity card picture are converted into the LBP characteristic graph, the LBP characteristic graph of the copied identity card has obvious texture characteristics, but the LBP characteristic graph of the real identity card has no obvious characteristics.

In an embodiment of the present application, a method for identifying a copied document is provided, including: acquiring sample data, wherein the sample data comprises a positive sample image marked with a real picture and a negative sample image marked with a copied picture; preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image; inputting the initial positive sample image and the initial negative sample image into an initial model for model training operation to obtain a copying certificate identification model based on LBP characteristics, wherein the initial model consists of a MobilenetV3 network, an LBPGenerater module, a characteristic extraction module and a full connection layer; acquiring a certificate image to be identified; inputting the certificate image to be identified into an LBP generator module for LBP characteristic conversion to obtain an LBP characteristic diagram of the certificate to be identified; inputting the LBP characteristic diagram of the certificate to be recognized into a copied certificate recognition model based on the LBP characteristics to perform texture characteristic recognition operation, and obtaining a texture characteristic recognition result; if the texture feature identification result has obvious texture difference features, confirming that the certificate to be identified is a copied certificate; and if the texture feature identification result does not have obvious texture difference features, determining that the certificate to be identified is the original certificate. Compared with the prior art, when the copied document recognition model is trained, the LBP characteristic diagram is introduced as auxiliary information of model training, the training data is subjected to edge extension training according to different multiplying powers of 1 time, 1.5 times, 3 times and the like, partial background information except the identity card area can be utilized, dispersed distinguishing clues can be dynamically adapted, the trained model can recognize the document image according to the texture characteristic information implied by the LBP characteristic diagram, and the robustness of the copied document recognition model is effectively enhanced.

Continuing to refer to fig. 4, a flowchart of one embodiment of step S202 of fig. 2 is shown, and for ease of illustration, only the portions relevant to the present application are shown.

In some optional implementation manners of this embodiment, the step S202 specifically includes: step S401 and step S402.

Step S401: and carrying out size adjustment processing on the positive sample image and the negative sample image according to a mode of keeping the length-width ratio unchanged to obtain a standard-size positive sample image and a standard-size negative sample image.

In the embodiment of the present application, the resizing process refers to a process of resizing a positive sample image, the resizing process always keeps the aspect ratio of the image unchanged, specifically, the image enlargement may use an interpolation method, that is, a suitable interpolation algorithm is used to insert new elements between pixel points based on the original image pixels, the interpolation algorithm includes, for example, an edge-based image interpolation algorithm, a region-based image interpolation algorithm, and other known or future-developed algorithms, and the reduced image may be interpolated, for example, using CV _ input _ AREA.

Step S402: and segmenting the standard-size positive sample image and the standard-size negative sample image according to a preset size to obtain an initial positive sample image and an initial negative sample image.

In the embodiment of the present application, since the input of the quality detection model needs to satisfy a certain size requirement, when an oversized image is input to the quality detection model, the processing efficiency and the processing accuracy of the model are affected, so that the input data of the quality detection model conforms to the requirement of the model by performing the normalization operation on the positive sample image or the negative sample image through the size adjustment processing and the segmentation processing, where the preset size may be dynamically adjusted according to the actual situation, for example, the preset size may be 224 × 224, and it should be understood that the example of the preset size is only for convenience of understanding and is not used to limit the present application.

Continuing to refer to fig. 5, a flowchart of one embodiment of step S203 of fig. 2 is shown, and for ease of illustration, only the portions relevant to the present application are shown.

In some optional implementation manners of this embodiment, the step S203 specifically includes: step S501, step S502, step S503, and step S504.

Step S501: fourier transformation operation is carried out on the initial sample data to obtain reference characteristic data F_G。

Step S502: according to the LBP generator module, LBP characteristic prediction operation is carried out on the initial sample data to obtain predicted characteristic data F_p。

Step S503: from reference characteristic data F_GAnd predicting the feature data F_pConstructing a characteristic loss function L_LBPCharacteristic loss function L_LBPExpressed as:

wherein, F_pRepresenting predicted feature data; f_GRepresenting reference characteristic data.

Step S504: according to a characteristic loss function L_LBPAnd training the LBPGenerator module until the LBPGenerator module is converged to obtain the trained LBPGenerator module.

In the embodiment of the present application, the loss function L2 of the LBPGenerator module performs a loss calculation on the difference between the actual LBP feature map and the generated LBP feature map, so that the trained LBPGenerator module can predict the same LBP image features as the real feature map.

In some alternative implementations of this embodiment, the loss function L of the full connection layer_softmaxExpressed as:

wherein yi represents the real label of the sample, fj represents the jth confidence of the output class, and N represents the number of training samples.

Continuing to refer to FIG. 6, a flow chart of one embodiment of acquiring a document sub-image to be identified is shown according to an embodiment of the present application, and for convenience of illustration, only the portion relevant to the present application is shown.

In some optional implementations of this embodiment, after step S104 and before step S105, the method further includes: step S601, step S602, and step S603, wherein step S105 specifically includes: step S604.

Step S601: and carrying out self-adaptive binarization operation on the certificate image to be identified to obtain a binarization certificate image.

In the embodiment of the application, the binarization algorithm is to take a threshold value, and to display white when the value of the pixel point greater than the threshold value is 1, and to display black when the value of the pixel point less than the threshold value is naturally assigned to be 0. The quality of this threshold determination directly affects the effectiveness.

In the embodiment of the present application, the adaptive binarization threshold may be an average value of the whole picture; the adaptive binarization threshold value can also be the average value of all pixel points of the row and the column where each pixel point is located; the adaptive binarization threshold may also be an average value of all pixel points in a matrix region where each pixel point is located, and it should be understood that the example of the adaptive binarization threshold herein is only for convenience of understanding and is not used to limit the present application.

In some optimized embodiments of the application, the document image to be recognized may be divided into a smooth region (gray values of pixel points are in a small interval and present a distribution curve protruding in the middle) and a detail region (gray values of pixel points are in a large interval and present an extreme distribution rule), and in performing "matrix binarization processing", a parameter β, that is, if (a < mean β), is added in a comparison process between the pixel points and a mean value of all the pixel points in a matrix, where a represents long pixel points and mean represents a pixel mean value.

In the embodiment of the present application, the parameter β may be 0.9.

Step S602: and carrying out corrosion expansion operation on the binary certificate image to obtain a certificate image sideline.

In the embodiment of the application, the erosion dilation operation refers to deleting pixels of the boundary of the binarized document image (erosion part, i.e. white area is made one circle thinner), and adding pixels to the boundary of the binarized document image (dilation part, i.e. white area is made one circle thicker), and the size of the circle is specified by a parameter, and a user can set the value of the parameter according to actual conditions.

Step S603: and carrying out segmentation operation on the certificate image to be identified according to the certificate image sideline to obtain a certificate subimage to be identified.

Step S604: and carrying out LBP feature conversion on the certificate subimage to be identified to obtain an LBP feature map of the certificate to be identified.

In the embodiment of the application, because the size of the document image is irregular, even the document image has an oversize size, when the document frame detection operation is performed on the oversize document image, the efficiency and the accuracy of the document frame detection can be seriously affected, and therefore, the document image to be identified is subjected to targeted area division by performing adaptive binarization operation, corrosion expansion operation and segmentation operation on the document image to be identified, so that a small-size image with a text as a main area is obtained, and the efficiency and the accuracy of the subsequent document frame detection operation are effectively ensured.

Continuing to refer to fig. 7, a flowchart of a specific implementation of obtaining a target gradient map prediction model provided in an embodiment of the present application is shown, and for convenience of illustration, only the portion related to the present application is shown.

In some optional implementations of this embodiment, after step S202 and before step S203, the method further includes: step S701 and step S702, wherein step S203 specifically includes: step S703.

Step S701: and carrying out image enhancement operation on the initial positive sample image and the initial negative sample image to obtain an enhanced positive sample image and an enhanced negative sample image.

In the embodiment of the present application, the specific implementation manner of the image enhancement operation is as follows: respectively randomly cropping 4 pictures, namely, 1, 2, 3 and 4 (random _ cut), then randomly splicing and combining the 4 cropped pictures (random _ combine) to obtain 1 mixed picture, and finally resetting the mixed picture to form a fixed size as an input image of the initial gradient map prediction model.

Step S702: and performing gradient map calculation operation on the enhanced positive sample image and the enhanced negative sample image according to a two-dimensional discrete function derivation method to obtain a positive sample gradient map and a negative sample gradient map.

In the embodiment of the present application, the image gradient may be regarded as a two-dimensional discrete function, and the image gradient is actually a derivative of the two-dimensional discrete function. Specifically, the method comprises the following steps:

image gradient G (x, y) ═ dx (i, j) + dy (i, j);

dx(i,j)＝I(i+1,j)-I(i,j)；

dy(i,j)＝I(i,j+1)-I(i,j)；

where I is the value of an image pixel (e.g., RGB value) and (I, j) is the pixel's coordinates.

Image gradients can also be generally differentiated by median:

dx(i,j)＝[I(i+1,j)-I(i-1,j)]/2；

dy(i,j)＝[I(i,j+1)-I(i,j-1)]/2；

image edges are typically realized by performing gradient operations on the image.

Step S703: and inputting the positive sample gradient map and the negative sample gradient map into the initial model to perform model training operation, so as to obtain the copied certificate recognition model based on the LBP characteristics.

In the embodiment of the application, the image enhancement algorithm Mosaic is introduced, so that the characteristic content of the image of the input model is increased, the training width of the model is increased, the false recognition condition of low document quality is effectively reduced, meanwhile, the recall of the image to be detected, which is shot under difficult scenes such as severe light and the like, is increased, and the overall accuracy and recall rate of document quality recognition are improved.

In summary, the present application provides a method for recognizing a copied document, including: acquiring sample data, wherein the sample data comprises a positive sample image marked with a real picture and a negative sample image marked with a copied picture; preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image; inputting the initial positive sample image and the initial negative sample image into an initial model for model training operation to obtain a copying certificate identification model based on LBP characteristics, wherein the initial model consists of a MobilenetV3 network, an LBPGenerater module, a characteristic extraction module and a full connection layer; acquiring a certificate image to be identified; inputting the certificate image to be identified into an LBP generator module for LBP characteristic conversion to obtain an LBP characteristic diagram of the certificate to be identified; inputting an LBP characteristic diagram of the certificate to be identified into a copied certificate identification model based on LBP characteristics to carry out texture characteristic identification operation, and obtaining a texture characteristic identification result; if the texture feature identification result has obvious texture difference features, confirming the certificate to be identified as a copied certificate; and if the texture feature identification result does not have obvious texture difference features, determining that the certificate to be identified is the original certificate. Compared with the prior art, when the copied document recognition model is trained, the LBP characteristic diagram is introduced as auxiliary information of model training, the training data is subjected to edge extension training according to different multiplying powers of 1 time, 1.5 times, 3 times and the like, partial background information except the identity card area can be utilized, dispersed distinguishing clues can be dynamically adapted, the trained model can recognize the document image according to the texture characteristic information implied by the LBP characteristic diagram, and the robustness of the copied document recognition model is effectively enhanced. Meanwhile, the self-adaptive binarization operation, the corrosion expansion operation and the segmentation operation are carried out on the certificate image to be identified, so that the targeted area division is carried out on the certificate image to be identified, a small-size image with a text as a main area is obtained, and the efficiency and the accuracy of the subsequent text box detection operation are effectively ensured; the image enhancement algorithm Mosaic is introduced, so that the characteristic content of the image of the input model is increased, the training width of the model is increased, the false recognition condition of low document quality is effectively reduced, meanwhile, the recall of the image to be detected shot under difficult scenes such as severe light and the like is increased, and the overall accuracy and recall rate of document quality recognition are improved.

It is emphasized that, in order to further ensure the privacy and security of the image of the document to be recognized and the recognition result of the document, the image of the document to be recognized and the recognition result of the document may also be stored in a node of a block chain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless otherwise indicated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Example two

With further reference to fig. 8, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a device for recognizing a copied document, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.

As shown in fig. 8, the identification device 200 of the copy document of the present embodiment includes: the system comprises a sample data acquisition module 210, a preprocessing module 220, a model training module 230, a certificate to be recognized acquisition module 240, an LBP feature conversion module 250, a texture feature recognition module 260, a first result module 270 and a second result module 280. Wherein:

a sample data obtaining module 210, configured to obtain sample data, where the sample data includes a positive sample image labeled with a real picture and a negative sample image labeled with a copied picture;

the preprocessing module 220 is configured to perform preprocessing operation on the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image;

the model training module 230 is configured to input the initial positive sample image and the initial negative sample image to an initial model for model training operation, so as to obtain a copied document recognition model based on LBP features, where the initial model is composed of a MobilenetV3 network, an LBPGenerator module, a feature extraction module, and a full connection layer;

the certificate to be identified acquisition module 240 is used for acquiring a certificate image to be identified;

the LBP feature conversion module 250 inputs the certificate image to be identified to the LBPGenerater module for LBP feature conversion to obtain an LBP feature map of the certificate to be identified;

the texture feature recognition module 260 is used for inputting the LBP feature map of the certificate to be recognized into a copied certificate recognition model based on the LBP features to perform texture feature recognition operation, so as to obtain a texture feature recognition result;

the first result module 270 is configured to determine that the certificate to be identified is a copied certificate if the texture feature identification result has an obvious texture difference feature;

the second result module 280 is configured to determine that the certificate to be recognized is the original certificate if the texture feature recognition result does not have obvious texture difference features.

In an embodiment of the present application, there is provided a device 200 for recognizing a copied document, including: the sample data obtaining module 210 is configured to obtain sample data, where the sample data includes a positive sample image labeled with a real picture and a negative sample image labeled with a copied picture; the preprocessing module 220 is configured to perform preprocessing operation on the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image; the model training module 230 is configured to input the initial positive sample image and the initial negative sample image to an initial model for model training operation, so as to obtain a copied document recognition model based on LBP features, where the initial model is composed of a MobilenetV3 network, an LBPGenerator module, a feature extraction module, and a full connection layer; the certificate to be identified acquisition module 240 is used for acquiring a certificate image to be identified; the LBP feature conversion module 250 inputs the certificate image to be identified to the LBPGenerater module for LBP feature conversion to obtain an LBP feature map of the certificate to be identified; the texture feature recognition module 260 is used for inputting the LBP feature map of the certificate to be recognized into a copied certificate recognition model based on the LBP features to perform texture feature recognition operation, so as to obtain a texture feature recognition result; the first result module 270 is configured to determine that the certificate to be identified is a copied certificate if the texture feature identification result has an obvious texture difference feature; the second result module 280 is configured to determine that the certificate to be recognized is the original certificate if the texture feature recognition result does not have obvious texture difference features. According to the method and the device, when the copied document identification model is trained, the LBP characteristic diagram is introduced as auxiliary information for model training, the training data is subjected to edge expanding training according to different multiplying powers of 1, 1.5, 3 times and the like, part of background information except the identity card area can be utilized, dispersed distinguishing clues are dynamically adapted, the trained model can identify the document image according to the texture characteristic information implied by the LBP characteristic diagram, and the robustness of the copied document identification model is effectively enhanced.

Continuing to refer to FIG. 9, a schematic diagram of one embodiment of the pre-processing module 220 of FIG. 8 is shown, only relevant portions of which are shown for ease of illustration.

In some optional implementations of this embodiment, the preprocessing module 220 further includes: a size adjustment sub-module 221 and a segmentation sub-module 222, wherein:

the size adjusting sub-module 221 is configured to perform size adjustment processing on the positive sample image and the negative sample image in a manner that the length-width ratio is not changed, so as to obtain a standard-size positive sample image and a standard-size negative sample image;

the segmentation submodule 222 is configured to perform segmentation processing on the standard-size positive sample image and the standard-size negative sample image according to a preset size, so as to obtain an initial positive sample image and an initial negative sample image.

In some optional implementations of this embodiment, the model training module 230 includes: the device comprises a Fourier change submodule, an LBP feature prediction submodule, a loss function construction submodule and a model training submodule, wherein:

a Fourier transform submodule for performing Fourier transform operation on the initial sample data to obtain reference characteristic data F_G；

An LBP feature prediction sub-module, configured to perform an LBP feature prediction operation on the initial sample data according to the LBP generator module to obtain predicted feature data F_p；

A loss function construction submodule for constructing a loss function from the reference feature data F_GAnd predicting the feature data F_pConstructing a characteristic loss function L_LBPCharacteristic loss function L_LBPExpressed as:

wherein, F_pRepresenting predicted feature data; f_GRepresenting reference feature data;

a model training submodule for training the model according to the characteristic loss function L_LBPAnd training the LBPGenerator module until the LBPGenerator module is converged to obtain the trained LBPGenerator module.

In some optional implementations of the embodiment, the above-mentioned copied document identification apparatus 200 further includes: binarization module, corrosion expansion module and segmentation module, above-mentioned LBP feature conversion module includes: an LBP feature transformation submodule, wherein:

the binarization module is used for carrying out self-adaptive binarization operation on the certificate image to be identified to obtain a binarization certificate image;

the corrosion expansion module is used for carrying out corrosion expansion operation on the binary certificate image to obtain a certificate image sideline;

the segmentation module is used for carrying out segmentation operation on the certificate image to be identified according to the certificate image sideline to obtain a certificate subimage to be identified;

and the LBP characteristic conversion sub-module is used for carrying out LBP characteristic conversion on the certificate subimage to be identified to obtain the certificate LBP characteristic diagram to be identified.

In some optimized embodiments of the application, the document image to be recognized may be divided into a smooth region (the gray values of the pixel points are in a small interval and present a distribution curve protruding in the middle) and a detail region (the gray values of the pixel points are distributed in a large interval and present an extreme distribution rule), and in the "matrix binarization processing", a parameter β, that is, if (a < mean β), is added in the process of comparing the pixel points with the mean values of all the pixel points in the matrix, where a represents long pixel points and mean represents the pixel mean values.

In the embodiment of the present application, the parameter β may be 0.9.

In some optional implementations of the embodiment, the above-mentioned copied document identification apparatus 200 further includes: an image enhancement module and a gradient map calculation module, wherein the model training module 230 includes: a model training submodule, wherein:

the image enhancement module is used for carrying out image enhancement operation on the initial positive sample image and the initial negative sample image to obtain an enhanced positive sample image and an enhanced negative sample image;

the gradient map calculation module is used for performing gradient map calculation operation on the enhanced positive sample image and the enhanced negative sample image according to a two-dimensional discrete function derivation method to obtain a positive sample gradient map and a negative sample gradient map;

and the gradient map model training submodule is used for inputting the positive sample gradient map and the negative sample gradient map into the initial model to perform model training operation, so that the copied certificate recognition model based on the LBP characteristics is obtained.

image gradient G (x, y) ═ dx (i, j) + dy (i, j);

dx(i,j)＝I(i+1,j)-I(i,j)；

dy(i,j)＝I(i,j+1)-I(i,j)；

Image gradients can also be generally differentiated by median:

dx(i,j)＝[I(i+1,j)-I(i-1,j)]/2；

dy(i,j)＝[I(i,j+1)-I(i,j-1)]/2；

In the embodiment of the application, the image enhancement algorithm Mosaic is introduced, so that the characteristic content of the image of the input model is increased, the training width of the model is increased, the false recognition condition of low document quality is effectively reduced, meanwhile, the recall of the image to be detected shot under difficult scenes such as severe light is increased, and the overall accuracy and recall rate of document quality recognition are improved.

In summary, the present application provides a device 200 for recognizing copied documents, comprising: the sample data obtaining module 210 is configured to obtain sample data, where the sample data includes a positive sample image labeled with a real picture and a negative sample image labeled with a copied picture; the preprocessing module 220 is configured to perform preprocessing operation on the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image; the model training module 230 is configured to input the initial positive sample image and the initial negative sample image to an initial model for model training operation, so as to obtain a copied document recognition model based on LBP features, where the initial model is composed of a MobilenetV3 network, an LBPGenerator module, a feature extraction module, and a full connection layer; the certificate to be identified acquisition module 240 is used for acquiring a certificate image to be identified; the LBP feature conversion module 250 inputs the certificate image to be identified to the LBPGenerator module for LBP feature conversion to obtain a certificate LBP feature map to be identified; the texture feature recognition module 260 is used for inputting the LBP feature map of the certificate to be recognized into a copied certificate recognition model based on the LBP features to perform texture feature recognition operation, so as to obtain a texture feature recognition result; the first result module 270 is configured to determine that the certificate to be identified is a copied certificate if the texture feature identification result has an obvious texture difference feature; the second result module 280 is configured to determine that the certificate to be recognized is the original certificate if the texture feature recognition result does not have obvious texture difference features. According to the method and the device, when the copied document identification model is trained, the LBP characteristic diagram is introduced as auxiliary information for model training, the training data is subjected to edge expanding training according to different multiplying powers of 1, 1.5, 3 times and the like, part of background information except the identity card area can be utilized, dispersed distinguishing clues are dynamically adapted, the trained model can identify the document image according to the texture characteristic information implied by the LBP characteristic diagram, and the robustness of the copied document identification model is effectively enhanced. Meanwhile, the self-adaptive binarization operation, the corrosion expansion operation and the segmentation operation are carried out on the certificate image to be identified, so that the targeted area division is carried out on the certificate image to be identified, a small-size image with a text as a main area is obtained, and the efficiency and the accuracy of the subsequent text box detection operation are effectively ensured; the image enhancement algorithm Mosaic is introduced, so that the characteristic content of the image of the input model is increased, the training width of the model is increased, the false recognition condition of low document quality is effectively reduced, meanwhile, the recall of the image to be detected shot under difficult scenes such as severe light and the like is increased, and the overall accuracy and recall rate of document quality recognition are improved.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 10, fig. 10 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 300 includes a memory 310, a processor 320, and a network interface 330 communicatively coupled to each other via a system bus. It is noted that only computer device 300 having

components

310 and 330 is shown, but it is understood that not all of the shown components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 310 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 310 may be an internal storage unit of the computer device 300, such as a hard disk or a memory of the computer device 300. In other embodiments, the memory 310 may also be an external storage device of the computer device 300, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 300. Of course, the memory 310 may also include both internal and external storage devices of the computer device 300. In this embodiment, the memory 310 is generally used for storing an operating system and various application software installed on the computer device 300, such as computer readable instructions of a copy identification method. In addition, the memory 310 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 320 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 320 is generally operative to control overall operation of the computer device 300. In this embodiment, the processor 320 is configured to execute computer readable instructions or process data stored in the memory 310, for example, execute computer readable instructions of the identification method of the copied document.

The network interface 330 may include a wireless network interface or a wired network interface, and the network interface 330 is generally used to establish a communication connection between the computer device 300 and other electronic devices.

The application provides a computer equipment, when training and copying certificate recognition model, the LBP characteristic diagram has been introduced as the auxiliary information of model training to expanding the limit training according to different multiplying powers such as 1, 1.5, 3 times to training data, can utilize except the regional partial background information of ID card, the scattered judgement clue of dynamic adaptation, make the model trained can discern the certificate image according to the texture characteristic information that the LBP characteristic diagram implies, effectively strengthen the robustness of copying certificate recognition model.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the method for recognizing copied documents as described above.

The computer-readable storage medium introduces the LBP characteristic diagram as auxiliary information for model training when a copied certificate recognition model is trained, carries out edge extension training on training data according to different multiplying powers of 1, 1.5, 3 times and the like, can dynamically adapt to dispersed discrimination clues by utilizing part of background information except an identity card area, enables the trained model to recognize the certificate image according to the texture characteristic information hidden by the LBP characteristic diagram, and effectively enhances the robustness of the copied certificate recognition model.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A method for identifying a copied certificate is characterized by comprising the following steps:

acquiring a certificate image to be identified;

2. The method for identifying the copied document according to claim 1, wherein the step of preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image includes the following steps:

carrying out size adjustment processing on the positive sample image and the negative sample image according to a mode that the length-width ratio is not changed to obtain a standard-size positive sample image and a standard-size negative sample image;

and segmenting the standard-size positive sample image and the standard-size negative sample image according to a preset size to obtain the initial positive sample image and the initial negative sample image.

3. The method for recognizing the copied document according to claim 1, wherein the step of inputting the initial sample data to an initial model for model training to obtain the copied document recognition model based on the LBP feature includes the following steps:

fourier transformation operation is carried out on the initial sample data to obtain reference characteristic data F_G；

According to the LBP generator module, LBP characteristic prediction operation is carried out on the initial sample data to obtain predicted characteristic data F_p；

According to the reference characteristic data F_GAnd the predicted feature data F_pConstructing a characteristic loss function L_LBPSaid characteristic loss function L_LBPExpressed as:

wherein, F_pRepresenting the predicted feature data; f_GRepresenting the reference feature data;

according to the characteristic loss function L_LBPAnd training the LBPGenerator module until the LBPGenerator module is converged to obtain the trained LBPGenerator module.

4. The method of claim 1, wherein the identification of the copied document is performed by a printerThen, the loss function L of the full link layer_softmaxExpressed as:

5. The identification method of the copied document as claimed in claim 1, wherein after the step of obtaining the image of the document to be identified and before the step of performing the LBP feature conversion on the image of the document to be identified to obtain the LBP feature map of the document to be identified, the method further comprises the following steps:

carrying out self-adaptive binarization operation on the certificate image to be identified to obtain a binarized certificate image;

carrying out corrosion expansion operation on the binarization certificate image to obtain a certificate image sideline;

carrying out segmentation operation on the certificate image to be identified according to the certificate image edge line to obtain a certificate subimage to be identified;

the step of performing the LBP feature conversion on the certificate image to be identified to obtain the certificate LBP feature map to be identified specifically comprises the following steps:

and carrying out the LBP characteristic conversion on the certificate subimage to be identified to obtain the LBP characteristic diagram of the certificate to be identified.

6. The method for recognizing the copied document according to claim 1, wherein after the step of preprocessing the positive sample image and the negative sample image to obtain an initial positive sample image and an initial negative sample image, the method further comprises the following steps before the step of inputting the initial positive sample image and the initial negative sample image to an initial model for model training operation to obtain a copied document recognition model based on LBP features:

performing image enhancement operation on the initial positive sample image and the initial negative sample image to obtain an enhanced positive sample image and an enhanced negative sample image;

performing gradient map calculation operation on the enhanced positive sample image and the enhanced negative sample image according to a two-dimensional discrete function derivation method to obtain a positive sample gradient map and a negative sample gradient map;

the method comprises the following steps of inputting the initial positive sample image and the initial negative sample image into an initial model to perform model training operation, and obtaining a copied document identification model based on LBP characteristics, and specifically comprises the following steps:

and inputting the positive sample gradient map and the negative sample gradient map into an initial model to perform the model training operation, so as to obtain the copied certificate recognition model based on the LBP characteristics.

7. The method for recognizing the copied document according to claim 1, wherein after the step of inputting the LBP feature map of the document to be recognized into the copied document recognition model based on the LBP features to obtain the document recognition result, the method further comprises the following steps:

and storing the certificate image to be identified and the certificate identification result into a block chain.

8. A device for recognizing a copied document, comprising:

9. Computer device, characterized in that it comprises a memory in which computer-readable instructions are stored and a processor which, when executing said computer-readable instructions, carries out the steps of the method for identification of a copy document as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the method of identification of a copy document as claimed in any one of claims 1 to 7.