CN110163193B

CN110163193B - Image processing method, image processing device, computer-readable storage medium and computer equipment

Info

Publication number: CN110163193B
Application number: CN201910228327.7A
Authority: CN
Inventors: 姜媚
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2021-08-06
Anticipated expiration: 2039-03-25
Also published as: CN110163193A

Abstract

The application relates to an image processing method, an apparatus, a computer-readable storage medium and a computer device, wherein the method comprises the following steps: acquiring an image to be processed; inputting the image to be processed into an image processing model for certificate angular point feature extraction; processing the extracted certificate corner features through the image processing model to generate a corner position prediction feature map corresponding to the image to be processed; pixel points in the corner position prediction feature map have pixel values representing the probability of belonging to the certificate corner and correspond to the pixel points in the image to be processed; determining the corner positions in the image to be processed according to the corner position prediction feature map; and positioning a certificate image area based on the corner position in the image to be processed. The scheme provided by the application can improve the accuracy of certificate area division.

Description

Image processing method, image processing device, computer-readable storage medium and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a computer-readable storage medium, and a computer device.

Background

With the development of society, nowadays more and more industries, such as communication, travel, accommodation, and the like, need to perform certificate information auditing. In many current scenes, certificate identification and verification are required after a certificate area is located in an image including a certificate.

However, conventional edge-based document detection algorithms are only applicable to relatively simple background images. For scenes with complex backgrounds or fuzzy edges, many false detection areas are often generated, so that the dividing accuracy of the certificate area is low.

Disclosure of Invention

In view of the above, it is necessary to provide an image processing method, an image processing apparatus, a computer-readable storage medium, and a computer device for solving the technical problem that the accuracy of a document region partitioned by conventional document detection is low.

An image processing method comprising:

acquiring an image to be processed;

inputting the image to be processed into an image processing model for certificate angular point feature extraction;

processing the extracted certificate corner features through the image processing model to generate a corner position prediction feature map corresponding to the image to be processed; pixel points in the corner position prediction feature map have pixel values representing the probability of belonging to the certificate corner and correspond to the pixel points in the image to be processed;

determining the corner positions in the image to be processed according to the corner position prediction feature map;

and positioning a certificate image area based on the corner position in the image to be processed.

An image processing apparatus comprising:

the acquisition module is used for acquiring an image to be processed;

the extraction module is used for inputting the image to be processed into an image processing model to carry out certificate angular point feature extraction;

the generating module is used for processing the extracted certificate corner features through the image processing model and generating a corner position prediction feature map corresponding to the image to be processed; pixel points in the corner position prediction feature map have pixel values representing the probability of belonging to the certificate corner and correspond to the pixel points in the image to be processed;

the determining module is used for determining the corner positions in the image to be processed according to the corner position prediction feature map;

and the positioning module is used for positioning the certificate image area based on the corner position in the image to be processed.

A computer-readable storage medium, in which a computer program is stored which, when executed by a processor, causes the processor to carry out the steps of the above-mentioned image processing method.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the image processing method described above.

According to the image processing method, the image processing device, the computer readable storage medium and the computer equipment, after the image to be processed is obtained, the image to be processed can be input into the image processing model for certificate corner feature extraction, then the extracted certificate corner features are processed through the image processing model, and a corner position prediction feature map corresponding to the image to be processed is generated. Because the pixel points in the obtained corner point prediction characteristic image have pixel values representing the probability of belonging to the certificate corner points and correspond to the pixel points in the image to be processed, whether the pixel points are certificate corner points or not can be judged according to the pixel values of the pixel points in the corner point prediction characteristic image, so that the corner point position in the image to be processed can be determined, the certificate image area can be positioned in the image to be processed, and the accuracy of positioning the certificate area from the image is improved.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of an image processing method;

FIG. 2 is a schematic diagram of a certificate corner included in an image in one embodiment;

FIG. 3 is a flow diagram illustrating a method for image processing according to one embodiment;

FIG. 4 is a schematic flow chart illustrating processing of an image to be processed using an image processing model according to an embodiment;

FIG. 5 is a diagram illustrating a correspondence relationship between a feature map for corner position prediction and pixel points of an image to be processed in an embodiment;

FIG. 6 is a diagram illustrating location prediction features corresponding to each corner of a certificate in an image to be processed according to an embodiment;

FIG. 7 is a schematic diagram of locating a document image area based on corner location in an image to be processed in one embodiment;

FIG. 8 is a data flow diagram of a densely connected network in one embodiment;

FIG. 9 is a flowchart illustrating processing of an image to be processed using a cascaded image processing model according to an embodiment;

FIG. 10 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

FIG. 11 is a block diagram showing a configuration of an image processing apparatus according to another embodiment;

FIG. 12 is a block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

FIG. 1 is a diagram of an embodiment of an application environment of an image processing method. Referring to fig. 1, the image processing method is applied to an image processing system. The image processing system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. Both the terminal 110 and the server 120 can be independently used to perform the image processing method provided in the embodiment of the present application. The terminal 110 and the server 120 may also be cooperatively used to execute the image processing method provided in the embodiment of the present application.

The image processing model according to the embodiment of the present application is a machine learning model having a corner prediction capability after sample learning. The angular point prediction means that the position of the angular point of the certificate is predicted in the image. The number of corner points may be one or more than one. By way of example, FIG. 2 shows a schematic diagram of a corner point of a document included in an image in one embodiment. When the certificate included in the image is an identification card, the image is subjected to corner point prediction, that is, positions of four corner points of the identification card are predicted in the image, that is, four

corner points

201, 202, 203 and 204 are marked in the figure.

The image processing model may adopt a Neural network model, such as a CNN (Convolutional Neural Networks) model. The Network structure in the CNN model may be a densnet (dense Connected Network) structure, a ResNet (Residual Neural Network) structure, or a shuffle Network (recombined channel Network) structure, etc. Of course, other types of models may be used as the image processing model, and the embodiment of the present application is not limited herein.

In the embodiment of the application, a computer device (such as the terminal 110 or the server 120 shown in fig. 1) acquires an image to be processed; inputting an image to be processed into an image processing model to perform certificate angular point feature extraction; processing the extracted certificate corner features through an image processing model to generate a corner position prediction feature map corresponding to the image to be processed; pixel points in the corner position prediction characteristic graph have pixel values representing the probability of belonging to the certificate corner and correspond to the pixel points in the image to be processed; determining the corner positions in the image to be processed according to the corner position prediction feature map; and positioning the certificate image area based on the corner position in the image to be processed. The image processing model may also be trained on the terminal 110 or the server 120. The server 120 may also send the trained image processing model to the terminal 110 for use.

In one embodiment, as shown in FIG. 3, an image processing method is provided. The embodiment is mainly illustrated by applying the method to the terminal 110 (or the server 120) in fig. 1. Referring to fig. 3, the image processing method specifically includes the steps of:

s302, acquiring an image to be processed.

The image to be processed is an image to be subjected to certificate image area positioning. The certificate image area positioning refers to dividing an image area where a certificate is located in an image. Such as identification cards or drivers licenses.

Specifically, the computer device may acquire an image generated locally, and take the image as an image to be processed. The computer device may also crawl images from the network as images to be processed. The computer equipment can also acquire images transmitted by other computer equipment, and the images are taken as images to be processed.

In a specific embodiment, a user can place a certificate in a visual field of a camera built in or connected with the terminal in a scene where identity authentication is required, and a photo including the certificate is acquired through the camera, so that an image to be processed is acquired. For example, in a scene of real-name authentication of an application program, a user places a personal identification card in a visual field range of a camera built in a terminal, acquires a picture including the identification card through the camera, and uploads the acquired picture to a server corresponding to the application program through the terminal to perform real-name authentication. The server obtains the image to be processed and locates the certificate image area from the image to identify the identity.

In one embodiment, the image to be processed may be an image file having a visual modality. Such as image files in the format of JPG, JPEG, or PNG. The image to be processed may also be image data without visual form. For example, a set of values comprising the respective Pixel values of the respective pixels (pixels) is used.

It should be noted that the image to be processed in the embodiment of the present application is not limited to an image including a document image area. When the image to be processed comprises a document image area, the document image area can be positioned in the image to be processed through the embodiment provided by the application. When the image to be processed does not include the certificate image area, the processing result that the image to be processed does not include the certificate image area can be obtained after the image to be processed is processed through the embodiment provided by the application.

In one embodiment, the image to be processed may be a two-dimensional image or a three-dimensional image.

And S304, inputting the image to be processed into the image processing model for certificate corner feature extraction.

Where an angle refers to where the edges of the objects meet. For example, the identification card includes four sides that meet to form four corners. The corner point is the position of the corner. The certificate corner feature is an inherent feature of an area where the corner of the certificate image is located. The region where the corner point is located may refer to only the pixel position where the corner of the certificate image is located. For example, the pixel positions (20, 30) are regions where corner points are located. The area where the corner point is located can also be a pixel position range which extends outwards by taking the pixel position where the corner of the certificate image is located as a reference point. For example, a circular range area with 20 pixels as a radius is formed by taking the pixel position of the corner of the certificate image as a central point.

It is understood that the image processing model includes a neural network with feature extraction capability, and when the image processing model is trained, the neural network is trained by using a specific sample to learn the certificate corner features of the extracted certificate image. Therefore, in the subsequent use process of the image processing model, the certificate corner feature can be extracted based on the input image to be processed, and the certificate corner position can be predicted according to the extracted certificate corner feature.

Specifically, the computer device can input the feature map of the image to be processed into the image processing model for processing. The feature map may specifically be a color channel feature map. In this way, the feature extraction layer (the neural network with the feature extraction capability) included in the image processing model can extract the certificate corner features based on the input color channel feature map. When the image to be processed is a color image, the color channel feature map of the image to be processed may be an RGB three-color channel feature map; when the image to be processed is a grayscale image, the color channel feature map of the image to be processed may be a grayscale channel feature map.

For example, fig. 4 shows a schematic flow chart of processing an image to be processed by using an image processing model in an embodiment. Referring to fig. 4, it is assumed that the image to be processed is a color image and has a size of H1 × W1. The image to be processed includes three color channel feature maps, which are an R channel feature map, a G channel feature map, and a B channel feature map. And the pixel value of the pixel point in each color channel characteristic graph is the channel value of the pixel point in the color channel. The image processing model takes the three color channel feature maps as input for processing, that is, the feature map size of the model input is 3 × H1 × W1. It is understood that one color here reflects one feature and one color channel is one feature channel. The number of the characteristic channels is the dimension of the characteristic diagram, and at the moment, the number of the characteristic channels input by the model, namely the characteristic dimension is 3.

S306, processing the extracted certificate corner features through an image processing model to generate a corner position prediction feature map corresponding to the image to be processed; and pixel points in the corner position prediction characteristic graph have pixel values representing the probability of belonging to the certificate corners and correspond to the pixel points in the image to be processed.

The corner position prediction feature map is an image for predicting whether pixel points are certificate corners at a pixel level. And the pixel values of the pixel points in the corner position prediction characteristic graph are used for reflecting the probability that the corresponding pixel points belong to the certificate corners. The larger the pixel value of the pixel point in the corner position prediction characteristic image is, the higher the possibility that the corresponding pixel point belongs to the certificate corner is. Of course, in another embodiment, the pixel points in the corner position prediction feature map may also have pixel values representing probabilities that do not belong to the certificate corner, and correspond to the pixel points in the image to be processed. That is, the pixel values of the pixel points in the corner position prediction feature map are used for reflecting the probability that the corresponding pixel points do not belong to the certificate corners. The larger the pixel value of the pixel point in the angular point position prediction characteristic image is, the higher the possibility that the pixel point does not belong to the certificate angular point is. That is, it is only necessary to use a feature map that can distinguish the certificate corner points from the non-certificate corner points by the pixel values of the pixel points.

The corresponding relationship between the pixel points in the angular point position prediction characteristic image and the pixel points in the image to be processed can be a one-to-one corresponding relationship or a one-to-many relationship. That is to say, one pixel point in the corner position prediction feature map may correspond to one pixel point in the image to be processed, or may correspond to a plurality of pixel points in the image to be processed. When the correspondence between the pixel points in the angular point position prediction feature map and the pixel points in the image to be processed is a one-to-one correspondence, the correspondence may be specifically a one-to-one correspondence according to the pixel positions. When the corresponding relationship between the pixel points in the corner position prediction feature map and the pixel points in the image to be processed is a one-to-many relationship, the relationship may be specifically the corresponding relationship according to the relative positions of the pixel points in the image. It can be understood in a popular way that when the image processing model obtains the corner position prediction feature map based on the color channel feature map of the image to be processed, in order to reduce the size of the network model, improve the prediction efficiency, and reduce the image resolution, the number of pixel points of the feature map on the feature channel is reduced. For example, the image resolution of the image to be processed is 8 times of the image resolution of the corner position prediction feature map.

When the corresponding relationship between the pixel points in the corner position prediction feature map and the pixel points in the image to be processed is a one-to-one corresponding relationship, the image to be processed can be predicted pixel by image processing model, and whether each pixel point in the image to be processed is a certificate corner can be obtained in one-to-one correspondence according to the pixel values of the pixel points in the corner position prediction feature map.

For example, fig. 5 shows a corresponding relationship diagram between a corner position prediction feature diagram and pixel points of an image to be processed in an embodiment. Referring to the upper diagram of fig. 5, it can be seen that the image size of the image to be processed is 8 × 8, and the image size of the corner position prediction feature map is 8 × 8. The corresponding relation between the pixel points in the angular point position prediction characteristic image and the pixel points in the image to be processed is a one-to-one corresponding relation, and the pixel points correspond to one-to-one according to the pixel positions. Referring to the lower diagram of fig. 5, it can be seen that the image size of the image to be processed is 8 × 8, and the image size of the corner position prediction feature map is 4 × 4. And when the corresponding relation between the pixel points in the angular point position prediction characteristic graph and the pixel points in the image to be processed is a one-to-many relation, carrying out the corresponding relation according to the relative positions of the pixel points in the image. It is understood that the present embodiment may process a two-dimensional image or a three-dimensional image, but for convenience of viewing, the two-dimensional image is used in the drawings for illustration.

In one embodiment, the corner location prediction feature map may be a frame of image. Then the pixel points in the frame of image have pixel values that represent the probability of belonging to any one of the certificate corners. The corner location prediction feature map may also be a set of images. Then the pixel points in each frame of the set of images have pixel values that represent the probability of belonging to a particular certificate corner.

In one embodiment, the certificate corner features include certificate corner content features. Processing the extracted certificate angular point features through an image processing model to generate an angular point position prediction feature map corresponding to the image to be processed, wherein the angular point position prediction feature map comprises the following steps: processing the extracted certificate angular point content features through an image processing model to generate position prediction feature maps corresponding to all certificate angular points in the image to be processed; the position prediction characteristic diagrams corresponding to the certificate angular points are arranged according to a preset certificate angular point sequence; and pixel points in the position prediction characteristic graph corresponding to each certificate angular point have pixel values representing the probability of belonging to the corresponding certificate angular point and correspond to the pixel points in the image to be processed.

It can be appreciated that due to its high-resolution markability, a document typically has a uniform layout format, a fixed body structure, and fixed content features in the neighborhood of each corner of the document. Based on the characteristic, the problem of positioning the certificate image area is converted into the problem of predicting the position of each corner point of the certificate in the embodiment of the application. Thus, after the position and the sequence of each corner point in the certificate are determined, the image area where the certificate is located can be positioned. The sequence of the corner points in the certificate can be obtained by taking one of the corner points as a starting corner point and sequencing the rest corner points clockwise or counterclockwise. With continued reference to fig. 2, in this embodiment, the corner point at the upper left corner of the identification card is taken as a starting corner point, and the remaining corner points are ordered clockwise to obtain

corner points

201, 202, 203, and 204 arranged in sequence.

Specifically, the certificate corner content features reflect content included in the area where the certificate corner is located. The computer equipment can learn and extract the content characteristics of the certificate corner through a specific sample training image processing model. When the image processing model is used, the certificate angular point content features in the image to be processed can be extracted, and then the extracted certificate angular point content features are continuously processed to obtain position prediction feature maps corresponding to all certificate angular points in the image to be processed. The position prediction characteristic graphs corresponding to all certificate angular points are arranged according to a preset certificate angular point sequence; and pixel points in the position prediction characteristic graph corresponding to each certificate angular point have pixel values representing the probability of belonging to the corresponding certificate angular point and correspond to the pixel points in the image to be processed.

For example, assuming the document is an identification card, the document includes 4 corner points. Then, assuming that the corner point of the upper left corner of the certificate is taken as a starting corner point, sequencing is performed for each corner point in time, the extracted certificate corner point features are processed through an image processing model, so as to obtain position prediction feature maps corresponding to the certificate corner points included in the image to be processed, and the position prediction feature map corresponding to the corner point of the upper left corner of the certificate, the position prediction feature map corresponding to the corner point of the upper right corner of the certificate, the position prediction feature map corresponding to the corner point of the lower right corner of the certificate, and the position prediction feature map corresponding to the corner point of the lower left corner of the certificate are sequentially shown as (1), (2), (3) and (4) in fig. 6. And the pixel value of the pixel point on each image represents the probability that the pixel point belongs to the certificate corner point. For example, the pixel value of a pixel point on the position prediction feature map shown in fig. 6(1) represents the probability that the pixel point belongs to the corner point at the upper left corner of the identity card.

It can be understood that, since the pixel point in the position prediction feature map corresponding to each certificate corner point has a pixel value representing the probability of belonging to the corresponding certificate corner point, and corresponds to the pixel point in the image to be processed. The map conveys information in a manner similar to that of a thermodynamic map (Heatmap), and the location prediction feature map may be colloquially referred to as a location thermodynamic map.

In a specific embodiment, the value range of the pixel value of each pixel point in the position prediction feature map of one corner point is (0, 1). The pixel point with the largest pixel value is an angular point of prediction, and the pixel position of the pixel point is an angular point position of prediction.

In this embodiment, each corner point is predicted to obtain a position prediction feature map corresponding to each corner point, so that interference between corner points is avoided, and accuracy of corner point position prediction is further improved.

In an embodiment, when the image processing model obtains the position prediction feature maps corresponding to the corner points, the image processing model can also obtain the background channel feature map of the image to be processed. The background channel feature map is used to enhance the constraint between the certificate corners when training the image processing model.

With continued reference to fig. 4, assuming that the document is an identification card, the image processing model processes three color channel feature maps (3 × H1 × W1) of the image to be processed, and obtains a feature map with a size of 5 × H2 × W2, i.e., a corner position prediction feature map. The feature maps of the first four channels 4 × H2 × W2 are position prediction feature maps corresponding to the four corner points of the upper left corner, the upper right corner, the lower right corner and the lower left corner of the identity card in sequence. The point with the maximum value on each feature map is the predicted corner position. And the feature map of the last channel is the feature map of the background channel.

And S308, determining the corner position in the image to be processed according to the corner position prediction feature map.

Specifically, the computer device may compare pixel values of pixel points in the corner position prediction feature map, and use the pixel points whose pixel values satisfy the corner selection condition as the corners to obtain the corner positions. The condition for selecting the angular points can be a preset number of pixel points with pixel values sequenced to the front, wherein the preset number is the number of the angular points, and the condition is suitable for predicting scenes of all the angular points through a frame of feature map. The condition for selecting the corner point can be a pixel point with the maximum pixel value, and the condition is suitable for predicting a scene of one corner point through a frame of feature map.

In one embodiment, the corner position may be a position where one pixel is located, or may be a position where a plurality of pixels are located. For example, if the pixel value of the pixel point Q is the largest in the feature map, the point may be determined as an angular point, and then the position of the pixel point may be used as an angular point position, or the position of the pixel point may be used as a reference point and extended outward to obtain the positions of a plurality of pixel points as angular point positions.

S310, positioning a certificate image area based on the corner position in the image to be processed.

Specifically, the computer device may connect the determined positions of the corner points in sequence according to the arrangement order of the corresponding corner points to form a closed polygon. The image area where the closed polygon is located is the certificate image area.

FIG. 7 shows a schematic diagram of locating a document image area based on corner location in an image to be processed in one embodiment. Referring to fig. 7, the computer device determines 4 corner points in the image to be processed, and the four corner points are arranged in the

order

701, 702, 703 and 704. Then the computer device can connect the position 701 with the position 702, connect the position 702 with the position 703, and connect the position 703 with the position 704 in turn to obtain a closed quadrangle 710, which is the certificate image area.

According to the image processing method, after the image to be processed is obtained, the image to be processed is input into the image processing model to be subjected to certificate corner feature extraction, then the extracted certificate corner features are processed through the image processing model, and a corner position prediction feature map corresponding to the image to be processed is generated. Because the pixel points in the obtained corner point prediction characteristic image have pixel values representing the probability of belonging to the certificate corner points and correspond to the pixel points in the image to be processed, whether the pixel points are certificate corner points or not can be judged according to the pixel values of the pixel points in the corner point prediction characteristic image, so that the corner point position in the image to be processed can be determined, the certificate image area can be positioned in the image to be processed, and the accuracy of positioning the certificate area from the image is improved.

In one embodiment, S304 includes: inputting an image to be processed into an image processing model, and extracting the certificate corner feature of the image to be processed layer by layer through a multilayer neural network of a dense connection network in the image processing model; wherein, the output of the dense connection network fuses the output of each layer of neural network included in the dense connection network.

Wherein, a dense Connected Network (dense Connected Network) is a Network structure for realizing feature reuse by adding the output of a preamble layer at each layer input. Specifically, the dense connection network includes a plurality of layers of neural networks, wherein an input of each layer of neural network includes not only an output of an adjacent upper layer of neural network but also an output of a neural network preceding the neural network and/or an input of a first layer of neural network. The output of the dense connection network is the output of the last layer of the neural network of the dense connection network, that is to say, the output of the dense connection network fuses the outputs of the multiple layers of the neural networks included in the dense connection network. Therefore, the dense connection network can better fuse the multi-layer extracted feature information, so that the abundant feature information is more favorable for predicting the position of the certificate corner; and simultaneously, the output characteristic diagram of each layer of the dense connection network can be ensured to be kept at a lower dimension. Thereby improving the network characterization capability and simultaneously reducing the network parameter quantity and the forward speed.

In a specific embodiment, the input of each layer of the neural network in the dense connection network not only comprises the output of the adjacent upper layer of the neural network, but also comprises the output of all layers of the neural networks before the neural network and the input of the first layer of the neural network; that is to say, the output of the dense connection network fuses the outputs of all the layers of neural networks included in the dense connection network, and fuses the feature information extracted by all the layers.

In one embodiment, the certificate corner feature extraction is performed on the image to be processed layer by layer through a multilayer neural network of a dense connection network in the image processing model by inputting the image to be processed into the image processing model, and comprises the following steps: inputting an image to be processed into an image processing model; taking each layer of neural network of the dense connection network as a current layer of neural network in sequence; splicing the output of each layer of neural network in the dense connection network before the current layer of neural network with the input of the first layer of neural network in the dense connection network to obtain the comprehensive input of the current layer of neural network; processing the comprehensive input through the current layer neural network to obtain the output of the current layer neural network until the output of the last layer neural network in the dense connection network is obtained; and taking the output of the last layer of neural network as the output of the dense connection network.

It will be appreciated that the image processing model includes a plurality of layers of neural networks arranged in sequence. After the computer equipment inputs the image to be processed into the image processing model, the multilayer neural network in the image processing model sequentially processes the output of the previous layer of neural network layer by layer, and transmits the processing result to the next layer of neural network for continuous processing. The image processing model may include one or more dense connection networks, one dense connection network may include one or more layers of neural networks, and one layer of neural networks may include one or more layers of networks, among others. The network layer specifically includes a convolutional layer, a normalization layer, or a pooling layer.

Specifically, when the data of the image processing model is transferred to one of the dense connection networks (denoted as D1) of the image processing model, the input of the first layer neural network (denoted as S1) of the dense connection network D1 is the data (denoted as x1) transferred to the image processing model; the data x1 is processed by the first-layer neural network S1 to obtain a processing result, i.e., an output (denoted as y1) of the first-layer neural network S1. The second-layer neural network (denoted as S2) of the dense connection network D1 uses the data x1 and the output y1 of the first-layer neural network S1 as input, and obtains a processing result after data processing, that is, the output (denoted as y2) of the second-layer neural network S2. The third layer of neural network (denoted as S3) of the dense connection network D1 is processed by using the data x1, the output y1 of the first layer of neural network S1, and the output y2 of the second layer of neural network S2 as inputs, and so on until the last layer of neural network (denoted as Sn) of the dense connection network D1 is processed by using the outputs of the n-1 layers of neural networks before the data x1 as inputs, and the output (denoted as yn) of the dense connection network D1 is obtained. The specific data flow direction can be seen with reference to fig. 8. The input of the first layer of neural network in the dense connection network and the output of other neural networks are used as the input of a certain layer of neural network together, and the input and the output can be spliced according to characteristic channels.

It should be noted that, assuming that the image size of the to-be-processed image is H × W, when the to-be-processed image is input into the image processing model, the color channel feature map is input, and then the image size of the color channel feature map is 3 × H × W (since the feature channels are color channels, the number of the feature channels is 3). The color channel characteristic diagram is processed by a network layer included in the image processing model, and the extracted image characteristics are continuously changed, namely the number N of characteristic channels of the output characteristic diagram N H W of the network layer is continuously changed. Of course, the image resolution (H × W) of the feature map may also vary.

For example, assuming that the input of the first layer neural network is 32 eigen channels and the output is 32 eigen channels, the input of the second layer neural network is the concatenation of the input and the output of the first layer, i.e. 32+32 ═ 64 channels. Assuming that the output of the second layer is 32 channels, the input of the third layer is the concatenation of the input and output of the first layer and the output of the second layer, i.e. 32+32+32 ═ 96 channels, and so on. Because the input of the back neural network is the accumulation of the output of the front neural network on the characteristic channel, the number of the characteristic channels output by each layer of neural network can be set to be smaller, so that the network parameters are reduced, and the network calculation speed is improved.

In one embodiment, the number of dense connection networks is more than one; dense connection networks include more than one convolutional layer. The image processing method further includes: for the convolution layers with the preset number and the preset number, which are positioned in the dense connection network with the preset number and the layer sequence with the front network sequence, the output of the convolution layers is respectively input into the parallel batch normalization layer and the parallel example normalization layer, and the batch normalization output and the example normalization output are obtained; the batch normalized output and the instance normalized output are spliced as inputs to a next layer adjacent to the parallel batch normalization layer and the instance normalization layer.

The convolutional layer is a network layer that includes a plurality of convolutional kernels and performs a convolutional operation on input data. The normalization layer is used for limiting data needing to be processed to a certain range of network layers. Normalization can make subsequent data processing more convenient and can also ensure that convergence is accelerated when the model runs.

In one embodiment, the certificate corner features include certificate corner content features and certificate corner appearance features. The certificate angular point content features are the features of content included in the area where the certificate angular point is located, and the certificate angular point apparent features are the shape features and the shape distribution features of the certificate angular point. The Normalization layers include a Batch Normalization (BN) layer and an Instance Normalization (IN) layer. The Batch Normalization (BN) layer focuses on normalizing a Batch of samples (inputs) to make it easier to extract image appearance features. The example Normalization (IN) layer focuses on the Normalization process on a single sample (input), making it easier to extract image content features.

It will be appreciated that typically, a normalization layer is typically connected after a convolutional layer, and that shallow features of CNN networks are typically intended to capture image appearance information and high-level features are intended to capture image content information. In the dense connection network with the preset number, which is in the front of the network sequence, of the image processing model, the batch normalization layer and the example normalization layer are connected in parallel after the convolution layers with the preset number, which are in the front of the layer sequence, so that the model can retain the image appearance characteristics of a shallow layer and does not influence the content characteristics of a high-level captured image. The shallow layer refers to the network layer in the model in the front order, and the high layer refers to the network layer in the model in the back order. The shallower the order is at the front level, the higher the order is at the rear level.

Specifically, for a preset number of convolutional layers in a dense connection network with a preset number of layers in a network sequence before, the computer device can input the output of the convolutional layers into a parallel batch normalization layer and an example normalization layer respectively to obtain a batch normalization output and an example normalization output; and splicing the batch normalization output and the example normalization output according to the characteristic channel to be used as the input of the next layer adjacent to the parallel batch normalization layer and the example normalization layer. For the sequentially-next convolutional layers in the middle layers of the dense connection networks with the preset number of the previous network sequences or the sequentially-next convolutional layers in the dense connection networks with the next network sequences, the computer equipment can directly input the output of the convolutional layers into the batch normalization layer to obtain batch normalized output which is used as the input of the next layer adjacent to the batch normalization layer. The image processing model is assumed to include four dense connection networks, the preset number of dense connection networks in the network order before may be the first two dense connection networks, and the dense connection networks in the network order before are the last two dense connection networks. In the predetermined number of densely connected networks preceding the network sequence, the predetermined number of convolutional layers preceding the layer sequence may be the first convolutional layer.

For example, one of the densely connected networks in the top of the network sequence includes more than one convolutional layer. The first layer of convolution layer is connected with the batch normalization layer and the example normalization layer in parallel, then the batch normalization layer and the example normalization layer are connected to the second layer of convolution layer together, the second layer of convolution layer is connected with only the batch normalization layer, the third layer of convolution layer … is connected with the batch normalization layer, and the subsequent convolution layers are connected with only the batch normalization layer.

In a specific embodiment, the image processing model includes 4 dense connection networks, the first dense connection network including a single convolutional layer followed by a batch normalization layer and an instance normalization layer connected in parallel. The second to the fourth dense connection networks are all Denseblock structures, the first Deneblock structure comprises 2 Bottleneck units, the second Deneblock structure comprises 4 Bottleneck units, and the third Deneblock structure comprises 4 Bottleneck units. With continued reference to fig. 8, a bottleeck cell network structure is shown that includes two convolutional layers, the first including 128 1 × 1 convolutional kernels, and the second including 32 3 × 3 convolutional kernels. And after the input and the output of the Bottleneck unit are spliced according to the characteristic channels, the input and the output of the Bottleneck unit are jointly used as the input of the next Bottleneck unit. It will be appreciated that typically a normalization layer is connected after a convolution layer, and the structure of the normalization layer is not shown in the bottleeck unit. But specifically, the first convolutional layer of the bottleeck unit is followed by the parallel batch normalization layer and instance normalization layer. The outputs of the two normalization layers are spliced according to the characteristic channel and are jointly used as the input of a second convolution layer, and the batch normalization layer is connected after the second convolution layer. In general, it is sufficient to connect the parallel batch normalization layer and the parallel example normalization layer after the first convolution layer of the first two Denseblock structures of the image processing network, but the present application does not limit the number of convolution layers connecting the parallel batch normalization layer and the parallel example normalization layer, and can satisfy the requirement of retaining the superficial apparent information and not affecting the high-level captured content information.

In the embodiment, the example normalization layer is added after the first several layers of convolution layers of the image processing model, so that the image processing model not only retains superficial layer appearance information, but also does not influence high-level captured content information, the feature characterization capability of the image processing model is improved, the accuracy and effectiveness of feature extraction of the image processing model are improved, and the accuracy of corner point prediction is further improved.

In the embodiment, the image processing model adopts the dense connection network, and the dense connection network can better integrate the feature information extracted by all layers in the network, so that the accuracy and the effectiveness of the feature extraction of the image processing model can be improved; moreover, the output channel of each layer of the network can be kept at a small value, and the network parameter quantity and the forward speed are reduced while the network characterization capability is improved.

In one embodiment, S308 comprises: positioning the predicted corner positions in the corner position prediction feature map; selecting a reference point position in a preset neighborhood of the position of the predicted corner point; and when the difference of the pixel values of the predicted corner position and the reference point position is smaller than the preset difference, the predicted corner position is shifted towards the direction of the reference point position to obtain the target corner position.

It can be understood that the computer device may generate an error in the process of processing the image to be processed through the image processing model to obtain the corner position prediction feature map and then predicting the corner position. In this embodiment, the predicted corner positions are corrected, so that the prediction error can be reduced.

The predicted corner position is determined according to pixel values of pixel points in the corner position prediction characteristic image and is the predicted corner position. The reference point position is a reference position which is selected from a preset neighborhood of the predicted corner position to judge the deviation of the predicted corner. The computer device may determine whether the predicted corner position needs to be corrected based on a magnitude relationship between a difference in pixel values between the predicted corner position and the reference point position and the predicted difference.

Specifically, the computer device calculates a difference in pixel values of the predicted corner position and the reference point position after positioning the two corner positions, and then compares the difference with a preset difference. And when the calculated difference reaches the preset difference, the prediction error is considered to be within an acceptable range, and the position of the prediction corner point is not required to be corrected. And when the calculated difference is smaller than the preset difference, the prediction error is considered to be beyond an acceptable range, and the position of the prediction corner point needs to be corrected. Further, when the computer equipment judges that the predicted corner position needs to be corrected, the predicted corner position is shifted towards the direction of the reference point position by a preset distance to obtain the target corner position.

The predicted corner position may specifically be a position of a pixel point with a maximum pixel value in the corner position prediction feature map. The reference point position may be specifically a position where a pixel point with a maximum pixel value next to the maximum pixel value is located in the four neighborhoods of the predicted corner position. The four adjacent domains of the predicted corner position can be the positions of four pixel points, namely, the upper pixel point, the lower pixel point, the left pixel point and the right pixel point of the predicted corner position. The offset preset distance is an empirical value obtained through a plurality of experiments.

In a specific embodiment, a calculation formula for correcting the predicted corner position to obtain the target corner position is as follows:

(i,j)＝arg max(F)

(i',j')＝arg max_Ω(F)

x＝start+j”*stride

y＝start+i”*stride

start＝stride/2-0.5 (1)

wherein, F is a position prediction feature map (H × W) corresponding to a certain corner outputted by the image processing model, F (i, j) is a pixel position where a maximum pixel value on the position prediction feature map is located, F (i, j) is the maximum pixel value, (i ', j') is a pixel position where a next maximum pixel value in four neighborhoods of the maximum pixel value is located, F (i ', j') is the next maximum pixel value, and thr is a preset difference. stride is the multiple of the maximum pixel value and the image resolution of the input image to be processed, and start is the corresponding deviation value. (Δ x, Δ y) is the preset distance of offset, (i ", j") is the pixel position where the maximum pixel value after correction is located. (x, y) is the pixel position of the input image to be processed, i.e. the position of the corner point located in the image to be processed. the preset distances of thr, Δ x, Δ y offsets and the calculated relationship between stride and start are all empirical values obtained through multiple experiments. For example, thr ═ 0.23, Δ x ═ Δ y ═ 0.5, and start ═ stride/2 to 0.5.

It will be appreciated that since the four neighborhoods of maximum pixel values are four pixel neighborhoods of up, down, left, and right of one pixel, then the offset of Δ x and Δ y is whichever. Namely:

(i”,j”)＝{(i+Δx,j),(i-Δx,j),(i,j+Δy),(i,j-Δy)} (2)

it can be understood that, when the computer device processes the image to be processed through the image processing model to obtain the corner position prediction feature map, if the resolution of the image is reduced during the processing, the prediction error is increased. In this scenario, correction of the position of the predicted corner point is more necessary.

In the above embodiment, the predicted corner positions are positioned in the feature map according to the corner position prediction, and the predicted corner positions are corrected to reduce the prediction error, thereby further improving the accuracy of corner prediction.

In one embodiment, processing the extracted certificate corner features through an image processing model to generate a corner position prediction feature map corresponding to an image to be processed, includes: and processing the extracted certificate angular point features through an image processing model, and respectively generating certificate image types and angular point position prediction feature maps corresponding to the images to be processed through parallel output branches. The image processing method further includes: intercepting a certificate image in the image to be processed according to the certificate image area; and (4) combining the certificate image type and the certificate image to carry out certificate identification.

It can be understood that when different types of processing results can be obtained based on the certificate corner features extracted by the image processing model, a plurality of parallel output branches can be set for the image processing model to respectively output different types of processing results. In this embodiment, an output branch for certificate image type prediction based on certificate corner features and an output branch for corner position prediction based on certificate corner features may be provided in parallel for the image processing model.

Specifically, the image processing model may be trained from image samples of multiple credential image types, such that a multi-class output branch of the image processing model is trained. The output of the multi-class output branch may specifically be a probability vector, in which the values of the vector elements represent the probability that a certificate included in the input image belongs to each certificate image type.

For example, assume that the image processing model is being trained, the sample image includes: including the image on the front of the identity card, including the image on the back of the identity card and not including the image on the front and back of the identity card, then the certificate image type includes three kinds: the identity card front side, the identity card back side and the non-target image. The types of the certificate images that can be recognized by the image processing model after training are three types, namely the front side of the identity card, the back side of the identity card and the non-target image, and then the output of the multi-classification output branch is a probability vector with the size of 1 × 3, as shown in fig. 4. The multi-class outputs the probability vector of the branch output such as (0.1, 0.2, 0.7), etc.

Further, after obtaining the certificate image type to which the certificate included in the image to be processed belongs and positioning the certificate image area in the image to be processed, the computer device can intercept the certificate image from the image to be processed according to the certificate image area, perform affine transformation on the certificate image, and then perform subsequent character recognition process by combining the certificate image type to perform identity recognition.

In one embodiment, the image processing model may output more than one branch of the corner position prediction feature map, that is, the respective corner position prediction feature map output branch of each certificate image type. In the certificate image type prediction, the prediction result includes the type of the non-target image, but the branch of the output corner position prediction feature map does not include the output branch corresponding to the non-target image. That is, the number of vector elements in a vector output by a branch outputting a predicted certificate image type is N +1, and the number of branches outputting a corner position predicted feature map is N.

By way of example, with continued reference to FIG. 4, assume that the types of credential images that can be identified after image processing model training include: the identity card front side, the identity card back side and the non-target image. The image processing model comprises three output branches, wherein one output branch is used for outputting the certificate image type prediction result, one output branch is used for outputting the corner position prediction characteristic diagram on the front side of the identity card, and the other output branch is used for outputting the corner position prediction characteristic diagram on the back side of the identity card. For example, the input image to be processed is an image including the front side of the identity card, and since the back side of the identity card does not exist, in the output corner position prediction feature map of the back side of the identity card, the pixel values of the first four channels are all 0, and the pixel value of the last background channel is all 1. For another example, the input image to be processed is an image including both the front side and the back side of the identification card, and since both the front side and the back side of the identification card exist, one output branch outputs a corner position prediction feature map of the front side of the identification card to predict corners of the front side of the identification card, and the other output branch outputs a corner position prediction feature map of the back side of the identification card to predict corners of the back side of the identification card.

In the above embodiment, the image processing model may further include an output branch for predicting the type of the certificate image, so that the certificate image can be identified by combining the type of the certificate image after the certificate image area is located, thereby improving the certificate identification efficiency.

In one embodiment, the image processing method further comprises: collecting a target image comprising a certificate image as a training sample of an image processing model; generating a corner position feature map of each training sample as a corresponding training label; inputting the training sample into an image processing model to obtain training output; and constructing a loss function training image processing model according to the training output and the training label.

It should be noted that the image processing model in the embodiment of the present application is obtained through supervised training, and the supervised training model can enable the image processing model to learn the capability that the image processing model wants to learn through the training input of the design model, that is, the training sample and the corresponding training label.

In particular, a computer device may collect target images including credential images as training samples for an image processing model. The document images can be document images of various document types, and can include a document front image and a document back image. Each target image may include one document image or may include more than one document image. The more than one document image included in the target image may be document images belonging to the same document type or document images belonging to different document types, respectively.

For example, the target image may include only the front image of the identification card, the target image may include both the front image and the back image of the identification card, the target image may include both the front image and the front image of the driver's license, and so on.

In one embodiment, generating the corner position feature map of each training sample as a corresponding training label includes: for each certificate angular point sample in each training sample, respectively taking the position of the certificate angular point sample in the training sample as a center, and generating a position characteristic map of the certificate angular point sample according to a preset distribution mode; and generating the corner position characteristic diagram of each training sample according to the position characteristic diagram of each certificate corner sample in each training sample, and using the corner position characteristic diagram as a corresponding training label of each training sample.

Specifically, after the computer device collects training samples, the computer device can respectively determine certificate angular point samples included in the training samples for each training sample, and then generates a position feature map of the certificate angular point samples in a preset distribution mode by taking the positions of the certificate angular point samples in the training samples as the center; and generating the angular point position characteristic diagram of each training sample according to the position characteristic diagram of each certificate angular point sample in each training sample, and using the angular point position characteristic diagram as a corresponding training label of each training sample. The preset distribution mode may be a gaussian distribution mode.

The computer device can generate a position feature map, namely a corner position feature map, for all certificate corner point samples included in a training sample. At this time, for the region where any corner position in the corner position feature map is located, the distribution mode is outward radiation with the corner as the center. The computer equipment can also generate a position feature map corresponding to each certificate angular point sample for all certificate angular point samples included in a training sample, and then the position feature maps are spliced to obtain the angular point position feature map. At this time, only one corner point exists in each position feature map, and the distribution mode of outward radiation takes the corner point as the center.

For example, assuming that the certificate is an identity card, the computer device may respectively obey gaussian distributions with the position of each corner point as a center and the radius as sigma to four corner points, namely, an upper left corner point, an upper right corner point, a lower right corner point and a lower left corner point of the identity card, and generate a position feature map of the corresponding corner point. The calculation formula is as follows:

wherein (x, y) is the position of any point in the position feature map, g (x, y) is the value of point (x, y), (c)_x,c_y) The positions of the corner points of the certificate. The resulting position profile is a gaussian distribution that radiates outward centered on the position where a certain angular point is located.

In this embodiment, a way of designing a training label is provided, and an image is designed as the training label, so that the image processing model learns a mapping relationship from an image space to an image space, and does not involve qualitative spanning (for example, from the image space to a value space, etc.) in a feature space, so that model learning is easier, and a core feature is easier to learn, so that model performance is better, and robustness and generalization are stronger.

In one embodiment, generating the corner position feature map of each training sample as a training label corresponding to each training sample according to the position feature map of each certificate corner sample in each training sample includes: generating a background channel characteristic diagram of each training sample according to the position characteristic diagram of each certificate angular point sample in each training sample; and splicing the position characteristic diagrams of the certificate angular point samples in the training samples according to a preset certificate angular point sequence, and then continuously splicing the position characteristic diagrams with the background channel characteristic diagrams to obtain the angular point position characteristic diagrams of the training samples as corresponding training labels of the training samples.

In particular, since the computer device generates respective location feature maps for each corner point, the constraint relationship between more than one corner point belonging to a certificate is not reflected. At this time, the computer device may generate a background channel feature map of each training sample according to the position feature map of each certificate corner point sample in each training sample, and enhance the constraint between more than one corner point belonging to one certificate through the background channel feature map.

And the value of each pixel value in the background channel characteristic image is opposite to the value of the pixel value in the position characteristic image of each certificate angular point sample. Namely, the pixel value of the position of each certificate angular point sample is minimum, and the pixel value of the sample far away from the certificate angular point is gradually increased. Specifically, the pixel value of the pixel point in the background channel feature map may be a difference between N and a maximum pixel value of a corresponding pixel position in the position feature map of each certificate corner sample. And N is the maximum value of the pixel value in the position characteristic image of the certificate corner point sample. For example, for a pixel position (X, Y), the pixel values of the position feature map of each certificate corner sample at the pixel position are p1, p2, p3 and p4, and then the pixel value at the pixel position in the background channel feature map is N-max (p1, p2, p3, p 4).

In the embodiment, the constraint performance between the angular points is enhanced by adding the background channel characteristic diagram, so that the design of the training label is more reasonable, and the image processing model with high prediction accuracy can be trained.

Further, after the training samples and the training labels corresponding to the training samples are obtained by the computer device, the image processing model can be trained in a supervised mode by using the training samples.

Specifically, the computer device may set a network result of an output branch of the image processing model such that the output branch may output a corner position prediction feature map based on the input training samples, construct a loss function based on the output corner position prediction feature map and the training labels, and train the image processing model with the goal of minimizing the loss function.

In one embodiment, the loss function may be defined by mse (mean square error) loss, as shown in the following equation:

wherein n is the number of pixel points included in the angular point position characteristic diagram, g_iThe pixel values of the pixel points in the feature map are predicted for the corner positions,

the pixel values of the pixel points in the angular point position characteristic image are obtained. It can be understood that the corner position feature map and the corner position prediction feature map are respectively an expected output and an actual output, and have the same size.

For example, assuming that the document is an identity card, the image types of the document that can be identified by the image processing model are: the identification card front side, the identification card back side and the non-target image, the image resolution of the feature map is H W, then the feature channel number of the feature map at the corner position is 5, and n is 2W 5W.

In further embodiments, the loss function may also employ SmoothL1 loss or Focal loss, etc.

In the above embodiment, the image processing model learns the mapping relationship from the image space to the image space, and does not involve qualitative crossing in the feature space (for example, from the image space to the value space, etc.), so that the model learning is easier, and the core features are easier to learn, so that the model performance is better, and the robustness and the generalization performance are stronger.

In one embodiment, generating the corner position feature map of each training sample as a corresponding training label includes: generating a corner position feature map of each training sample as a corresponding first training label; and taking the certificate image type to which the certificate image belongs in each training sample as a corresponding second training label. Inputting the training samples into an image processing model to obtain training output, wherein the training output comprises: inputting the training samples into an image processing model, and respectively obtaining the predicted certificate image types and the corner point sample position prediction characteristic maps corresponding to the training samples through parallel output branches. Constructing a loss function training image processing model according to the training output and the training labels, wherein the method comprises the following steps: constructing a first loss function according to the corner sample position prediction feature map and the first training label, and constructing a second loss function according to the predicted certificate image type and the second training label; an image processing model is trained in conjunction with the first loss function and the second loss function.

It will be appreciated that the image processing model may include more than one output branch, and that a training sample may have more than one training label. The computer device may construct a respective training sample and training label pair for each output branch to train the output branch.

Specifically, the computer device may generate a feature map of corner positions of each training sample as a corresponding first training label, and train branches predicting the corner positions according to the training samples and the corresponding first training labels; and taking the certificate image type of the certificate image in each training sample as a corresponding second training label, and training and predicting the branch of the certificate image type according to the training sample and the corresponding second training label. Wherein the loss functions corresponding to different output branches are different.

In a particular embodiment, the penalty function for predicting branches of a credential image type can be defined using softmax penalties, as shown in the following equation:

wherein, a_jThe probability value of softmax of the image belonging to the j-th certificate image type of the input image processing model is obtained. a is_iA softmax probability value for the type of credential image to which the image of the input image processing model actually belongs. m is the number of certificate image types (including non-target image types). z is a radical of_jThe characteristic value of the loss layer is input.

Then, the loss function of the overall training of the image processing model is shown as follows:

L＝L_cls+λL_reg (6)

because the output branch for predicting the certificate image type is simple and is mainly used for assisting the output branch for predicting the corner position to perform corner prediction, the value of lambda can be specifically 10.

In the above embodiment, the training image processing model learns two types of capabilities, so that the output branch of the certificate image type is used for assisting the output branch of the corner position prediction to perform corner prediction. Therefore, when the image processing model is used, a more accurate corner point prediction result can be obtained.

In one embodiment, the image processing method further comprises: determining the position of an actual corner point sample in a training sample; determining a predicted corner sample position in the corner sample position prediction feature map; and constructing a third loss function according to the actual corner sample position and the predicted corner sample position. Training an image processing model in conjunction with the first loss function and the second loss function, comprising: an image processing model is trained in conjunction with the first loss function, the second loss function, and the third loss function.

It can be understood that, since the prediction of the image processing model inevitably has errors, when the image processing model is trained, the loss function part of the prediction error can be added to the loss function of the image processing model, so that the image processing model can be trained to reduce the prediction error as much as possible.

In a specific embodiment, the loss function of the prediction error can be defined by using SmoothL1 loss, as shown in the following formula:

L_offset＝|x_p-x_g|+|y_p-y_g| (7)

wherein x is_pIs the x-coordinate, x, of the predicted corner point_gThe x coordinate of the actual corner point is the x coordinate of the corner point in the training label; y is_pBeing the y coordinate of the predicted corner point, y_gIs the y-coordinate of the actual corner point, i.e. the y-coordinate of the corner point in the training label.

L＝L_cls+λL_reg+βL_offset (8)

in this embodiment, the loss function used for training the image processing model is added with a part of the prediction error, so that the image processing model can be trained to reduce the prediction error as much as possible, and thus, when the image processing model is used, a more accurate corner point prediction result can be obtained.

In a specific embodiment, the embodiments provided herein are applicable to documents having a fixed imposition format, such as identification cards, driving licenses, or drivers licenses. Because the neighborhood region of the certificate corner of the certificate with the fixed typesetting format has fixed characteristics, the image processing model can achieve the purpose of predicting the certificate corner by learning the neighborhood characteristics of the certificate corner during sample learning, thereby positioning the certificate image region in the image to be processed. Secondly, the certificate corner position feature map is adopted as a training label in the training of the image processing model, the image processing model is trained through mapping learning from an image space (a feature space where an image to be processed is located) to an image space (a feature space where the certificate corner position feature map is located), compared with other mapping learning from the image space to a numerical value space and the like, the learning process of the image processing model is easier, and core features can be learned more easily without a large number of training samples. Thirdly, the model structure of the image processing model adopts a dense connection network, and the input of each layer of neural network in the dense connection network is the series connection of the outputs of all the layers on the characteristic channel, so that the image processing model can better fuse the characteristics extracted by all the neural networks, and meanwhile, the characteristic channel of the output of each layer of neural network in the image processing model can keep a small value, thereby improving the representation capability of the model and simultaneously reducing the network parameter quantity and the forward speed. Finally, for the preset number of convolution layers with the layer sequence being the front in the dense connection network with the preset number of the layer sequence being the front, the parallel batch normalization layer and the example normalization layer are connected behind the convolution layers, so that the extracted image appearance characteristics can be captured through the batch normalization layer, and the extracted image content characteristics can be captured through the example normalization layer. Therefore, superficial layer apparent information can be better reserved, high-layer content information is not influenced, the generalization performance of the image processing model is enhanced, and the robustness of the image processing model is improved. In summary, the embodiment provided by the application can position the arbitrarily placed certificate image area in the image to be processed in real time through the lightweight and fast image processing model, and then can extract the certificate image to perform subsequent recognition operation.

For example, in a scenario of automatic document verification, a user or a merchant uploads or captures an image including a document, and the document is usually randomly placed on a background. At the moment, the embodiment provided by the application can be adopted to quickly and accurately position the certificate image area in the image, so that the certificate image can be intercepted according to the positioned certificate image area, the subsequent processing flow can be effectively reduced, the interference of background information is reduced, and the certificate identification precision is improved.

In another embodiment, in a scene where the image resolution of the image to be processed is high and the ratio of the certificate image area to the image to be processed is small, the image to be processed with high resolution needs to be downsampled to a small size and then the image processing model is output, so that more image details are lost, and the precision of certificate corner point prediction is greatly reduced. At this time, a cascade image processing model can be adopted, namely, firstly, an image processing model is adopted to predict the position of the certificate corner point in the original image to be processed, but the predicted position of the certificate corner point can have larger deviation because the proportion of the certificate image area in the original image to be processed is smaller; based on the above, the certificate image is continuously captured according to the result predicted by the first image processing model, and then the captured certificate image is input into the second image processing model to predict the certificate angular point position in the captured certificate image, so that the effect of correcting the positioning result is achieved. The specific flowchart is shown in fig. 9. The first image processing model and the second image processing model may have the same model structure but different model parameters, except that the training samples are different during model training. The training sample of the first image processing model can be a complex image comprising an arbitrary background, and the training sample of the second image processing model is an image with a smaller background area intercepted from the training sample of the first image processing model.

In general, the embodiment that this application provided can satisfy arbitrary angle in the image fast high-efficiently, and the certificate locate function of arbitrary size reduces subsequent text recognition module's complexity, promotes whole certificate text recognition precision. In the embodiment provided by the application, the positioning error of the corner position can reach less than 10 on the self-contained test set^-4～10^-5The average positioning error of the positioning device has obvious positioning effect. Where the positioning error is the error at the normalized pixel position.

The specific test results are shown in the following table one:

model (model)	Error in positioning	Precision of classification (%)
			Image processing model (ID card location)	0.0000665	99.86
Image processing model (Single-layer driving license location)	0.000284	100
			Cascade image processing model (license location)	0.000276	100

It should be understood that, although the steps in the flowcharts of the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or the stages of other steps.

As shown in fig. 10, in one embodiment, an image processing apparatus 1000 is provided. Referring to fig. 10, the image processing apparatus 1000 includes: an acquisition module 1001, an extraction module 1002, a generation module 1003, a determination module 1004, and a positioning module 1005.

An obtaining module 1001 is configured to obtain an image to be processed.

The extraction module 1002 is configured to input the image to be processed into the image processing model to perform certificate corner feature extraction.

A generating module 1003, configured to process the extracted certificate corner features through an image processing model, and generate a corner position prediction feature map corresponding to the image to be processed; and pixel points in the corner position prediction characteristic graph have pixel values representing the probability of belonging to the certificate corners and correspond to the pixel points in the image to be processed.

A determining module 1004, configured to determine a corner position in the image to be processed according to the corner position prediction feature map.

And a positioning module 1005, configured to position, in the image to be processed, an image region of the certificate based on the corner position.

In one embodiment, the extracting module 1002 is further configured to input the image to be processed into an image processing model, and perform certificate corner feature extraction on the image to be processed layer by layer through a multilayer neural network densely connected to a network in the image processing model; the output of the dense connection network fuses the outputs of the neural networks of the layers comprised by the dense connection network.

In one embodiment, the extraction module 1002 is further configured to input the image to be processed into an image processing model; taking each layer of neural network of the dense connection network as a current layer of neural network in sequence; splicing the output of each layer of neural network in the dense connection network before the current layer of neural network with the input of the first layer of neural network in the dense connection network to obtain the comprehensive input of the current layer of neural network; processing the comprehensive input through the current layer neural network to obtain the output of the current layer neural network until the output of the last layer neural network in the dense connection network is obtained; and taking the output of the last layer of neural network as the output of the dense connection network.

In one embodiment, the number of dense connection networks is more than one; dense connection networks include more than one convolutional layer. The extraction module 1002 is further configured to, for a preset number of convolutional layers in a dense connection network with a preset number of layers in a network sequence before, input outputs of the convolutional layers into a parallel batch normalization layer and an example normalization layer, respectively, to obtain a batch normalization output and an example normalization output; the batch normalized output and the instance normalized output are spliced as inputs to a next layer adjacent to the parallel batch normalization layer and the instance normalization layer.

In an embodiment, the generating module 1003 is further configured to process the extracted content features of the certificate corners through an image processing model, and generate a position prediction feature map corresponding to each certificate corner included in the image to be processed; the position prediction characteristic diagrams corresponding to the certificate angular points are arranged according to a preset certificate angular point sequence; and pixel points in the position prediction characteristic graph corresponding to each certificate angular point have pixel values representing the probability of belonging to the corresponding certificate angular point and correspond to the pixel points in the image to be processed.

In one embodiment, the determining module 1004 is further configured to locate the predicted corner locations in the corner location prediction feature map; selecting a reference point position in a preset neighborhood of the position of the predicted corner point; and when the difference of the pixel values of the predicted corner position and the reference point position is smaller than the preset difference, the predicted corner position is shifted towards the direction of the reference point position to obtain the target corner position.

In an embodiment, the generating module 1003 is further configured to process the extracted certificate corner features through an image processing model, and generate a certificate image type and a corner position prediction feature map corresponding to the image to be processed through parallel output branches, respectively. The image processing apparatus 1000 further includes: the identification module 1006 is configured to intercept a certificate image from the image to be processed according to the certificate image area; and (4) combining the certificate image type and the certificate image to carry out certificate identification.

As shown in fig. 11, in one embodiment, the image processing apparatus 1000 further includes: a recognition module 1006 and a training module 1007.

A training module 1007 for collecting a target image including a certificate image as a training sample of the image processing model; generating a corner position feature map of each training sample as a corresponding training label; inputting the training sample into an image processing model to obtain training output; and constructing a loss function training image processing model according to the training output and the training label.

In one embodiment, the training module 1007 is further configured to generate a position feature map of each certificate angular point sample in each training sample according to a preset distribution mode, with a position of the certificate angular point sample in the training sample as a center; and generating the corner position characteristic diagram of each training sample according to the position characteristic diagram of each certificate corner sample in each training sample, and using the corner position characteristic diagram as a corresponding training label of each training sample.

In one embodiment, the training module 1007 is further configured to generate a background channel feature map of each training sample according to the position feature map of each certificate corner point sample in each training sample; and splicing the position characteristic diagrams of the certificate angular point samples in the training samples according to a preset certificate angular point sequence, and then continuously splicing the position characteristic diagrams with the background channel characteristic diagrams to obtain the angular point position characteristic diagrams of the training samples as corresponding training labels of the training samples.

In one embodiment, the training module 1007 is further configured to generate a corner position feature map of each training sample as a corresponding first training label; taking the certificate image type to which the certificate image belongs in each training sample as a corresponding second training label; inputting the training samples into an image processing model, and respectively obtaining the predicted certificate image types and the corner point sample position prediction characteristic maps corresponding to the training samples through parallel output branches; constructing a first loss function according to the corner sample position prediction feature map and the first training label, and constructing a second loss function according to the predicted certificate image type and the second training label; an image processing model is trained in conjunction with the first loss function and the second loss function.

In one embodiment, the training module 1007 is further configured to determine actual corner sample positions in the training samples; determining a predicted corner sample position in the corner sample position prediction feature map; constructing a third loss function according to the actual angular point sample position and the predicted angular point sample position; an image processing model is trained in conjunction with the first loss function, the second loss function, and the third loss function.

After the image to be processed is obtained, the image to be processed may be input to the image processing model for certificate corner feature extraction, and then the extracted certificate corner feature is processed by the image processing model to generate a corner position prediction feature map corresponding to the image to be processed. Because the pixel points in the obtained corner point prediction characteristic image have pixel values representing the probability of belonging to the certificate corner points and correspond to the pixel points in the image to be processed, whether the pixel points are certificate corner points or not can be judged according to the pixel values of the pixel points in the corner point prediction characteristic image, so that the corner point position in the image to be processed can be determined, the certificate image area can be positioned in the image to be processed, and the accuracy of positioning the certificate area from the image is improved.

FIG. 12 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 or the server 120 in fig. 1. As shown in fig. 12, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the image processing method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform an image processing method. Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the image processing apparatus 1000 provided in the present application may be implemented in a form of a computer program, and the computer program may be run on a computer device as shown in fig. 12. The memory of the computer device may store therein various program modules constituting the image processing apparatus, such as an acquisition module 1001, an extraction module 1002, a generation module 1003, a determination module 1004, and a positioning module 1005 shown in fig. 10. The computer program constituted by the respective program modules causes the processor to execute the steps in the image processing method of the respective embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 12 may acquire the image to be processed by executing the steps by the acquisition module 1001 in the image processing apparatus 1000 shown in fig. 10. The extraction module 1002 executes the steps to input the image to be processed into the image processing model for certificate corner feature extraction. Processing the extracted certificate corner features through an image processing model by executing the steps through a generating module 1003, and generating a corner position prediction feature map corresponding to the image to be processed; and pixel points in the corner position prediction characteristic graph have pixel values representing the probability of belonging to the certificate corners and correspond to the pixel points in the image to be processed. The determination module 1004 executes the steps to determine the corner locations in the image to be processed from the corner location prediction feature map. The steps performed by the positioning module 1005 locate the document image area based on the corner position in the image to be processed.

In an embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the image processing method described above. Here, the steps of the image processing method may be steps in the image processing methods of the respective embodiments described above.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the above-mentioned image processing method. Here, the steps of the image processing method may be steps in the image processing methods of the respective embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed;

positioning a certificate image area based on the corner position in the image to be processed;

the training step of the image processing model comprises the following steps:

collecting a target image including a certificate image as a training sample of the image processing model;

for each certificate angular point sample in each training sample, respectively taking the position of the certificate angular point sample in the training sample as a center, and generating a position feature map of the certificate angular point sample according to a preset distribution mode;

generating a background channel characteristic map of each training sample according to the position characteristic map of each certificate angular point sample in each training sample; the value of each pixel value in the background channel characteristic image is opposite to the value of the pixel value in the position characteristic image of each certificate angular point sample;

splicing the position feature maps of the certificate angular point samples in the training samples according to a preset certificate angular point sequence, and then continuously splicing the position feature maps with the background channel feature maps to obtain angular point position feature maps of the training samples as training labels corresponding to the training samples;

inputting the training sample into the image processing model to obtain training output;

and constructing a loss function according to the training output and the training label to train the image processing model.

2. The method of claim 1, wherein the inputting the image to be processed into an image processing model for certificate corner feature extraction comprises:

inputting the image to be processed into an image processing model, and performing certificate angular point feature extraction on the image to be processed layer by layer through a multilayer neural network of a dense connection network in the image processing model; the output of the dense connection network fuses the outputs of the neural networks of the layers included in the dense connection network.

3. The method of claim 2, wherein the step of performing certificate corner feature extraction on the image to be processed layer by layer through a multilayer neural network of densely connected networks in the image processing model by inputting the image to be processed into an image processing model comprises:

inputting the image to be processed into an image processing model;

taking each layer of neural network of the dense connection network as a current layer of neural network in sequence;

splicing the output of each layer of neural network in the dense connection network before the current layer of neural network with the input of the first layer of neural network in the dense connection network to obtain the comprehensive input of the current layer of neural network;

processing the comprehensive input through a current layer neural network to obtain the output of the current layer neural network until the output of the last layer neural network in the dense connection network is obtained;

and taking the output of the last layer of neural network as the output of the dense connection network.

4. The method of claim 2, wherein the number of dense connection networks is more than one; the dense connection network comprises more than one convolutional layer; the method further comprises the following steps:

for the convolution layers with the preset number and the preset number, which are positioned in the dense connection network with the preset number and the layer sequence with the front network sequence, the output of the convolution layers is respectively input into the parallel batch normalization layer and the parallel example normalization layer, and the batch normalization output and the example normalization output are obtained;

and splicing the batch normalization output and the example normalization output to serve as the input of the next layer adjacent to the parallel batch normalization layer and the example normalization layer.

5. The method of claim 1, wherein the certificate corner features comprise certificate corner content features; processing the extracted certificate corner features through the image processing model to generate a corner position prediction feature map corresponding to the image to be processed, wherein the corner position prediction feature map comprises the following steps:

processing the content features of the extracted certificate corner points through the image processing model to generate position prediction feature maps corresponding to the certificate corner points in the image to be processed;

the position prediction characteristic graphs corresponding to the certificate angular points are arranged according to a preset certificate angular point sequence; and pixel points in the position prediction characteristic graph corresponding to each certificate angular point have pixel values representing the probability of belonging to the corresponding certificate angular point and correspond to the pixel points in the image to be processed.

6. The method according to claim 1, wherein said determining the corner positions in the image to be processed according to the corner position prediction feature map comprises:

positioning a predicted corner position in the corner position prediction feature map;

selecting a reference point position in a preset neighborhood of the position of the prediction angular point;

and when the difference between the pixel values of the predicted corner position and the reference point position is smaller than a preset difference, shifting the predicted corner position towards the direction of the reference point position to obtain a target corner position.

7. The method of claim 1, wherein the processing the extracted certificate corner features by the image processing model to generate a corner position prediction feature map corresponding to the image to be processed comprises:

processing the extracted certificate angular point features through the image processing model, and respectively generating certificate image types and angular point position prediction feature maps corresponding to the to-be-processed image through parallel output branches;

the method further comprises the following steps:

intercepting a certificate image in the image to be processed according to the certificate image area;

and combining the certificate image type and the certificate image to perform certificate identification.

8. The method according to claim 1, wherein the generating of the corner position feature map of each of the training samples as a corresponding training label comprises:

generating a corner position feature map of each training sample as a corresponding first training label;

taking the certificate image type of the certificate image in each training sample as a corresponding second training label;

inputting the training sample into the image processing model to obtain a training output, including:

inputting the training sample into the image processing model, and respectively obtaining a predicted certificate image type and an angular point sample position prediction characteristic diagram corresponding to the training sample through parallel output branches;

the training the image processing model according to the training output and the training label construction loss function comprises:

constructing a first loss function according to the corner sample position prediction feature map and the first training label, and constructing a second loss function according to the predicted certificate image type and the second training label;

training the image processing model in conjunction with the first loss function and the second loss function.

9. The method of claim 8, further comprising:

determining actual corner sample positions in the training samples;

determining a predicted corner sample position in the corner sample position prediction feature map;

constructing a third loss function according to the actual corner sample position and the predicted corner sample position;

the training the image processing model in conjunction with the first loss function and the second loss function includes:

training the image processing model in conjunction with the first loss function, the second loss function, and the third loss function.

10. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring an image to be processed;

the positioning module is used for positioning a certificate image area based on the corner position in the image to be processed;

the training module is used for collecting a target image comprising a certificate image as a training sample of the image processing model; for each certificate angular point sample in each training sample, respectively taking the position of the certificate angular point sample in the training sample as a center, and generating a position feature map of the certificate angular point sample according to a preset distribution mode; generating a background channel characteristic map of each training sample according to the position characteristic map of each certificate angular point sample in each training sample; the value of each pixel value in the background channel characteristic image is opposite to the value of the pixel value in the position characteristic image of each certificate angular point sample; splicing the position feature maps of the certificate angular point samples in the training samples according to a preset certificate angular point sequence, and then continuously splicing the position feature maps with the background channel feature maps to obtain angular point position feature maps of the training samples as training labels corresponding to the training samples; inputting the training sample into the image processing model to obtain training output; and constructing a loss function according to the training output and the training label to train the image processing model.

11. The apparatus of claim 10, wherein the extraction module is further configured to:

12. The apparatus of claim 11, wherein the extraction module is further configured to:

inputting the image to be processed into an image processing model;

13. The apparatus of claim 11, wherein the number of dense connection networks is more than one; the dense connection network comprises more than one convolutional layer; the extraction module is further configured to:

14. The apparatus of claim 10, wherein the generating module is further configured to:

15. The apparatus of claim 10, wherein the determining module is further configured to:

16. The apparatus of claim 10, wherein the generating module is further configured to:

the image processing apparatus further comprises an identification module configured to:

17. The apparatus of claim 10, wherein the training module is further configured to:

18. The apparatus of claim 17, wherein the training module is further configured to:

determining actual corner sample positions in the training samples;

19. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 9.

20. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the computer program, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 9.