CN109426814B - Method, system and equipment for positioning and identifying specific plate of invoice picture - Google Patents

Method, system and equipment for positioning and identifying specific plate of invoice picture Download PDF

Info

Publication number
CN109426814B
CN109426814B CN201710724450.9A CN201710724450A CN109426814B CN 109426814 B CN109426814 B CN 109426814B CN 201710724450 A CN201710724450 A CN 201710724450A CN 109426814 B CN109426814 B CN 109426814B
Authority
CN
China
Prior art keywords
picture
invoice
identified
information
invoice information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710724450.9A
Other languages
Chinese (zh)
Other versions
CN109426814A (en
Inventor
武晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201710724450.9A priority Critical patent/CN109426814B/en
Publication of CN109426814A publication Critical patent/CN109426814A/en
Application granted granted Critical
Publication of CN109426814B publication Critical patent/CN109426814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention relates to a method, a system and equipment for positioning and identifying specific plates of an invoice picture. The method for positioning and identifying the specific plate of the invoice picture comprises the following steps: acquiring an invoice picture; determining invoice information to be identified; according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified; and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified. Because the invoice information corresponding to each identification content is clearly and clearly obtained by directly positioning and cutting the target before identification, the method and the device fill the blank in the technical field and have wide application prospect. The defect that the conventional image recognition algorithm can only recognize each character in the invoice picture indiscriminately but cannot distinguish required invoice information is fundamentally overcome. The invoice information acquisition is truly automated.

Description

Method, system and equipment for positioning and identifying specific plate of invoice picture
Technical Field
The invention relates to invoices, in particular to a method, a system and equipment for positioning and identifying specific plates of an invoice picture.
Background
Processing invoice reimbursement is a very important work of a financial department of a company, and workers need to acquire information such as a ticket number sequence, a company name, an invoicing date, money amount, tax amount and the like in each invoice to perform next verification work and the like. The work is heavy.
The traditional image recognition algorithm can identify each character and each symbol in the invoice picture line by line indiscriminately, but cannot identify which field is the invoice number, which field is the date, which field is the amount of money and the like.
Therefore, the acquisition of the invoice information can still only be done by human.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a method, a system and equipment for positioning and identifying specific plates of an invoice picture.
According to one aspect of the invention, a method for positioning and identifying specific plates of an invoice picture is provided, which comprises the following steps:
acquiring an invoice picture;
determining invoice information to be identified;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified;
and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified.
The invention obtains the picture of the left invoice and the right invoice to be processed by any simple method, then intercepts and aligns the effective areas of the invoices at various positions, various rotation angles and various brightness conditions in all the pictures by some image processing methods, further combines logic processing, accurately positions each area to be identified in one picture by adopting the methods of Canny contour detection, image text detection, color detection and row and column extraction, intercepts the area to be identified into fragments and respectively identifies the fragments. The original manual mode is replaced by the automatic invoice information acquisition. Not only has high accuracy, but also greatly improves the efficiency.
Further, the invoice information to be identified comprises a ticket number sequence, a company name, an invoicing date, an amount and a tax amount.
Further, the invoice picture is acquired and then an invoice effective area is intercepted.
Further, intercepting the invoice valid area comprises: and (6) correcting the invoice picture.
Further, the setting invoice picture includes:
detecting all lines in the invoice picture by using a Houghline algorithm of openCV;
calculating the deflection angle of each line;
the rotation angle which appears most times is the deflection angle of the picture;
and rotating the picture according to the deflection angle, and straightening the invoice picture.
Further, intercepting the invoice valid area comprises:
extracting the outline of the effective invoice area in the invoice picture;
and obtaining a picture of the invoice effective area based on the contour segmentation.
Further, the step of executing after obtaining the picture of the invoice valid area comprises judging whether the position of the invoice head is above the picture through an SIFT algorithm, and if not, turning the picture up and down. SIFT is a computer vision algorithm used to detect and describe local features in an image.
Further, extracting the outline of the invoice effective area in the invoice picture comprises:
respectively calculating the sum of pixel values of each pixel column and the sum of pixel values of each pixel row of the invoice picture to obtain two vectors;
multiplying the two vectors to obtain a two-dimensional array;
converting the two-dimensional array into a corresponding gray picture;
and filtering and removing transverse lines and vertical lines in the invoice picture to obtain the outline of the effective area of the invoice.
The method specifically comprises the following steps:
respectively calculating the sum of pixel values of each pixel column and the sum of pixel values of each pixel row of the whole invoice picture with the background not removed to obtain two vectors (the calculated pixel row or column sum is calculated for the whole invoice picture including the background, because the program does not know where the outline of the invoice effective area is, and the program needs to find where the outline of the invoice effective area by calculating the pixel row-column sum, multiplying, and horizontal lines);
multiplying the two vectors to obtain a two-dimensional array;
converting this two-dimensional array into a corresponding grayscale picture (all values are scaled to a range of 0 to 255, each value from 0 to 255 representing the grayscale value from white to different levels of gray to black in turn);
the horizontal lines and the vertical lines in the picture are directionally filtered (which form is determined to be filtered by changing parameters) through the cv2.MorphologyEx function (certain form in the directionally filtered picture) in the openCV toolkit, and the horizontal lines and the vertical lines in the picture are removed to obtain the outline of the effective region of the invoice.
The method ensures that the effective region for extracting the invoice can not be influenced by the interference factors such as the position angle, the brightness degree and the light shadow of the effective region of the invoice in the picture, and the effective region for taking out the invoice can be accurately intercepted.
Further, the positioning and segmenting of the region where the invoice information to be identified is located comprises:
obtaining a region matched with a preset positioning template and vertex coordinates of the matched region by using a Canny algorithm;
acquiring a picture of an area where invoice information to be identified is located according to the vertex coordinates of the matching area;
extracting pixels of invoice information characters to be identified in the region picture, and performing reverse binarization processing;
and calculating the sum of pixel values of the rows and columns of the area picture, comparing the sum with a preset threshold value for distinguishing characters and gaps, and judging and segmenting the invoice information picture to be identified.
Further, obtaining a region matched with the preset positioning template and vertex coordinates of the matched region by using a Canny algorithm, wherein the method comprises the following steps:
intercepting a part of picture of an area where invoice information to be identified is located;
zooming the part of the picture to obtain a plurality of zoomed pictures;
and inputting the multiple zoom images and the preset positioning template into a Canny algorithm for operation to obtain an area matched with the preset positioning template and vertex coordinates of the matched area.
And the multiple zoom images are subjected to contour matching with the positioning template, so that the matching area is positioned, and the positioning precision is high.
And after the matching area is positioned and the vertex coordinates of the matching area are obtained, positioning and obtaining the area picture of the invoice information to be identified according to the position relation between the matching area and the area picture of the invoice information to be identified.
Wherein the scaling parameter of the scaling is 0.8-1.2.
Further, calculating the sum of pixel values of the rows and columns of the area picture, comparing the sum with a threshold value, and judging and segmenting the invoice information picture to be identified comprises the following steps:
calculating the sum of pixel values of each pixel row of the area picture, and judging and segmenting the line area picture of the invoice information to be identified according to the threshold value of the pixel row;
and calculating the sum of the pixel values of each pixel column of the line region picture, and judging and segmenting the invoice information picture to be identified according to the threshold value of the pixel column.
The threshold value is determined according to the number of characters in each row or column of the picture and the noise of the picture, in actual operation, the sum of pixel values of each pixel row or column is output, and then a developer checks the approximate critical value of the values of the pixel sums of the pixel rows or columns without characters and the pixel rows or columns with characters, namely the threshold value.
The threshold value of the pixel row is 14000, and the threshold value of the pixel column is 4000.
According to another aspect of the present invention, there is provided a system for locating and identifying specific plates of an invoice picture, comprising:
acquiring an invoice picture acquisition unit;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain a positioning unit of the invoice information picture to be identified;
and the identification unit is used for identifying the text content of the invoice information picture to be identified and acquiring the invoice information to be identified.
Further, the invoice information to be identified comprises a ticket number sequence, a company name, an invoicing date, an amount and a tax amount.
The system is based on the method for positioning and identifying the specific plate of the invoice picture, and the steps of intercepting the effective region of the invoice, positioning and segmenting the region of the invoice information to be identified, identifying the text content of the invoice information picture to be identified, acquiring the invoice information to be identified and the like are described in the method part for positioning and identifying the specific plate of the invoice picture.
According to another aspect of the present invention, there is provided an invoice picture specific plate location and identification device, comprising a computer readable medium storing a computer program, the program being executed to perform:
acquiring an invoice picture;
determining invoice information to be identified;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified;
and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified.
The equipment is based on the method for positioning and identifying the specific plate of the invoice picture, and the steps of intercepting the effective region of the invoice, positioning and segmenting the region of the invoice information to be identified, identifying the text content of the invoice information picture to be identified, acquiring the invoice information to be identified and the like are described in the method part for positioning and identifying the specific plate of the invoice picture.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the method for positioning and identifying the specific plate of the invoice picture, the region of the invoice picture where the invoice picture is located is positioned and divided according to the information of the invoice to be identified, and the invoice information picture to be identified is obtained; and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified. Because the invoice information corresponding to each identification content is clearly and clearly obtained by directly positioning and cutting the target before identification, the method and the device fill the blank in the technical field and have wide application prospect.
2. According to the method for positioning and identifying the specific plate of the invoice picture, the effective area of the invoice is extracted without being influenced by interference factors such as the position angle, the brightness degree and the light shadow of the effective area of the invoice in a picture, and the effective area of the invoice is accurately intercepted; and further detecting the deflection angle of the invoice and correcting the picture. And then combining logic processing, accurately positioning each area to be identified in one picture by adopting a Canny outline detection method, an image text detection method, a color detection method and a row and column extraction method, and intercepting the areas into fragments for identification respectively. The positioning accuracy and the recognition accuracy are ensured.
3. The system for positioning and identifying the specific plate of the invoice picture obtains the invoice picture through the acquisition unit; through a positioning unit, according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified; the method comprises the steps of identifying the text content of an invoice information picture to be identified through an identification unit to obtain the invoice information to be identified, determining the invoice information to be identified firstly through the identification unit due to the fact that the invoice information to be identified is clear and the position of the invoice on which the invoice is located is clear, then locating and cutting the area where the invoice information is located to obtain the picture with the information content to be identified, and then identifying the picture to obtain corresponding invoice information, so that the identification information corresponding to each identification process is clear, and the defect that the required invoice information cannot be distinguished due to the fact that each character in the invoice picture can only be identified indiscriminately through an existing image identification algorithm is fundamentally overcome. The invoice information acquisition is truly automated.
4. The invoice picture specific plate positioning and identifying device of the invention stores and runs the following programs: acquiring an invoice picture; determining invoice information to be identified; according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified; and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified. Only need the invoice picture, can automatic identification invoice information through the above-mentioned procedure of operation, need not artifical discernment, not only guaranteed the degree of accuracy, also improved efficiency moreover greatly.
Drawings
FIG. 1 is a schematic diagram of an initial invoice image obtained according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a rectified invoice according to an embodiment of the present invention;
FIG. 3 is a schematic view of an effective area for rectifying an invoice picture according to an embodiment of the present invention;
FIG. 4 is a schematic view of a positive, non-inverted invoice valid area according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an invoice information company name to be identified according to an embodiment of the invention;
FIG. 6 is a schematic view of a positioning template according to an embodiment of the present invention;
FIG. 7 is a zoom view of the positioning template with the highest matching degree according to the embodiment of the present invention;
FIG. 8 is a schematic diagram of an area in which an invoice information company name is to be identified according to an embodiment of the invention;
FIG. 9 is a schematic diagram of a row of a company name of an invoice information to be identified according to an embodiment of the invention;
fig. 10 is a schematic diagram of a picture of a company name of an invoice information to be identified according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical solution of the present invention, the following embodiments are provided to further explain the present invention.
The first embodiment is as follows:
the embodiment provides a method for positioning and identifying specific plates of an invoice picture, which is explained by taking a value-added tax invoice as an example, and comprises the following steps:
s1, extracting and setting an invoice effective area from a value-added tax invoice issuing picture:
the invoice pictures are often shot by people under various uncertain actual conditions, so the positions of the effective areas of the invoices in the pictures, background patterns, light rays, definition and the rotation angles of the invoices are not fixed, and the situation that the effective areas cannot be directly identified by calling an image identification program is avoided, so that the accurate positions of all targets need to be firstly positioned, and the targets need to be respectively intercepted into independent fragments for identification.
S1.1, acquiring an invoice picture as shown in figure 1;
s1.2, reading an invoice picture, and detecting all lines in the invoice picture by using a Houghline algorithm of openCV; calculating a deflection angle of each line; finding out the rotation angle which appears most times, namely the deflection angle of the picture; reversely rotating the picture according to the deflection angle to obtain a straightened picture, as shown in fig. 2;
s1.3, aiming at the corrected invoice picture, firstly extracting the outlines of all details, calculating the sum of pixel values of each pixel column, then calculating the sum of pixel values of each pixel row, and multiplying two vectors into a two-dimensional array (which can be used as two-dimensional picture processing); removing the horizontal line and the vertical line in the picture to completely segment the invoice area and the background, obtaining the effective area outline of the invoice-righting picture, intercepting to obtain the picture of the invoice effective area, as shown in fig. 3, specifically:
extracting the outline of the effective invoice area of the invoice putting invoice picture:
respectively calculating the sum of pixel values of each pixel column and the sum of pixel values of each pixel row of the whole invoice picture with the background not removed to obtain two vectors (the calculated pixel row or column sum is calculated for the whole invoice picture including the background, because the program does not know where the outline of the invoice effective area is, and the program needs to find where the outline of the invoice effective area by calculating the pixel row-column sum, multiplying, and horizontal lines);
multiplying the two vectors to obtain a two-dimensional array;
converting the two-dimensional array into a corresponding gray picture (all values are normalized to the range of 0 to 255 according to the proportion, and each value from 0 to 255 sequentially represents a gray value from white to gray of different levels and then to black);
directionally filtering horizontal lines and vertical lines in the picture (determining which form is filtered by changing parameters) through a cv2.MorphologyEx function (certain form in the directionally filtered picture) in an openCV toolkit, and removing the horizontal lines and the vertical lines in the picture to obtain an invoice effective area outline;
s1.4, finding out whether the position of the invoice head is above the picture or not through an SIFT algorithm, if not, turning the picture upside down, and finally obtaining the invoice picture which is not provided with a deflection angle and is not inverted and only provided with the invoice effective area, as shown in fig. 4.
S2, according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located, and obtaining the invoice information picture to be identified:
s2.1, determining invoice information to be identified: company name, as shown in FIG. 5;
s2.2, determining a positioning template according to the invoice information to be identified, wherein as shown in FIG. 6, different positioning templates can be determined according to different identification information due to the fixed position of the invoice information to be identified in the invoice;
s2.3, performing template matching on the whole picture by using a positioning template, determining the general position of the whole picture in the invoice picture, and intercepting part of the picture of the region where the invoice information to be identified is located;
s2.4, scaling the partial picture of the area where the company name information is located from 0.8 to 1.2 times of the scale parameter by 10 steps to obtain 10 scaling pictures;
s2.5, operating to obtain the 10 graphs and a Canny contour map of the positioning template, then finding a region similar to the template from the 10 graphs through template matching, simultaneously calculating the matching degree, and reserving a zoom graph with the highest matching degree, wherein the zoom graph is shown in FIG. 7 and matches four vertex coordinates of the position of the region;
s2.6, according to the obtained four coordinates, framing the four rows of content areas, as shown in FIG. 8;
and S2.7, extracting pixels of all company name characters from the image of the framed area, and then performing reverse binarization (converting pixels with the gray value larger than zero into white 255) to obtain an image with a black background and gray characters. Calculating the sum of pixel values of each pixel row, distinguishing the character row from the gap according to a certain threshold value 14000, if the area with the sum of pixel values larger than 14000 is the character row, and the pixel row smaller than 14000 is the gap, thereby judging and dividing three rows with characters in the image, and intercepting the first row as a further target, as shown in fig. 9.
S2.8, calculating the sum of the pixel values of each pixel column of the previous image, judging and segmenting a character region according to a threshold value 4000, and further obtaining a picture of a company name region, wherein the picture is shown in FIG. 10;
and S3, the text content of the invoice information picture to be identified can adopt a tool for identifying the printed Chinese character string and the number string, such as a tesseract tool kit of ***, which is sourced at present, and the picture content can be directly output for identification by calling the identification function of the tool kit in a program, namely the invoice information to be identified is obtained.
And acquiring invoice information to be identified.
The method can be used for accurately positioning the specific target area of the value-added tax invoice, the target plates are intercepted into target fragments one by one for identification, and the accuracy of invoice identification can be effectively improved in an auxiliary manner.
The system for positioning and identifying the specific plate of the invoice picture in the embodiment comprises:
acquiring an invoice picture acquisition unit;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain a positioning unit of the invoice information picture to be identified;
and the identification unit is used for identifying the text content of the invoice information picture to be identified and acquiring the invoice information to be identified.
The device for locating and identifying the specific plate of the invoice picture comprises a computer readable medium storing a computer program, wherein the program is executed to execute the following steps:
acquiring an invoice picture;
determining invoice information to be identified;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified;
and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified.
Example two:
the same features of this embodiment and the first embodiment are not described again, and the different features of this embodiment and the first embodiment are:
in the method for locating and identifying the specific plate of the invoice picture of the embodiment,
s2, according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located, and obtaining the invoice information picture to be identified:
s2.1, determining invoice information to be identified: a sequence of ticket numbers;
s2.2, determining a positioning template according to the invoice information to be identified;
s2.3, performing template matching on the whole picture by using a positioning template, determining the general position of the whole picture in the invoice picture, and intercepting part of the picture of the region where the invoice information to be identified is located;
s2.4, scaling the partial picture of the region where the invoice information to be identified is located by 10 steps from 0.8 to 1.2 times of the scale parameter to obtain 10 scaling pictures;
s2.5, operating to obtain the 10 graphs and a Canny contour map of the positioning template, then finding a region similar to the template from the 10 graphs through template matching, simultaneously calculating the matching degree, and keeping four vertex coordinates of the position of the zooming graph with the highest matching degree and the matching region;
s2.6, according to the obtained four coordinates, framing the four rows of content areas;
and S2.7, extracting pixels of all invoice information characters to be identified from the image of the framed area, and then carrying out reverse binarization (converting pixels with gray values larger than zero into white 255) to obtain an image with black background and gray characters. Calculating the sum of pixel values of each pixel row, distinguishing character rows from gaps according to a certain threshold value, if the area of which the sum of the pixel values is greater than the threshold value is a character row, and the pixel rows which are smaller than the threshold value are gaps, judging and segmenting the rows with characters in the image, and intercepting the picture of the row where the target is located.
S2.8, calculating the sum of pixel values of each pixel row of the upper image, judging and segmenting a character region according to a threshold value, and further obtaining a picture of a ticket number sequence region;
and S3, recognizing the text content of the invoice information picture to be recognized, and acquiring the invoice information to be recognized.
Example three:
the same features of this embodiment and the first embodiment are not described again, and the different features of this embodiment and the first embodiment are:
in the method for locating and identifying the specific plate of the invoice picture of the embodiment,
s2, according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located, and obtaining the invoice information picture to be identified:
s2.1, determining invoice information to be identified: date of billing;
s2.2, determining a positioning template according to the invoice information to be identified;
s2.3, performing template matching on the whole picture by using a positioning template, determining the general position of the whole picture in the invoice picture, and intercepting part of the picture of the region where the invoice information to be identified is located;
s2.4, carrying out 15-step scaling on the partial picture of the region where the invoice information to be identified is located from 0.8 to 1.2 times of the proportional parameter to obtain a 15-piece scaling picture;
s2.5, obtaining Canny contour maps of the 15 maps and the positioning template through operation, then finding a region similar to the template through template matching from the 10 maps, simultaneously calculating the matching degree, and reserving four vertex coordinates of the position of the scaling map with the highest matching degree and the matching region;
s2.6, according to the obtained four coordinates, framing the four rows of content areas;
and S2.7, extracting pixels of all invoice information characters to be identified from the image of the framed area, and then carrying out reverse binarization to obtain an image with black background and gray characters. Calculating the sum of the pixel values of each pixel row, distinguishing the character row from the gap according to a certain threshold value, if the area of which the sum of the pixel values is greater than the threshold value is the character row, and the pixel row of which the sum of the pixel values is less than the threshold value is the gap, judging and segmenting the row with characters in the image, and intercepting the image of the row where the target is located.
S2.8, calculating the sum of pixel values of each pixel row of the upper image, judging and segmenting a character area according to a threshold value, and further obtaining an image of an invoicing date area;
and S3, recognizing the text content of the invoice information picture to be recognized, and acquiring the invoice information to be recognized.
Example four:
the same features of this embodiment and the first embodiment are not repeated, and the different features of this embodiment and the first embodiment are:
in the method for locating and identifying the specific plate of the invoice picture of the embodiment,
s2, according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located, and obtaining the invoice information picture to be identified:
s2.1, determining invoice information to be identified: an amount;
s2.2, determining a positioning template according to the invoice information to be identified;
s2.3, performing template matching on the whole picture by using a positioning template, determining the general position of the whole picture in the invoice picture, and intercepting part of the picture of the region where the invoice information to be identified is located;
s2.4, scaling the partial picture of the region where the invoice information to be identified is located by 10 steps from 0.8 to 1.2 times of the scale parameter to obtain 10 scaling pictures;
s2.5, operating to obtain the 10 graphs and a Canny contour map of the positioning template, then finding a region similar to the template from the 10 graphs through template matching, simultaneously calculating the matching degree, and keeping four vertex coordinates of the position of the zooming graph with the highest matching degree and the matching region;
s2.6, framing the four rows of content areas according to the obtained four coordinates;
and S2.7, extracting pixels of all invoice information characters to be identified from the image of the framed area, and then carrying out reverse binarization to obtain an image with black background and gray characters. Calculating the sum of pixel values of each pixel row, distinguishing character rows from gaps according to a certain threshold value, if the area of which the sum of the pixel values is greater than the threshold value is a character row, and the pixel rows which are smaller than the threshold value are gaps, judging and segmenting the rows with characters in the image, and intercepting the picture of the row where the target is located.
S2.8, calculating the sum of pixel values of each pixel row of the previous image, judging and segmenting a character region according to a threshold value, and further obtaining a picture of a money region;
and S3, identifying the text content of the invoice information picture to be identified, and acquiring the invoice information to be identified.
Example five:
the same features of this embodiment and the first embodiment are not described again, and the different features of this embodiment and the first embodiment are:
in the method for locating and identifying the specific plate of the invoice picture of the embodiment,
s2, according to the invoice information to be identified, positioning and segmenting the region of the invoice picture where the invoice information is located, and obtaining the invoice information picture to be identified:
s2.1, determining invoice information to be identified: the amount of the tax;
s2.2, determining a positioning template according to the invoice information to be identified;
s2.3, performing template matching on the whole picture by using a positioning template, determining the general position of the whole picture in the invoice picture, and intercepting part of the picture of the region where the invoice information to be identified is located;
s2.4, scaling the partial picture of the region where the invoice information to be identified is located by 10 steps from 0.8 to 1.2 times of the scale parameter to obtain 10 scaling pictures;
s2.5, operating to obtain the 10 graphs and a Canny contour map of the positioning template, then finding a region similar to the template from the 10 graphs through template matching, simultaneously calculating the matching degree, and keeping four vertex coordinates of the position of the zooming graph with the highest matching degree and the matching region;
s2.6, according to the obtained four coordinates, framing the four rows of content areas;
and S2.7, extracting pixels of all invoice information characters to be identified from the image of the framed area, and then carrying out reverse binarization to obtain an image with black background and gray characters. Calculating the sum of pixel values of each pixel row, distinguishing character rows from gaps according to a certain threshold value, if the area of which the sum of the pixel values is greater than the threshold value is a character row, and the pixel rows which are smaller than the threshold value are gaps, judging and segmenting the rows with characters in the image, and intercepting the picture of the row where the target is located.
S2.8, calculating the sum of pixel values of each pixel row of the upper image, judging and segmenting a character region according to a threshold value, and further obtaining a picture of the tax region;
and S3, identifying the text content of the invoice information picture to be identified, and acquiring the invoice information to be identified.
The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the features described above have similar functions to (but are not limited to) those disclosed in this application.

Claims (11)

1. A method for positioning and identifying specific plate of invoice picture is characterized by comprising the following steps:
acquiring an invoice picture;
determining invoice information to be identified;
determining a positioning template according to invoice information to be identified, wherein different information to be identified corresponds to different positioning templates, and the invoice information to be identified comprises an invoice number sequence, a company name, an invoicing date, money and tax;
performing template matching on the invoice picture by using a positioning template, determining the position of the invoice picture, and intercepting part of picture of the region where the invoice information to be identified is located;
zooming a part of picture of the region where the information to be identified is located to obtain a zoom image;
obtaining a region matched with the zoom map and the determined positioning template and vertex coordinates of the matched region by using a Canny algorithm;
acquiring a picture of an area where invoice information to be identified is located according to the vertex coordinates of the matching area;
extracting pixels of invoice information characters to be identified in the region picture, and performing reverse binarization processing;
calculating the sum of pixel values of the rows and columns of the area pictures, comparing the sum with a preset threshold value for distinguishing characters and gaps, and judging and segmenting an invoice information picture to be identified;
and identifying the text content of the invoice information picture to be identified to obtain the invoice information to be identified.
2. The method as claimed in claim 1, wherein the step of performing after obtaining the invoice picture comprises intercepting the invoice valid area.
3. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 2, wherein before intercepting the invoice valid area, it includes: and (5) correcting the invoice picture.
4. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 3, wherein the step of rectifying the invoice picture comprises the following steps:
detecting all lines in the invoice picture by using a Houghline algorithm of openCV;
calculating a deflection angle of each line;
the rotation angle which appears most times is the deflection angle of the picture;
and rotating the picture according to the deflection angle, and straightening the invoice picture.
5. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 3, wherein intercepting the effective area of the invoice comprises:
extracting the outline of the effective region of the invoice in the invoice picture;
and obtaining the picture of the effective region of the invoice based on the contour segmentation.
6. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 5, wherein the extracting the outline of the invoice valid area in the invoice picture comprises:
respectively calculating the sum of pixel values of each pixel column and the sum of pixel values of each pixel row of the invoice picture to obtain two vectors;
multiplying the two vectors to obtain a two-dimensional array;
converting the two-dimensional array into a corresponding gray picture;
and filtering and removing the horizontal lines and the vertical lines in the invoice picture to obtain the outline of the invoice effective area.
7. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 6, wherein the step of executing after obtaining the picture of the invoice valid area comprises judging whether the position of the invoice head is above the picture by SIFT algorithm, otherwise, turning the picture upside down.
8. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 1, wherein the obtaining of the region matched with the preset locating template and the vertex coordinates of the matched region by using Canny algorithm comprises:
intercepting a part of picture of an area where invoice information to be identified is located;
zooming the part of the picture to obtain a plurality of zoomed pictures;
inputting the multiple zoom images and the preset positioning template into a Canny algorithm for operation, and obtaining an area matched with the preset positioning template and vertex coordinates of the matched area.
9. The method for locating and identifying the specific plate of the invoice picture as claimed in claim 1, wherein the step of calculating the sum of the pixel values of the rows and the columns of the area picture, comparing the sum with the threshold value, and judging and segmenting the invoice information picture to be identified comprises the following steps:
calculating the sum of pixel values of each pixel row of the region picture, and judging and segmenting the region picture of the row where the invoice information to be identified is located according to the threshold value of the pixel row;
and calculating the sum of the pixel values of each pixel column of the line region picture, and judging and segmenting the invoice information picture to be identified according to the threshold value of the pixel column.
10. A positioning and identifying system for specific plate of invoice picture is characterized by comprising:
acquiring an invoice picture acquisition unit;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain a positioning unit of the invoice information picture to be identified;
the identification unit is used for identifying the text content of the invoice information picture to be identified and acquiring the invoice information to be identified;
the positioning and segmentation of the region where the invoice information to be identified is located comprises that the positioning unit is specifically used for:
determining a positioning template according to invoice information to be identified, wherein different information to be identified corresponds to different positioning templates, and the invoice information to be identified comprises an invoice number sequence, a company name, an invoicing date, money and tax;
performing template matching on the invoice picture by using a positioning template, determining the position of the invoice picture, and intercepting part of picture of the region where the invoice information to be identified is located;
zooming a part of picture of the region where the information to be identified is located to obtain a zoom image;
obtaining a region matched with the zoom map and the determined positioning template and vertex coordinates of the matched region by using a Canny algorithm;
acquiring a picture of an area where invoice information to be identified is located according to the vertex coordinates of the matching area;
extracting pixels of invoice information characters to be identified in the region picture, and performing reverse binarization processing;
and calculating the sum of pixel values of the rows and columns of the area picture, comparing the sum with a preset threshold value for distinguishing characters and gaps, and judging and segmenting the invoice information picture to be identified.
11. An apparatus for locating and identifying a specific plate in an invoice image, comprising a computer-readable medium having a computer program stored thereon, the program being operative to perform:
acquiring an invoice picture;
determining invoice information to be identified;
according to the invoice information to be identified, positioning and dividing the region of the invoice picture where the invoice information is located to obtain the invoice information picture to be identified;
identifying the text content of the invoice information picture to be identified to obtain invoice information to be identified;
the positioning and the segmentation of the region where the invoice information to be identified is located comprise the following steps:
determining a positioning template according to invoice information to be identified, and determining different positioning templates according to different information to be identified, wherein the invoice information to be identified comprises an invoice number sequence, a company name, an invoicing date, money and tax;
performing template matching on the invoice picture by using a positioning template, determining the position of the invoice picture, and intercepting part of picture of the region where the invoice information to be identified is located;
zooming a part of the picture of the area where the information to be identified is located to obtain a zoom image;
obtaining a region matched with the zoom map and the determined positioning template and vertex coordinates of the matched region by using a Canny algorithm;
acquiring a picture of an area where invoice information to be identified is located according to the vertex coordinates of the matching area;
extracting pixels of invoice information characters to be identified in the region picture, and performing reverse binarization processing;
and calculating the sum of pixel values of the rows and columns of the area picture, comparing the sum with a preset threshold value for distinguishing characters and gaps, and judging and segmenting the invoice information picture to be identified.
CN201710724450.9A 2017-08-22 2017-08-22 Method, system and equipment for positioning and identifying specific plate of invoice picture Active CN109426814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710724450.9A CN109426814B (en) 2017-08-22 2017-08-22 Method, system and equipment for positioning and identifying specific plate of invoice picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710724450.9A CN109426814B (en) 2017-08-22 2017-08-22 Method, system and equipment for positioning and identifying specific plate of invoice picture

Publications (2)

Publication Number Publication Date
CN109426814A CN109426814A (en) 2019-03-05
CN109426814B true CN109426814B (en) 2023-02-24

Family

ID=65498246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710724450.9A Active CN109426814B (en) 2017-08-22 2017-08-22 Method, system and equipment for positioning and identifying specific plate of invoice picture

Country Status (1)

Country Link
CN (1) CN109426814B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598686B (en) * 2019-09-17 2023-08-04 携程计算机技术(上海)有限公司 Invoice identification method, system, electronic equipment and medium
CN110751088A (en) * 2019-10-17 2020-02-04 深圳金蝶账无忧网络科技有限公司 Data processing method and related equipment
CN110796082B (en) * 2019-10-29 2020-11-24 上海眼控科技股份有限公司 Nameplate text detection method and device, computer equipment and storage medium
CN111444793A (en) * 2020-03-13 2020-07-24 安诚迈科(北京)信息技术有限公司 Bill recognition method, equipment, storage medium and device based on OCR
CN113469161A (en) * 2020-03-31 2021-10-01 顺丰科技有限公司 Method, device and storage medium for processing logistics list
CN111489246A (en) * 2020-04-09 2020-08-04 贵州爱信诺航天信息有限公司 Electronic integrated posting system for value-added tax invoice
CN111931784B (en) * 2020-09-17 2021-01-01 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium
CN112257629A (en) * 2020-10-29 2021-01-22 广联达科技股份有限公司 Text information identification method and device for construction drawing
CN112257712B (en) * 2020-10-29 2024-02-27 湖南星汉数智科技有限公司 Train ticket image alignment method and device, computer device and computer readable storage medium
CN112732955A (en) * 2021-03-31 2021-04-30 国网浙江省电力有限公司 Financial certificate storage and recording method in standard cost accounting

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464951A (en) * 2007-12-21 2009-06-24 北大方正集团有限公司 Image recognition method and system
CN103488999A (en) * 2013-09-11 2014-01-01 东华大学 Invoice data recording method
CN103617415A (en) * 2013-11-19 2014-03-05 北京京东尚科信息技术有限公司 Device and method for automatically identifying invoice
CN104112128A (en) * 2014-06-19 2014-10-22 中国工商银行股份有限公司 Digital image processing system applied to bill image character recognition and method
CN105045780A (en) * 2015-07-15 2015-11-11 广州敦和信息技术有限公司 Method and device for identifying semantic information of invoice brief notes
KR101615306B1 (en) * 2015-08-10 2016-04-25 (주)이씨에이시스템 A COD recognition system that recognizes the characters on the COD and detects the COD
CN105528604A (en) * 2016-01-31 2016-04-27 华南理工大学 Bill automatic identification and processing system based on OCR
CN105631393A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Information recognition method and device
CN106845469A (en) * 2017-01-24 2017-06-13 深圳怡化电脑股份有限公司 A kind of Paper Currency Identification and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4003428B2 (en) * 2001-10-10 2007-11-07 セイコーエプソン株式会社 Check processing apparatus and check processing method
US8254658B2 (en) * 2007-08-03 2012-08-28 Bank Of America Corporation Payee detection
US20160162995A1 (en) * 2014-12-04 2016-06-09 Siemens Technology And Services Pvt. Ltd. Method and system for duplicate invoice entry detection
CN105654072B (en) * 2016-03-24 2019-03-01 哈尔滨工业大学 A kind of text of low resolution medical treatment bill images automatically extracts and identifying system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464951A (en) * 2007-12-21 2009-06-24 北大方正集团有限公司 Image recognition method and system
CN103488999A (en) * 2013-09-11 2014-01-01 东华大学 Invoice data recording method
CN103617415A (en) * 2013-11-19 2014-03-05 北京京东尚科信息技术有限公司 Device and method for automatically identifying invoice
CN104112128A (en) * 2014-06-19 2014-10-22 中国工商银行股份有限公司 Digital image processing system applied to bill image character recognition and method
CN105631393A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Information recognition method and device
CN105045780A (en) * 2015-07-15 2015-11-11 广州敦和信息技术有限公司 Method and device for identifying semantic information of invoice brief notes
KR101615306B1 (en) * 2015-08-10 2016-04-25 (주)이씨에이시스템 A COD recognition system that recognizes the characters on the COD and detects the COD
CN105528604A (en) * 2016-01-31 2016-04-27 华南理工大学 Bill automatic identification and processing system based on OCR
CN106845469A (en) * 2017-01-24 2017-06-13 深圳怡化电脑股份有限公司 A kind of Paper Currency Identification and device

Also Published As

Publication number Publication date
CN109426814A (en) 2019-03-05

Similar Documents

Publication Publication Date Title
CN109426814B (en) Method, system and equipment for positioning and identifying specific plate of invoice picture
CN111474184B (en) AOI character defect detection method and device based on industrial machine vision
CN112949564A (en) Pointer type instrument automatic reading method based on deep learning
CN110119741B (en) Card image information identification method with background
CN109409355B (en) Novel transformer nameplate identification method and device
CN108133216B (en) Nixie tube reading identification method capable of realizing decimal point reading based on machine vision
JP5468332B2 (en) Image feature point extraction method
CN111222507B (en) Automatic identification method for digital meter reading and computer readable storage medium
CN105718931B (en) System and method for determining clutter in acquired images
CN112149667A (en) Method for automatically reading pointer type instrument based on deep learning
CN111539330B (en) Transformer substation digital display instrument identification method based on double-SVM multi-classifier
CN111259891B (en) Method, device, equipment and medium for identifying identity card in natural scene
CN110738216A (en) Medicine identification method based on improved SURF algorithm
CN111046881A (en) Pointer type instrument reading identification method based on computer vision and deep learning
CN103699876B (en) Method and device for identifying vehicle number based on linear array CCD (Charge Coupled Device) images
US10395090B2 (en) Symbol detection for desired image reconstruction
CN105678737A (en) Digital image corner point detection method based on Radon transform
CN113903024A (en) Handwritten bill numerical value information identification method, system, medium and device
CN112419207A (en) Image correction method, device and system
CN110288040B (en) Image similarity judging method and device based on topology verification
CN109753981B (en) Image recognition method and device
CN113283439B (en) Intelligent counting method, device and system based on image recognition
CN116994269A (en) Seal similarity comparison method and seal similarity comparison system in image document
CN111935480B (en) Detection method for image acquisition device and related device
CN114627457A (en) Ticket information identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant