CN111210455B - Method and device for extracting preprinted information in image, medium and electronic equipment - Google Patents

Method and device for extracting preprinted information in image, medium and electronic equipment Download PDF

Info

Publication number
CN111210455B
CN111210455B CN201911268302.6A CN201911268302A CN111210455B CN 111210455 B CN111210455 B CN 111210455B CN 201911268302 A CN201911268302 A CN 201911268302A CN 111210455 B CN111210455 B CN 111210455B
Authority
CN
China
Prior art keywords
image
information
processed
preset
binary image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911268302.6A
Other languages
Chinese (zh)
Other versions
CN111210455A (en
Inventor
马文伟
王亚领
刘设伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201911268302.6A priority Critical patent/CN111210455B/en
Publication of CN111210455A publication Critical patent/CN111210455A/en
Application granted granted Critical
Publication of CN111210455B publication Critical patent/CN111210455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to the technical field of image processing, in particular to a method and a device for extracting preprinted information in an image, a medium and electronic equipment, wherein the method for extracting the preprinted information comprises the following steps: converting the image to be processed into a gray level image and dividing the gray level image according to a preset two-level dividing threshold value to obtain a corresponding binary image G; converting the image to be processed into Lab color space, and dividing the image to be processed according to the sum S of the a component and the b component of the image to be processed and a preset S threshold value to obtain a binary image S y The method comprises the steps of carrying out a first treatment on the surface of the From binary image G 0 And a binary image S y Determining the number of the corresponding information types of the image to be processed; and selecting a corresponding preset method according to the quantity to extract preprinting information in the image to be processed. According to the technical scheme, the number of the information categories in the image to be processed can be accurately judged, different extraction methods are selected according to the number, preprint information in the image to be processed is accurately extracted, and the problems of extraction transition or insufficient extraction of the preprint information are avoided.

Description

Method and device for extracting preprinted information in image, medium and electronic equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method for extracting preprinted information in an image, an apparatus for extracting preprinted information in an image, a computer readable storage medium, and an electronic device.
Background
An optical character recognition (Optical Character Recognition, OCR) system is a system capable of converting an image file into a text format, and is widely used in aspects of document information collection, document information entry, and the like.
When entering voucher information, in order to distinguish preprinted information and printing information in a voucher picture, the conventional OCR system usually sets an empirical color value of the preprinted information corresponding to the voucher manually, and extracts the preprinted information according to the empirical color value so as to distinguish the preprinted information and the printing information. However, since the printing modes of the preprinted information are different, the empirical color values often deviate from the true certificates, so that problems of extraction transition or insufficient extraction often occur when the preprinted information is extracted according to the empirical color values.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The disclosure aims to provide a method for extracting preprinted information in an image, an apparatus for extracting preprinted information in an image, a computer readable storage medium and an electronic device, so as to overcome the problems of transition or insufficient extraction of preprinted information at least to a certain extent.
Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a method for extracting preprinted information in an image, including:
converting an image to be processed into a gray image, and dividing the gray image according to a preset two-stage dividing threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, including a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0 The method comprises the steps of carrying out a first treatment on the surface of the And
converting the image to be processed into Lab color space, and dividing the image to be processed according to the sum S of the a component and the b component of the image to be processed and a preset S threshold value to obtain a binary image S y
According to the binary image G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed;
and selecting a corresponding preset method according to the quantity to extract preprinting information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the image G is obtained from the binary image 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed comprises the following steps:
calculating the binary image G 0 And the binary image S y Overlapping portion with the binary image G 0 Is a coincidence rate of (2);
and determining the quantity of the information types corresponding to the image to be processed according to the coincidence rate.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the determining, according to the coincidence ratio, the number of information types corresponding to the image to be processed includes:
if the coincidence rate is larger than a preset judging threshold value, judging that the number of information types corresponding to the image to be processed is 2;
and if the coincidence rate is smaller than or equal to a preset judging threshold value, judging that the number of the information types corresponding to the image to be processed is 3.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the selecting a corresponding preset method according to the number to extract preprinted information in the image to be processed includes:
If the number of the information types corresponding to the image to be processed is 2, selecting a first preset method to extract preprinting information in the image to be processed;
and if the number of the information types corresponding to the image to be processed is 3, selecting a second preset method to extract preprinted information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the first preset method includes:
dividing the gray level image according to a preset first-level dividing threshold value to extract preprint information in the image to be processed; the preset first-level segmentation threshold is a gray level threshold.
In an exemplary embodiment of the present disclosure, based on the foregoing, the preset two-stage segmentation threshold includes a preset second segmentation threshold;
the binary image G comprises a binary image G corresponding to a preset second segmentation threshold value 1
The second preset method comprises the following steps:
calculating the binary image S y And the binary image G 1 To extract preprinting information in the image to be processed.
In an exemplary embodiment of the present disclosure, the binary image S is calculated based on the foregoing scheme y And the binary image G 1 After the overlapping area a of (2), the method further comprises:
Respectively obtaining binary images G 0 The preset information in (a) represents M 0 Corresponding preset information in the position and overlapping area a of (c) represents M A Is a position of (2);
representing M according to the preset information 0 Is represented by M A Is positioned with respect to the binary image G 0 Performing translation so that the preset information represents M 0 Is represented by M A Is matched with each other.
According to a second aspect of the present disclosure, there is provided an extraction apparatus of preprinted information in an image, comprising:
the first segmentation module is used for converting an image to be processed into a gray image, and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; wherein the preset two-stage segmentation threshold includes a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0
A second segmentation module for converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the a and b components of the image to be processed and a preset S threshold to obtain a binary image S y
A quantity determining module for determining the quantity of the binary images G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed;
And the information extraction module is used for selecting a corresponding preset method according to the quantity to extract preprinted information in the image to be processed.
According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of extracting preprinted information in an image as described in the first aspect of the above embodiments.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a processor; and
and a storage device for storing one or more programs, which when executed by the one or more processors, cause the one or more processors to implement a method for extracting preprinted information in an image as described in the first aspect of the above embodiments.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
in the method for extracting preprinting information in an image provided by an embodiment of the present disclosure, an image to be processed is converted into a gray image, and the gray image is segmented according to a preset two-stage segmentation threshold to obtain a corresponding binary image G 0 Simultaneously converting the image to be processed into Lab color space, and dividing the image to be processed according to the sum S of the a component and the b component of the image to be processed and a preset S threshold value to obtain a binary image S y Then according to the binary image G 0 And the binary image S y And determining the quantity of the information types corresponding to the image to be processed, and selecting a corresponding preset method according to the quantity to extract the image to be processed to obtain preprinted information. The technical scheme of the embodiment of the disclosure adopts the method of combining the gray value of the image to be processed and the sum S of the ab component to determine the number of information categories in the image to be processed, so that accurate judgment can be realizedThe number of information categories in the image to be processed is further selected, different extraction methods are further selected according to the number, preprint information in the image to be processed is accurately extracted, and the problems of extraction transition or insufficient extraction of the preprint information are avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:
Fig. 1 schematically illustrates a flowchart of a method for extracting preprinted information in an image in an exemplary embodiment of the present disclosure;
fig. 2 schematically illustrates the binary image G according to the exemplary embodiment of the present disclosure 0 And the binary image S y A flow chart of a method for determining the number of corresponding information categories of the image to be processed;
fig. 3 schematically illustrates a flowchart of a method for determining the number of information categories corresponding to the image to be processed according to the coincidence ratio in an exemplary embodiment of the present disclosure;
fig. 4 schematically illustrates a flowchart of a method for extracting preprint information in the image to be processed according to the number selection corresponding to a preset method in an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a method of processing an image in an exemplary embodiment of the present disclosure;
fig. 6 schematically illustrates a result diagram obtained when preprinting information extraction is performed on a to-be-processed image having 2 kinds of information in an exemplary embodiment of the present disclosure;
fig. 7 schematically illustrates a result diagram obtained when preprinting information extraction is performed on an image to be processed having 3 kinds of information in an exemplary embodiment of the present disclosure;
fig. 8 schematically illustrates a composition diagram of an extraction apparatus of preprinted information in an image in an exemplary embodiment of the present disclosure;
Fig. 9 schematically illustrates a composition diagram of an extraction apparatus of preprinted information in another image in an exemplary embodiment of the present disclosure;
FIG. 10 schematically illustrates a structural schematic diagram of a computer system suitable for use in implementing the electronic device of the exemplary embodiments of the present disclosure;
fig. 11 schematically illustrates a schematic diagram of a computer-readable storage medium according to some embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In the present exemplary embodiment, there is first provided an extraction method of preprinted information in an image, which can be applied to a process of extracting specific information in an image to be processed, for example, for an OCR system, can be used to extract preprinted information in a ticket from an image of the ticket. Referring to fig. 1, the method for extracting preprinted information in an image may include the following steps:
s110, converting an image to be processed into a gray image, and dividing the gray image according to a preset two-stage dividing threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, including a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0
S120, converting the image to be processed into a Lab color space, and dividing the image to be processed according to the sum S of the a component and the b component of the image to be processed and a preset S threshold value to obtain a binary image S y
S130, according to the binary image G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed;
and S140, selecting a corresponding preset method according to the quantity to extract preprinting information in the image to be processed.
According to the extraction method of the preprint information in the image provided in the present exemplary embodiment, the number of the information categories in the image to be processed can be accurately determined by adopting the method of determining the number of the information categories in the image to be processed by combining the gray value of the image to be processed and the sum S of the ab component, and further, different extraction methods are selected according to the number, so that the preprint information in the image to be processed is accurately extracted, and the problem of extraction transition or insufficient extraction of the preprint information is avoided.
Hereinafter, each step of the extraction method of preprinted information in an image in the present exemplary embodiment will be described in more detail with reference to the drawings and embodiments.
Because the document type image to be processed such as bill and the like can have two situations, one is that the image only comprises background information and preprinted information, and the other is that the image simultaneously comprises the background information, the preprinted information and the printed information. When the types of information included in the image to be processed are different, the corresponding information extraction methods are different, so that the types of information in the image to be processed can be judged first, and preprinted information can be further extracted according to the types of information.
Step S110, converting the image to be processed into a gray image, and dividing the gray image according to a preset two-stage dividing threshold to obtain a corresponding binary image G.
In an example embodiment of the present disclosure, the preset two-stage segmentation threshold refers to a gray level threshold, and specifically includes a preset first segmentation threshold. The preset first segmentation threshold value can be set in a self-defined manner according to differences among colors corresponding to different kinds of information in different images to be processed, and corresponds to a gray value range corresponding to the type of information with the highest gray value in the images to be processed, and the specifically set range is not limited in the disclosure. After the image to be processed is converted into a gray image according to the conversion rule of the image, the gray value of each pixel point can be obtained. The pixel points corresponding to the information of the type with the highest gray value in the image to be processed can be extracted by presetting a first segmentation threshold value to form a corresponding binary image G 0
For example, in the images to be processed corresponding to various notes, the colors corresponding to the background, preprinted information and printed information are different, so that after the images to be processed are converted into gray images, gray values at the background, preprinted information and printed information have certain differences. As the gray value of the preprinted information or the printed information is the lowest in the general bill, the pixel point with the lowest gray value in the image to be processed can be extracted by setting a preset first segmentation threshold value, and the binary image G is obtained by heuristic segmentation 0 To extract preprint information or print information, and further confirm the binary image G 0 Whether preprinted information or printed information is contained. The pixel points are grouped according to the gray value of each pixel point in the gray image corresponding to the image to be processed, so that the pixel points which are required by a user and are in the range of the preset two-stage segmentation threshold value can be extracted, and heuristic extraction of preprinted information in the image to be processed is realized.
Step S120, converting the image to be processed into Lab color space, and dividing the image to be processed according to the sum S of the a and b components of the image to be processed and a preset S threshold value to obtain a binary imageS y
In an example embodiment of the present disclosure, the preset S threshold is set in a self-defining manner according to a difference between an S value corresponding to a background and information in an image to be processed, and the S value corresponding to a pixel point of presence information in the image to be processed belongs to a range, so that the pixel point of presence information in the image to be processed can be extracted to obtain a binary image S y . After the image to be processed is converted into a Lab color space, lab values of all pixel points in the image to be processed can be obtained, according to the characteristics of the Lab color space, the sum S of the components a and b is taken as the basis of grouping the pixel points, the sum S of the components a and b is grouped according to a preset S threshold value, and a binary image formed by the pixel points of which the sum of the components a and b is in the range of the preset S threshold value is the binary image S y
For example, in the images to be processed of bills, the background information is generally gray of paper, and the printing information is generally black ink, so that the preprinting information can be distinguished from the background information and the printing information according to a preset S threshold corresponding to the sum S of the components a and b in the Lab color space, and a binary image S corresponding to the preprinting information is obtained y . Through the characteristic of Lab color space, the pixels close to gray in the image to be processed can be separated from the pixels with other colors, so that the separation of preprinted information from background information and printing information is realized.
Step S130, according to the binary image G 0 And the binary image S y And determining the number of the information types corresponding to the image to be processed.
In an example embodiment of the present disclosure, according to the binary image G 0 And the binary image S y Determining the number of the types of information corresponding to the image to be processed, as shown with reference to fig. 2, includes the following steps S210 to S220:
step S210, calculating the binary image G 0 And the binary image S y Overlapping portion with the binary image G 0 Is a coincidence ratio of (2).
And step S220, determining the number of information types corresponding to the image to be processed according to the coincidence ratio.
In an example embodiment of the present disclosure, due to the binary image G obtained in step S110 0 Is composed of pixel points with lowest gray values, and may correspond to preprinted information in the image to be processed or print information in the image to be processed, and can be based on binary image S y And binary image G 0 The overlapped part is in the binary image G 0 The coincidence ratio of the information types in the image to be processed is judged. Specifically, when the image to be processed contains 2 kinds of information of background information and preprint information, the binary image G 0 The corresponding information is preprinted information, and the binary image S y The corresponding information is preprinted information, so that the overlapping part of the preprinted information and the preprinted information is arranged on the binary image G 0 The coincidence rate of the two is larger; when the image to be processed contains 3 kinds of information of background information, preprint information and printing information, the binary image G 0 The corresponding information is print information and thus corresponds to the binary image S y The overlapping part is arranged in the binary image G 0 The overlap ratio of (2) is smaller. From the above analysis, it can be seen that the binary image S y And binary image G 0 The overlapped part is in the binary image G 0 The amount of the coincidence ratio in the image to be processed is judged according to the amount of the information types contained in the image to be processed.
By analyzing the binary image obtained by dividing the image according to the characteristics of the sum of the gray value and the a component and the b component of the Lab color space, the type of information contained in the image to be processed can be accurately judged, and then the accurate division is performed to obtain accurate preprinted information, so that extraction transition or insufficient extraction caused by interference of other information on the preprinted information when the information in the image to be processed is extracted according to the characteristics of the experience color is avoided.
Preferably, different preset discrimination thresholds can be customized for different images to be processed. At this time, the number of information categories corresponding to the image to be processed is determined according to the coincidence ratio, as shown in fig. 3, including the following steps S310 to S320:
step S310, if the coincidence ratio is greater than a preset discrimination threshold, determining that the number of information types corresponding to the image to be processed is 2.
Step S320, if the coincidence ratio is less than or equal to a preset discrimination threshold, determining that the number of information types corresponding to the image to be processed is 3.
In an example embodiment of the present disclosure, different preset discrimination thresholds are defined for different types of to-be-processed images, and the number of information types corresponding to the to-be-processed images may be determined according to the magnitude relation between the coincidence rate calculated in step S210 and the preset discrimination thresholds. When the coincidence rate is larger than a preset judging threshold value, judging that the number of the information types corresponding to the image to be processed is 2, wherein the information types are preprinted information and background information respectively; when the coincidence ratio is smaller than or equal to a preset judging threshold value, the number of the information types corresponding to the image to be processed is 3, and the information types are respectively preprinted information, printing information and background information.
Step S140, selecting a corresponding preset method according to the number to extract preprinting information in the image to be processed.
In an example embodiment of the present disclosure, when the types of information corresponding to the images to be processed are different, the ways of extracting the preprinted information in the images to be processed are also different, so that the selecting a corresponding preset method according to the number extracts the preprinted information in the images to be processed, as shown in fig. 4, and includes the following steps S410 to S420:
in step S410, if the number of information types corresponding to the image to be processed is 2, a first preset method is selected to extract preprinted information in the image to be processed.
In an example embodiment of the present disclosure, when the number of information types corresponding to the image to be processed is 2, that is, the image to be processed includes only preprinted information and background information, a first preset method may be selected to extract preprinted information in the image to be processed, and specifically, the first preset method may include: and dividing the gray level image according to a preset first-level dividing threshold value to extract preprinting information in the image to be processed.
In an example embodiment of the present disclosure, the preset first-level segmentation threshold is a gray-level threshold. When the image to be processed only comprises preprinting information and background information, the gray level image corresponding to the image to be processed only comprises two gray level values, so that pixel points corresponding to the two gray level values can be grouped according to a preset first-level segmentation threshold value to respectively form binary images corresponding to the preprinting information and the background information, and further the preprinting information is extracted from the binary images corresponding to the preprinting information. After the fact that the image to be processed only comprises preprint information and background information is determined, pixel points comprising the preprint information and the background information in the image to be processed can be accurately distinguished according to the gray value according to a preset first-level segmentation threshold value, and then the purpose of extracting the preprint information in the image to be processed is achieved. In addition, since the pixel points including the background information are distinguished at the same time, the background information in the image to be processed can also be extracted.
Step S420, if the number of information types corresponding to the image to be processed is 3, selecting a second preset method to extract preprinted information in the image to be processed.
In an example embodiment of the present disclosure, when the preset two-stage segmentation threshold includes a preset second segmentation threshold, the binary image G includes a binary image G corresponding to the preset second segmentation threshold 1 . At this time, when the number of information types corresponding to the image to be processed is 3, that is, the image to be processed includes preprinted information, printing information and background information, the second preset method includes: calculating the binary image S y And the binary image G 1 To extract preprinting information in the image to be processed.
In an example embodiment of the present disclosure, the gray threshold corresponding to the preset second segmentation threshold is greater than the gray threshold corresponding to the preset first segmentation threshold. When the image to be processed comprises three kinds of information, the gray values of the three kinds of information are respectively printing information, preprinting information and background information from low to high, so that a binary image G corresponding to a preset second segmentation threshold value can be calculated 1 And a binary image S extracted in Lab color space y To extract preprint information in the image to be processed. The preprinting information is extracted by the method for calculating the overlapping area, so that the extraction of the preprinting information by single segmentation can be avoided And in the process, the problem of insufficient extraction or extraction transition caused by the printing information on the extraction result is solved. In addition, since the overlapping area a is determined to be an image area composed of pixel points containing preprinted information, the remaining area of the image to be processed only includes the printing information and the background information 2, so that the printing information and the background information can be distinguished according to the first preset method, and the printing information and the background information in the image to be processed can be extracted respectively.
Taking the example that 2 kinds of information and 3 kinds of information exist in the to-be-processed image corresponding to the bill respectively as an example, implementation details of the technical scheme of the embodiment of the disclosure are explained in detail:
1. there are 2 kinds of information, preprinting information and background information in the image to be processed.
Referring to fig. 6, 6 (a) is an image to be processed; converting 6 (a) into gray image, dividing pixel points according to gray value, and referring to 6 (b), wherein the region composed of pixel points with the same color as 610 is binary image G 0 The method comprises the steps of carrying out a first treatment on the surface of the Dividing the image to be processed in Lab color space, referring to FIG. 6 (c), wherein the region composed of pixels having the same color as that at 620 is a binary image S y The method comprises the steps of carrying out a first treatment on the surface of the Calculating a binary image G 0 And a binary image S y Overlapping portion and binary image G 0 The coincidence rate of the image to be processed is larger than a preset judging threshold value of 0.6, so that the image to be processed is judged to only comprise preprinted information and background information; the preprinting information in the image to be processed can be extracted by dividing the gray image converted in 6 (a), referring to the area composed of pixels having the same color as the white at 630 in fig. 6 (d).
2. There are 3 kinds of information in the image to be processed, preprinting information, printing information and background information.
Referring to fig. 7, 7 (a) is an image to be processed; converting 7 (a) into gray image, dividing pixel points according to gray value, and referring to 7 (b), wherein the region composed of pixel points with the same color as 710 is binary image G 0 The region composed of pixel points with the same color as the 720 color is a binary image G 1 The method comprises the steps of carrying out a first treatment on the surface of the Segmenting the image to be processed in Lab color space, see 7 (c)The region composed of pixels having the same color as the 730 is shown as a binary image S y The method comprises the steps of carrying out a first treatment on the surface of the Calculating a binary image G 0 And a binary image S y Overlapping portion and binary image G 0 The coincidence rate of the image to be processed is smaller than a preset judging threshold value of 0.6, so that the image to be processed is judged to only comprise preprinted information, printing information and background information; by obtaining binary images G 1 And a binary image S y The overlapping area a of (c) may extract preprint information in the image to be processed, referring to an area composed of pixels having the same color as white at 740 in fig. 7 (d).
Further, in the calculating of the binary image S y And the binary image G 1 After the overlapping area A of the image processing device, the printing information and the preprinting information can be processed on the basis of respectively extracting the printing information and the preprinting information, so that the positions of the information corresponding to the content in the printing information and the preprinting information are matched with each other, and the structured output of the data on the image to be processed is facilitated. Referring to fig. 5, the method includes the following steps S510 to S520:
step S510, obtaining binary images G respectively 0 The preset information in (a) represents M 0 Corresponding preset information in the position and overlapping area a of (c) represents M A Is a position of (c).
Step S520, representing M according to preset information 0 The position and preset information of (c) represents M A Is positioned with respect to the binary image G 0 Performing translation so that the preset information represents M 0 The position and preset information of (c) represents M A Is a position match of (c).
In one example embodiment of the present disclosure, a binary image G 0 And the overlapping area A is the binary image G obtained according to the steps 0 And a coincidence region a. When the information in the image to be processed is extracted by adopting the second preset method, the binary image G 0 And the overlapping area A respectively comprise printing information and preprinting information, and can be arranged on a binary image G 0 Any one of the preset information is selected to represent M 0 And acquiring the position of the image to be processed; simultaneously acquiring and presetting information representative M 0 Preset messages of content matching each otherInformation representative M A A position in the image to be processed. Since the positions of the printing information and the preprinting information of the bill are relatively fixed and the contents of the information are mutually corresponding, the binary image G 0 Performing translation so that the preset information represents M 0 Preset information representing M matched with A The positions of all the printing information and the preset information are matched with each other.
Based on the basis of extracting the print information and the preprint information respectively, the binary image G is processed by the position relation represented by the preset information matched with each other in the print information and the preprint information 0 The displacement is carried out, so that the dislocation situation of the printing information and the preprinting information in the image to be processed can be corrected, the positions of the corresponding information of the printing information and the preprinting information are matched, and the structured output of the information in the image to be processed is facilitated.
It is noted that the above-described figures are merely schematic illustrations of processes involved in a method according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
In addition, in an exemplary embodiment of the present disclosure, an apparatus for extracting preprinted information in an image is also provided. Referring to fig. 8, the device 800 for extracting preprinted information in an image includes: a first segmentation module 810, a second segmentation module 820, a quantity determination module 830, and an information extraction module 840.
The first segmentation module 810 may be configured to convert an image to be processed into a gray image, and segment the gray image according to a preset two-stage segmentation threshold to obtain a corresponding binary image G; wherein the preset two-stage segmentation threshold includes a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0
The second segmentation module 820 may be used to convert the image to be processed to Lab color space anddividing the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image S y
The number determination module 830 may be configured to determine, based on the binary image G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed;
the information extraction module 840 may be configured to select a corresponding preset method according to the number to extract preprinted information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the number determination module 830 may be configured to calculate the binary image G 0 And the binary image S y Overlapping portion with the binary image G 0 Is a coincidence rate of (2); and determining the quantity of the information types corresponding to the image to be processed according to the coincidence rate.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the number determining module 830 may be configured to determine, when the coincidence ratio is greater than a preset discrimination threshold, that the number of information types corresponding to the image to be processed is 2; and when the coincidence rate is smaller than or equal to a preset judging threshold value, judging that the number of the information types corresponding to the image to be processed is 3.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the information extracting module 840 may be configured to select a first preset method to extract preprinted information in the image to be processed when the number of information types corresponding to the image to be processed is 2; and when the number of the information types corresponding to the image to be processed is 3, selecting a second preset method to extract preprinted information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the information extraction module 840 may be configured to segment the gray-scale image according to a preset first-level segmentation threshold to extract preprinted information in the image to be processed; the preset first-level segmentation threshold is a gray level threshold.
In an exemplary embodiment of the present disclosure, based on the foregoing schemeThe information extraction module 840 may be configured to calculate the binary image S y And the binary image G 1 To extract preprinting information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, referring to fig. 9, the apparatus 800 for extracting preprinted information in an image further includes an image processing module 850, which may be used to respectively obtain binary images G 0 The preset information in (a) represents M 0 Position and binary image G of (2) 1 The preset information in (a) represents M A Is a position of (2); representing M according to the preset information 0 Is represented by M A Is positioned with respect to the binary image G 0 Performing translation so that the preset information represents M 0 Is represented by M A Is matched with each other.
Since each functional module of the device for extracting preprinted information in an image according to the exemplary embodiment of the present disclosure corresponds to a step of the exemplary embodiment of the method for extracting preprinted information in an image, for details not disclosed in the embodiment of the device of the present disclosure, please refer to the embodiment of the method for extracting preprinted information in an image according to the present disclosure.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the method for extracting preprinted information in an image is provided.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. Components of electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting the various system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.
Wherein the storage unit stores program code that is executable by the processing unit 1010 such that the processing unit 1010 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 1010 may perform step S110 as shown in fig. 1: converting an image to be processed into a gray image, and dividing the gray image according to a preset two-stage dividing threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, including a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0 The method comprises the steps of carrying out a first treatment on the surface of the S120: converting the image to be processed into Lab color space, and dividing the image to be processed according to the sum S of the a component and the b component of the image to be processed and a preset S threshold value to obtain a binary image S y The method comprises the steps of carrying out a first treatment on the surface of the S130: according to the binary image G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed; s140: and selecting a corresponding preset method according to the quantity to extract preprinting information in the image to be processed.
As another example, the electronic device may implement the steps shown in fig. 2-5.
The memory unit 1020 may include readable media in the form of volatile memory units such as Random Access Memory (RAM) 1021 and/or cache memory unit 1022, and may further include Read Only Memory (ROM) 1023.
Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 1030 may be representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 1000 can also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1050. Also, electronic device 1000 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1060. As shown, the network adapter 1060 communicates with other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with the electronic device 1000, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.
Referring to fig. 11, a program product 1100 for implementing the above-described method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (8)

1. A method for extracting preprinted information in an image, comprising:
Converting an image to be processed into a gray image, and dividing the gray image according to a preset two-stage dividing threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, including a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0 The method comprises the steps of carrying out a first treatment on the surface of the And
converting the image to be processed into Lab color space, and dividing the image to be processed according to the sum S of the a component and the b component of the image to be processed and a preset S threshold value to obtain a binary image S y
According to the binary image G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed;
selecting a corresponding preset method according to the quantity to extract preprinting information in the image to be processed;
the method comprises a first preset method or a second preset method, wherein the first preset method is a method selected when the number of the types of the corresponding information of the image to be processed is 2, and the second preset method is a method selected when the number of the types of the corresponding information of the image to be processed is 3;
the first preset method is to divide the gray level image according to a preset first-level division threshold value so as to extract preprinting information in the image to be processed; the second preset method is to obtain a binary image G corresponding to a preset second segmentation threshold 1 And calculates the binary image S y And the binary image G 1 To extract preprinting information in the image to be processed.
2. The method according to claim 1, wherein the binary image G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed comprises the following steps:
calculating the binary image G 0 And the binary image S y Overlapping portion with the binary image G 0 Is a coincidence rate of (2);
and determining the quantity of the information types corresponding to the image to be processed according to the coincidence rate.
3. The method according to claim 2, wherein determining the number of information categories corresponding to the image to be processed according to the coincidence ratio includes:
if the coincidence rate is larger than a preset judging threshold value, judging that the number of information types corresponding to the image to be processed is 2;
and if the coincidence rate is smaller than or equal to a preset judging threshold value, judging that the number of the information types corresponding to the image to be processed is 3.
4. The method of claim 1, wherein the preset two-stage segmentation threshold comprises a preset second segmentation threshold;
the binary image G comprises a binary image G corresponding to a preset second segmentation threshold value 1
5. The method according to claim 4, wherein, in the calculating of the binary image S y And the binary image G 1 After the overlapping area a of (2), the method further comprises:
respectively obtaining binary images G 0 The preset information in (a) represents M 0 Corresponding preset information in the position and overlapping area a of (c) represents M A Is a position of (2);
representing M according to the preset information 0 Is represented by M A Is positioned with respect to the binary image G 0 Performing translation so that the preset information represents M 0 Is represented by M A Is matched with each other.
6. An apparatus for extracting preprinted information in an image, comprising:
the first segmentation module is used for converting an image to be processed into a gray image, and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; wherein the preset two-stage segmentation threshold includes a preset first segmentation threshold, and the binary image G includes a binary image G corresponding to the preset first segmentation threshold 0
A second segmentation module for converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the a and b components of the image to be processed and a preset S threshold to obtain a binary image S y
A quantity determining module for determining the quantity of the binary images G 0 And the binary image S y Determining the number of the information types corresponding to the image to be processed;
the information extraction module is used for selecting a corresponding preset method according to the quantity to extract preprinted information in the image to be processed, wherein the preset method is a first preset method or a second preset method, the first preset method is a method selected when the quantity of the corresponding information types of the image to be processed is 2, and the second preset method is a method selected when the quantity of the corresponding information types of the image to be processed is 3;
the first preset method is to preset a first-level segmentation threshold value to the gray imageSegmenting to extract preprinting information in the image to be processed; the second preset method is to obtain a binary image G corresponding to a preset second segmentation threshold 1 And calculates the binary image S y And the binary image G 1 To extract preprinting information in the image to be processed.
7. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method for extracting preprinted information in an image according to any one of claims 1 to 5.
8. An electronic device, comprising:
a processor; and
a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of extracting preprinted information in an image as claimed in any one of claims 1 to 5.
CN201911268302.6A 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment Active CN111210455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911268302.6A CN111210455B (en) 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911268302.6A CN111210455B (en) 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111210455A CN111210455A (en) 2020-05-29
CN111210455B true CN111210455B (en) 2023-08-01

Family

ID=70789259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911268302.6A Active CN111210455B (en) 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111210455B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574405A (en) * 2015-01-15 2015-04-29 北京天航华创科技股份有限公司 Color image threshold segmentation method based on Lab space
CN105120167A (en) * 2015-08-31 2015-12-02 广州市幸福网络技术有限公司 Certificate picture camera and certificate picture photographing method
CN108596916A (en) * 2018-04-16 2018-09-28 深圳市联软科技股份有限公司 Watermark recognition methods, system, terminal and medium similar in a kind of color
CN109255355A (en) * 2018-05-28 2019-01-22 北京京东尚科信息技术有限公司 Image processing method, device, terminal, electronic equipment and computer-readable medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7113615B2 (en) * 1993-11-18 2006-09-26 Digimarc Corporation Watermark embedder and reader
US9684941B2 (en) * 2012-10-29 2017-06-20 Digimarc Corporation Determining pose for use with digital watermarking, fingerprinting and augmented reality

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574405A (en) * 2015-01-15 2015-04-29 北京天航华创科技股份有限公司 Color image threshold segmentation method based on Lab space
CN105120167A (en) * 2015-08-31 2015-12-02 广州市幸福网络技术有限公司 Certificate picture camera and certificate picture photographing method
CN108596916A (en) * 2018-04-16 2018-09-28 深圳市联软科技股份有限公司 Watermark recognition methods, system, terminal and medium similar in a kind of color
CN109255355A (en) * 2018-05-28 2019-01-22 北京京东尚科信息技术有限公司 Image processing method, device, terminal, electronic equipment and computer-readable medium

Also Published As

Publication number Publication date
CN111210455A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
CN110942074B (en) Character segmentation recognition method and device, electronic equipment and storage medium
US20190065894A1 (en) Determining a document type of a digital document
EP2669847B1 (en) Document processing apparatus, document processing method and scanner
CN110222694B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113313111B (en) Text recognition method, device, equipment and medium
US11983910B2 (en) Image processing system, image processing method, and storage medium each for obtaining pixels of object using neural network
US7796817B2 (en) Character recognition method, character recognition device, and computer product
CN111724396B (en) Image segmentation method and device, computer readable storage medium and electronic equipment
US11935314B2 (en) Apparatus for generating a binary image into a white pixel, storage medium, and method
US11941903B2 (en) Image processing apparatus, image processing method, and non-transitory storage medium
CN111210455B (en) Method and device for extracting preprinted information in image, medium and electronic equipment
CN115376137B (en) Optical character recognition processing and text recognition model training method and device
KR101498546B1 (en) System and method for restoring digital documents
CN107330470B (en) Method and device for identifying picture
CN113780294B (en) Text character segmentation method and device
JP2011248415A (en) Image processing apparatus and image processing program
CN112287653B (en) Method of generating electronic contract, computing apparatus, and computer storage medium
CN115083024A (en) Signature identification method, device, medium and equipment based on region division
CN113807343A (en) Character recognition method and device, computer equipment and storage medium
CN112801960A (en) Image processing method and device, storage medium and electronic equipment
CN111753836A (en) Character recognition method and device, computer readable medium and electronic equipment
CN115273113B (en) Table text semantic recognition method and device
CN111104936A (en) Text image recognition method, device, equipment and storage medium
CN111476090A (en) Watermark identification method and device
JP4793429B2 (en) Image processing apparatus and image processing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant