CN115273115A - Document element labeling method and device, electronic equipment and storage medium - Google Patents

Document element labeling method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115273115A
CN115273115A CN202210876852.1A CN202210876852A CN115273115A CN 115273115 A CN115273115 A CN 115273115A CN 202210876852 A CN202210876852 A CN 202210876852A CN 115273115 A CN115273115 A CN 115273115A
Authority
CN
China
Prior art keywords
target
image
document
region
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210876852.1A
Other languages
Chinese (zh)
Inventor
徐支勇
李长亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Digital Entertainment Co Ltd
Original Assignee
Beijing Kingsoft Digital Entertainment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Digital Entertainment Co Ltd filed Critical Beijing Kingsoft Digital Entertainment Co Ltd
Priority to CN202210876852.1A priority Critical patent/CN115273115A/en
Publication of CN115273115A publication Critical patent/CN115273115A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of document processing, and provides a document element labeling method, a document element labeling device, electronic equipment and a storage medium, wherein the method comprises the following steps: converting a document to be annotated into a target image; performing morphological processing on the target image to obtain a preliminary characteristic region corresponding to each element of the document to be marked in the target image; determining pixel points of pixel values in each preliminary characteristic region belonging to the same connected region as the same target region; and acquiring element content in the target area, and labeling the target area based on the element content. By adopting the method, the consumption of manual resources in the process of marking the document elements is reduced, and due to the superiority of the self region determination of the traditional image processing methods such as morphological processing and the like, the accuracy of the determined target region can be ensured by utilizing the image processing methods such as morphological processing and the like, namely the accuracy of marking the document elements can be ensured.

Description

Document element labeling method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of document processing technologies, and in particular, to a method and an apparatus for labeling document elements, an electronic device, and a storage medium.
Background
With the rapid development of deep learning technology, the document processing technology based on deep learning is also widely applied. For example, a document element classification model trained based on deep learning techniques can be used to classify various elements in a document, such as text elements, picture elements, and table elements. When a deep learning model related to document processing is trained, document elements are generally labeled, and then the labeled document elements are used as a training set for training the deep learning model related to document processing and used for deep learning model training.
At present, the common deep learning document element training set labeling mode includes:
the labeling method is as follows: on the basis of utilizing marking software, manually determining the area of each element in the document, and manually marking different elements;
and a second labeling mode: and (3) labeling by using a mode of combining a deep learning model and manual labeling, namely, training a document element labeling model by using manually labeled document elements as a training set, and realizing mass labeling of the document elements by using the document element labeling model.
However, in the first labeling mode, different elements are labeled by manually determining the area of each element in the document, which not only consumes high labor cost, but also results in low labeling accuracy due to errors caused by manually determining the area; the second labeling method has the same problem as the first labeling method because the second labeling method requires a large amount of high-quality training sets and the training sets are obtained by manual labeling, and thus the second labeling method has the same problem as the first labeling method.
Disclosure of Invention
The embodiment of the invention aims to provide a document element labeling method, a document element labeling device, electronic equipment and a storage medium, so that the labor cost of document element labeling is reduced while the accuracy of document element labeling is not influenced.
In one aspect of the present invention, a document element labeling method is provided, including:
converting a document to be annotated into a target image;
performing morphological processing on the target image to obtain a preliminary characteristic region corresponding to each element of the document to be annotated in the target image;
determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same connected region as the same target region;
and acquiring element content in the target area, and labeling the target area based on the element content.
Optionally, the performing morphological processing on the target image to obtain a preliminary feature region corresponding to each element of the document to be annotated in the target image includes:
converting the target image into a grayscale image;
and performing binarization processing on the gray level image based on a preset filter core to obtain a preliminary characteristic region corresponding to each element of the document to be marked in the target image.
Optionally, the binarizing processing is performed on the grayscale image based on a preset filter core to obtain preliminary feature regions corresponding to each element of the document to be labeled in the target image, and the method includes:
for each pixel point in the gray level image, taking a value obtained by subtracting an original pixel value of the pixel point from 255 as a new pixel value of the pixel point to obtain a target gray level image;
carrying out corrosion and expansion processing on the target gray level image based on a preset line detection filter core to obtain a target morphological image;
and carrying out corrosion and expansion treatment on the target morphological image based on a preset region detection filtering core to obtain a preliminary characteristic region corresponding to each element of the document to be marked.
Optionally, the processing of corroding and expanding the target gray image based on the preset line detection filter kernel to obtain a morphological image includes:
corroding and expanding the target gray level image based on a preset vertical line detection filter core to obtain a primary morphological image;
and carrying out corrosion and expansion treatment on the preliminary morphological image based on a preset transverse line detection filter core to obtain a target morphological image.
Optionally, the converting the document to be annotated into the target image includes:
and converting the document to be annotated into a target image based on a document processing tool PyMuPDF.
Optionally, the determining, as the same target region, pixel points of the same connected region to which the pixel values in each of the preliminary feature regions belong includes:
and determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same 4-communication region as the same target region, or determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same 8-communication region as the same target region.
Optionally, the obtaining of the element content in the target region and labeling the target region based on the element content include:
determining the position coordinates of the target area;
acquiring element content at a position corresponding to the position coordinates in the target image;
and determining the type of the element content, and labeling the target area based on the type.
Optionally, the determining a type of the element content and labeling the target area based on the type includes:
determining the type of the element content as a document header, a document footer, a text paragraph, a picture, a table or a formula, and marking the target area as the corresponding type of the element content.
In another aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of any one of the document element labeling methods when the processor executes the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above-mentioned document element labeling methods.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above-described document element labeling methods.
By adopting the document element marking method provided by the embodiment of the invention, the document to be marked is converted into a target image; performing morphological processing on the target image to obtain a preliminary characteristic region corresponding to each element of the document to be marked in the target image; determining pixel points of pixel values in each preliminary characteristic region belonging to the same connected region as the same target region; acquiring element content in a target area, and labeling the target area based on the element content. The method comprises the steps of performing morphological processing on a target image to obtain a region division result meeting human visual standards in the target image, determining a target region where different document elements are located through connected region analysis, and then directly labeling the target region according to element content in the target region, so that the method not only reduces the consumption of artificial resources in the document element labeling process, but also can ensure the accuracy of the determined target region by using image processing methods such as morphological processing and the like due to the superiority of the region determination of the image processing methods such as the traditional morphological processing and the like, namely the accuracy of the document element labeling.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flowchart of a document element labeling method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating morphological processing of a target image according to an embodiment of the invention;
FIG. 3 is a diagram illustrating morphological processing of an image according to an embodiment of the present invention;
FIG. 4 is another schematic diagram of morphological processing of an image according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a morphological processing of an image according to an embodiment of the present invention;
FIG. 6 is a schematic view of a criss-cross structural element;
FIG. 7 is a schematic diagram of an overlay of criss-crossing structural elements with an image pixel matrix;
FIG. 8 is a flowchart of document element tagging provided by an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.
The embodiment of the invention provides a document element labeling method, a document element labeling device, electronic equipment, a computer-readable storage medium and a computer program product, and aims to reduce the labor cost of document element labeling while not influencing the accuracy of document element labeling because the existing document labeling mode has the problems of high labor cost and low labeling accuracy caused by errors of manually determined regions.
The following first introduces a document element labeling method provided by the embodiment of the present invention. The document element labeling method provided by the embodiment of the present invention may be applied to any electronic device having image processing and document processing functions, and is not specifically limited herein.
Fig. 1 is a flowchart of a document element tagging method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, converting a document to be annotated into a target image.
In the embodiment of the present invention, the document to be annotated includes, but is not limited to: a ppt-formatted document, a pptx-formatted document, a txt-formatted document, a doc-formatted document, a docx-formatted document, an xls-formatted document, an xlsxs-formatted document, and a pdf-formatted document.
In this step, a document to be annotated may be converted into a target image by using a document processing tool PyMuPDF. Of course, any other tool capable of performing document-image conversion may also be used in this step to convert the document to be annotated into the target image, which is not specifically limited herein.
And 102, performing morphological processing on the target image to obtain each element preliminary characteristic area of the document to be annotated in the target image.
In the embodiment of the present invention, each element of the document to be annotated may include: document headers, document footers, text paragraphs, pictures, tables, formulas, and the like. For the document to be annotated with annotations, each element of the document to be annotated may further include annotation text.
The element preliminary characteristic region is a region with image characteristics in a target image obtained after morphological processing is carried out on the target image. The areas of the text paragraphs, table document headers and other elements in the target image have image characteristics, so that the areas can be determined by performing morphological processing on the target image, and the areas such as a large blank background do not belong to the preliminary characteristic area.
And 103, determining pixel points of which the pixel values in the primary characteristic regions belong to the same connected region as the same target region.
If the pixel values of some pixel points belong to the same connected region, the pixel points are probably the pixel points corresponding to the same element. For example, for a text segment, the pixel values of the corresponding pixels are the same and belong to the same connected region. Therefore, the pixel points of which the pixel values in the preliminary feature regions belong to the same connected region can be determined as the same target region, and the pixel points included in the same target region belong to the same element.
In the embodiment of the invention, the pixel points of which the pixel values in each preliminary characteristic region belong to the same 4-communication region can be determined as the same target region, or the pixel points of which the pixel values in each preliminary characteristic region belong to the same 8-communication region can be determined as the same target region.
For example, if the pixel values of the respective pixel points in the same 4-connected region are all within the first preset pixel value range, the pixel points in the 4-connected region may be determined as the same target region, and if the pixel values of the respective pixel points in the same 8-connected region are all within the second preset pixel value range, the pixel points in the 8-connected region may be determined as the same target region. The first preset pixel value range and the second preset pixel value range may be set according to an actual application scenario, for example, the first preset pixel value range may be set to [245,255] or [250,255], and the second preset pixel value range may be set to [252,255] or [253,255].
And 104, acquiring element content in the target area, and labeling the target area based on the element content.
In the embodiment of the present invention, the element content is content at a position corresponding to the position coordinate of the target area, and may include a document header, a document footer, a text paragraph, a picture, a table, a formula, or the like. After the element content in the target area is obtained, marking the target area based on the position coordinates of the target area where the element content is located, and obtaining position information corresponding to the target area.
The target area may be a rectangular area, and labeling the target area specifically may be to record a position coordinate, element content, and an element type corresponding to the rectangular area, where the position coordinate may be four corner coordinates of the rectangular area, may also be one corner coordinate of the rectangular area, and a height and a width of the rectangular area, as long as the position of the rectangular area can be identified, and no specific limitation is made here.
For example, for the target area a, which includes an element of a piece of text, it may be labeled to record information: position coordinates: the position coordinates of the target area a; the element content is as follows: text content; element types: text.
By adopting the document element marking method provided by the embodiment of the invention, the document to be marked is converted into a target image; performing morphological processing on the target image to obtain a preliminary characteristic region corresponding to each element of the document to be marked in the target image; determining pixel points of which the pixel values in the primary characteristic regions belong to the same connected region as the same target region; acquiring element content in a target area, and labeling the target area based on the element content. The method comprises the steps of performing morphological processing on a target image to obtain a region division result meeting human visual standards in the target image, determining target regions where different document elements are located through connected region analysis, and directly labeling the target regions according to element contents in the target regions, so that the consumption of manpower resources in the document element labeling process is reduced, and due to the superiority of region determination of the traditional image processing methods such as morphological processing, the accuracy of the determined target regions can be guaranteed by the image processing methods such as morphological processing, and the accuracy of document element labeling can be guaranteed.
In a possible implementation manner, fig. 2 is a flowchart of performing morphological processing on a target image according to an embodiment of the present invention, and as shown in fig. 2, the performing morphological processing on the target image to obtain preliminary feature regions corresponding to elements of the document to be annotated in the target image may include:
step 201, converting the target image into a gray scale image.
The gray scale image has the characteristics of enabling the image to display more details, improving the contrast of the image, selectively highlighting interesting features of the image or inhibiting unwanted features in the image and enabling the distribution of pixels to be more uniform, so that the target image can be converted into the gray scale image for further processing.
Specifically, for each pixel point in the target image, the pixel value of the pixel point may be transformed based on the formula Gray = R0.299 + G0.587 + B0.114, so that the transformed pixel value is located in [0,255], where R, G, and B represent the colors of three channels of red, green, and blue, and Gray is the transformed pixel Gray scale.
Step 202, performing binarization processing on the gray level image based on a preset filter core to obtain a preliminary characteristic region corresponding to each element of the document to be labeled in the target image.
In this embodiment, after the grayscale image is binarized based on the preset filter kernel, a processed image with only two pixel values of 0 and 255 can be obtained, the contrast is more obvious, and the identification of the preliminary feature region corresponding to each element is facilitated. In one case, if the document to be labeled includes a dividing line, the preliminary feature region corresponding to each element may be each region divided by the dividing line. In another case, if the document to be annotated does not include a section line, the preliminary feature region corresponding to each element may be a region having image features such as a document header, a document footer, a text paragraph, a picture, a table, and a formula.
Specifically, the step of performing binarization processing on the grayscale image based on a preset filter kernel to obtain a preliminary feature region corresponding to each element of the document to be labeled in the target image may include the following steps A1 to A3:
step A1, regarding each pixel point in the gray level image, using a value obtained by subtracting an original pixel value of the pixel point from 255 as a new pixel value of the pixel point, and obtaining a target gray level image.
The pixel point corresponding to each element in the general gray image is black, the background is white, and the white brightness is high, so that the processing such as identification is convenient, so that the gray image can be subjected to negation processing, that is, a value obtained by subtracting the original pixel value of the pixel point from 255 is used as a new pixel value of the pixel point, and a target gray image is obtained. In the target gray image, pixel points corresponding to all elements are white, and the background is black.
And A2, carrying out corrosion and expansion treatment on the target gray level image based on a preset line detection filter core to obtain a target morphological image.
And A3, carrying out corrosion and expansion treatment on the target morphological image based on a preset region detection filter core to obtain a preliminary characteristic region corresponding to each element of the document to be marked.
The etching treatment may specifically be to take the minimum gray value in the neighborhood of the pixel point at each position as the output gray value of the pixel point at the position. The expansion processing is to take the maximum value of the neighborhood internal value of the pixel point at each position as the output gray value of the pixel point at the position. In the above steps A2 and A3, filtering kernels with different sizes and processing times can be selected according to actual requirements to perform erosion and expansion processing. This is followed by an example for the sake of clarity and clarity.
In an embodiment, the step of performing erosion and expansion processing on the target gray-scale image based on the preset line detection filter kernel to obtain a morphological image may include the following steps B1-B2:
and B1, performing corrosion and expansion processing on the target gray level image based on a preset vertical line detection filter core to obtain a primary morphological image.
In the embodiment of the present invention, the preset vertical line detection filtering kernel may be used to detect whether a columnar vertical line exists in a to-be-annotated document represented by the target gray-scale image, and specifically may be used to perform layout analysis on the document represented by the target image. The preset vertical line detection filter kernel can be set to be a rectangular filter kernel with a size of m × 1 according to an actual application situation, where m is a height of the rectangular filter kernel, 1 is a width of the rectangular filter kernel, and a value of m can take any reasonable value such as 2, 3, 4, and the like, and is not limited herein. Specifically, the rectangular filter kernel of m × 1 size may be used to perform erosion processing on the target grayscale image to remove noise of the target grayscale image, and then the rectangular filter kernel of m × 1 size may be used to perform expansion processing on the image obtained after the erosion processing, so that the image forms a plurality of closed regions, and further a preliminary morphological image with a plurality of closed regions may be obtained.
The erosion processing is to take the minimum gray value in the neighborhood of the pixel point at each position as the output gray value of the pixel point at the position, that is, when the rectangular filter with the size of m × 1 is adopted to conduct erosion processing on the target gray image, the minimum gray value in the neighborhood with the size of m × 1 is taken as the new gray value of each pixel in the target gray image. Thus, as long as there is a black pixel in the pixels included in the current m × 1 rectangle, the new gray value of the pixel is 0.
Since the width of the rectangular filter kernel is 1, that is, the rectangular filter kernel is in units of pixel columns, when a column vertical line exists in the target gray-scale image, for the pixel points on the column vertical line, the pixel points included in the m × 1 rectangle are all white pixel points, so that after the etching treatment, the pixel values of the pixel points included in the column vertical line are still white, and thus the column vertical line in the target gray-scale image is detected.
The expansion processing is to take the maximum value of the neighborhood value of the pixel point at each position as the output gray value of the pixel point at the position, that is, when the rectangular filter with the size of m × 1 is adopted to perform expansion processing on the target gray image, the maximum gray value in the neighborhood with the size of m × 1 of each pixel in the target gray image is taken as the new gray value of the pixel. Thus, as long as there is a white pixel in the pixels included in the current m × 1 rectangle, the new gray value of the pixel is 255.
And because the width of the rectangular filtering kernel is 1, that is, the pixel column is taken as a unit, when a column vertical line exists in the target gray-scale image, for a pixel point on the column vertical line, due to the influence of factors such as image noise and the like, in some cases, a black pixel point may exist in the pixel point included in the m × 1 rectangle, and after expansion processing, the pixel values of the pixel points included in the column vertical line are all white, so that the column vertical line in the target gray-scale image can be detected more accurately.
For example, fig. 3 is a schematic diagram of performing morphological processing on an image according to an embodiment of the present invention, as shown in fig. 3, a black line in a target grayscale image 301 represents a vertical line in a column in a document corresponding to the target grayscale image 301, and a preset vertical line detection filter kernel is used to detect the target grayscale image 301, so as to detect a line shown in an image 302, where the line is the vertical line in the column in the target grayscale image 301.
And B2, performing corrosion and expansion treatment on the preliminary morphological image based on a preset transverse line detection filter core to obtain a target morphological image.
In the embodiment of the invention, the preset horizontal line detection filtering core can be used for detecting whether a document to be marked represented by the target image has a column horizontal line or an open form. The preset transverse line detection filter kernel can be set to be a rectangular filter kernel with the size of 1 × n according to the practical application situation, where 1 is the height of the rectangular filter kernel, n is the width of the rectangular filter kernel, and the value of n can be any reasonable value such as 1,2, 3, and the like, which is not limited herein. Specifically, the preliminary morphological image may be subjected to erosion processing by using a rectangular filter kernel of 1 × n size to remove noise of the preliminary morphological image, and then the image obtained after the erosion processing is subjected to expansion processing by continuing to use the rectangular filter kernel of 1 × n size to further enhance a closed region in the image, so as to obtain the target morphological image.
The erosion processing is to take the minimum gray value in the neighborhood of the pixel point at each position as the output gray value of the pixel point at the position, that is, when the rectangular filtering kernel with the size of 1 × n is adopted to conduct erosion processing on the preliminary morphological image, the minimum gray value in the neighborhood with the size of 1 × n is taken as the new gray value of each pixel in the preliminary morphological image. Thus, as long as there is a black pixel in the pixels included in the current rectangle of 1 × n size, the new gray value of the pixel is 0.
And because the height of the rectangular filtering kernel is 1, namely the pixel is taken as a unit, when a column horizontal line exists in the preliminary morphological image, for the pixel points on the column horizontal line, the pixel points included by the rectangle with the size of 1 multiplied by n are all white pixel points, so after corrosion treatment, the pixel values of the pixel points included by the column horizontal line are still white, and thus the column vertical line in the preliminary morphological image is detected.
The expansion processing is to take the maximum value of the neighborhood value of the pixel point at each position as the output gray value of the pixel point at the position, that is, when the rectangular filtering mode with the size of 1 × n is adopted to check the expansion processing of the preliminary morphological image, the maximum gray value in the neighborhood with the size of 1 × n of each pixel in the preliminary morphological image is taken as the new gray value of the pixel. Thus, as long as there is a white pixel in the pixels included in the current rectangle with the size of 1 × n, the new gray value of the pixel is 255.
And because the height of the rectangular filtering kernel is 1, that is, the height is in pixel row units, when a horizontal line of a column exists in the preliminary morphological image, for the pixel points on the horizontal line of the column, due to the influence of factors such as image noise and the like, under certain conditions, the pixel points included in the rectangle with the size of 1 × n may have black pixel points, and after the expansion processing, the pixel values of the pixel points included in the horizontal line of the column are all white, so that the horizontal line of the column in the target gray level image can be detected more accurately.
For example, fig. 4 is another schematic diagram of performing morphological processing on an image according to an embodiment of the present invention, as shown in fig. 4, a preliminary morphological image 401 includes text content of a document, horizontal lines in columns, an open form, and a picture, and a preset horizontal line detection filter kernel performs erosion and expansion processing on the preliminary morphological image 401 to obtain an open form 402 and horizontal lines in columns 403 shown in black rectangular frames.
In the embodiment of the invention, the target gray level image is subjected to corrosion treatment and expansion treatment through the preset vertical line detection filter kernel and the preset transverse line detection filter kernel, so that the column-dividing vertical lines and the column-dividing transverse lines in the document to be marked, which is represented by the target image, can be obtained, and the detected column-dividing vertical lines and the detected column-dividing transverse lines are added to obtain the closed table structure in the document to be marked, which is represented by the target image.
When the line structure exists in the document to be marked represented by the target image, the preset line detection filtering core is utilized to carry out line detection on the target image, so that the analysis of the document layout can be well assisted.
When the document to be annotated represented by the target image does not have a line structure, but only has the combination of the character areas of common pictures, tables and title paragraphs, the embodiment of the invention can adopt the preset area detection filter to check the target morphological image to carry out corrosion and expansion treatment, and obtain the preliminary characteristic area corresponding to each element of the document to be annotated. The preset area detection filter kernel can select an mxn rectangular filter kernel according to an actual application scene, and the values of m and n can be adjusted and adapted according to the page size of the document to be marked. Namely, the target morphological image may be eroded by using the m × n rectangular filter kernel to further remove the noise of the target morphological image, and then the eroded image may be expanded by continuing to use the m × n rectangular filter kernel to further enhance the closed region in the image, so as to obtain the preliminary feature region corresponding to each element of the document to be annotated.
Fig. 5 is a further schematic diagram of morphological processing performed on an image according to an embodiment of the present invention, and as shown in fig. 5, a target morphological image 501 is subjected to erosion and dilation processing by using a preset region detection filter kernel, and the obtained preliminary feature regions include a plurality of picture regions 502, a header region 503, a footer region 504, and the like.
The morphological processing of the image can be used for simplifying the page and extracting the main characteristic area of the page to obtain different areas of the page which accord with the human visual standard. In the embodiment of the invention, the gray image can be subjected to corrosion treatment and expansion treatment by selecting the adaptive preset horizontal line detection filter kernel, the adaptive preset vertical line detection filter kernel and the adaptive preset area detection filter kernel according to different document element characteristics of the to-be-labeled document, such as tables, pictures, text paragraphs and the like, so as to obtain the masks corresponding to different document element characteristics in the gray image as the primary characteristic areas.
In the embodiment of the present invention, the morphological erosion operation for the image may specifically be to take the minimum gray value in the neighborhood of the pixel point at each position as the output gray value of the pixel point at the position. The neighborhood structure of the pixel point can be a rectangular structure, an elliptical structure, a cross-shaped structure and the like, and can be defined as a structural element, and actually can be a 01 binary matrix.
For example, fig. 6 is a schematic diagram of a cross-shaped structural element, and it is assumed that the pixel matrix of the target image is:
Figure BDA0003762704280000121
the criss-cross structural elements shown in fig. 6 may be superimposed on the matrix to obtain a superimposed schematic diagram shown in fig. 7, where fig. 7 is a schematic diagram of a superimposed schematic diagram of criss-cross structural elements and an image pixel matrix. The cross-shaped structure formed by the shaded portion of fig. 7 is a cross-shaped structural element, the point (0, 2) of the cross-shaped structural element in fig. 6 corresponds to the gray-scale value "11" in the target image shown in fig. 7, the point (1, 1) corresponds to the gray-scale value "234" in the target image shown in fig. 7, the point (1, 2) corresponds to the gray-scale value "21" in the target image shown in fig. 7, the point (1, 3) corresponds to the gray-scale value "67" in the target image shown in fig. 7, and the point (2, 2) corresponds to the gray-scale value "31" in the target image shown in fig. 7. The image gray corresponding to each point of the cross structural element in fig. 6 is processed, that is, the image gray corresponding to the point in the target image is processed. Therefore, when processing the image gray corresponding to the point (1, 2) of the cross structural element in fig. 6, that is, when processing the pixel where the gray value "21" is located in the target image shown in fig. 7, the minimum value can be found in the cross neighborhood and assigned to the point (1, 2). As can be seen from fig. 7, if the minimum value in the cross-shaped neighborhood of the gray scale value "21" in the target image is 11, 11 can be assignedGiving the value to the pixel point where the gray value is 21 to obtain a new pixel matrix corresponding to the target image
Figure BDA0003762704280000122
According to the obtained new pixel matrix, the gray value of the target image is reduced by the erosion operation, namely the overall brightness of the output image after the erosion processing is lower than that of the original target image, the area of a brighter area in the original target image is reduced, and the area of a darker area is increased.
The morphological dilation operation for the image may be equivalent to a reverse operation of erosion processing, and specifically, a maximum value of a neighborhood internal value of a pixel point at each position may be taken as an output gray value of the pixel point at the position. The image after the expansion process has a brighter object size and a darker object size larger than the original image. Also taking fig. 7 as an example, when processing the image gray level corresponding to the point (1, 2) of the cross-shaped structural element in fig. 6, that is, when processing the pixel where the gray level "21" is located in the target image shown in fig. 7, the maximum value can be found in the cross-shaped neighborhood and assigned to the point (1, 2). As shown in fig. 7, if the minimum value in the cross-shaped neighborhood of the gray value "21" in the target image is 234, 234 may be assigned to the pixel point where the gray value "21" is located, so as to obtain a new pixel matrix corresponding to the target image
Figure BDA0003762704280000131
As can be seen from the obtained new pixel matrix, the expansion operation increases the gray scale value of the target image, i.e., the overall brightness of the expanded output image is higher than that of the original target image, so that the area of the darker region in the original target image is smaller and the area of the brighter region is larger.
In the embodiment of the invention, the target image can be corroded first, and then the corroded image is expanded to obtain the target morphological image. Or firstly carrying out expansion processing on the target image, and then carrying out corrosion processing on the image after the expansion processing to obtain a target morphological image.
In a possible implementation manner, the step of determining, as the same target region, pixel points whose pixel values in each of the preliminary feature regions belong to the same connected region may include: and determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same 4-communication region as the same target region, or determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same 8-communication region as the same target region.
That is to say, in the embodiment of the present invention, after the gray scale image of the target image is subjected to morphological erosion processing and expansion processing, page division in a mask form, that is, each preliminary feature region, may be obtained, and a region corresponding to a pixel value of 255 seen by a human eye may be used as an image foreground to be distinguished from a background region corresponding to a pixel value of 0.
Specifically, image foreground pixel points which are 4 communicated or 8 communicated in the preliminary feature region image can be divided into the same communicated region, a rectangular outer contour of the same communicated region is marked, one or more closed communicated regions are obtained, and the obtained one or more closed communicated regions are used as target regions.
In a possible implementation manner, fig. 8 is a flowchart of document element labeling according to an embodiment of the present invention, and as shown in fig. 8, the obtaining element content in the target area and labeling the target area based on the element content may include:
step 801, determining the position coordinates of the target area.
Specifically, after 4-connected or 8-connected image foreground pixel points in the preliminary feature region image are divided into the same connected domain, one or more closed connected domains are obtained to serve as a target region, a rectangular outer contour of the target region can be marked, and coordinate information of the rectangular outer contour is obtained.
Step 802, acquiring element content of a position corresponding to the position coordinates in the target image.
In this step, the content at the coordinate in the target image may be determined according to the coordinate information of the rectangular outer contour of the target region, and then the element content at the position in the target image may be extracted by using an OCR (Optical Character Recognition) technique.
Or, in this step, the element content at the corresponding position may also be directly searched and obtained from the document to be labeled according to the coordinate information of the rectangular outer contour of the target area.
Step 803, determine the type of the element content, and label the target area based on the type.
Specifically, the type of the element content may be determined as a document header, a document footer, a text paragraph, a picture, a table, or a formula, and the target area is labeled as a corresponding type of the element content. For example, if the type of element content is a document header, the target region is labeled as a document header, and if the type of element content is a text paragraph, the target region is labeled as a text paragraph.
By adopting the method provided by the embodiment of the invention, the region division of various elements in the document to be labeled can be realized by utilizing mathematical methods such as morphological detection, connected domain analysis and the like, the task of labeling the elements of the document can be realized by using a small amount of CPU computing resources, the consumption of manual resources in the process of labeling the elements of the document is reduced, and the accuracy of the determined target region can be ensured and the accuracy of the labeling of the elements of the document can also be ensured by utilizing the image processing methods such as morphological processing and the like due to the superiority of the region determination of the traditional image processing methods such as morphological processing and the like.
An embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904,
a memory 903 for storing computer programs;
the processor 901 is configured to implement the following steps when executing the program stored in the memory 903:
converting a document to be annotated into a target image;
performing morphological processing on the target image to obtain a preliminary characteristic region corresponding to each element of the document to be annotated in the target image;
determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same connected region as the same target region;
and acquiring element content in the target area, and labeling the target area based on the element content.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the document element labeling method described in any of the above embodiments.
In yet another embodiment, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the document element labeling method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to them, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A document element labeling method is characterized by comprising the following steps:
converting a document to be annotated into a target image;
performing morphological processing on the target image to obtain a preliminary characteristic region corresponding to each element of the document to be annotated in the target image;
determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same connected region as the same target region;
and acquiring element content in the target area, and labeling the target area based on the element content.
2. The method according to claim 1, wherein the performing morphological processing on the target image to obtain preliminary feature regions corresponding to each element of the document to be annotated in the target image comprises:
converting the target image into a grayscale image;
and performing binarization processing on the gray level image based on a preset filter core to obtain a preliminary characteristic region corresponding to each element of the document to be marked in the target image.
3. The method according to claim 2, wherein the binarizing processing is performed on the grayscale image based on a preset filter kernel to obtain a preliminary feature region corresponding to each element of the document to be labeled in the target image, and the method comprises:
for each pixel point in the gray level image, using a value obtained by subtracting an original pixel value of the pixel point from 255 as a new pixel value of the pixel point to obtain a target gray level image;
carrying out corrosion and expansion processing on the target gray level image based on a preset line detection filter core to obtain a target morphological image;
and carrying out corrosion and expansion treatment on the target morphological image based on a preset region detection filtering core to obtain a preliminary characteristic region corresponding to each element of the document to be marked.
4. The method according to claim 3, wherein the erosion and expansion processing is performed on the target gray-scale image based on a preset line detection filter kernel to obtain a morphological image, and comprises:
corroding and expanding the target gray level image based on a preset vertical line detection filter core to obtain a primary morphological image;
and carrying out corrosion and expansion treatment on the preliminary morphological image based on a preset transverse line detection filter core to obtain a target morphological image.
5. The method according to any one of claims 1 to 4, wherein the converting the document to be annotated into the target image comprises:
and converting the document to be annotated into a target image based on a document processing tool PyMuPDF.
6. The method according to any one of claims 1 to 4, wherein determining pixel points of which pixel values in each of the preliminary feature regions belong to the same connected region as the same target region comprises:
and determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same 4-communication region as the same target region, or determining pixel points of which the pixel values in the preliminary characteristic regions belong to the same 8-communication region as the same target region.
7. The method according to any one of claims 1 to 4, wherein the obtaining of the element content in the target region and labeling of the target region based on the element content comprises:
determining the position coordinates of the target area;
acquiring element content at a position corresponding to the position coordinates in the target image;
and determining the type of the element content, and labeling the target area based on the type.
8. The method of claim 7, wherein determining a type of the element content and labeling the target region based on the type comprises:
determining the type of the element content as a document header, a document footer, a text paragraph, a picture, a table or a formula, and marking the target area as the corresponding type of the element content.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-8.
CN202210876852.1A 2022-07-25 2022-07-25 Document element labeling method and device, electronic equipment and storage medium Pending CN115273115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210876852.1A CN115273115A (en) 2022-07-25 2022-07-25 Document element labeling method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210876852.1A CN115273115A (en) 2022-07-25 2022-07-25 Document element labeling method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115273115A true CN115273115A (en) 2022-11-01

Family

ID=83769317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210876852.1A Pending CN115273115A (en) 2022-07-25 2022-07-25 Document element labeling method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115273115A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410191A (en) * 2022-11-03 2022-11-29 平安银行股份有限公司 Text image recognition method, device, equipment and storage medium
CN115861043A (en) * 2023-02-16 2023-03-28 深圳市旗云智能科技有限公司 Image data processing method and system based on artificial intelligence
CN116306575A (en) * 2023-05-10 2023-06-23 杭州恒生聚源信息技术有限公司 Document analysis method, document analysis model training method and device and electronic equipment
CN117746437A (en) * 2024-02-20 2024-03-22 沈阳哲航信息科技有限公司 Document data extraction system and method thereof
WO2024139298A1 (en) * 2022-12-29 2024-07-04 青岛云天励飞科技有限公司 Image labeling method and apparatus, and electronic device and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410191A (en) * 2022-11-03 2022-11-29 平安银行股份有限公司 Text image recognition method, device, equipment and storage medium
WO2024139298A1 (en) * 2022-12-29 2024-07-04 青岛云天励飞科技有限公司 Image labeling method and apparatus, and electronic device and storage medium
CN115861043A (en) * 2023-02-16 2023-03-28 深圳市旗云智能科技有限公司 Image data processing method and system based on artificial intelligence
CN116306575A (en) * 2023-05-10 2023-06-23 杭州恒生聚源信息技术有限公司 Document analysis method, document analysis model training method and device and electronic equipment
CN116306575B (en) * 2023-05-10 2023-08-29 杭州恒生聚源信息技术有限公司 Document analysis method, document analysis model training method and device and electronic equipment
CN117746437A (en) * 2024-02-20 2024-03-22 沈阳哲航信息科技有限公司 Document data extraction system and method thereof
CN117746437B (en) * 2024-02-20 2024-04-30 沈阳哲航信息科技有限公司 Document data extraction system and method thereof

Similar Documents

Publication Publication Date Title
CN115273115A (en) Document element labeling method and device, electronic equipment and storage medium
US10817741B2 (en) Word segmentation system, method and device
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
CN109886928B (en) Target cell marking method, device, storage medium and terminal equipment
CN111563495B (en) Method and device for recognizing characters in image and electronic equipment
TW202024997A (en) Two-dimensional code identification method, apparatus, and device
CN112818812A (en) Method and device for identifying table information in image, electronic equipment and storage medium
CN109740606B (en) Image identification method and device
CN108764352B (en) Method and device for detecting repeated page content
CN109389110B (en) Region determination method and device
JP2012500428A (en) Segment print pages into articles
CN110738030A (en) Table reconstruction method and device, electronic equipment and storage medium
TW200540728A (en) Text region recognition method, storage medium and system
CN112883926B (en) Identification method and device for form medical images
CN110570442A (en) Contour detection method under complex background, terminal device and storage medium
CN111626145B (en) Simple and effective incomplete form identification and page-crossing splicing method
CN110807404A (en) Form line detection method, device, terminal and storage medium based on deep learning
CN114723677A (en) Image defect detection method, image defect detection device, image defect detection equipment and storage medium
CN114359932B (en) Text detection method, text recognition method and device
CN114581928A (en) Form identification method and system
CN114429577A (en) Flag detection method, system and equipment based on high beacon strategy
CN112215266B (en) X-ray image contraband detection method based on small sample learning
CN113392455A (en) House type graph scale detection method and device based on deep learning and electronic equipment
CN116012860B (en) Teacher blackboard writing design level diagnosis method and device based on image recognition
CN109726722B (en) Character segmentation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination