CN114140778A - Page turning abnormality detection method - Google Patents

Page turning abnormality detection method Download PDF

Info

Publication number
CN114140778A
CN114140778A CN202110046802.6A CN202110046802A CN114140778A CN 114140778 A CN114140778 A CN 114140778A CN 202110046802 A CN202110046802 A CN 202110046802A CN 114140778 A CN114140778 A CN 114140778A
Authority
CN
China
Prior art keywords
page
page number
candidate
pages
identification result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110046802.6A
Other languages
Chinese (zh)
Other versions
CN114140778B (en
Inventor
豆浩斌
陈博
朱风云
庞在虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingbanjishi Intelligent Technology Co ltd
Original Assignee
Beijing Lingbanjishi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lingbanjishi Intelligent Technology Co ltd filed Critical Beijing Lingbanjishi Intelligent Technology Co ltd
Priority to CN202110046802.6A priority Critical patent/CN114140778B/en
Publication of CN114140778A publication Critical patent/CN114140778A/en
Application granted granted Critical
Publication of CN114140778B publication Critical patent/CN114140778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention discloses a page turning abnormity detection method, which comprises the following steps: setting and selecting a page number area type; intercepting a page number area image; optical character recognition; extracting candidate page numbers; filtering the candidate page numbers; generating a preliminary identification result; smoothing and confirming; and judging and marking the page turning abnormal type. The page turning abnormality detection method can be used for dealing with complex conditions such as position change of a page in a document, confusion between the page and a text number, failure or error of an optical character recognition algorithm and the like, automatically searching and marking abnormal conditions in document scanning, providing convenience for subsequent manual verification and supplementary scanning, practically improving the performance of an automatic page turning scanner, reducing manpower, improving efficiency and simultaneously ensuring the quality and the integrity of a scanning result.

Description

Page turning abnormality detection method
Technical Field
The invention relates to the technical field of page turning scanning, in particular to a page turning abnormity detection method.
Background
Electronization of a large number of book documents is a significant and heavy-duty project. In order to save manpower and improve efficiency, an automatic page turning scanner is developed and applied, and is an automatic device capable of automatically turning over well bound book documents and shooting the documents page by page into electronic pictures; the automatic page-turning scanner can efficiently complete the electronization of paper documents through an automatic page-turning and photographing device without manual intervention or with little manual intervention.
In the prior art, abnormal conditions such as missing pages, overlapping pages and the like often occur in the automatic page-turning scanner in the working process, so that the scanning result is incomplete, and the workload of subsequent manual verification is increased. In order to solve the problem, an automatic page-turning scanner in the prior art is additionally provided with a page-turning abnormality detection device, and a generally adopted scheme is that an ultrasonic transmitter and a receiver are installed, and whether abnormal conditions such as missing pages or re-page pages occur or not is judged by detecting signal changes after ultrasonic waves penetrate through turned-up paper, so that whether page-turning fails or not is determined, and page-turning is attempted again. However, the inventor has found that the solution not only increases the hardware cost, but also the parameters of the detection device are not easy to set due to different paper conditions, so that it is still difficult to completely avoid the abnormal situations such as missing pages or re-paging, and the integrity of the scanning result cannot be ensured, and the manual checking is still required.
Disclosure of Invention
Based on this, in order to solve the technical problems in the prior art, so that the automatic page-turning scanner can quickly and instantly find the abnormal situation occurring in the process of turning pages of the document, and reduce the workload of subsequent manual verification, the invention especially provides a page-turning abnormality detection method, which comprises the following steps:
step 1, dividing a page image into a plurality of page areas according to a set page area type, and limiting the page area for page identification by selecting the page area type;
step 2, intercepting a corresponding page area image from the page image according to the selected page area type;
step 3, performing optical character recognition on the intercepted page number area image and outputting text information in the page number area image;
step 4, searching and extracting all digital information appearing in the text information obtained by optical character recognition and taking the digital information as candidate page numbers, wherein the candidate page numbers form a candidate page number set;
step 5, inquiring the context of the candidate page number in the candidate page number set, and filtering the candidate page number of which the preposing quantifier or the followed quantifier is a non-page quantifier;
step 6, sorting the candidate pages in the filtered candidate page set according to the coordinate positions of the candidate pages, and selecting the candidate page closest to the edge of the page as a primary identification result;
step 7, smoothing and confirming the preliminary identification result of the current page by using the page identification result of the adjacent page according to the characteristic that the page number of the document is continuously increased to generate a page identification result;
and 8, judging and marking the page turning abnormity type of the page according to the page number identification result to obtain a page turning abnormity detection result.
In an embodiment, dividing a page image into a plurality of page areas according to a set page area type specifically includes:
the page image is equally divided into 9 areas, and since the other page areas except the central page area are all the page areas where page numbers are likely to appear, the types of the page areas of the other page areas except the central page area are respectively set as upper left, upper middle, upper right, middle left, middle right, lower left, lower middle and lower right.
In an embodiment, selecting a candidate page number closest to a page edge as a preliminary identification result specifically includes:
selecting a candidate page closest to the upper edge of the page area image as a preliminary identification result for the page area images with the types of the page areas of the upper left, the middle upper and the upper right;
selecting a candidate page closest to the left edge of the page area image as a preliminary identification result for the page area image with the page area type of the middle left;
selecting a candidate page closest to the right edge of the page area image as a preliminary identification result for the page area image with the page area type of the middle right;
and selecting the candidate page closest to the lower edge of the page area image as a preliminary identification result for the page area images with the types of the page areas of left lower, middle lower and right lower.
In one embodiment, the text information includes a text line position in the page area image, a character position in the text line, and a text content of each character.
In one embodiment, in step 5, when there is no candidate page number meeting the requirement in the candidate page number set, the page number set is marked as identification failure;
the candidate page number which does not meet the requirement in the candidate page number set means that after filtering, no preposing quantifier or candidate page number with quantifier as a page quantifier exists in the candidate page number set, and no candidate page number without preposing quantifier or quantifier exists.
In one embodiment, the smoothing and validation process includes correction, completion, validation; and adopting multilevel smoothing and confirmation processing according to the scale of the processing object.
In one embodiment, the multi-level smoothing and confirming process is adopted according to the scale of the processing object, and specifically includes:
smoothing and confirming adjacent pages, namely deducing the page number of the current page according to the page number recognition results of the pages on the two adjacent sides of the current page, voting the deducing result and the primary recognition result of the current page together, and taking the page number which has the most votes as the page number recognition result of the current page;
smoothing and confirming adjacent page sections, namely searching confirmed page sections with continuous pages and page sections with continuous pages in a page, when the two adjacent sides of the page sections with continuous pages are the confirmed page, comparing the length value of the page section with the page interval value of the page sections with continuous pages, and when the length value is equal to the page interval value, directly obtaining the page number of each page in the page sections with continuous pages, and generating a page identification result; the page segment is a page sequence formed by a plurality of adjacent pages; the length value of the page segment is the number of pages contained in the page segment;
and (3) smoothing and confirming the inside of the page segment, namely performing internal voting processing on the page segment which still cannot confirm the page number after smoothing and confirming processing between adjacent pages and smoothing and confirming processing between adjacent page segments in sequence, and combining page numbers which are confirmed on two adjacent sides of the page segment and are limited by the page segments with continuous page numbers to obtain the page number so as to generate a page number identification result.
In one embodiment, smoothing and validation within a page segment includes:
when the page numbers of the pages in the page section are continuous, firstly determining the page number of the initial page of the page section, namely determining the page number identification result of the initial page; calculating the page numbers of other pages in the page section according to the page number of the initial page to generate a page number identification result;
when the page number of the initial page of the page segment cannot be determined and one or more page numbers in the page segment have corresponding initial identification results, reversely calculating the page number of the initial page of the page segment according to the initial identification results of the page number, and performing voting processing inside the page segment to determine the page number of the initial page; and calculating the page numbers of other pages in the page section according to the page number of the initial page to generate a page number identification result.
In one embodiment, the result of the preliminary identification of the ith page number in the page segment is fi(ii) a According to the preliminary identification result fiCalculating the page number voting result v of the initial page of the page segment corresponding to the ith pagei1=fi- (i-1); when the page segment contains n pages with the initial identification result, n page number voting results of the initial page are correspondingly obtained; counting n page number voting results of the initial page, performing page segment internal voting, and taking the page number voting result with the most accumulated votes and the votes exceeding the set internal voting threshold value as the page number identification result S of the initial page1
The identification result of the page number of the ith page in the page section is Pi(ii) a Recognizing result S according to page number of starting page1Obtaining the page number recognition result P of other pages in the page segmenti=S1+(i-1)。
In an embodiment, the determining, according to the page number identification result, the page turning abnormality type of the page specifically includes:
comparing the page number recognition results of the adjacent pages; when the interval between the front page and the back page is less than 1, determining that the re-page occurs, and marking the page turning abnormal type as the re-page; when the interval of the current back page number is larger than 1, judging that a missing page occurs, and marking the abnormal type of the page turning as the missing page; if the page number cannot be confirmed, the page turning abnormal type of the page number is marked as identification failure; and marking the abnormal page turning types as the re-page, the missing page and the page which fails to be identified, and manually checking the page.
The embodiment of the invention has the following beneficial effects:
the page turning abnormity detection method provided by the invention considers the complexity in practical application, and formulates a plurality of processing strategies to deal with the situations such as position change of page numbers in documents, confusion between page numbers and text numbers, failure or error of an optical character recognition algorithm and the like; the technical scheme provided by the invention can better identify the page number of the document, automatically search and mark the abnormal position of the page number in the document scanning, provide convenience for later manual check and supplementary scanning, practically improve the performance of the automatic page-turning scanner in practical application, and ensure the quality and integrity of the scanning result while reducing manpower and improving efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic view of an automatic page turning scanner of the present invention;
fig. 2 is a schematic flow chart of the page turning abnormality detection method according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses a page turning abnormity detection method, which is applied to an automatic page turning scanner, wherein the automatic page turning scanner is an automatic device for automatically turning over well bound book documents and shooting the book documents page by page into electronic pictures, and as shown in figure 1, the automatic page turning scanner comprises an automatic page turning device, a shooting device, an information processing device and a storage device;
the automatic page turning device is used for fixing a book document to be scanned and automatically completing continuous page turning operation; the shooting device is a high-definition camera, and after the automatic page turning device turns pages, the shooting device shoots the current page and generates an electronic image, and transmits the electronic image to the information processing device; the information processing device carries out preprocessing and identification processing on the received electronic pictures, and summarizes and exports the electronic pictures into electronic documents; the storage device is used for storing the scanned electronic image and the derived electronic document, and storing the processing program and the algorithm model.
As shown in fig. 2, the invention discloses a page turning abnormality detection method based on page number identification, which specifically comprises the following steps:
step 1, setting and selecting a page number area type: dividing the page image into a plurality of page areas according to the set page area type, and limiting the page area for page identification by selecting the page area type;
particularly, dividing a page image into a plurality of page areas according to a set page area type specifically includes:
equally dividing the page image into 9 page areas, and setting the types of the page areas of the other page areas except the central page area as upper left, upper middle, upper right, middle left, middle right, lower left, lower middle and lower right respectively as the page areas with the possible page numbers except the central page area;
in practical application, the division of the page area and the setting of the corresponding page area type can be carried out according to application requirements, including the number and the mode of dividing the page area; for example, the number of the divided page areas may be 4, 9, 16, etc., and the manner of dividing the page areas includes equal or unequal;
when a scanning task of a book document is started to be executed, the page image range of page identification can be limited only by selecting the type of the page area, so that a large amount of interference is eliminated, and the calculation amount of a subsequent identification algorithm is greatly reduced;
step 2, intercepting a page number area image: intercepting a corresponding page area image in the page image according to the selected page area type;
particularly, according to the set page number region types, including upper left, upper middle, upper right, middle left, middle right, lower left, lower middle and lower right, a left upper page number region image, an upper middle page number region image, an upper right page number region image, a left middle page number region image, a right middle page number region image, a left lower page number region image, a middle lower page number region image and a lower right page number region image are correspondingly intercepted;
and 3, optical character recognition processing: performing optical character recognition on the intercepted page number area image and outputting text information in the page number area image;
specifically, the text information includes a text line position in the page number region image, a character position in the text line, and a text content of each character;
wherein, the Optical Character Recognition (OCR) technology recognizes the characters in the document image and converts and outputs the characters into a text format;
step 4, extracting candidate page numbers: searching and extracting all digital information which appears in text information obtained by optical character recognition and taking the digital information as candidate page numbers, wherein the candidate page numbers form a candidate page number set;
wherein the numerical information includes, but is not limited to, arabic numerals, roman numerals, greek numerals, chinese numerals, etc.;
step 5, filtering candidate page numbers: querying the context of the candidate page number in the candidate page number set, and filtering the candidate page number in which the preposing quantifier or the following quantifier is a non-page quantifier;
specifically, when the prepositive quantifier or the followed quantifier of the candidate page number is the non-page quantifier, the candidate page number is filtered, and the prepositive quantifier or the followed quantifier is reserved as the page quantifier and the candidate page number without the prepositive quantifier or the followed quantifier;
when the candidate page number set does not have the candidate page number meeting the requirements, marking the candidate page number set as identification failure;
specifically, the candidate page number which does not meet the requirement in the candidate page number set means that after filtering, no preposition quantifier or candidate page number with quantifier as a page quantifier exists in the candidate page number set, and no candidate page number without preposition quantifier or quantifier exists;
wherein, the Page number quantifier comprises a Page, a P and the like;
wherein, the non-page quantifier comprises a Chapter, a Section, a part, a Chapter, a Section and the like;
generally, pages in books or publications are not numbered before or after, or are represented in the form of "page 1", "P1";
and 6, generating a preliminary identification result: sorting the filtered candidate page number set according to the coordinate positions, and selecting the candidate page number closest to the page edge as a primary identification result;
particularly, selecting a candidate page number closest to the edge of the page as a preliminary identification result specifically includes:
selecting a candidate page closest to the upper edge of the page area image as a preliminary identification result for the page area images with the types of the page areas of the upper left, the middle upper and the upper right;
selecting a candidate page closest to the left edge of the page area image as a preliminary identification result for the page area image with the page area type of the middle left;
selecting a candidate page closest to the right edge of the page area image as a preliminary identification result for the page area image with the page area type of the middle right;
selecting a candidate page closest to the lower edge of the page area image as a preliminary identification result for the page area images with the types of the page areas of left lower, middle lower and right lower;
and 7, smoothing and confirming: according to the characteristic that the page number of the document is continuously increased, smoothing and confirming the primary recognition result of the current page by using the page number recognition result of the adjacent page;
since the preliminary recognition result may have errors and deletions, it is an essential step to perform a smooth confirmation process on the preliminary recognition result;
in particular, the smoothing and validation process includes correction, completion, validation; adopting multilevel smoothing and confirmation processing according to the scale of a processing object;
particularly, the method for adopting multi-level smoothing and confirmation processing according to the scale of the processing object specifically comprises the following steps:
the first level, smoothing and confirming between adjacent pages, namely deducing the page number of the current page according to the page number recognition result of the pages at two adjacent sides of the current page, voting the deducing result and the preliminary recognition result of the current page together, and taking the page number which votes most as the page number recognition result of the current page;
the second level, smoothing and confirming between adjacent page sections, namely searching confirmed page sections with continuous pages and page sections with continuous pages in a page, when the two adjacent sides of the page sections with continuous pages are the confirmed page, comparing the length value of the page section with the page interval value of the page sections with continuous pages, and when the length value is equal to the page interval value, directly obtaining the page number of each page in the page sections with continuous pages, and generating a page identification result; the page segment is a page sequence formed by a plurality of adjacent pages; the length value of the page segment is the number of pages contained in the page segment;
the third level, smoothing and confirmation in the page segment, namely for the page segment which still can not confirm the page number after smoothing and confirmation processing between adjacent pages and smoothing and confirmation processing between adjacent page segments in sequence, carrying out internal voting processing on the page segment, obtaining the page number by combining the page number range limited by the page segments which have confirmed the page number on the two adjacent sides of the page segment and have continuous page numbers, and generating a page number identification result;
specifically, smoothing and validation inside a page segment specifically includes:
when the page numbers of the pages in the page section are continuous, firstly determining the page number of the initial page of the page section, namely determining the page number identification result of the initial page; calculating the page numbers of other pages in the page section according to the page number of the initial page to generate a page number identification result;
specifically, the page number identification result of the ith page in the page segment is Pi(ii) a Recognizing result S according to page number of starting page1Obtaining the page number recognition result P of other pages in the page segmenti=S1+(i-1);
When the page number of the initial page of the page segment cannot be determined and one or more page numbers in the page segment have corresponding initial identification results, reversely calculating the page number of the initial page of the page segment according to the initial identification results of the page number, and performing voting processing inside the page segment to determine the page number of the initial page; calculating the page numbers of other pages in the page section according to the page number of the initial page to generate a page number identification result;
specifically, the initial identification result of the ith page number in the page segment is fi(ii) a According to the preliminary identification result fiCalculating the page number voting result v of the starting page of the page segment corresponding to the ith page, namely the 1 st page of the page segmenti1=fi- (i-1); when the page segment contains n pages with the initial identification result, n page number voting results of the initial page are correspondingly obtained; counting n page number voting results of the initial page, performing page segment internal voting, and taking the page number voting result with the most accumulated votes and the votes exceeding the set internal voting threshold value as the page number identification result S of the initial page1
Page number identification of ith page in page segmentThe result is Pi(ii) a Recognizing result S according to page number of starting page1Obtaining the page number recognition result P of other pages in the page segmenti=S1+(i-1);
The set internal voting threshold value can be n/2, or (n +1)/2, or other threshold values set according to practical application;
step 8, judging and marking the page turning abnormal type: judging and marking the page turning abnormal type of the page according to the page number identification result to obtain a page turning abnormal detection result;
particularly, the method for judging and marking the page turning abnormal type of the page according to the page number identification result specifically comprises the following steps:
the page turning abnormal type comprises a page re-turning, a page missing and an identification failure;
comparing the page number recognition results of the adjacent pages; when the interval between the front page and the back page is less than 1, determining that the re-page occurs, and marking the page turning abnormal type as the re-page; when the interval of the current back page number is larger than 1, judging that a missing page occurs, and marking the abnormal type of the page turning as the missing page; for the page which can not confirm the page number, the page turning abnormal type is marked as identification failure; and marking the abnormal page turning types as the re-page, the missing page and the page which fails to be identified, and manually checking the page.
The embodiment of the invention has the following beneficial effects:
in the page turning abnormality detection method disclosed by the invention, various coping strategies are formulated in detail to process the complex conditions appearing in the actual page turning scanning, including position change of page numbers in a document, confusion between the page numbers and text numbers, failure or error of an identification algorithm and the like;
the page turning abnormity detection method classifies the region position where the page number is likely to appear as the page number region type, and the interference of other contents in the page can be simply and effectively eliminated by setting the page number region type, so that the calculation amount of subsequent Optical Character Recognition (OCR) processing is greatly reduced; eliminating the interference of other numbers in the page by inspecting the context of all candidate pages in the page area, and determining the preliminary identification result of the page by comparing the coordinate positions between the candidate pages;
the page turning abnormity detection method provided by the invention utilizes the continuous increasing characteristic of the page number of the document, sequentially adopts the smoothness between adjacent pages, the smoothness between adjacent page sections and the smoothness inside the page sections, and further performs smooth confirmation processing, including confirmation, correction and completion, on the initial recognition result of the page number to obtain the final page number recognition result.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A page turning abnormality detection method is characterized by comprising the following steps:
step 1, dividing a page image into a plurality of page areas according to a set page area type, and limiting the page area for page identification by selecting the page area type;
step 2, intercepting a corresponding page area image from the page image according to the selected page area type;
step 3, performing optical character recognition on the intercepted page number area image and outputting text information in the page number area image;
step 4, searching and extracting all digital information appearing in the text information obtained by optical character recognition and taking the digital information as candidate page numbers, wherein the candidate page numbers form a candidate page number set;
step 5, inquiring the context of the candidate page number in the candidate page number set, and filtering the candidate page number of which the preposing quantifier or the followed quantifier is a non-page quantifier;
step 6, sorting the candidate pages in the filtered candidate page set according to the coordinate positions of the candidate pages, and selecting the candidate page closest to the edge of the page as a primary identification result;
step 7, smoothing and confirming the preliminary identification result of the current page by using the page identification result of the adjacent page according to the characteristic that the page number of the document is continuously increased to generate a page identification result;
and 8, judging and marking the page turning abnormity type of the page according to the page number identification result to obtain a page turning abnormity detection result.
2. The page-turning abnormality detection method according to claim 1,
the method for dividing the page image into the plurality of page areas according to the set page area type specifically includes:
the page image is equally divided into 9 areas, and since the other page areas except the central page area are all the page areas where page numbers are likely to appear, the types of the page areas of the other page areas except the central page area are respectively set as upper left, upper middle, upper right, middle left, middle right, lower left, lower middle and lower right.
3. The page-turning abnormality detection method according to claim 2,
the method specifically comprises the following steps of selecting a candidate page number closest to a page edge as a primary identification result:
selecting a candidate page closest to the upper edge of the page area image as a preliminary identification result for the page area images with the types of the page areas of the upper left, the middle upper and the upper right;
selecting a candidate page closest to the left edge of the page area image as a preliminary identification result for the page area image with the page area type of the middle left;
selecting a candidate page closest to the right edge of the page area image as a preliminary identification result for the page area image with the page area type of the middle right;
and selecting the candidate page closest to the lower edge of the page area image as a preliminary identification result for the page area images with the types of the page areas of left lower, middle lower and right lower.
4. The page-turning abnormality detection method according to claim 1,
wherein the text information includes a text line position in the page number region image, a character position in the text line, and a text content of each character.
5. The page-turning abnormality detection method according to claim 1,
in step 5, when no candidate page number meeting the requirement exists in the candidate page number set, marking the candidate page number set as identification failure;
the candidate page number which does not meet the requirement in the candidate page number set means that after filtering, no preposing quantifier or candidate page number with quantifier as a page quantifier exists in the candidate page number set, and no candidate page number without preposing quantifier or quantifier exists.
6. The page-turning abnormality detection method according to claim 1,
the smoothing and confirming treatment comprises correction, completion and confirmation; and adopting multilevel smoothing and confirmation processing according to the scale of the processing object.
7. The page flip anomaly detection method of claim 6, wherein said page flip anomaly detection method is performed by a computer
Adopting multilevel smoothing and confirmation processing according to the scale of a processing object, which specifically comprises the following steps:
smoothing and confirming adjacent pages, namely deducing the page number of the current page according to the page number recognition results of the pages on the two adjacent sides of the current page, voting the deducing result and the primary recognition result of the current page together, and taking the page number which has the most votes as the page number recognition result of the current page;
smoothing and confirming adjacent page sections, namely searching confirmed page sections with continuous pages and page sections with continuous pages in a page, when the two adjacent sides of the page sections with continuous pages are the confirmed page, comparing the length value of the page section with the page interval value of the page sections with continuous pages, and when the length value is equal to the page interval value, directly obtaining the page number of each page in the page sections with continuous pages, and generating a page identification result; the page segment is a page sequence formed by a plurality of adjacent pages; the length value of the page segment is the number of pages contained in the page segment;
and (3) smoothing and confirming the inside of the page segment, namely performing internal voting processing on the page segment which still cannot confirm the page number after smoothing and confirming processing between adjacent pages and smoothing and confirming processing between adjacent page segments in sequence, and combining page numbers which are confirmed on two adjacent sides of the page segment and are limited by the page segments with continuous page numbers to obtain the page number so as to generate a page number identification result.
8. The page-turning abnormality detection method according to claim 7,
smoothing and confirming the inside of the page segment specifically comprises the following steps:
when the page numbers of the pages in the page section are continuous, firstly determining the page number of the initial page of the page section, namely determining the page number identification result of the initial page; calculating the page numbers of other pages in the page section according to the page number of the initial page to generate a page number identification result;
when the page number of the initial page of the page segment cannot be determined and one or more page numbers in the page segment have corresponding initial identification results, reversely calculating the page number of the initial page of the page segment according to the initial identification results of the page number, and performing voting processing inside the page segment to determine the page number of the initial page; and calculating the page numbers of other pages in the page section according to the page number of the initial page to generate a page number identification result.
9. The page-turning abnormality detection method according to claim 8,
the initial identification result of the ith page number in the page segment is fi(ii) a According to the preliminary identification result fiCalculating the page number voting result v of the initial page of the page segment corresponding to the ith pagei1=fi- (i-1); when the page segment contains n pages with the initial identification result, n page number voting results of the initial page are correspondingly obtained; counting n page number voting results of the initial page, performing page segment internal voting, and taking the page number voting result with the most accumulated votes and the votes exceeding the set internal voting threshold value as the page number identification result S of the initial page1
The identification result of the page number of the ith page in the page section is Pi(ii) a Recognizing result S according to page number of starting page1Obtaining the page number recognition result P of other pages in the page segmenti=S1+(i-1)。
10. The page-turning abnormality detection method according to claim 1,
the method for judging the page turning abnormal type of the page according to the page number identification result specifically comprises the following steps:
comparing the page number recognition results of the adjacent pages; when the interval between the front page and the back page is less than 1, determining that the re-page occurs, and marking the page turning abnormal type as the re-page; when the interval of the current back page number is larger than 1, judging that a missing page occurs, and marking the abnormal type of the page turning as the missing page; if the page number cannot be confirmed, the page turning abnormal type of the page number is marked as identification failure; and marking the abnormal page turning types as the re-page, the missing page and the page which fails to be identified, and manually checking the page.
CN202110046802.6A 2021-01-14 2021-01-14 Page turning abnormality detection method Active CN114140778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110046802.6A CN114140778B (en) 2021-01-14 2021-01-14 Page turning abnormality detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110046802.6A CN114140778B (en) 2021-01-14 2021-01-14 Page turning abnormality detection method

Publications (2)

Publication Number Publication Date
CN114140778A true CN114140778A (en) 2022-03-04
CN114140778B CN114140778B (en) 2022-05-06

Family

ID=80438835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110046802.6A Active CN114140778B (en) 2021-01-14 2021-01-14 Page turning abnormality detection method

Country Status (1)

Country Link
CN (1) CN114140778B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7619784B1 (en) * 2003-06-30 2009-11-17 Google Inc. Pacing and error monitoring of manual page turning operator
CN104094278A (en) * 2012-01-23 2014-10-08 微软公司 Pattern matching engine
US20140375802A1 (en) * 2013-06-25 2014-12-25 Casio Computer Co., Ltd. Document camera system and method for reading image
CN104639791A (en) * 2013-11-12 2015-05-20 国家电网公司 Scanner capable of recognizing page numbers and application method of scanner
CN110059559A (en) * 2019-03-15 2019-07-26 深圳壹账通智能科技有限公司 The processing method and its electronic equipment of OCR identification file
CN111556251A (en) * 2020-05-20 2020-08-18 深圳前海微众银行股份有限公司 Electronic book generation method, device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7619784B1 (en) * 2003-06-30 2009-11-17 Google Inc. Pacing and error monitoring of manual page turning operator
CN104094278A (en) * 2012-01-23 2014-10-08 微软公司 Pattern matching engine
US20140375802A1 (en) * 2013-06-25 2014-12-25 Casio Computer Co., Ltd. Document camera system and method for reading image
CN104639791A (en) * 2013-11-12 2015-05-20 国家电网公司 Scanner capable of recognizing page numbers and application method of scanner
CN110059559A (en) * 2019-03-15 2019-07-26 深圳壹账通智能科技有限公司 The processing method and its electronic equipment of OCR identification file
CN111556251A (en) * 2020-05-20 2020-08-18 深圳前海微众银行股份有限公司 Electronic book generation method, device and medium

Also Published As

Publication number Publication date
CN114140778B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
CN109658584B (en) Bill information identification method and device
RU2651144C2 (en) Data input from images of the documents with fixed structure
US8064729B2 (en) Image skew detection apparatus and methods
US6788810B2 (en) Optical character recognition device and method and recording medium
EP0834826B1 (en) Positioning templates in optical character recognition systems
US6345130B1 (en) Method and arrangement for ensuring quality during scanning/copying of images/documents
CN105654072A (en) Automatic character extraction and recognition system and method for low-resolution medical bill image
CN104112128A (en) Digital image processing system applied to bill image character recognition and method
US9613299B2 (en) Method of identifying pattern training need during verification of recognized text
CN112183038A (en) Form identification and typing method, computer equipment and computer readable storage medium
CN105760901A (en) Automatic language identification method for multilingual skew document image
JP2010072826A (en) Image processing apparatus, image processing method, program, and computer readable storage medium
CN111539414B (en) Method and system for character recognition and character correction of OCR (optical character recognition) image
CN113065396A (en) Automatic filing processing system and method for scanned archive image based on deep learning
US8787702B1 (en) Methods and apparatus for determining and/or modifying image orientation
US7903876B2 (en) Distortion correction of a captured image
WO2018107574A1 (en) Method and device for detecting see-through register anti-counterfeiting characteristics
CN114140778B (en) Page turning abnormality detection method
CN117333893A (en) OCR-based custom template image recognition method, system and storage medium
CN117558005A (en) Archive management system based on OCR image recognition
CN111445433B (en) Method and device for detecting blank page and fuzzy page of electronic file
RU2469398C1 (en) Method to ensure correct alignment of documents in automatic printing
JPH07230525A (en) Method for recognizing ruled line and method for processing table
CN117690139B (en) Image preprocessing method and system based on paper book reading electronization
CN113673405B (en) Problem correction method and system based on problem recognition and intelligent home education learning machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant