CN104463153B - The method and system of character identification rate in a kind of raising format document - Google Patents

The method and system of character identification rate in a kind of raising format document Download PDF

Info

Publication number
CN104463153B
CN104463153B CN201310450972.6A CN201310450972A CN104463153B CN 104463153 B CN104463153 B CN 104463153B CN 201310450972 A CN201310450972 A CN 201310450972A CN 104463153 B CN104463153 B CN 104463153B
Authority
CN
China
Prior art keywords
character
code
characters
original
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310450972.6A
Other languages
Chinese (zh)
Other versions
CN104463153A (en
Inventor
董宁
耿蕾蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Founder Apabi Technology Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201310450972.6A priority Critical patent/CN104463153B/en
Publication of CN104463153A publication Critical patent/CN104463153A/en
Application granted granted Critical
Publication of CN104463153B publication Critical patent/CN104463153B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention is a kind of method and system improving character identification rate in format document, character original coding corresponding to the same predetermined character in the format document is compared to obtain with character standard coding and encodes comparison result, multiple coding comparison results are subjected to probability statistics and obtain probability value, the probability value is compared with threshold value, if being more than threshold value, the format document shows the character that character original coding control universal standard character code library obtains;Otherwise, the format document shows the character after OCR identifications.The present invention to select to show that the character original coding compares the character that universal standard character code library obtains or the format document shows the character after OCR identifications, therefore effectively increases the accuracy of character recognition by the method for probability statistics.

Description

Method and system for improving character recognition rate in layout document
Technical Field
The invention relates to a method for improving character recognition rate, in particular to a method and a system for improving character recognition rate in a layout document.
Background
In order to ensure the reading effect of readers, typesetting files published by a publisher of books and periodicals before printing are generally format documents. A so-called layout document is a file capable of explicitly recording information of a position, a font bitmap, a font, a size, a color, and the like of each character, and may also record an encoding of each character. The layout document records the relative positions of the font bitmap and the characters, so that the layout document has certain stability, the layout document read by a reader in any computer environment and the printed books and periodicals have consistent visual effects, and the common layout document is mainly PDF and the like.
Although some layout documents describe the code of characters, the display is generally performed according to a font bitmap, and the display is not performed according to the code. When extracting characters of characters from a layout document, the encoding of the characters recorded in the layout document can be generally obtained through a general standard encoding or a self-defined encoding mode, so that the encoding mode of the characters of the specific layout document is not determined, and the characters of the characters cannot be obtained according to the encoding.
Therefore, in the prior art, an OCR (Optical Character Recognition) technology is usually adopted to extract characters in a layout document, but since the OCR technology itself has a problem of a Recognition rate, characters of characters recognized by the OCR technology often have a problem of a high error rate, which affects reading of a user.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the problem of high error rate in recognizing characters by using an OCR technology in the prior art, and to provide a method and a system for improving the character recognition rate in a layout document.
To solve the above technical problems, the present invention is a method for improving the recognition rate of characters in layout documents,
the method comprises the following steps:
comparing the original code of the character corresponding to the same preset character in the layout document with the universal standard code of the character to obtain a code comparison result with the same code or different codes;
carrying out probability statistics on the code comparison results corresponding to the preset characters to obtain a probability value of the preset characters coded by a character universal standard;
comparing the probability value with a threshold value, and if the probability value exceeds the threshold value, comparing the preset character with a character obtained by a universal standard character coding library according to the original character coding of the preset character and displaying the character; otherwise, directly displaying the character recognized by the predetermined character through OCR.
A method for improving the character recognition rate in a layout document further comprises the following steps before the step of obtaining the code comparison result:
extracting a font bitmap of each preset character in the layout document;
extracting a character original code of each preset character in the layout document;
performing OCR recognition on the font bitmap to obtain recognized characters;
and contrasting the recognized characters with a universal standard character code library to obtain a universal standard character code.
A method for improving the character recognition rate in a layout document further comprises the following steps before the step of extracting the original code of the character:
and screening out characters with original character codes in the layout document as preset characters.
A method for improving the character recognition rate in a layout document, after the step of screening out characters with original character codes in the layout document as preset characters, the method also comprises the following steps:
an ID number is given to each of the predetermined characters.
A method for improving the character recognition rate in a layout document further comprises the following steps after the step of extracting the original code of each preset character in the layout document:
establishing a character original coding table, and storing the ID of the preset character and the character original code corresponding to the ID into the character original coding table.
A method for improving the character recognition rate in a layout document further comprises the following steps after the step of obtaining the universal standard code of the character:
establishing a character standard code table, and storing the ID of the preset character and the corresponding character standard code in the character standard code table.
Before comparing the probability value with a threshold value and carrying out corresponding operation, the method for improving the character recognition rate in the layout document further comprises the following steps:
an editable interface for displaying, modifying and confirming the characters is established.
A system for improving the character recognition rate of layout documents comprises a code comparison device, a probability statistic device and a probability value and threshold value comparison device, wherein,
the code comparison device is used for comparing the original code of the character corresponding to the same preset character in the format document with the universal standard code of the character to obtain a code comparison result with the same code or different codes;
the probability statistic device is used for carrying out probability statistics on the code comparison results corresponding to the preset characters to obtain a probability value of the preset characters coded by a character universal standard;
the probability value and threshold value comparison device is used for comparing the probability value with a threshold value, if the probability value exceeds the threshold value, the preset character is compared with a character obtained by a universal standard character coding library according to the original character coding and is displayed; otherwise, directly displaying the character recognized by the predetermined character through OCR.
A system for improving the character recognition rate of a format document further comprises a character pattern and bitmap extraction device, a character original code extraction device, an OCR recognition device and a character universal standard code corresponding device, wherein,
the font bitmap extracting device is used for extracting the font bitmap of each preset character in the layout document;
the character original code extracting device is used for extracting a character original code of each preset character in the layout document;
the OCR recognition device is used for performing OCR recognition on the extracted character pattern bitmap to obtain recognized characters;
and the character universal standard code corresponding device is used for comparing the recognized character with a universal standard character code library to obtain a character universal standard code.
The system for improving the character recognition rate in the format document further comprises a preset character screening device, wherein the preset character screening device is used for screening out characters with original character codes in the format document as preset characters.
A system for improving the character recognition rate of a layout document further comprises an ID numbering device, wherein the ID numbering device is used for carrying out ID numbering on each preset character.
The system for improving the character recognition rate in the format document further comprises a character standard coding table establishing device, wherein the character standard coding table establishing device is used for establishing a character standard coding table and storing the ID of the preset character and the character standard code corresponding to the ID into the character standard coding table.
The system for improving the character recognition rate in the format document further comprises a character standard coding table establishing device, wherein the character standard coding table establishing device is used for establishing a character standard coding table and storing the ID of the preset character and the character standard code corresponding to the ID into the character standard coding table.
A system for improving the character recognition rate of a layout document further comprises an editable interface establishing device, wherein the editable interface establishing device is used for establishing an editable interface for displaying, modifying and confirming the characters.
Compared with the prior art, the technical scheme of the invention has the following advantages:
1. the method and the system for improving the character recognition rate in the format document are characterized in that a character original code corresponding to the same preset character in the format document is compared with a character universal standard code to obtain a code comparison result with the same code or different codes, probability statistics is carried out on a plurality of code comparison results to obtain a probability value, the probability value is compared with a threshold value, and if the probability value exceeds the threshold value, characters obtained by comparing the character original code with a universal standard character code library are displayed; otherwise, displaying the character after OCR recognition. The invention selects and displays the character obtained by comparing the original code of the character with the general standard character code library or the format document to display the character after OCR recognition by a probability statistical method, thereby effectively improving the accuracy of character recognition.
2. The method and the system for improving the character recognition rate in the format document further comprise the following steps before the step of obtaining the code comparison result: and extracting a font bitmap of each preset character in the layout document. Extracting a character original code of each predetermined character in the layout document. Performing OCR recognition on the font bitmap to obtain recognized characters; and contrasting the recognized characters with a universal standard character code library to obtain a universal standard character code. The invention can obtain the recognized character by an OCR recognition method, and is convenient for further obtaining the universal standard code of the character. The OCR recognition device is a commercially available universal module and has the advantage of low price.
3. According to the method and the system for improving the character recognition rate in the format document, before the step of extracting the original codes of the characters, the method further comprises the step of screening the characters with the original codes of the characters in the format document as the preset characters, and the operation of screening the preset characters can reduce the frequency of the step of extracting the characters needing to be extracted from the font bitmap, so that the running time of the method and the system is effectively reduced, and the running efficiency is improved. The invention also comprises a step of ID numbering for each preset character, and the preset character can be more conveniently and accurately in one-to-one correspondence with the original code of the character or the identified character by adopting the ID numbering mode. The invention also comprises a step of establishing a character original coding table and a step of establishing a character standard coding table, wherein the character original coding table can effectively manage the character original coding, the character standard coding table can effectively manage the character standard coding, and the running time of the invention can be reduced.
4. The method and the system for improving the character recognition rate in the format document further comprise the step of establishing an editable interface, wherein the editable interface can display, modify and confirm the displayed characters, can manually intervene in the displayed wrong characters and is convenient for correcting errors.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which
FIG. 1 is a flow diagram of a method for increasing the recognition rate of characters in layout documents according to an embodiment of the present invention;
fig. 2 is a block diagram of a system for increasing a character recognition rate in a layout document according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
Example 1
As an embodiment of the present invention, as shown in fig. 1, a method for increasing a character recognition rate in a layout document includes the following steps:
and comparing the original code of the character corresponding to the same preset character in the format document with the universal standard code of the character to obtain a code comparison result with the same code or different codes.
And carrying out probability statistics on the code comparison results corresponding to the preset characters to obtain a probability value of the preset characters coded by adopting a character universal standard.
And comparing the probability value with a threshold value, and if the probability value exceeds the threshold value, comparing the preset character with a character obtained by a universal standard character coding library according to the original character coding and displaying the character. Otherwise, directly displaying the character recognized by the predetermined character through OCR.
The invention selects and displays the character obtained by the character original code according to the comparison universal standard character coding library or the character recognized by OCR through a probability statistical method, and replaces the character recognized by OCR with the character obtained by the character original code according to the comparison universal standard character coding library when the preset character adopts a character universal standard coding mode, and the accuracy of the character obtained by the character original code according to the comparison universal standard character coding library is higher than that of OCR, so that the invention can improve the accuracy of character recognition on the whole.
Example 2
As an embodiment of the present invention, on the basis of embodiment 1, before the step of obtaining the coding comparison result, the method further includes the following steps:
and extracting a font bitmap of each preset character in the layout document.
And performing OCR recognition on the extracted font bitmap to obtain a recognized character.
And contrasting the recognized characters with a universal standard character code library to obtain a universal standard character code. The universal standard code of the characters is national standard GB 2312.
Extracting a character original code of each predetermined character in the layout document.
The steps of obtaining the universal standard code of the character and the original code of the character can be executed respectively and simultaneously, and can also have a certain sequence, for example, the universal standard code of the character is obtained firstly, and then the original code of the character is obtained; or acquiring the original code of the character first and then acquiring the universal standard code of the character. The purpose of the invention can be realized by acquiring the universal standard code and the original code of the character before comparison.
The invention can obtain the recognized character by an OCR recognition method, and is convenient for further obtaining the universal standard code of the character.
Example 3
As an embodiment of the present invention, on the basis of embodiment 2, before the step of extracting the original code of the character, the method further includes the following steps:
and screening out characters with original character codes in the layout document as preset characters. The operation of screening the preset characters can reduce the times of the step of extracting the characters needing to extract the font bitmap, effectively reduce the running time of the invention and improve the running efficiency.
Example 4
As an embodiment of the present invention, on the basis of embodiment 3, after the step of screening out the character having the original character encoding in the layout document as the predetermined character, the method further includes the following steps:
an ID number is given to each of the predetermined characters. By adopting the ID numbering mode, the preset characters can be more conveniently and accurately in one-to-one correspondence with the original codes of the characters or the recognized characters.
Example 5
As an embodiment of the present invention, on the basis of embodiment 4, after the step of extracting the character original code of each predetermined character in the layout document, the method further includes the following steps:
establishing a character original coding table, and storing the ID of the preset character and the character original code corresponding to the ID into the character original coding table. The character original coding table can effectively manage the character original coding and can reduce the running time of the invention.
Example 6
As an embodiment of the present invention, on the basis of embodiment 4 or embodiment 5, after the step of obtaining the universal standard code for the character, the method further includes the following steps:
establishing a character standard code table, and storing the ID of the preset character and the corresponding character standard code in the character standard code table. The character standard coding table can effectively manage character standard coding and can reduce the running time of the invention.
Example 7
As an embodiment of the present invention, on the basis of the above embodiment, before comparing the probability value with a threshold and performing corresponding operations, the method further includes the following steps:
an editable interface for displaying, modifying and confirming the characters is established.
The editable interface can display, modify and confirm the displayed characters, can manually intervene in the displayed wrong characters, and is convenient for correcting errors.
As an embodiment of the present invention, on the basis of the above embodiment, the threshold is 90%.
Example 8
Referring to fig. 2, a system for improving a character recognition rate in a layout document according to an embodiment of the present invention includes a code comparison device, a probability statistic device, and a probability value and threshold comparison device. Wherein,
and the code comparison device is used for comparing the original code of the character corresponding to the same preset character in the format document with the universal standard code of the character to obtain a code comparison result with the same code or different codes.
And the probability statistic device is used for carrying out probability statistics on the code comparison results corresponding to the preset characters to obtain the probability value of the preset characters coded by the universal character standard.
And the probability value and threshold value comparison device is used for comparing the probability value with a threshold value, and if the probability value exceeds the threshold value, the preset character is compared with a character obtained by a universal standard character coding library according to the original character coding and is displayed. Otherwise, directly displaying the character recognized by the predetermined character through OCR.
The invention selects and displays the characters obtained by contrasting the original codes of the characters with the general standard character code library or the characters after OCR recognition by the format document through a probability statistical method, thereby effectively improving the accuracy of character recognition.
Example 9
As an embodiment of the present invention, on the basis of embodiment 8, the present invention further includes a font-pattern bitmap extraction device, a character original code extraction device, an OCR recognition device, and a character common standard code correspondence device. Wherein,
the font bitmap extracting device is used for extracting the font bitmap of each preset character in the layout document.
The character original code extracting device is used for extracting the character original code of each preset character in the layout document.
And the OCR recognition device is used for performing OCR recognition on the extracted character pattern bitmap to obtain recognized characters.
And the character universal standard code corresponding device is used for comparing the recognized character with a universal standard character code library to obtain a character universal standard code.
The invention can obtain the recognized character by an OCR recognition method, and is convenient for further obtaining the universal standard code of the character. The OCR recognition device is a commercially available universal module and has the advantage of low price.
Example 10
As an embodiment of the present invention, on the basis of embodiment 9, the method further includes a predetermined character screening device, where the predetermined character screening device is configured to screen out characters having original codes of characters in the layout document as predetermined characters. The preset character screening device can reduce the times of the step of extracting the characters needing to extract the font bitmap, effectively reduce the running time of the invention and improve the running efficiency.
Example 11
As an embodiment of the present invention, on the basis of embodiment 10, an ID numbering device is further included, and the ID numbering device is configured to perform ID numbering for each of the predetermined characters. The ID numbering device can more conveniently and accurately enable the preset characters to be in one-to-one correspondence with the original codes of the characters or the recognized characters.
Example 12
As an embodiment of the present invention, on the basis of embodiment 11, the present invention further includes a character original encoding table creating device, where the character original encoding table creating device is configured to create a character original encoding table, and store the ID of the predetermined character and the character original encoding corresponding to the ID in the character original encoding table. The character original coding table establishing device can effectively manage the character original coding and can reduce the running time of the invention.
Example 13
As an embodiment of the present invention, on the basis of embodiment 11 or embodiment 12, the present invention further includes a character standard code table creating device, wherein the character standard code table creating device is configured to create a character standard code table, and store the ID of the predetermined character and the corresponding character standard code in the character standard code table. The character standard code table establishing device can effectively manage character standard codes and can reduce the running time of the invention.
Example 14
As an embodiment of the present invention, on the basis of any one of embodiments 8 to 13, the present invention further includes an editable interface creating device, where the editable interface creating device is configured to create an editable interface for displaying, modifying, and confirming the character. The editable interface can display, modify and confirm the displayed characters, can manually intervene in the displayed wrong characters, and has the function of correcting errors.
As an embodiment of the present invention, on the basis of the above embodiment, the threshold is 90%.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

Claims (8)

1. A method for improving the character recognition rate in a layout document is characterized by comprising the following steps:
extracting a font bitmap of each preset character in the layout document;
extracting a character original code of each preset character in the layout document;
performing OCR recognition on the font bitmap to obtain recognized characters;
contrasting the recognized characters with a universal standard character code library to obtain universal standard codes of the characters, wherein the universal standard codes of the characters are national standard GB 2312;
comparing the original code of the character corresponding to the same preset character in the layout document with the universal standard code of the character to obtain a code comparison result with the same code or different codes;
carrying out probability statistics on the code comparison results corresponding to the preset characters to obtain a probability value of the preset characters coded by a character universal standard;
comparing the probability value with a threshold value, and if the probability value exceeds the threshold value, comparing the preset character with a character obtained by a universal standard character coding library according to the original character coding of the preset character and displaying the character; otherwise, directly displaying the character recognized by the predetermined character through OCR.
2. The method of claim 1, wherein before the step of extracting the original code of the character, the method further comprises the following steps:
and screening out characters with original character codes in the layout document as preset characters.
3. The method of claim 2, wherein after the step of screening out the characters with original codes of the characters in the layout document as the predetermined characters, the method further comprises the following steps:
i D numbering is performed for each of the predetermined characters.
4. The method of claim 3, wherein after the step of extracting the original encoding of each of the predetermined characters in the layout document, the method further comprises the following steps:
establishing a character original code table, and storing I D of the predetermined character and the character original code corresponding to the predetermined character in the character original code table.
5. The method for improving the character recognition rate of the layout document according to claim 3 or 4, further comprising the following steps after the step of obtaining the universal standard encoding of the characters:
establishing a character standard code table, and storing I D of the predetermined character and the corresponding character standard code in the character standard code table.
6. The method of claim 1, wherein before comparing the probability value with a threshold and performing corresponding operations, the method further comprises the following steps:
an editable interface for displaying, modifying and confirming the characters is established.
7. A system for improving the character recognition rate of format documents is characterized by comprising a character pattern bitmap extraction device, a character original code extraction device, an OCR recognition device, a character universal standard code corresponding device, a code comparison device, a probability statistic device, a probability value and threshold value comparison device, wherein,
the font bitmap extracting device is used for extracting the font bitmap of each preset character in the layout document;
the character original code extracting device is used for extracting a character original code of each preset character in the layout document;
the OCR recognition device is used for performing OCR recognition on the extracted character pattern bitmap to obtain recognized characters;
the character universal standard code corresponding device is used for comparing the recognized character with a universal standard character code library to obtain a character universal standard code, wherein the character universal standard code is national standard GB 2312;
the code comparison device is used for comparing the original code of the character corresponding to the same preset character in the format document with the universal standard code of the character to obtain a code comparison result with the same code or different codes;
the probability statistic device is used for carrying out probability statistics on the code comparison results corresponding to the preset characters to obtain a probability value of the preset characters coded by a character universal standard;
the probability value and threshold value comparison device is used for comparing the probability value with a threshold value, if the probability value exceeds the threshold value, the preset character is compared with a character obtained by a universal standard character coding library according to the original character coding and is displayed; otherwise, directly displaying the character recognized by the predetermined character through OCR.
8. The system for improving the character recognition rate of the layout document as claimed in claim 7, further comprising a predetermined character screening device for screening out characters having original codes of the characters in the layout document as predetermined characters.
CN201310450972.6A 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document Expired - Fee Related CN104463153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310450972.6A CN104463153B (en) 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310450972.6A CN104463153B (en) 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document

Publications (2)

Publication Number Publication Date
CN104463153A CN104463153A (en) 2015-03-25
CN104463153B true CN104463153B (en) 2018-09-04

Family

ID=52909169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310450972.6A Expired - Fee Related CN104463153B (en) 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document

Country Status (1)

Country Link
CN (1) CN104463153B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038093B (en) * 2017-11-10 2021-06-15 深圳市亿图软件有限公司 PDF character extraction method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782896A (en) * 2009-01-21 2010-07-21 汉王科技股份有限公司 PDF character extraction method combined with OCR technology
CN102194503A (en) * 2010-03-12 2011-09-21 腾讯科技(深圳)有限公司 Player and character code detection method and device for subtitle file
JP5955579B2 (en) * 2011-07-21 2016-07-20 日東電工株式会社 Protection sheet for glass etching

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5955579A (en) * 1982-09-24 1984-03-30 Fujitsu Ltd Character recognizer
JPH06187505A (en) * 1992-12-21 1994-07-08 Hitachi Ltd Data entry system/method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782896A (en) * 2009-01-21 2010-07-21 汉王科技股份有限公司 PDF character extraction method combined with OCR technology
CN102194503A (en) * 2010-03-12 2011-09-21 腾讯科技(深圳)有限公司 Player and character code detection method and device for subtitle file
JP5955579B2 (en) * 2011-07-21 2016-07-20 日東電工株式会社 Protection sheet for glass etching

Also Published As

Publication number Publication date
CN104463153A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN108073913B (en) Handwriting datamation data acquisition method
CN111401371B (en) Text detection and identification method and system and computer equipment
WO2014092979A1 (en) Method of perspective correction for devanagari text
JP6000992B2 (en) Document file generation apparatus and document file generation method
US20150003746A1 (en) Computing device and file verifying method
CN108319578B (en) Method for generating medium for data recording
JP2019079347A (en) Character estimation system, character estimation method, and character estimation program
CN111368744A (en) Method and device for identifying unstructured table in picture
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN111338733A (en) User interface adaptation method and system
CN102467664B (en) Method and device for assisting with optical character recognition
EP3093851B1 (en) Method and device for use when reassembling a fragmented jpeg image
CN104268545A (en) Method for table area recognition and content rasterization in electronic document layout files
CN104463153B (en) The method and system of character identification rate in a kind of raising format document
CN114049540A (en) Method, device, equipment and medium for detecting marked image based on artificial intelligence
CN104346616A (en) Character recognition device and character recognition method
US9639970B2 (en) Character recognition system, character recognition program and character recognition method
CN109145879B (en) Method, equipment and storage medium for identifying printing font
CN111476090A (en) Watermark identification method and device
CN107169517B (en) Method for judging repeated strokes, terminal equipment and computer readable storage medium
CN104850819B (en) Information processing method and electronic equipment
CN107016317B (en) Bar code decoding method and device
CN112541505B (en) Text recognition method, text recognition device and computer-readable storage medium
CN107943760B (en) Method and device for optimizing fonts of PDF document editing, terminal equipment and storage medium
CN112434700A (en) License plate recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220620

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: FOUNDER APABI TECHNOLOGY Ltd.

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: FOUNDER APABI TECHNOLOGY Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180904