CN102306294A - Method and system for extracting image from portable document format (PDF) file page - Google Patents

Method and system for extracting image from portable document format (PDF) file page Download PDF

Info

Publication number
CN102306294A
CN102306294A CN201110243119A CN201110243119A CN102306294A CN 102306294 A CN102306294 A CN 102306294A CN 201110243119 A CN201110243119 A CN 201110243119A CN 201110243119 A CN201110243119 A CN 201110243119A CN 102306294 A CN102306294 A CN 102306294A
Authority
CN
China
Prior art keywords
pictorial element
divided
row
image
ranks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110243119A
Other languages
Chinese (zh)
Inventor
晏检平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wondershare Software Co Ltd
Original Assignee
Shenzhen Wondershare Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wondershare Software Co Ltd filed Critical Shenzhen Wondershare Software Co Ltd
Priority to CN201110243119A priority Critical patent/CN102306294A/en
Priority to PCT/CN2011/084305 priority patent/WO2013026245A1/en
Publication of CN102306294A publication Critical patent/CN102306294A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a method for extracting an image from a portable document format (PDF) file page. The method comprises the following steps of: acquiring position information of each image element in the PDF file page; dividing all image elements in the page into different sets according to the position information; and taking all image elements in each set as a whole for image extraction. The invention also discloses a system for extracting the image from the PDF file page. By adopting the method or the system disclosed by the invention, the extracted image can be easily edited, and high extraction efficiency is realized.

Description

A kind of method and system of from the PDF file page, extracting image
Technical field
The present invention relates to document processing field, particularly relate to a kind of method and system of from the PDF file page, extracting image.
Background technology
PDF is the abbreviation of Portable Document Format (portable file layout), is a kind of electronic document format.The pdf document form becomes the desirable file layout of carrying out electronic document distribution and formatted message propagation on the internet with its remarkable characteristic.Current, the technical paper major part of issue is submitted to PDF on the internet.But the emphasis point of PDF file is to describe the print format of document, and does not describe the data structure in the original document, and is difficult for editor.Therefore, be the file of other form with the PDF file conversion if desired, be the comparison difficulty.Especially the image in the PDF file is the most scabrous problem in the PDF file conversion.
In the prior art, when the PDF file conversion is the file of other form, mainly contain dual mode for the extraction of image:
A kind of is with intact the extracting of all images element in the PDF file (width of cloth picture possibly be made up of a large amount of pictorial elements).The pictorial element that this mode extracts often has thousands of.Because what this mode extracted is a large amount of pictorial elements, does not have clear and definite which pictorial element simultaneously and constitute piece image.Therefore, the image that this mode extracts can only be edited and can't edit integral image pictorial element.
Also having a kind of is directly the full page in the PDF file to be extracted as a picture.The image that this mode extracts, the same problem that is difficult for editor that exists.
Summary of the invention
The purpose of this invention is to provide a kind of method and system of from the PDF file page, extracting image, can make the image that extracts be easy to editor, have higher extraction efficiency simultaneously.
For realizing above-mentioned purpose, the invention provides following scheme:
A kind of method of from the PDF file page, extracting image comprises:
Obtain the positional information of each pictorial element in the PDF file page;
According to said positional information, all images element in the page is divided into different set;
All images element in each set is carried out image as a whole to be extracted.
Preferably, the said positional information of obtaining each pictorial element in the PDF file page comprises:
Obtain the top left corner apex location coordinate information of each pictorial element in the PDF file page, and write down the reference point of said coordinate information as this pictorial element.
Preferably, said according to said positional information, all images element in the page is divided into different set, comprising:
Said pictorial element is carried out the division of horizontal direction, obtain one or more row sets;
Pictorial element in the said row set is carried out the division of vertical direction, obtain the ranks set.
Preferably, said said pictorial element is carried out the division of horizontal direction, obtains one or more row sets, comprising:
A, according to the ordinate of the reference point of pictorial element, all images element is sorted;
B, according to the ranking results of ordinate, first pictorial element is divided to first row set;
C, judge whether the next pictorial element and the just ordinate scope of divided image element intersect;
D is if then be divided to the row set that said firm divided image element belongs to said next pictorial element; Otherwise, said next pictorial element is divided to new row set, return step C.
Preferably, said pictorial element in the said row set is carried out the division of vertical direction, obtains the ranks set, comprising:
E, for each row set, the horizontal ordinate according to the reference point of said pictorial element sorts to the pictorial element in the row set;
F, according to the ranking results of horizontal ordinate, first pictorial element in the row set is divided to first row set; Said row set is the ranks set corresponding to full page;
G, judge whether next pictorial element and just divided image element intersect in the horizontal ordinate direction;
H is if then be divided to said next pictorial element the row set at said firm divided image element place; Otherwise, said next pictorial element is divided to new row set, return step G.
Preferably, saidly all images element in the set of each ranks carried out image as a whole extract, comprising:
Obtain the peripheral profile of each ranks set;
According to said peripheral profile, all images element in the said ranks set is extracted as a width of cloth picture.
Preferably, the said peripheral profile that obtains each ranks set; According to said peripheral profile, all images element in the said ranks set is extracted as a width of cloth picture, comprising:
Obtain the peripheral rectangle of each ranks set;
According to this periphery rectangle all images element in this ranks set being carried out sectional drawing as a whole extracts.
A kind of system that from the PDF file page, extracts image comprises:
Position information acquisition module is used for obtaining the positional information of each pictorial element of the PDF file page;
Module is divided in set, is used for according to said positional information, and all images element in the page is divided into different set;
Extraction module is used for all images element of each set is carried out the image extraction as a whole.
Preferably, said position information acquisition module comprises:
The coordinate information acquiring unit is used for obtaining the top left corner apex location coordinate information of each pictorial element of the PDF file page, and writes down the reference point of said coordinate information as this pictorial element.
Preferably, said set division module comprises:
The row set division unit is used for said pictorial element is carried out the division of horizontal direction, obtains one or more row sets;
Ranks set division unit is used for the pictorial element of said row set is carried out the division of vertical direction, obtains the ranks set.
According to specific embodiment provided by the invention, the invention discloses following technique effect:
The disclosed method of from the PDF file page, extracting image of the present invention; Through according to the positional information of pictorial element in the file page; Its procession is divided; Integral body is carried out in ranks set after dividing extract, make the image that extracts be easy to editor, have higher extraction efficiency simultaneously.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use among the embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the said method flow diagram that from the PDF file page, extracts image of the embodiment of the invention;
Fig. 2 is the said system construction drawing that from the PDF file page, extracts image of the embodiment of the invention.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
The purpose of this invention is to provide a kind of method and system of from the PDF file page, extracting image, can pictorial element be divided into a small amount of significant set, and extract according to the original image information in the PDF file.
For make above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, below in conjunction with accompanying drawing and embodiment the present invention done further detailed explanation.
Referring to Fig. 1, be the said method flow diagram that from the PDF file page, extracts image of the embodiment of the invention.As shown in Figure 1, the method comprising the steps of:
S101: the positional information of obtaining each pictorial element in the PDF file page;
Pictorial element can be various type.Concrete, can use the mode of coordinate to write down the positional information of each pictorial element.For the pictures different element, the shared region area size of this element also is not quite similar.(x y) writes down this positions of elements information can to adopt this element planimetric coordinates among the present invention.Wherein, x representes horizontal ordinate, and y representes ordinate.The element that region area is big more, the coordinates regional of its covering are also big more.
Therefore step S101 can comprise:
Obtain the coordinate information of each pictorial element in the PDF file page.
Concrete, can obtain the top left corner apex location coordinate information of each pictorial element in the PDF file page, and write down the reference point of said coordinate information as this pictorial element.
S102:, all images element in the page is divided into different set according to said positional information;
Usually, possibly comprise a plurality of pictorial elements (for example pixel) in the width of cloth picture.Because these pictorial elements belong to a width of cloth picture, so the position of these pictorial elements is very compact.The implication of step S102 is exactly to form mode according to PDF file page Central Plains picture originally, will belong to the pictorial element of a picture as much as possible, is divided in the same set, so that do as a whole the extraction.
In the practical application, step S102 can comprise:
Said pictorial element is carried out the division of horizontal direction, obtain one or more row sets;
Pictorial element in the set of said row is carried out the division of vertical direction, obtain the ranks set.
Concrete, for adopting the coordinate mode to represent the positional information of each pictorial element, the division of the set of row can comprise step:
A, according to the ordinate of the reference point of said pictorial element, all images element is sorted;
For each pictorial element, need sort according to the coordinate of the point at its same position place.Concrete, can adopt the ordinate of the upper left point of each pictorial element, all images element is sorted; Also can adopt the ordinate of upper right point, following an of left side or the lower-right most point etc. of each pictorial element to sort.These points can be thought the reference point of pictorial element.
The purpose of ordering is in order to be divided into same row set by the pictorial element that horizontal level is close.Therefore, if in the coordinate system, axis of ordinates is by under the last sensing, and the ordinate that is positioned at the element of page top so will be less than the ordinate of the element that is positioned at page below, and can sort this moment according to the ascending order of ordinate; If in the coordinate system, axis of ordinates is by pointing to down, and the ordinate that is positioned at the element of page top so will be greater than the ordinate of the element that is positioned at page below, and can sort this moment according to the descending order of ordinate.
B, according to the ranking results of ordinate, first pictorial element is divided to first row set;
C, judge whether the next pictorial element and the just ordinate scope of divided image element intersect;
D is if then be divided to the row set that said firm divided image element belongs to said next pictorial element; Otherwise, said next pictorial element is divided to new row set, return step C.
For instance, the ordinate scope of supposing firm divided image element is at 10-100, and the ordinate scope of next pictorial element is at 20-50, and there is the part that intersects in obvious two scopes.Be divided to row set that said firm divided image element belong to next pictorial element this moment, thinks that promptly both are on the position with delegation basically.
If just the ordinate scope of divided image element is at 10-100, the ordinate scope of next pictorial element is at 200-260, and then there is not crossing part in two scopes.Be divided to new row set with said next pictorial element this moment, thinks that promptly both do not belong to same delegation.Repeating step C and D are until all images element has all been divided.
Pictorial element in the set of said row is carried out the division of vertical direction, obtains the ranks set, specifically can comprise step:
E, for each row set, the horizontal ordinate according to the reference point of said pictorial element sorts to the pictorial element in the row set;
The purpose of ordering is in order to be divided into same row set by the pictorial element that the upright position is close.Therefore, if in the coordinate system, abscissa axis is pointed to right by a left side, and the horizontal ordinate that is positioned at the element of page left so will be less than the horizontal ordinate that is positioned at right-hand element of the page, and can sort this moment according to the ascending order of horizontal ordinate; If in the coordinate system, abscissa axis is that a left side is pointed in the bottom right, and the horizontal ordinate that is positioned at the element of page left so will be greater than the horizontal ordinate that is positioned at right-hand element of the page, and can sort this moment according to the descending order of horizontal ordinate.
F, according to the ranking results of horizontal ordinate, first pictorial element in the row set is divided to first row set; Said row set is the ranks set corresponding to full page;
G, judge whether next pictorial element and just divided image element intersect in the horizontal ordinate direction;
H is if then be divided to said next pictorial element the row set at said firm divided image element place; Otherwise, said next pictorial element is divided to new row set, return step G.
For instance, the horizontal ordinate scope of supposing firm divided image element is at 10-100, and the horizontal ordinate scope of next pictorial element is at 20-150, and there is the part that intersects in obvious two scopes.Gather the row that next pictorial element is divided to said firm divided image element place this moment, thinks that promptly both are on the position of same row basically.
If just the horizontal ordinate scope of divided image element is at 10-100, the horizontal ordinate scope of next pictorial element is at 200-260, and then there is not crossing part in two scopes.Be divided to new row set with said next pictorial element this moment, thinks that promptly both do not belong to same row.Repeating step G and H until the pictorial element in certain row set has been divided, divide another row set then, the final division of accomplishing all row sets.
Need to prove that step e-H is to each row set.For the row set that marks off in each row set, just can think ranks set for full page.
S103: all images element in each set is carried out image as a whole extract.
Because each the ranks set after procession is divided all is very approaching at horizontal and vertical position, these elements constitute same width of cloth image probably jointly.Therefore, can all images element in each ranks set be extracted as a whole.
Concrete, can adopt following manner to extract:
Obtain the peripheral profile of each ranks set;
According to said peripheral profile, all images element in the said ranks set is extracted as a width of cloth picture.
More specifically, for ease of understanding and operation, obtaining the peripheral profile of each ranks set, can be the peripheral rectangle that obtains each ranks set; According to this periphery rectangle all images element in this ranks set being carried out sectional drawing then extracts.
In sum; The disclosed method of from the PDF file page, extracting image of the present invention; Through according to the positional information of pictorial element in the file page, its procession is divided, integral body is carried out in the ranks set after dividing extract; Make the image that extracts be easy to editor, have higher extraction efficiency simultaneously.
Corresponding with the disclosed method of from the PDF file page, extracting image of the present invention, the invention also discloses a kind of system that from the PDF file page, extracts image.
Referring to Fig. 2, be the said system construction drawing that from the PDF file page, extracts image of the embodiment of the invention.As shown in Figure 2, this system comprises:
Position information acquisition module 201 is used for obtaining the positional information of each pictorial element of the PDF file page;
Module 202 is divided in set, is used for according to said positional information, and all images element in the page is divided into different set;
Extraction module 203 is used for all images element of each set is carried out the image extraction as a whole.
In the practical application, said position information acquisition module 201 can comprise:
The coordinate information acquiring unit is used for obtaining the top left corner apex location coordinate information of each pictorial element of the PDF file page, and writes down the reference point of said coordinate information as this pictorial element.
Said set is divided module 202 and can be comprised:
The row set division unit is used for said pictorial element is carried out the division of horizontal direction, obtains one or more row sets;
Ranks set division unit is used for the pictorial element of said row set is carried out the division of vertical direction, obtains the ranks set.
Each embodiment adopts the mode of going forward one by one to describe in this instructions, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For the disclosed system of embodiment, because it is corresponding with the embodiment disclosed method, so description is fairly simple, relevant part is partly explained referring to method and is got final product.
Used concrete example among this paper principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part all can change on embodiment and range of application.In sum, this description should not be construed as limitation of the present invention.

Claims (10)

1. a method of from the PDF file page, extracting image is characterized in that, comprising:
Obtain the positional information of each pictorial element in the PDF file page;
According to said positional information, all images element in the page is divided into different set;
All images element in each set is carried out image as a whole to be extracted.
2. method according to claim 1 is characterized in that, the said positional information of obtaining each pictorial element in the PDF file page comprises:
Obtain the top left corner apex location coordinate information of each pictorial element in the PDF file page, and write down the reference point of said coordinate information as this pictorial element.
3. method according to claim 1 is characterized in that, and is said according to said positional information, and all images element in the page is divided into different set, comprising:
Said pictorial element is carried out the division of horizontal direction, obtain one or more row sets;
Pictorial element in the said row set is carried out the division of vertical direction, obtain the ranks set.
4. method according to claim 3 is characterized in that, said said pictorial element is carried out the division of horizontal direction, obtains one or more row sets, comprising:
A, according to the ordinate of the reference point of pictorial element, all images element is sorted;
B, according to the ranking results of ordinate, first pictorial element is divided to first row set;
C, judge whether the next pictorial element and the just ordinate scope of divided image element intersect;
D is if then be divided to the row set that said firm divided image element belongs to said next pictorial element; Otherwise, said next pictorial element is divided to new row set, return step C.
5. method according to claim 3 is characterized in that, said pictorial element in the said row set is carried out the division of vertical direction, obtains the ranks set, comprising:
E, for each row set, the horizontal ordinate according to the reference point of said pictorial element sorts to the pictorial element in the row set;
F, according to the ranking results of horizontal ordinate, first pictorial element in the row set is divided to first row set; Said row set is the ranks set corresponding to full page;
G, judge whether next pictorial element and just divided image element intersect in the horizontal ordinate direction;
H is if then be divided to said next pictorial element the row set at said firm divided image element place; Otherwise, said next pictorial element is divided to new row set, return step G.
6. according to each described method of claim 3-5, it is characterized in that, saidly all images element in the set of each ranks is carried out image as a whole extract, comprising:
Obtain the peripheral profile of each ranks set;
According to said peripheral profile, all images element in the said ranks set is extracted as a width of cloth picture.
7. method according to claim 6 is characterized in that, the said peripheral profile that obtains each ranks set; According to said peripheral profile, all images element in the said ranks set is extracted as a width of cloth picture, comprising:
Obtain the peripheral rectangle of each ranks set;
According to this periphery rectangle all images element in this ranks set being carried out sectional drawing as a whole extracts.
8. a system that from the PDF file page, extracts image is characterized in that, comprising:
Position information acquisition module is used for obtaining the positional information of each pictorial element of the PDF file page;
Module is divided in set, is used for according to said positional information, and all images element in the page is divided into different set;
Extraction module is used for all images element of each set is carried out the image extraction as a whole.
9. system according to claim 8 is characterized in that, said position information acquisition module comprises:
The coordinate information acquiring unit is used for obtaining the top left corner apex location coordinate information of each pictorial element of the PDF file page, and writes down the reference point of said coordinate information as this pictorial element.
10. system according to claim 8 is characterized in that, said set is divided module and comprised:
The row set division unit is used for said pictorial element is carried out the division of horizontal direction, obtains one or more row sets;
Ranks set division unit is used for the pictorial element of said row set is carried out the division of vertical direction, obtains the ranks set.
CN201110243119A 2011-08-23 2011-08-23 Method and system for extracting image from portable document format (PDF) file page Pending CN102306294A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201110243119A CN102306294A (en) 2011-08-23 2011-08-23 Method and system for extracting image from portable document format (PDF) file page
PCT/CN2011/084305 WO2013026245A1 (en) 2011-08-23 2011-12-20 Method and system for extracting image from page of pdf file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110243119A CN102306294A (en) 2011-08-23 2011-08-23 Method and system for extracting image from portable document format (PDF) file page

Publications (1)

Publication Number Publication Date
CN102306294A true CN102306294A (en) 2012-01-04

Family

ID=45380154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110243119A Pending CN102306294A (en) 2011-08-23 2011-08-23 Method and system for extracting image from portable document format (PDF) file page

Country Status (2)

Country Link
CN (1) CN102306294A (en)
WO (1) WO2013026245A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302626A (en) * 2015-11-09 2016-02-03 深圳市依伴数字科技有限公司 Analytic method of XPS (XML Paper Specification) structural data
CN105843783A (en) * 2016-03-21 2016-08-10 哈尔滨工程大学 Chinese PDF file text content extraction method oriented to network flow transmission
CN106951400A (en) * 2017-02-06 2017-07-14 北京因果树网络科技有限公司 The information extraction method and device of a kind of pdf document
CN109670461A (en) * 2018-12-24 2019-04-23 广东亿迅科技有限公司 PDF text extraction method, device, computer equipment and storage medium
CN112100978A (en) * 2020-09-16 2020-12-18 掌阅科技股份有限公司 Typesetting processing method based on electronic book, electronic equipment and storage medium
CN112100979A (en) * 2020-09-16 2020-12-18 掌阅科技股份有限公司 Typesetting processing method based on electronic book, electronic equipment and storage medium
CN113011131A (en) * 2021-03-22 2021-06-22 掌阅科技股份有限公司 Typesetting method based on picture electronic book, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10445615B2 (en) * 2017-05-24 2019-10-15 Wipro Limited Method and device for extracting images from portable document format (PDF) documents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866335A (en) * 2010-06-14 2010-10-20 深圳市万兴软件有限公司 Form processing method and device in document conversion
CN101876967A (en) * 2010-03-25 2010-11-03 深圳市万兴软件有限公司 Method for generating PDF text paragraphs
CN101937477A (en) * 2009-06-29 2011-01-05 鸿富锦精密工业(深圳)有限公司 Data processing equipment, system and method for realizing figure file fitting

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6801673B2 (en) * 2001-10-09 2004-10-05 Hewlett-Packard Development Company, L.P. Section extraction tool for PDF documents
US7162084B2 (en) * 2003-01-29 2007-01-09 Microsoft Corporation System and method for automatically detecting and extracting objects in digital image data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937477A (en) * 2009-06-29 2011-01-05 鸿富锦精密工业(深圳)有限公司 Data processing equipment, system and method for realizing figure file fitting
CN101876967A (en) * 2010-03-25 2010-11-03 深圳市万兴软件有限公司 Method for generating PDF text paragraphs
CN101866335A (en) * 2010-06-14 2010-10-20 深圳市万兴软件有限公司 Form processing method and device in document conversion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张伯: "基于PDF文字流的表格识别技术的研究", 《中国优秀硕士学位论文全文数据库》, 1 May 2010 (2010-05-01) *
王津涛等: "PDF文件中可识别图像的提取", 《计算机工程与设计》, vol. 27, no. 9, 16 May 2006 (2006-05-16), pages 1539 - 1541 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302626A (en) * 2015-11-09 2016-02-03 深圳市依伴数字科技有限公司 Analytic method of XPS (XML Paper Specification) structural data
CN105302626B (en) * 2015-11-09 2021-07-23 深圳市巨鼎医疗股份有限公司 Analytic method of XPS (XPS) structured data
CN105843783A (en) * 2016-03-21 2016-08-10 哈尔滨工程大学 Chinese PDF file text content extraction method oriented to network flow transmission
CN106951400A (en) * 2017-02-06 2017-07-14 北京因果树网络科技有限公司 The information extraction method and device of a kind of pdf document
CN109670461A (en) * 2018-12-24 2019-04-23 广东亿迅科技有限公司 PDF text extraction method, device, computer equipment and storage medium
CN112100978A (en) * 2020-09-16 2020-12-18 掌阅科技股份有限公司 Typesetting processing method based on electronic book, electronic equipment and storage medium
CN112100979A (en) * 2020-09-16 2020-12-18 掌阅科技股份有限公司 Typesetting processing method based on electronic book, electronic equipment and storage medium
CN113011131A (en) * 2021-03-22 2021-06-22 掌阅科技股份有限公司 Typesetting method based on picture electronic book, electronic equipment and storage medium
CN113011131B (en) * 2021-03-22 2022-02-22 掌阅科技股份有限公司 Typesetting method based on picture electronic book, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2013026245A1 (en) 2013-02-28

Similar Documents

Publication Publication Date Title
CN102306294A (en) Method and system for extracting image from portable document format (PDF) file page
CN108470021A (en) The localization method and device of table in PDF document
CN101719335B (en) Grid picture electronic map for geographic information system
CN103929599B (en) Digital video image real-time zooming method based on FPGA
WO2016150052A1 (en) Method and system for utilizing image to generate link
CN101859322B (en) Webpage display method for mobile terminal
CN1853157A (en) Improved presentation of large objects on small displays
CN101976114B (en) System and method for realizing information interaction between computer and pen and paper based on camera
CN107092684A (en) Image processing method and device, storage medium
CN104516891A (en) Layout analyzing method and system
CN102156865A (en) Handwritten text line character segmentation method and identification method
CN102902535A (en) Picture self-adaption method, system and terminal equipment
CN104281864A (en) Method and equipment for generating two-dimensional codes
CN102157003A (en) Automatic configuration method for annotation label of map under digital environment
CN102254312A (en) Method for splicing geographical tile graphs
US10395155B2 (en) Billboard containing encoded information
CN112668289A (en) Extraction method and device of nested table and storage medium
CN103440239A (en) Functional region recognition-based webpage segmentation method and device
CN109543525B (en) Table extraction method for general table image
CN101847165A (en) Layout drawing method and device of memory part
CN102662962B (en) Dynamic display method based on webpage elements
CN103915035B (en) A kind of data transfer device and DTU
CN100517299C (en) Typesetting method for implementing multiple alignment in word rows
CN103207875A (en) Map data processing method and device
US20180189952A1 (en) Method for processing the lef diagram of a layout

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120104