CN102841941A - Index-based format returnable file establishing and drawing method - Google Patents

Index-based format returnable file establishing and drawing method Download PDF

Info

Publication number
CN102841941A
CN102841941A CN2012102990882A CN201210299088A CN102841941A CN 102841941 A CN102841941 A CN 102841941A CN 2012102990882 A CN2012102990882 A CN 2012102990882A CN 201210299088 A CN201210299088 A CN 201210299088A CN 102841941 A CN102841941 A CN 102841941A
Authority
CN
China
Prior art keywords
reflux
enclosing region
index
file
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102990882A
Other languages
Chinese (zh)
Other versions
CN102841941B (en
Inventor
龚如宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201210299088.2A priority Critical patent/CN102841941B/en
Publication of CN102841941A publication Critical patent/CN102841941A/en
Application granted granted Critical
Publication of CN102841941B publication Critical patent/CN102841941B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention relates to an index-based format returnable file establishing and drawing method. A logic structure and a reading order of a format file are described through a method based on indexes, and each index is described. Recording of all pixels contained in returnable objects is not needed, and only a surrounding boundary of a boundary of each returnable object in the format file needs to be recorded so as to substantially improve a compression rate and a network transmission rate. Besides, preliminary calculation can be conducted from a coordinate value of a former surrounding area to a coordinate value of a next surrounding area in a row by means of predictions through recording of a row or a column alignment parameter to which each returnable text object belongs in the format file. Compression effects can be improved through computing and compressing of difference values. In terms of display style, the method can provide more plentiful display styles, can dynamically arrange a layout of each returnable text object again according to an indentation mode in an original format, and can dynamically rearrange the layout of a floating alignment style of texts and illustrations in the original format.

Description

Set up and method for drafting based on the format of the index file that can reflux
Technical field
The present invention relates to a kind of computer documents message structure technology, particularly a kind of format based on index file that can reflux is set up and method for drafting.
Background technology
A lot of digital books exist with layout files form and image file format at present, such as PDF, and TIFF, PNG and CEB etc.These electron-like books mainly are suitable for the giant-screen reading and application is read in printing/printing.Yet format numeral books but are not suitable on the terminal of different screen or window size or medium, reading and using.Such as format numeral books, but, need narrow down to screen size to the page and could represent complete line/row owing to lack the backflow functionality of literal line/row for the reading A4 space of a whole page on the small screen terminal.But when big space of a whole page books narrowed down to screen size, literal was little as not see Chu.Perhaps need be according to the reading position digital books page that comes ceaselessly to roll, to accomplish the complete reading of each row/row.
To read format numeral books on different size terminal or the medium in order being supported in, need to analyze and understand the space of a whole page logical organization of digital books and confirm the reading priority ranking between each logic region.Read routine will show each logic region according to reading priority ranking, come each logic region is carried out different disposal according to the type of screen size and logic region simultaneously.Such as draw a portrait zone and line drawing zone are amplified and dwindled processing, and the conversion process that can reflux is carried out in literal paragraph zone.So that for the user provides better reading experience.
Present screen adaptive technique mainly is applicable to the literal to be main digital books.Can enter a new line automatically according to the width of reading window such as the literal paragraph in HTML, TXT and the EPUB file, on ocr software, read with the form that can reflux.And for format numeral books figure, the format of scanning numeral books etc. particularly need be through carrying out the analysis of space of a whole page logical organization, makes a distinction text filed, description and line drawing zone.And text filed be that unit cuts according to word or speech, generate the subimage of word and speech.The final space of a whole page logical expressions that use HTML, XHTML and XML language to carry out digital books are so that read the refluxed reading effect that gets up to have similar text formatting TXT, HTML and word file on browser.The shortcoming of this method is not only need spend a large amount of storage spaces to preserve a lot of subimages, and can be affected when on network, reading owing to directly in files such as HTML, XHTML, directly embedded the image of a lot of words or word.
Relevant technical literature mainly comprises following four pieces:
One, non-patent document 1:Paper to PDA. In International Conference On Pattern Recognition (ICPR) 2002, T.M.Breuel, W.C.Janssen, K.Popat, and H.S.Baird;
Two, patent document 2: the method for expressing of Chinese patent layout file logical structure information and system's application number 200710123338.6;
Three, patent document 3: Japanese Patent Laid is opened 2006-350867 document treating apparatus, document disposal route, program and carrier;
Four, the patent literary composition 4: United States Patent (USP) PUB NO 2007/0234203 GENERATING IMAGE-BASED REFLOWABLE FILES FOR RENDERING ON VARIOUS SIZED DISPLAYS;
Five, the patent literary composition 5: United States Patent (USP) PUB NO 2007/0237428 A1, EFFICIENTG PROCESSING OF NON-REFLOW CONTENT IN A DIGITAL IMAGE;
The people such as Thomas M. Breuel of U.S. Xerox palo alto research centre in non-patent document 1 (palo alto research center) have proposed to arrange again each image-region that can reflux according to the size of display screen or window and have carried out dynamic space of a whole page composing.The image-region that here can reflux can be single character, word, description image, tabular drawing picture, the perhaps pairing image-region of figure.
These little pictorial elements whiles and space of a whole page control information are associated, to support showing each pel according to space of a whole page logical organization and reading order.Such advantage is to represent digital books image because the HTML of the standard of use waits, and can realize cross-platform reading.Shortcoming is owing to need a large amount of subimages, causes the waste of storage space and influences the smooth reading of network.
Come the correspondence image zone in the former format image of index but also mentioned in the non-patent document 1, to avoid directly preserving a lot of little refluxed image-regions through the rectangular shaped rim of record recirculation zone.But present technique and the difference in non-patent document 1 are not to be directly to write down the rectangular shaped rim of recirculation zone, but but but come the recirculation zone in the original format image of index through can the reflux baseline of the pairing line of text of character area, the modes such as average line recirculation zone bounding box of line of text of record.Can further improve compressibility, to improve the compressibility of character image region description information.
In patent document 2, mention content is used a numbering with reference to subsequence;, logical block use this label to come each content of index like this in describing with reference to subsequence; And use the content can reference position and the sequence length of this subsequence in the content reference sequences with reference to subsequence, obtain basic drafting information.Space of a whole page system for rearranging will read in logical block description document and content with reference to the subsequence file, realize that the screen self-adaptation resets.
The advantage of the method is owing to use numbering to come index content with reference to subsequence; Search reference position and sequence length in the content reference sequences by content with reference to subsequence again; Obtain basic drafting information; Can represent the logical structure information of layout files so effectively, neatly, need not make amendment original layout files.And need not draw sequence of graphical elements to each basically and deposit separately, can save storage space.
The shortcoming of the method is to use deviation post and length to come the index content reference sequences simultaneously owing to must generate the content reference sequences, is more suitable for the layout files in text formatting, like PDF, CEB file etc.And for the layout files of picture format, such as TIFF, PNG and the scanning pdf document etc. of scanning, because baseline parameter value of each row is different, be not suitable for through generating the content reference sequences and using the skew in the reference sequences to divide content with reference to subsequence.But the present invention directly uses the recirculation zone boundary information to come the subregion of the original page-images of index, not only can corresponding PDF etc. the layout files of text formatting, can also further corresponding PNG, the layout files of picture formats such as TIFF.
Propose the logical organization of using XML to represent document in the patent document 3 and read priority ranking.Through the position and the size of each logic region in the record format document, can come each logic region in the original format document of index.But the refluxed read function in text fragment zone is not supported in this invention, and each logic region is used single image to write down and representes.The present invention will handle the text fragment zone on patent document 3 bases, use the hierarchy type index technology to support the refluxed demonstration problem of literal in the text fragment zone.
Proposition generates through bounding box position, size and the shape that writes down reflow object in the digital books image, the baseline of reflow object etc. format numeral books is changed in the patent document 4.But the difference of the present invention and patent document 4 is not to be the baseline of each reflow object of record, but to each style of writing of original digital books image this, the baseline (Base Line) through accurate extraction row writes down capable baseline parameter with behavior unit.In addition for the situation of crooked literal line, the present invention proposes the baseline that operation parameter curve or multistage line segment approach literal line, only writes down the parameter of multistage line segment and the parameter of parametric curve.Can reduce the data recorded amount like this, improve compressibility.
But the notion of reflow object and Fei Ke reflow object has been proposed in the patent document 5; But how to have proposed through printed page analysis and printed page understanding acquisition reflow object with can not reflow object; But and to reflow object and can not reflow object carry out different processing, so that on the medium of different size, show.The present invention has proposed space of a whole page logical structure description and compression method based on index on this basis, on the convenience basis of improving expression, has improved compression performance simultaneously.
Summary of the invention
The present invention be directed to present screen layout files form adaptive technique storage space, the problem that data volume is big; Proposed a kind of format and can reflux that file is set up and method for drafting based on index; Describe the logical organization of digital books image and each index is described through using based on the mode of index; But needn't write down each pixel that comprises in the reflow object; But only need the coordinate of border in digital books image of each reflow object of record, thereby significantly improve compressibility and Network Transmission rate.
Technical scheme of the present invention is: a kind of format based on index file method for building up that can reflux comprises following concrete steps:
1) set up format based on the index file-storage device that can reflux, the file conversion server that can reflux reads in layout files;
2) but the file conversion server that can reflux calculates an enclosing region to each reflow object in the layout files and Fei Ke reflow object;
3) but to the reflow object of colleague in the layout files or same column, but calculate the alignment line of colleague or same column reflow object;
4) calculate each enclosing region coordinate position in, but calculate the relative position of each recirculation zone simultaneously with respect to row or row alignment line under it to layout files;
5) calculate the size of each enclosing region;
6) each enclosing region is set up an index; To each index; Note the position and size, each relative position of enclosing region and its alignment line that can reflux of the corresponding enclosing region of index, and make index of reference represent reading priority ranking and Rankine-Hugoniot relations and the indentation information of the literal enclosing region that can reflux between layout files page logic structure, each enclosing region;
7) the file conversion server that can reflux generates can reflux file and be stored into format based on index and can reflux in the file-storage device of format based on index; The file conversion server that can reflux transmits data mutually with the file-storage device that can reflux based on the format of index, and reading system reads in the format based on index in the file conversion server that maybe can reflux in the file-storage device that can the reflux file that can reflux and is used for showing.
Said format based on index can reflux, and the said layout files of step 1) can generate through the scanner imaging device in the file method for building up, also can change and generate through program, perhaps obtains or generates through the calling module file.
Said format based on the index step 2 in the file method for building up that can reflux) said enclosing region can be rectangle, circle, curve, ellipse, any definable geometric configuration of triangle.
But but but but said format based on index can reflux in the file method for building up alignment line of said colleague of step 3) or same column reflow object can be the literal type reflow object baseline, can be the average line of literal type reflow object, also can be the coboundary line or the lower limb line of literal type reflow object in the row; But can be the left hand edge line or the right hand edge line of literal type reflow object in the row, also can be parametric curve.
Can the reflux size of said each enclosing region of calculating of step 4) in the file method for building up of said format based on index, available mathematical model definition.
Said format each enclosing region of the said calculating of the step 5) coordinate position in the file method for building up that can reflux to layout files based on index, available absolute coordinates is represented; Also can be according to going linguistic property or big or small statistical property interior or the interior enclosing region of row; Through calculating the difference value of prediction coordinate figure and actual coordinate value; And select the high coding method of compressibility that this difference value sequence is compressed, with the position of definition enclosing region.
Said format based on the index reading priority ranking between said each enclosing region of step 6) that can reflux in the file method for building up is meant according to the reading order of former format agreement and confirms interregional reading priority ranking, but the Rankine-Hugoniot relations between said each enclosing region is meant the unsteady alignment relation between recirculation zone and Fei Ke recirculation zone.
A kind of format based on index file method for drafting that can reflux is drawn the format of the tape index file that can reflux, and comprises following concrete steps:
1) reads the refluxed file of tape index; Index comprises each enclosing region size and position in the file that can reflux; The relative position of each can reflux enclosing region and its alignment line, and reading priority ranking and Rankine-Hugoniot relations between space of a whole page logical organization and each enclosing region;
2) obtain size, shape, the zooming parameter that the FTP client FTP output medium is set type;
3) confirm initial display position, draw the different drafting mode of enclosing region type selecting, but, amplify or dwindle and draw for non-reflow object according to waiting;
4) but for reflow object; Confirm line-spacing; And according to the width of row and the size of plan demonstration enclosing region; Confirm that every row shows the number of enclosing region; If need indentation to show in the row; Need that display width is deducted the indentation width and obtain developed width; And confirm the number of the enclosing region that every row shows according to this developed width; If existence will show illustration in the row; Need to go the interior illustration width of row to obtain developed width the display width reduction, and confirm the number of the enclosing region that every row shows according to this developed width; Confirm the row distance; And according to the height of row and the size of plan demonstration enclosing region; Confirm that every row show the number of enclosing region; If need indentation to show in the row; Column memory needs that demonstrations highly deducted indentation and obtains actual height, and confirm the number of the enclosing region of every row demonstration, if will show illustration according to this actual height; Need that demonstration is highly deducted the interior illustration height of row and obtain actual height, confirm the number of the enclosing region that every row show then according to this actual height;
5) confirm the lateral coordinates position of each enclosing region in the row or the along slope coordinate position of interior each enclosing region of row;
6) confirm the anglec of rotation and the side-play amount of each enclosing region with respect to corresponding datum line;
7) draw the file that to reflux according to coordinate position, the anglec of rotation, side-play amount and the scaling of the enclosing region that calculates.
Beneficial effect of the present invention is: the format that the present invention is based on index file that can reflux is set up and method for drafting; This method representes that through making index of reference layout files patrols volume a knot structure ﹑ and represent the reading priority ranking between each enclosing region and expression index, and the layout file logical structure description method of high compression rate is provided; But come through the capable baseline of recording text, average line etc. and the coordinate difference calculated between adjacent literal type recirculation zone border, the recording method of high compression rate is provided; But this self-adaptation space of a whole page method for drafting shows style through the unsteady alignment of selecting literal type recirculation zone and image/subtype zone, can realize different unsteady alignment effect; But the indentation mode through selecting the literal type recirculation zone can provide different indentation effects.
Description of drawings
Fig. 1 is that the format that the present invention is based on index file that can reflux is set up the canonical schema with method for drafting enforcement;
Fig. 2 is can the reflux Computer Architecture schematic block diagram of file conversion server of the present invention;
Fig. 3 is a client-side system architecture schematic block diagram of the present invention;
Fig. 4 becomes format based on the index document flowchart that can reflux for layout files conversion of page of the present invention;
Fig. 5 is that the format that the present invention is based on index file that can reflux is recorded and narrated and implemented illustration;
But Fig. 6 implements illustration for the present invention's reflow object;
Fig. 7 shows format based on the index document flowchart that can reflux for the present invention;
Fig. 8 is the present invention's row baseline and the enclosing region synoptic diagram that comprises character image;
Fig. 9 is the present invention English capable baseline and enclosing region synoptic diagram;
Figure 10 is the capable baseline and the enclosing region synoptic diagram of Chinese character area-encasing rectangle of the present invention;
Figure 11 approaches the baseline and the average line embodiment figure of crooked row for operation parameter curve of the present invention;
Figure 12 comes the baseline of piecewise approximation row to implement illustration for the present invention uses many straight-line segments to be configured broken line;
Figure 13 implements illustration for the row alignment line of vertical setting of types document of the present invention;
But but Figure 14 implements illustration for the unsteady alignment mixing of literal type reflow object of the present invention and the non-reflow object of image type;
Figure 15 is the capable alignment line of area-encasing rectangle of the present invention and the alignment enforcement illustration of line reference;
Figure 16 comes the alignment line in the match enclosing region to implement illustration for the present invention uses straight line;
Figure 17 has the equipment inner structure schematic block diagram of display device for the present invention.
Embodiment
But both comprised reflow object such as literal zone in the layout files, also comprising chart etc. can not the reflow object zone.But enclosing region such as word in the literal paragraph or speech are regarded as the reflow object zone, and form, but zones such as figure and separatrix are non-reflow object zone.But allow reflow object in the text fragment according to the drafting that enters a new line of the size of the size of display screen, display window and the size of printing print media based on the format of the index file that can reflux.Charts etc. can not proportionally dwindle or enlarged and displayed according to screen size according to the reading order in the page in the reflow object zone.Below in conjunction with concrete embodiment technical scheme of the present invention is described.
Format based on index as shown in Figure 1 can reflux, and file is set up and the normatron connection layout of method for drafting enforcement, provides a typical computing environment can realize the present invention.Comprise can reflux file conversion server 100 and FTP client FTP 102, they intercom through network 106 mutually.The network here comprises the network such as the exchange data of LAN and wide area network.The file conversion server 100 of can the refluxing library 109 that can reflux with the format based on index simultaneously transmits data each other.In this realization form, FTP client FTP 102 can receive one or more files that reflux through network 106 from the file conversion server 100 that can reflux, and on output medium, draws.Such as drawing on the display device or on print media based on the file that can reflux of the format based on index through the read routine (such as the WEB browser) that operates in the FTP client FTP.
The Computer Architecture schematic block diagram of the file conversion server that refluxes as shown in Figure 2.The file conversion server 100 that can reflux among the figure links to each other with network 106 through network interface 200, can be through networking interface 200 transmission data, control signal, request of data etc.File conversion server 100 can transmit format based on the index file data that can reflux to network 106 through network interface 200 such as refluxing.The file conversion server 100 that can reflux also comprises processor 201 in addition, internal memory 202, and media drive 205 (read-write discs) and IO interface 206, they all interconnect through bus 208.Input equipment 207 comprises video camera, scanner, camera, duplicating machine, wand etc.Input equipment 207 links to each other with IO interface 206, and IO interface 206 links to each other with display adapter 203 simultaneously, can show the related data in the file conversion server that can reflux through display device 204.IO interface 206 can also link to each other with printing adapter in addition, is used on printed medium, drawing the file that can reflux.The IO interface 206 all right and external units here, such as keyboard, mouse, pen, touch-screen or other equipment link to each other, and are used for receiving user's input.Processor 201 is used for handling the program in the internal memory 202.Program implementation also can be by FPGA, ASIC, and hardware such as DSP are accomplished.Can also comprise digital books image file and the format of the having analyzed file that to reflux in the internal memory 202 based on index.
Internal memory 202 generally comprises RAM and ROM and permanent storage.Internal memory 202 has been stored the operation that operating system 209 is controlled the file conversion server that can reflux.Operating system 209 can make UNIX, LINUX, perhaps system such as WINDOWS.Optical character identification OCR related softwares 211 such as printed page analysis, printed page understanding have also been comprised in the internal memory 202.The OCR related software 211 here both can comprise the commercial non-commercialization that also can comprise.The document generator 210 that can reflux in the internal memory 202 has comprised program and data and has handled the digital books layout files that receives from network interface etc., and generates the file that can reflux and deliver to format based on index through bus 208 and can reflux and go in the library 109.The descriptor that document generator 210 also can be relevant with enclosing region with the text logic structural information of can refluxing in addition is transferred to other computing equipment through computer network interface 200 or network gets on.
Client-side system architecture schematic block diagram as shown in Figure 3.FTP client FTP 102 comprises processor 302, internal memory 303, and display adapter 304 links to each other with display device 305, computer-readable media driver 306, IO interface 307, input equipment 308 and network interface 309.
Operating system 311 and read routine 312 wherein in internal memory 303, have been stored, such as WEB browser or special-purpose read routine etc.The processor 302 here links to each other with display adapter 304 with read routine 312, according to the size of display device 305, obtains format based on the index file that can reflux through bus 310 from network interface 309; Or from refluxed library 109, obtain format based on the index file that can reflux based on layout files.And on display or display window, draw adaptively.
Layout files conversion of page as shown in Figure 4 becomes the document flowchart that can reflux of the format based on index, is a typical implementation method of the document generator 210 that can reflux, in order to convert format numeral book document to based on image refluxed file.
This method is at first read in format numeral book document in 401 steps.The form of the format numeral book document in this step has no particular limits, such as comprising forms such as JPEG, TIFF, GIF, BMP, PDF and CEB.Format numeral book document can generate through imaging devices such as scanners, also can change and generate through program.Perhaps generate or obtain through calling third-party module invokes.
In step 402, but the document generator 210 that can reflux obtains recirculation zone and Fei Ke recirculation zone in the format numeral books page through methods such as printed page analysis and printed page understandings.And calculate an enclosing region for each section object in the digital book document.But surround the also not restriction of shape of the enclosing region of each reflow object, such as rectangle, circle, curve, ellipse, triangle or more complicated polygonal shape etc.
In step 403, but the document generator 210 that can reflux is directed against the reflow object of the literal type of going together in the original digital book document, but calculates the aligned together line of colleague's literal reflow object.The alignment line of every here style of writing word can be the baseline (like Fig. 8, Fig. 9 and shown in Figure 10) of this journey literal, also can be the average line of this journey literal or the upper limb line of this journey literal etc.If but colleague's literal type reflow object is not on same straight line in the originally digital book document, this alignment line also can be a parametric curve (shown in figure 11) so, or many broken lines (shown in figure 12) that line segment combines.Be to describe for example according to row here, for the literal of arranging by row in the original document, need the calculated column alignment line, the alignment line (like Figure 13) on alignment line (like Figure 13) or right side that can make in the same column literal each literal left side is as the row alignment line.
In step 404; The document generator 210 that can reflux calculates the coordinate position of each enclosing region in original digital book document; If this enclosing region belongs to the enclosing region that can reflux, calculate this relative position of enclosing region that can reflux simultaneously with respect to its alignment line.
In step 405, calculate the size of each enclosing region.The document generator 210 that can reflux can use different mathematical models to calculate the size of enclosing region, such as length and the wide size that defines enclosing region of using boundary rectangle, also can use other extraneous polygonal geometric distances to represent.The position of enclosing region can use absolute coordinates to represent, also can use with respect to the relative coordinate of previous packet region in colleague or the same column and represent.Specifically ask for an interview following explanation about Fig. 8, Fig. 9, Figure 10, Figure 11 and Figure 12.
In step 406; The document generator 210 that can reflux is noted the format document logical structure; And each enclosing region set up an index; To each index, the relative position of its alignment line corresponding with it also will be write down with size, for the enclosing region that can reflux in the position of noting the corresponding enclosing region of this index.Also to note precedence relationship in addition, can obtain this precedence relationship through methods such as printed page analysis and printed page understandings according to the reading priority ranking between enclosing region.For the Rankine-Hugoniot relations between enclosing region, like the interregional unsteady alignment relation of character area and Fei Ke backflow image/graphics that can reflux, also can use and record and narrate based on the mode of index, the concrete example of recording and narrating is referring to the explanation about Fig. 5.But for the indentation relation of recirculation zone, also can use based on the mode of index and describe, description method is similar to the description of literal indentation, only substitutes literal with corresponding index.Do not limit the grammer and the descriptive language of record here.Such as using the XML form to write down format document logical structure and relative index information.Perhaps the descriptive language with PDF carries out record.Concrete example is referring to Fig. 5 and Fig. 6.
Format based on the index as shown in Figure 5 file that can reflux is implemented illustration.The 501st, original layout files, 502 point to Title area.504 use the XML language to describe original layout file logical structure, and the expression such as for title division comes permutation index according to reading order, in 503, makes index of reference represent title.Wherein the content of index also can use language such as XML to describe.But owing to must specify the reading priority ranking of each recirculation zone and non-recirculation zone; Utilize the relevant knowledge of the space of a whole page to carry out printed page analysis and printed page understanding; Reading order by appointment carries out interregional topology ordering; Use directed acyclic graph (DAG) to wait and representes interregional reading precedence relationship, also can use file such as XML to write down the priority ranking relation of reading.In Fig. 5, the 505th, the separatrix, belonging to can not reflow object, according to the size of output medium, can select to represent the separatrix or not represent the separatrix.506,507,508,509 and 510 have indicated each interregional reading order.Through using directed acyclic graph to come the reading order between posting field; And further through the index preferential reading order between letter symbol that comes to reflux in the specify text paragraph; Read etc. such as reading from left to right or turning left, come the unique drawing order of each symbol on the output medium of different size of confirming from the right side.The record of drawing order of priority here can use the method for demonstration to come record, also can impliedly represent the order of priority drawn according to the sequencing of traversal in note language (such as XML).
Except formulating the reading priority ranking between enclosing region, also to add record for the specific format arrangement between enclosing region.Such as indentation for text fragment, on the different size output medium, also to arrange the pel that can reflux according to indentation, therefore to carry out record for the indentation style of character area.For left-justify, middle alignment and right-aligned style also will be carried out record in addition.
For the literal paragraph need with reference to image/graphics float the alignment the space of a whole page; But but illustration is implemented in the mixing of literal type reflow object shown in figure 14 and the non-reflow object of image type; Need draw effect so that on the output medium of different size, realize the corresponding alignment of floating through specifying to reflux in the literal paragraph character area and the interregional unsteady alignment relation of Fei Ke backflow image/graphics.The physical record mode is such as using the align attribute in the similar html language to realize.
<p>
<img src=" rainbow.jpg " alt=" photo " align=" right " >
The image-region character area alignment of floating that keeps left of keeping right, can reflux
<br>
Can reflux character area from the left side registration image zone.
But in the present invention, will directly not use literal to be embedded in the note language, but and be to use the index of the character area that can reflux to represent corresponding literal paragraph zone Chinese words type recirculation zone.
Description about space of a whole page pattern is expressed; Except can directly in space of a whole page logical organization is expressed, writing down simultaneously; Can also use similar CSS CSS, XSLT methods such as (Extensible Stylesheet Language Transformations) writes down space of a whole page pattern.Certainly using XSLT and CSS to write down pattern also is to use index to carry out the description of space of a whole page style.
But reflow object as shown in Figure 6 is implemented illustration.In 601, each index enclosing region coordinate data, but can obtain the pixel that reflow object comprises through these data of index.Write down the boundary profile information of each enclosing region in 602,, can represent enclosing region through writing down four summits, shown in 602 if enclosing region is a quadrilateral.The 502nd, a part of zone-Title area of originally digital books image; Like this through describing and the index content description based on the document logical structure of index; But original digital books image can be cut into a lot of little reflow object images; But only need just can obtain the pixel in the correspondence recirculation zone in the original format image, but so that can on the different size screen terminal, set type each reflow object adaptively again through index.Method for expressing among Fig. 6 also can use the descriptive language that defines among XML or the PDF to represent.In Fig. 6, do not show the preferential reading order between assigned indexes in addition, but come impliedly to represent the priority ranking between index through the sequencing of traversal
Demonstration as shown in Figure 7 is based on the format of the index document flowchart that can reflux, and in order to adapt to the different screen size display digit books of dynamically setting type, needs to convert the space of a whole page that is easy to read based on the layout file logical structure of index to.If layout file logical structure described in language such as use XML, can select to use XSLT, patterns such as CSS are described and are converted the XML logical structure description to suitable new space of a whole page form.Such as can XML being converted into forms such as HTML through XSLT.Be convenient to read and use.
Use Fig. 8, Fig. 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 15 and Figure 16 that notions such as alignment line, datum line are described below.
Capable baseline as shown in Figure 8 with comprise the enclosing region synoptic diagram of character image, for the format numeral book document that receives, can be through the printed page analysis in 211, the program that printed page understanding is relevant calculates capable alignment line, as 801 being the row baseline among the figure.Can also calculate the different big or small rectangles relevant can reflux enclosing region 802 and 803 with this journey baseline 801.
The capable baseline of English as shown in Figure 9 and enclosing region synoptic diagram; Wherein 900 is baseline; 901,902,903,904,905 be area-encasing rectangle; In order to reduce the information of record enclosing region position and big or small required cost, reduce record with the needed data volume of record enclosing region in the delegation through the difference pulse code modulation (PCM).Adopting the differential coding method of difference pulse code modulation (PCM) is not directly to write down the coordinate figure of enclosing region, but through calculating the difference value of record in abutting connection with coordinate figure between enclosing region, improves compressibility.
y=m*x+b
y3=m*(x3-x1)+y1+?△y3
y4=m*(x4-x2)+y2+?△y4
y5=m*(x5-x3)+y3+?△y5
y6=m*(x6-x4)+y4+?△y6
Calculate in abutting connection with the difference value of Y coordinate between enclosing region exactly and encode.
The main cause that adopts the difference pulse code is because the difference value of Y coordinate is littler than initial value mostly, to the difference required figure place of encoding, can compare the initial value required figure place of encoding and reduce a lot.For example difference is 5, and its binary form indicating value is 101, if difference is-5, then changes positive integer 5 earlier into, becomes 1 complement to get final product its Binary Conversion again.So-called 1 complement is 0 with each Bit as if value exactly, just makes 1 into; Bit is 1, then becomes 0.The figure place that difference 5 should keep is 3, on this basis through using Huffman constant entropy encryption algorithm to compress.Such as can find out through the form of calculated in advance difference the Bit number that should keep and corresponding code value.For example difference is that 5 (101) figure place is 3, and corresponding Huffman code value is 100, and then both link together and can be recorded as 100101.Be to give an example with row to describe, but the method that also can use the present invention to propose for the reflow object of arranging by row in the original document reduce the record data amount here.
The capable baseline of Chinese character area-encasing rectangle shown in figure 10 and enclosing region synoptic diagram, but for the situation of the most of unanimities of the enclosing region of reflow object size, the method that has proposed suitable run length coding, RLC compression reduces the recorded information amount.For a large amount of format documents that occur of Chinese characters such as Chinese,, there is the height variation different with english font because Chinese character has the structure of Chinese characters; The interregional respective coordinates point of Chinese character area-encasing rectangle can use straight line 1000 to approach; Such as the equation according to straight line can use following formula from (x1, y1) calculate (x3, y3); (x5, y5).Wherein △ y3 and △ y5 approximate 0.
y=m*x+b
y3=m*(x3-x1)+y1+?△y3
y5=m*(x5-x3)+y3+?△y5
In like manner can be by (x2 calculates y2) that (x4, y4), (x6, y6) this condition dopes in abutting connection with Chinese character area-encasing rectangle zone 1001,1002,1003 apex coordinate on same straight line 1000.
y4=m*(x4-x2)+y2+?△y4
y6=m*(x6-x4)+y4+?△y6
Since △ y3 here, △ y5, and △ y4 and △ y6 approximate 0, can compress the redundance that (run-length coding) reduce raw data through the use run length coding, RLC.It or not the coordinate figure that directly writes down enclosing region; But,, then write down the length that difference value is bordering on 0 the distance of swimming if continuous some difference values all approach 0 through calculating the difference value between prediction coordinate figure and actual coordinate value; And this length encoded, to improve compressibility.
The selection of concrete compression algorithm can be selected based on can the reflux big or small statistical property of enclosing region of the type of language or literal type.For highly consistent colleague or the same column enclosing region that can reflux, use method such as run length coding, RLC compression can improve compressibility; And, can select methods such as difference pulse code modulation (PCM) to compress for the more colleague of height change or the same column enclosing region that can reflux.
Operation parameter curve shown in figure 11 approaches the baseline and the average line embodiment figure of crooked row; Because the row alignment line is not point-blank; Therefore adopt parametric curve to approach alignment line; Parametric curve 1101 expression average lines, parametric curve 1102 expression baselines adopt parametric curve can reduce the record enclosing region coordinate data volume required with respect to the skew of alignment line.
Many straight-line segments of use shown in figure 12 are configured broken line and come the baseline of piecewise approximation row to implement illustration; Can also use many straight lines 1201,1202,1203 to come piecewise approximation row alignment line, so that reduce the record enclosing region coordinate data volume required with respect to the skew of alignment line.
But but patent document 4 all writes down a reflow object baseline for each reflow object, needs a lot of data volumes of record, technological different with among the present invention.The present invention's technology does not get baseline owing to only need record row alignment line or row alignment line but need not write down each reflow object, can significantly reduce the required data volume of record.But the while is for the record of reflow object enclosing region in the colleague/same column; The method that employing comes the coordinates computed difference value based on the data of row/row alignment line and last enclosing region; Further significantly reduce the required data volume of record; The alignment line of vertical setting of types document shown in figure 13 is implemented illustration, and wherein 1301 for the left hand edge of the literal object that can reflux approaches the left alignment line of formation through aligning, and 1302 right hand edges for the literal object that can reflux approach the right alignment line of formation through aligning; Concrete computing method can be referring to printed page analysis, the rudimentary algorithm of printed page understanding and optical character identification.
The demonstration that Fig. 7 according to the present invention provides is a typical implementation method of read routine 312 based on the format of the index document flowchart that can reflux, in order on the output medium of different size, to draw format based on the index file that can reflux.At first read a refluxed file in step 701 based on format.In this refluxed file based on format, but each reflow object defines and is directed to the sub regions in the original format document with the form of enclosing region.The size of each enclosing region and position all are defined in the file that can reflux.But in addition for the reflow object of literal type; The alignment line of affiliated row/row also defines in based on the refluxed file of format; Reading priority ranking between each index and the format technical information of describing based on index also are read into such as literal indentation or the literal object information such as unsteady alignment with respect to chart.
In step 702, obtain the size and the information such as shape and zooming parameter of output medium.
In step 703, confirm current initial display position, but, show through the mode of amplifying or dwindle as if current display object right and wrong reflow object.But current display object is the reflow object of literal type, and perhaps the style of literal and illustration alignment mixing need be confirmed line-spacing and confirm that according to width of going and the size of intending the demonstration enclosing region every row plan shows the number of enclosing region.Width such as given display width and each enclosing region can calculate each row and can show how many enclosing region.
In step 704, confirm the horizontal level of each enclosing region in the row.But for the literal recirculation zone need with reference to the image/graphics zone float the alignment the space of a whole page; But but illustration is implemented in the unsteady alignment mixing of literal type reflow object shown in figure 14 and image type reflow object; Through line width being deducted the shared width of figure/image, obtain the developed width of the enclosing region that each row can hold.Calculate the number of the enclosing region of confirming that every row plan shows then according to this developed width.And the horizontal level of interior each enclosing region of further definite row.For the selection of alignment thereof, when selecting alignment thereof, can select left-justify, Right Aligns and middle alignment in addition.Selecting right-aligned the time, is the Right Aligns object of reference of literal reflow object with the left hand edge 1401 of the profile of image/chart.
If there is the situation that needs indentation to show in the row, also need display width be deducted the indentation width and obtain developed width, confirm the number of the enclosing region that every row plan shows then according to this developed width.And the horizontal level of interior each enclosing region of further definite row.
In step 705, need confirm that also each enclosing region is with respect to the vertical shift of drawing line reference.For the row alignment line is the situation of straight line; Illustration is implemented in the capable alignment line of area-encasing rectangle shown in figure 15 and the alignment of line reference; 1501 are the row alignment line; 1504 is line reference, aligns with line reference 1504 through aiming at line segment 1502 to the row in the enclosing region 1500, according in step 704, calculating the lateral excursion of confirming; Calculate sense of rotation and the vertical misalignment of this enclosing region 1500, and use the parameter of calculating on output medium, to draw this enclosing region 1503 with respect to line reference; For row alignment line non-rectilinear situation; Use straight line shown in figure 16 comes the alignment line in the match enclosing region to implement illustration; At first use straight line to come that section non-linear alignment line 1602 in each enclosing region 1601 of match; And straight line and line reference after using preceding method with match align, so that determine sense of rotation and the skew of this enclosing region with respect to line reference, with this enclosing region of drafting on output medium.
In step 706, according to the coordinate position anyhow of the enclosing region that calculates, the anglec of rotation and scaling wait draws enclosing region.
Above-mentioned description about the file read routine 312 that can reflux is to be the explanation that example is carried out with the digital book document of writing across the page; For the vertical digital book document of writing; Through calculating the vertical shift of enclosing region in row; And further confirm sense of rotation and the horizontal-shift of enclosing region with respect to the row datum line, the drafting coordinate figure of this enclosing region can be confirmed uniquely, and then the format document of vertical setting of types can be shown with the mode that can reflux.
Figure 17 provides another kind of embodiment of the present invention.The equipment inner structure schematic block diagram that has display device shown in figure 17, equipment 1700 internal memories 1702 that have a display device have been stored operating system 1709 and have been controlled the document generator 1710 that can reflux.Operating system 1702 can make UNIX, LINUX, perhaps system such as WINDOWS.Also comprised printed page analysis in the internal memory 1702, optical character identification such as printed page understanding (OCR) related software 1711.The optical character identification here (OCR) related software 1711 both can comprise the commercial non-commercialization that also can comprise.Refluxed document generator in 1710 has comprised program and data to be handled the digital books layout files that receives from network interface 200 or from input equipment 1707, obtains digital books layout files through IO interface 1706; Or read in digital books layout files, and generate can reflux file and deliver to format based on index and can reflux and go in the library 1709 of format based on index through media drive 1705.The transmission of above-mentioned data transmits through bus 1708.In internal memory 1702, also stored read routine 1712, such as WEB browser or special-purpose read routine etc.The processor 1701 here links to each other with display adapter 1703 with read routine 1712, draws on display or display window according to the size adaptation ground of the size of display device 1704 or window to show the file that can reflux.The execution of optical character identification (OCR) softwares 1711 such as document generator 1710, printed page analysis and printed page understanding, the read routine 1712 of here can refluxing also can be by FPGA, ASIC, and hardware such as DSP are accomplished.
This embodiment is applicable on camera, scanner, all-in-one or portable terminal etc. have the equipment of display terminal and computing function and implements.

Claims (8)

1. the format based on index file method for building up that can reflux is characterized in that, comprises following concrete steps:
1) set up format based on the index file-storage device that can reflux, the file conversion server that can reflux reads in layout files;
2) but the file conversion server that can reflux calculates an enclosing region to each reflow object in the layout files and Fei Ke reflow object;
3) but to the reflow object of colleague in the layout files or same column, but calculate the alignment line of colleague or same column reflow object;
4) calculate each enclosing region coordinate position in, but calculate the relative position of each recirculation zone simultaneously with respect to row or row alignment line under it to layout files;
5) calculate the size of each enclosing region;
6) each enclosing region is set up an index; To each index; Note the position and size, each relative position of enclosing region and its alignment line that can reflux of the corresponding enclosing region of index, and make index of reference represent reading priority ranking and Rankine-Hugoniot relations and the indentation information of the literal enclosing region that can reflux between layout files page logic structure, each enclosing region;
7) the file conversion server that can reflux generates can reflux file and be stored into format based on index and can reflux in the file-storage device of format based on index; The file conversion server that can reflux transmits data mutually with the file-storage device that can reflux based on the format of index, and reading system reads in the format based on index in the file conversion server that maybe can reflux in the file-storage device that can the reflux file that can reflux and is used for showing.
2. according to the said format of the claim 1 file method for building up that can reflux based on index; It is characterized in that; Layout files described in the said step 1) can generate through the scanner imaging device, also can change and generate through program, perhaps obtains or generates through the calling module file.
3. according to the said format of the claim 1 file method for building up that can reflux, it is characterized in that said step 2 based on index) described in enclosing region can be rectangle, circle, curve, ellipse, any definable geometric configuration of triangle.
4. according to the said format of the claim 1 file method for building up that can reflux based on index; It is characterized in that; But but but but the alignment line of colleague described in the said step 3) or same column reflow object can be the literal type reflow object baseline, can be the average line of literal type reflow object, also can be the coboundary line or the lower limb line of literal type reflow object in the row; But can be the left hand edge line or the right hand edge line of literal type reflow object in the row, also can be parametric curve.
5. according to the said format of the claim 1 file method for building up that can reflux, it is characterized in that, calculate the size of each enclosing region described in the said step 4), available mathematical model definition based on index.
6. according to the said format of the claim 1 file method for building up that can reflux, it is characterized in that calculate each enclosing region coordinate position in to layout files described in the said step 5), available absolute coordinates is represented based on index; Also can be according to going linguistic property or big or small statistical property interior or the interior enclosing region of row; Through calculating the difference value of prediction coordinate figure and actual coordinate value; And select the high coding method of compressibility that this difference value sequence is compressed, with the position of definition enclosing region.
7. according to the said format of the claim 1 file method for building up that can reflux based on index; It is characterized in that; Reading priority ranking described in the said step 6) between each enclosing region is meant according to the reading order of former format agreement confirms interregional reading priority ranking, but the Rankine-Hugoniot relations between said each enclosing region is meant the unsteady alignment relation between recirculation zone and Fei Ke recirculation zone.
8. the format based on index file method for drafting that can reflux is drawn the format of the tape index file that can reflux, and it is characterized in that, comprises following concrete steps:
1) reads the refluxed file of tape index; Index comprises each enclosing region size and position in the file that can reflux; The relative position of each can reflux enclosing region and its alignment line, and reading priority ranking and Rankine-Hugoniot relations between space of a whole page logical organization and each enclosing region;
2) obtain size, shape, the zooming parameter that the FTP client FTP output medium is set type;
3) confirm initial display position, draw the different drafting mode of enclosing region type selecting, but, amplify or dwindle and draw for non-reflow object according to waiting;
4) but for reflow object; Confirm line-spacing; And according to the width of row and the size of plan demonstration enclosing region; Confirm that every row shows the number of enclosing region; If need indentation to show in the row; Need that display width is deducted the indentation width and obtain developed width; And confirm the number of the enclosing region that every row shows according to this developed width; If existence will show illustration in the row; Need to go the interior illustration width of row to obtain developed width the display width reduction, and confirm the number of the enclosing region that every row shows according to this developed width; Confirm the row distance; And according to the height of row and the size of plan demonstration enclosing region; Confirm that every row show the number of enclosing region; If need indentation to show in the row; Column memory needs that demonstrations highly deducted indentation and obtains actual height, and confirm the number of the enclosing region of every row demonstration, if will show illustration according to this actual height; Need that demonstration is highly deducted the interior illustration height of row and obtain actual height, confirm the number of the enclosing region that every row show then according to this actual height;
5) confirm the lateral coordinates position of each enclosing region in the row or the along slope coordinate position of interior each enclosing region of row;
6) confirm the anglec of rotation and the side-play amount of each enclosing region with respect to corresponding datum line;
7) draw the file that to reflux according to coordinate position, the anglec of rotation, side-play amount and the scaling of the enclosing region that calculates.
CN201210299088.2A 2012-08-22 2012-08-22 Index-based format returnable file establishing and drawing method Expired - Fee Related CN102841941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210299088.2A CN102841941B (en) 2012-08-22 2012-08-22 Index-based format returnable file establishing and drawing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210299088.2A CN102841941B (en) 2012-08-22 2012-08-22 Index-based format returnable file establishing and drawing method

Publications (2)

Publication Number Publication Date
CN102841941A true CN102841941A (en) 2012-12-26
CN102841941B CN102841941B (en) 2015-04-29

Family

ID=47369304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210299088.2A Expired - Fee Related CN102841941B (en) 2012-08-22 2012-08-22 Index-based format returnable file establishing and drawing method

Country Status (1)

Country Link
CN (1) CN102841941B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268340A (en) * 2013-05-21 2013-08-28 龚如宾 Format reflowable file establishing and drawing method based on hierarchical index
CN105302626A (en) * 2015-11-09 2016-02-03 深圳市依伴数字科技有限公司 Analytic method of XPS (XML Paper Specification) structural data
CN103853849B (en) * 2014-03-28 2017-01-11 龚如宾 Method for establishing and drawing high-compression reflowable file
CN107885863A (en) * 2017-11-21 2018-04-06 湖北大学 Representation of Map Symbols method and system based on body
JP2019016236A (en) * 2017-07-07 2019-01-31 インターマン株式会社 Character string image display method
CN114495147A (en) * 2022-01-25 2022-05-13 北京百度网讯科技有限公司 Identification method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020057281A1 (en) * 2000-11-10 2002-05-16 Jun Moroo Image display control unit, image display control method, image displaying apparatus, and image display control program recorded computer-readable recording medium
CN101536075A (en) * 2006-03-29 2009-09-16 亚马逊科技公司 Generating image-based reflowable files for rendering on various sized displays
US20100238474A1 (en) * 2009-03-17 2010-09-23 Konica Minolta Business Technologies, Inc. Document image processing apparatus, document image processing method, and computer-readable recording medium having recorded document image processing program
CN102222059A (en) * 2011-06-14 2011-10-19 汉王科技股份有限公司 Method, device and system for realizing multi-format information display of electronic reader

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020057281A1 (en) * 2000-11-10 2002-05-16 Jun Moroo Image display control unit, image display control method, image displaying apparatus, and image display control program recorded computer-readable recording medium
CN101536075A (en) * 2006-03-29 2009-09-16 亚马逊科技公司 Generating image-based reflowable files for rendering on various sized displays
US20100238474A1 (en) * 2009-03-17 2010-09-23 Konica Minolta Business Technologies, Inc. Document image processing apparatus, document image processing method, and computer-readable recording medium having recorded document image processing program
CN102222059A (en) * 2011-06-14 2011-10-19 汉王科技股份有限公司 Method, device and system for realizing multi-format information display of electronic reader

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268340A (en) * 2013-05-21 2013-08-28 龚如宾 Format reflowable file establishing and drawing method based on hierarchical index
CN103268340B (en) * 2013-05-21 2016-08-10 龚如宾 Format reflowable file based on hierarchy type index is set up and method for drafting
CN103853849B (en) * 2014-03-28 2017-01-11 龚如宾 Method for establishing and drawing high-compression reflowable file
CN105302626A (en) * 2015-11-09 2016-02-03 深圳市依伴数字科技有限公司 Analytic method of XPS (XML Paper Specification) structural data
JP2019016236A (en) * 2017-07-07 2019-01-31 インターマン株式会社 Character string image display method
CN107885863A (en) * 2017-11-21 2018-04-06 湖北大学 Representation of Map Symbols method and system based on body
CN114495147A (en) * 2022-01-25 2022-05-13 北京百度网讯科技有限公司 Identification method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN102841941B (en) 2015-04-29

Similar Documents

Publication Publication Date Title
JP2006350867A (en) Document processing device, method, program, and information storage medium
JP4181892B2 (en) Image processing method
CN102841941B (en) Index-based format returnable file establishing and drawing method
CN100356372C (en) Generating method of computer format document and opening method
CN103268340B (en) Format reflowable file based on hierarchy type index is set up and method for drafting
CN1859541B (en) Image processing apparatus and its control method
US8520006B2 (en) Image processing apparatus and method, and program
US8584932B2 (en) Information input/output apparatus, information processing apparatus, information input/output system, printing medium, and information input/output method
WO2017040652A1 (en) Method and system for annotation and connection of electronic documents
US20050278624A1 (en) Image processing apparatus, control method therefor, and program
JP4208780B2 (en) Image processing system, control method for image processing apparatus, and program
JP2006023945A (en) Image processing system and image processing method
CN101443790A (en) Efficient processing of non-reflow content in a digital image
Ferilli Automatic digital document processing and management: Problems, algorithms and techniques
CN115659917A (en) Document format restoration method and device, electronic equipment and storage equipment
JP2018019300A (en) Image formation device, document digitization program, and document digitization method
JP6128898B2 (en) Information processing apparatus, control method for information processing apparatus, and program
CN107666550B (en) Image forming apparatus and document electronization method
JP2022092119A (en) Image processing apparatus, image processing method, and program
JP2013020477A (en) Image processing apparatus and program
JP2007129557A (en) Image processing system
US20090046322A1 (en) Information processing apparatus, image forming apparatus, print-data generation method, map-information generation method, and computer program product
JP2011118818A (en) Image processing device
JP6780380B2 (en) Image processing equipment and programs
CN103853849A (en) Method for establishing and drawing high-compression reflowable file

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20150429

Termination date: 20190822

CF01 Termination of patent right due to non-payment of annual fee