CN1335966A - Invisible encoding of attribute data in character based documents and files - Google Patents

Invisible encoding of attribute data in character based documents and files Download PDF

Info

Publication number
CN1335966A
CN1335966A CN00801713A CN00801713A CN1335966A CN 1335966 A CN1335966 A CN 1335966A CN 00801713 A CN00801713 A CN 00801713A CN 00801713 A CN00801713 A CN 00801713A CN 1335966 A CN1335966 A CN 1335966A
Authority
CN
China
Prior art keywords
text
sequence
message
coding
character code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN00801713A
Other languages
Chinese (zh)
Inventor
K·T·埃恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of CN1335966A publication Critical patent/CN1335966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Messages that contain text elements and attributes that affect the display of the text elements are encoded as a plain-text message followed by a list of the changes to the plain-text message to effect the enhanced display of the plain-text message. By segregating the plain-text from the attributes associated with the text elements, all text applications are able to display an undisturbed copy of the text. The control and formatting attributes are appended to the plain-text, so that the direct display of the initial portion of the message is an immediately readable version of the text. Additionally, the control and formatting information may be encoded using 'invisible' sequences of characters, such as space, backspace, tab, etc., or as a sequence of visible characters and corresponding invisible characters that have the effect of erasing the visible characters from view. By invisibly encoding the tag elements, the direct display of the message will appear as a plain-text message, because the tag elements will either be self-erasing, or appended to the plain-text message as 'invisible' white space.

Description

Based on the document of character and the hidden code of the attribute data in the file
Background of invention
1. invention field
The present invention relates to field of information processing, relate in particular to the coding of the information in the electronic version of document and file.
2. DESCRIPTION OF THE PRIOR ART
Along with the information Methods for Coding having been become higher performance of permission and efficient, the possibility incompatible with prior art system increased.Adopted some technology and standard to make this incompatibility minimize, but still left the product of leaving over that a class generated before adopting this technology and standard.
A kind of compatible successful standard that has had suitable high level is MIME (the internet mail of multi-usage an expands) form.Adopt the MIME form, can provide compatible by message being carried out twice coding: coding is " plain text (plain-text) " for the first time, and coding is " rich text (rich-text) " for the second time.As what its title implied, the plain text coding is the coding that under the situation of any control routine of the demonstration that does not influence these characters or mark all the printable text characters in the message is carried out, and the rich text form comprises the control routine that shows the attribute, for example black matrix, italic, underscore, color, font size, font type and other attributes that interrelate with printable text character.The MIME formatted file comprises two kinds of codings to message.When application program is opened a MIME formatted file when checking, which kind of coding is application program determine to use according to the performance of its performance or its operated system thereon.If application program is for example supported black matrix or italic font, then adopt the rich text form accurately to be reflected in the demonstration of black matrix in the origination message or italicized character.On the contrary, if application program or system can not show black matrix or tilted letter, then show the plain text coding.
In order to provide compatible between the equipment of equipment that allows rich text and permission plain text, the MIME formatted file only is made up of the printable character code.Mark in the origination message or control routine are left in the basket in the plain text coding of message, are encoded as many group unique strings in the rich text form.Fig. 1 has shown the coding of message 100 to plain text format 110 and rich text form 120.Fig. 2 has shown and has not only comprised plain text format 110 ' but also comprise rich text form 120 ' and the synthetic MIME formatted file 200 of the MIME dedicated control information of description document content, Doctype or the like.Rich text form 120 ' comprises the control information 121,122 of determining how text element occurs when showing, is in this example to draw 122 o'clock in beginning " black matrix " drafting 121 and end " black matrix ".In order to quote easily, the information outside the plain text information is called " attribute " jointly.When the application program of supporting black matrix, italic and underscore is handled MIME formatted file 200, it will handle rich text form 120 ', and message shown or that printed will occur with the origination message 100 similar forms with Fig. 1.If application program or system do not support black matrix, italic and underscore, then application program will be handled plain text format, and message shown or that printed will occur with the plain text format 110 similar forms with Fig. 1.
Yet it is can compatible MIME that the above-mentioned appropriate display of MIME formatted file 200 or printing presuppose application program.That is to say, presuppose application program and can discern MIME specific information 201,202,203, and select suitable coding 110 ', 120 ' to process and display.Yet, the initial part 201 that the application program of incompatible MIME can not identify file is MIME titles, can not identify center section 202 and be at encode MIME separator between 120 ' of plain text coding 110 ' and rich text, can not identify latter end 203 is MIME footnotes.For the application program of incompatible MIME, MIME formatted file 200 occurs as just the text of routine.Show or the MIME formatted file 200 printed will be to occur with form like the image class of the MIME formatted file 200 of Fig. 2 by such application program.That is to say that all MIME specific informations 201,202,203 will be as a part and the plain text format 110 ' and the appearance of rich text form 120 ' information of displayed document.A kind of so directly being presented at of MIME formatted file 200 visually do not have attractive force, and usually is unrecognizable for the user of the primitive form of being unfamiliar with formative computer documents.
Summary of the invention
An object of the present invention is to provide a kind of with text message and attribute to the message Methods for Coding, make no matter be used for the demonstration how ability of the application program of display message can both read easily to message.Another purpose of the present invention is to eliminate the needs that one text information are encoded into two kinds of different-formats.Another purpose of the present invention provides how text message shows with influence or the separating of the attribute of print text information.
These purposes and other purposes realize in two ways.In first method, the message coding that will comprise the text element and the tagged element of the demonstration that can influence text element is a plaintext message, and it follows the tabulation to the change of plaintext message subsequently, so that the enhancing demonstration of plaintext message is worked.By with plain text with and attribute that text element interrelates separate the original state copy that all text application can both videotex.In a most preferred embodiment, control and format attribute are added on the plain text, make that the direct demonstration of initial part of message is the readable version immediately of text.Irrelevant with first method or with second method that it combines in, with " stealth " sequence of character to controlling and formatted message is encoded.In one embodiment, adopt unique sequence of stealthy character, for example space, backspace, tabulation etc. to come each unique mark is encoded.In another embodiment, tagged element is encoded as character visible and has the stealthy character of the effect of the character visible of wiping, the sequence of for example backspace accordingly.By tagged element is carried out hidden code, the direct demonstration of message will occur with the form of plaintext message, because tagged element or wipe certainly, or append on the plaintext message as " stealth " blank.
Brief Description Of Drawings
Explain the present invention in more detail with exemplary forms below with reference to the accompanying drawings, in the accompanying drawings:
Fig. 1 shown prior art to the plain text of the document that comprises text element and tagged element and an example of rich text coding.
Fig. 2 has shown an example to the MIME coding of the document that comprises text element and tagged element of prior art.
Fig. 3 shown according to one aspect of the present invention to comprising text element and with an example of the document coding of the tagged element of text element grouping.
Fig. 4 A-4C has shown an example according to the hidden code of the tagged element of another aspect of the present invention.
Fig. 5 shown according to of the present invention to comprise text element and comprise text element the tagged element of trooping document coding example and to another example of the hidden code of tagged element.
Fig. 6 has shown according to embark on journey example of hidden code of the tagged element to document of the present invention.
Fig. 7 has shown according to an illustration block scheme that is used for the scrambler of document coding of the present invention.
Fig. 8 has shown according to an illustration process flow diagram that is used for document coding of the present invention.
Fig. 9 has shown an illustration block scheme that is used for the document of coding is carried out decoders for decoding according to of the present invention.
In the accompanying drawings, identical label is represented similar or function corresponding or feature.
Detailed description of the invention
Fig. 3 has shown an example according to the coding of the document 100 of one aspect of the present invention.As shown in the figure, the document 300 of coding comprises plain text part 310 and mark part 320.Plain text part 310 be document 100 content of text extraction or troop and the attribute that interrelates with text that does not have the appearance that influences text.That is to say that all letters, numeral, symbol, punctuate etc. are all from input document 100 direct codings; In most of the cases, document 100 will be an electronic form, only be to adopt identical characters code, for example ASCII in the electronic form that is included in document 100 that text is sent to coded document 300 from document 100 to the coding of text item.
Mark part 320 is extractions of each tagged element in the document 100, and the skew or the position that interrelate with tagged element.By each mark being added to the locational text 310 of relative offset in the coded document 300, regenerate document 100 with skew.For example, the speech " black matrix " the 101st that the 33rd to the 36th character position in input document 100 occurs occurs with the black matrix type.In order to realize this black matrix form, just adding " black matrix-beginning " attribute before the 33rd character position, just adding " black matrix-end " attribute after the 36th character position.Shown mark part 320 comprises numeral " 32 " 340 and letter " B " 345, and expression black matrix-beginning (" B ") just will be added in the plain text 310 after the 32nd (" 32 ") character position.Equally, mark part 320 comprises numeral " 36 " 350 and alphabetical sequence "/B ", and expression black matrix-end ("/B ") will be added in the plain text 310 after the 36th (" 36 ") character position.Therefore, " the 32 B 36/B " clauses and subclauses in the mark part 320 provide enough information to be used for the speech " black matrix " of plain text 310 is carried out the black matrix drafting.By coming with reference to its position of in plain text 310, being wanted similarly to each attribute flags (" I " italic, " UL " underscore) coding subsequently.The tagged element of other types, for example the UTML of hypertext link is quoted equally be encoded.If the content of tagged element shown by routine, for example filename of specific reference or IP address, then these contents remain plain text 310.If the content of tagged element do not show by routine, quoting of the reference that inside is produced for example, similar with section 202 in the illustration document of Fig. 2, then it encode in mark part 320, and not with plain text 310 appearance.
Be noted that by each mark being referred to the position in the plain text 310, then do not need repeated text.Comprise under the situation of the text more much more than mark in representative document of hypothesis, the demand of then eliminating repeated text is compared with the size of the MIME formatted file of equivalence on the size that adopts formatted file of the present invention sizable minimizing is provided.It is also noted that in a most preferred embodiment, plain text 310 at first appears in the formatted file 300.Like this, directly the content of display format file 300 leaves over application program will provide significant and be easy to read to the text of message 100 reproduction 310.All label informations appear at the text ending, can be ignored by the user.
" { changes: " 321 makes that the decoding of formatted file 300 is easier, the end of this mark part delimiter identification plain text 310 and the beginning of mark part 320 by usage flag part delimiter.Go out this predetermined delimiter with the application identification of the format compatible of formatted file 300, the information interpretation with the back is that mark-position one mark-information is right then." { changes: " appears at here just for illustrative purpose to the specific selection of the character of marker field delimiter 321.In a most preferred embodiment, select one very may be unique character string; That is, has the sequence that does not appear at the high likelihood in the plain text 310 simultaneously, for example " qx74gh#$6^2 ".Perhaps, can release the mark part delimiter from the content of formatted file 300.For example, application program can from after processing format file 300 forward, note the right appearance in discernible-mark-element-numeral-position.Lack the beginning of identification marking section 320 the right first time by discernible-mark-element-numeral-position.These and other technology that are used for the section distinguished of delineation information or troop all are known to those skilled in the art.
As mentioned above, tagged element will appear at the ending of plain text 310.According to another aspect of the present invention, the tagged element in the most preferred embodiment is encoded with " stealth " character code.That is, be used for tagged element and position encoded code thereof are encoded, so that can not produce visible effect to the direct demonstration of coded file.For the purposes of the present invention, even if the space produces " blank " space when showing, also it is thought " stealth " character.Equally, blank line is also included within the definition of " stealth ".
Fig. 4 A-4C has shown the example that generates corresponding to the stealthy sequence of tagged element.Shown in Fig. 4 A, every type tagged element 410 is defined uniquely by mark-type identifier 420.The definition of each mark-type identifier 420 can be limited in advance, perhaps can be the mapping of each coded document definition unique identifier to mark-type.For easy understanding, suppose that here mark-type limits in advance to the mapping of mark-type identifier, this is another data map technology well known in the art.Shown in Fig. 4 A, " beginning italic " mark-type has identifier " 100 " 421, and " end italic " mark-type has identifier " 101 " 422, or the like.As known in the art, some mark-types have correlation parameter.For example, " beginning color " mark-type has identifier " 106 " 425, and this identifier back is with the parameter of the amplitude of the redness 426 that definition institute define color is arranged, green 427 and blue 428 components.
The binary representation 420B that has shown the value of each mark-type identifier 420 among Fig. 4 A.According to one embodiment of the present of invention, be stealthy character string by sequential coding, for each tagged element generates a stealthy sequence with the scale-of-two among the binary representation 420B (0-1) value.Shown in Fig. 4 B, for example, use " space " (Sp) presentation logic " 0 ", and use " carriage return " (CR) presentation logic " 1 ".By adopting this expression, the binary coding 421B example 01100100 of " beginning italic " tagged element is encoded to following sequence: Sp-CR-CR-Sp-Sp-CR-Sp-Sp 431.Equally, the binary representation of the binary representation of the skew that interrelates with each tagged element and any parameter of interrelating with each tagged element is encoded too.By adopting " stealth " character to come tagged element, its skew and encode with any other parameter that tagged element interrelates, the direct demonstration of coded markings element will be only produce space and null in the ending of plain text 310.
The coded system that another kind is used to produce corresponding to " stealth " sequence of label information is tangible for those of ordinary skills.For example, Fig. 4 C has shown that four " stealth " characters of employing are expressed as follows the right coding of binary digit: " space " (SP) represents 00 pair, and " line feed " (LF) represents 01 pair, and " tabulation " (TB) represents 10, and " carriage return " (CR) represents 11.Adopt this expression, 01100100 421B of " beginning italic " tagged element represents to be encoded as sequence: LF-Tb-LF-Sp 441.
Fig. 5 has shown another kind of by adopting potential visible character that the coding method of " stealthy sequence " is provided in conjunction with the character code of " wiping " potential character visible.Shown in the example of front, the coded file 500 of Fig. 5 comprises plain text 510, and mark part 520 is followed in the back.The ending 519 of plain text 510 is described by describing sequence 521 with the beginning of mark part 520.In this example, describe three repetitive sequences that sequence 521 comprises " space " character heel " backspace " character.Attention, the direct demonstration of space heel backspace are not " as seen ", can not produce " blank " on showing.That is to say that after producing the space, conventional " cursor placement " pointer is increased, after producing backspace, reduce again then that the result causes effectively static cursor put pointer.In a printing device, printhead advances earlier to produce the space, then again rollback to obtain backspace.
After mark part delimiter sequence 521, be the coding of first tagged element and mark-type.As mentioned above, first mark in the message 100, " beginning black matrix " mark has deviation post 32.According to the example of this coding method, the stealthy sequential coding 560 of this marking bias value comprises numeral " 32 " 561, and the back is two backspace characters 562 and then.Tagged element coding 570 comprises text string "<B〉" 571, and the back is three backspace characters 572 and then.That is to say that the coding of each mark-skew and mark-identifier is to discern being used to of showing among Fig. 3 the backspace character that is accompanied by proper number so that wipe the text of change of the plain text 310 of each project.When coded file 500 directly is presented on the conventional display device, the ending of plain text 510 will appear in character " 32 " 561 tout court, and the cursor put pointer will return the ending of plain text 510 by two backspace characters 562."<B〉" character 571 appears at the ending of plain text 510 then tout court, and the cursor put pointer will return the ending of plain text 510 by three backspace characters 572 then.Equally, each project in the mark part 520 will be presented at the ending of plain text 510 tout court, be covered by next project immediately then.In the ending of mark part 520, additional last sequence 590 that five spaces and five backspaces are arranged is so that wipe any residual visible text.It is long that space in last sequence 590 and backspace number should equal the longest witness marking sequence.On a display device, the form that occurs at last of the character of being thumped will be the last character that will be thumped, in this case, be the space character sequence.On a printing device, according to buffering and degree of treatment different in the printing device, character visible can be printed, and is that each project in the mark part 520 impacts repeatedly with backspace character then when printhead returns the ending of plain text.In some application programs, some characters may be better than printing space and null in ending place of plaintext message in the printing and the bang of ending place of plaintext message.In addition, the coding with respect to all stealthy characters of use that Fig. 4 showed will be preferable.Equally, some legacy devices are " processing " backspace character not, but shows the symbol of an expression backspace character.Therefore, if the maximum compatibility of hope and legacy devices also is preferable with respect to the coding that Fig. 4 showed then.
With the display application program of this format compatible the data in the formatted file are handled as plain text, run into mark part delimiter 521 up to it.After this, it will be handled, and each mark-skew--mark-type is right, and ignores backspace character, and suitably strengthens the demonstration of plain text in response to each tagged element.
Fig. 6 has shown another encoding scheme, and this encoding scheme is to the tagged element information coding of " embarking on journey ", and has eliminated the needs to the migration parameter coding of each mark.The same backspace method for deleting that Fig. 5 showed is used for each mark is carried out " stealth " coding.Promptly, according to this aspect of the present invention, coded file 600 is to occur with the similar mode of conventional rich text form, difference is each tagged element 650,660 back immediately with the backspace character 651,661 of proper number is arranged, and is used for wiping when direct code displaying file 600 tagged element.As shown in Figure 5, the application program of the compatibility attribute that will show by each tagged element by utilization, ignore the backspace character that interrelates with each tagged element simultaneously and handle coded file 600.As mentioned above, this alternative may be not suitable for showing into backspace character the equipment of a symbol, perhaps is not suitable for not cushioning the printer with the pre-service backspace, because will visually produce chaff interference to the bang of character.In these cases, in order to reach the maximum compatibility with legacy devices, the alternative of Fig. 4 is best.
Fig. 7 has shown an illustration block scheme of handling input document 100 with the scrambler 700 of generation coded file 780.Scrambler 700 comprises syntax analyzer 710, tag encoder 720 and file organization device and written document device 730.Text element and tagged element that syntax analyzer 710 will be imported in the document 100 make a distinction.Text element 712 is sent to file organization device and written document device 730, and tagged element 714 is sent to tag encoder 720.Tag encoder 720 is a mark-type identifier with label coding, if it is not also so encoded.If with respect to one in the coding of Fig. 4-6 displaying mark-type identifier is encoded to a stealthy sequence above the adopted words of stealthy sequence signature of the present invention, tag encoder 720 also adopt.The flag sequence 721 of coding is sent to file organization device and written document device 730.If do not adopt the coding of embarking on journey, then each tagged element 711 adopts above-mentioned technology to be transmitted as code offset with respect to the position of plain text element 712 too.
File organization device and written document device 730 are prepared text 712 and mark 721 information, for being stored in the coded file 780.Here, jargon file is used on the general meaning, and the meaning is the composition sequence of data.For example, the packet sequence that it comprises file on the computer system, the byte sequence in the storer, transmits by the Internet, or the like.If adopt the coding of embarking on journey of stealthy sequence, then to discuss as Fig. 6, file organization device and written document device 730 are only write the tagged element 721 of text element 712 and coding in the coded file 780 with the order that they appear in the input document 100.If do not adopt the coding of embarking on journey, discuss as Fig. 3-5, then each text element 712 directly to be write in the coded file 780, the flag sequence and the skew thereof of each coding followed in the back.
Fig. 8 has shown according to various aspects of the present invention and has been used for importing the illustration process flow diagram of document coding.810, open input message to handle.For the output file of Fig. 7, input message can be various forms: computer documents, image, webpage from display screen, from the input of keyboard, or the like.Square frame 820 is analyzed the text element and the tagged element of input message.Square frame 820 can also comprise the device that is used for producing according to the form of input message tagged element.For example, if input message is a scanning image, then square frame 820 can be an identification content of text with and the text recognition system of attribute, for example black matrix, italic etc.
830,, then judge corresponding flag sequence 836 if the next element in the input message is a tagged element.If do not adopt the coding of embarking on journey, then square frame 836 comprises the deviation post of this tagged element in corresponding flag sequence.If utilize hidden code scheme of the present invention, then square frame 836 is transformed into a stealthy sequence with tagged element.If adopt the coding of embarking on journey 840, the flag sequence of interim memory encoding then is for the plain text that appends to output file subsequently ending partly.If do not adopt the coding of embarking on journey 840, then the coded markings sequence stores temporarily, so that then append to the end of the plain text part of output file.If adopted the coding of embarking on journey 840, then will be sent to square frame 850 corresponding to the stealthy sequence of tagged element, write in the output file so that appear at the order of importing in the message according to it.
If at the 830 next elements of importing in the message is not tagged element, then judges the corresponding text sequences, and be sent to square frame 850, so that write in the output file 832.Usually, square frame 832 only directly is sent to square frame 850 with text element, for writing in the output file, but if desired to any reformatting of the text of input message, for example be transformed into the ascii character code, then can carry out at this square frame 832.
Will be corresponding to the sequence of the element of input in the message after 850 write output file, or store for after using later on 842, system returns so that analyzes next element by 860 to 820, and continues this process, up to the end of importing message.
If also use the format of embarking on journey, then will indicate that 875 the delimiter of the beginning of mark part writes in the output file, and the flag sequence and the deviation post thereof of each storage be write in the output file 878 870.As mentioned above, because these sequences and skew are placed in the output file after all text elements, so the direct demonstration of output file will cause drawing with the form of easy reading the content of text of input message.That is, if by not drawing output file for demonstration with the application program of coded format " compatibility " discussed herein, then the initial part of output file still is plotted as plain text document, and is not inserted in visually noisy tagged element.
But Fig. 9 has shown the illustration block scheme according to the compatible decoder 900 of various aspects operation of the present invention.Code translator 900 is handled coded files 901, draws output 980 to produce one, and described drafting output 980 comprises and each attribute that interrelates corresponding to the text element of the input document that is used to produce coded file 901.Code translator 900 comprises syntax analyzer 910, mark code translator 920 and display driver 930.As mentioned above, coded file 901 can be in computer documents, the computer memory byte sequence, the packet sequence on the telecommunication media, or the like.Equally, term demonstration 980 and display driver 930 usefulness are in the universal sense, comprise conventional graphoscope and printer, and can by those of ordinary skills approval be comprise in the middle of display device, for example comprise and be used for by drawing application program, for example web browser and other check that device produces the file of the information of drafting, webpage, small routine, wavelet, little information etc.
Syntax analyzer 910 is described text element from the mark of coding.If adopt the coding of embarking on journey of mark, then syntax analyzer 910 comprise one when each coded markings 911 appears in the coded file 901 to its mark recognition system of discerning; In addition, syntax analyzer 910 comprises a mark part delimiter recognizer, is used to discern the end of text element 912 and the beginning of tagged element 911.As mentioned above, the technology that is used to distinguish the type of the various piece of file or information data is being known in the art.Text element 912 is provided directly to display driver 930.The flag sequence 911 of coding is by 920 decodings of mark code translator, and the tagged element 921 of decoding is provided for display driver 930.
If used the coding of embarking on journey of tagged element, then display driver produces each text element with its suitable drafting form immediately after receiving tagged element and text element.If also use the coding of embarking on journey, then handling any tagged element 921 videotex element 912 afterwards that may influence particular text element 912.For example, code translator 980 can be with coded file 901 " buffering " position as text element, and extracts text element 912 by will be tabulated by the change of tagged element generation effect the time at it.For example, in the example of the input message 100 of Fig. 3, syntax analyzer 910 can be designed to have a plurality of ports, the beginning of a port access plain text 310, the beginning of other port access mark parts 320 to coded file 901.When first marking bias " 32 " 340 is received by second port and when decoded, display driver 930 indication syntax analyzers 910 provide character from first port,, and they are not plotted in the output 980 with not making an amendment up to the 32nd character.Display driver 930 produces " black matrix " effect according to " B " mark 345 from second port then, and as " 36 " on first port skew as indicated in 350, the successive character that indication syntax analyzer 910 is showed from first port is until the 36th character.Each character of the from the 33rd to the 36th all adopts the black matrix effect to draw.Response is from "/B " mark 355 of second port, and display driver 930 is for forbidding the black matrix effect from the successive character of first port.As known in the field, this double process lasts till the end of coded file 901 always, produces one and comprises that text and expression are used to generate the output 980 of association attributes of the input file of coded file 901.
Above-mentionedly only shown principle of the present invention.Therefore, should be appreciated that those of ordinary skills can design the device of the various enforcement principle of the invention,, all fall in the spirit and scope of the present invention though here describe clearly or demonstration.For example, similar with " ending mark " in the document, the coded markings sequence of having showed is placed on the ending of the plain text part of coding output file.Perhaps, similar with " footnote " or " chapters and sections mark " in the document, the coded markings sequence can be placed on the ending of each plain text page or leaf or part.Ad hoc structure that provides in this open text and sequence are for illustrative purpose.For example, scrambler 700 and code translator 900 are here for complete and be shown as independent equipment.As conspicuous to those skilled in the art, the pretreater and the preprocessor of message that principle of the present invention can adopt conventional encoder encodes by conversion, for example be used to generate the file of a MIME form implemented.That is, the file that scrambler 700 can be constructed to accept a MIME form is as input, and adopts the only rich text section of conversion MIME formatted file of principle of the present invention.Corresponding code translator 800 will be accepted this coding of rich text section, and regenerate complete (plain text adds rich text) MIME formatted file, for a routine can compatible MIME the display device drafting.Equally, text is shown as the drafting of the text that has its attribute by the demonstration of code translator 800.Perhaps, for showing immediately fast of the information of carrying out, code translator 800 can be constructed to the plain text part is plotted on the display immediately, then on adding subsequently corresponding to the attribute of tagged element.Like this, for example, and do not arrive, be enhanced then with the image class of reflection details seemingly with details, coded document 901 will be shown as plain text immediately from the download of internet website, then along with time and bandwidth allow strengthen.These and other system optimization technology will be conspicuous to those skilled in the art under the enlightenment of disclosure text, and fall into the following claim book and will protect in the scope of expansion.

Claims (15)

1. one kind to message (100) Methods for Coding, and wherein, message (100) comprises a plurality of text elements (110) and specify at least one tagged element (711) of the appearance of described message (100) that when showing described method comprises:
Coding (710) described a plurality of text elements (110),
Coding (720) described at least one tagged element (711), and a text element (101) of a type and described a plurality of text element (110) interrelated, with the flag sequence (721) that forms a coding; And
Troop a plurality of text elements (110) of the described coding that (730) separate with the flag sequence (721) of described coding.
2. the method for claim 1, wherein
The flag sequence of described coding (721) is encoded as the stealthy sequence of a character code.
3. method as claimed in claim 2, wherein, the stealthy sequence of described character code corresponding to the corresponding stealthy character code sequence of binary representation (421B) (431) of described at least one tagged element (711).
4. one kind to message (100) Methods for Coding, described message comprises a plurality of text elements (110) and is used to control at least one tagged element (711) of demonstration of at least one text element (101) of described a plurality of text element (110), and described method comprises:
Make it possible to described at least one tagged element (711) coding (720) of described message (100) is become a corresponding character code hidden mark sequence (721), and
Make it possible to each text element of described a plurality of text elements (110) is encoded into a visible sequence of character code (712).
5. method as claimed in claim 4 further comprises:
Make it possible to judge the skew (340) corresponding to described at least one tagged element (711) of a position in the described message (100), wherein said position is the position that described at least one tagged element (711) is positioned at,
Make it possible to the skew (561) of described at least one tagged element (711) is encoded to a stealthy offset sequence (560),
Make it possible to the coding of each text element in described a plurality of text elements (110) is formed the form of troop (510),
Make it possible to described stealthy offset sequence and described hidden mark sequence (520) are appended on described the trooping (510) of coding of each text element in described a plurality of text element (110).
One kind to the input message (100) coding scrambler (700), described input message comprises a plurality of text elements (110) and is used to control at least one tagged element (711) of demonstration of at least one text element (101) of described a plurality of text element (110), and described scrambler comprises:
Tag encoder (720) is encoded into a character code hidden mark sequence (721) with described at least one tagged element (711),
Text code device (710) is encoded into a visible sequence of character code (712) with each text elements of described a plurality of text elements (110).
7. scrambler as claimed in claim 6 (700) further comprises:
Marker extraction device (830) is judged the skew (842) corresponding to described at least one tagged element (711) of the position of at least one tagged element (711) described in the described input message (100),
The skew scrambler is encoded to the stealthy offset sequence of a character code with described skew, and
Written document device (730),
To troop continuously as of character code corresponding to the visible sequence of character code of each text element of described a plurality of text elements (110) and write an output file (780), and
Described character code hidden mark sequence and the stealthy offset sequence of described character code (721) are write described output file (780).
One kind to the input message (100) coding scrambler (700), described input message comprises a plurality of text elements (110) and is used to control at least one tagged element (711) of demonstration of at least one text element (101) of described a plurality of text element (110), and described scrambler comprises:
Marker extraction device (710) is judged the skew corresponding to described at least one tagged element (711) of the position of at least one tagged element (711) described in the described input message (100),
Tag encoder (720) is encoded to the character code flag sequence (721) of a coding with described skew and described at least one tagged element (711), and
File organization device (730),
Each text element of described a plurality of text elements (110) trooped is troop continuously (510) of pure words character code, and
Character code (520) flag sequence (721) of described coding is appended to troop continuously (510) of pure words character code, to form the coding (500) of input message (100).
One kind to the input message (901) decoders for decoding (900), described input message comprises at least one character code hidden mark sequence (911) and the visible sequence of at least one character code, described code translator comprises:
Mark code translator (920) becomes a tagged element (921) with described at least one character code hidden mark sequential decoding, and
Display driver (930), according at least one visible sequence of tagged element (921) draw characters code (912), and
10. code translator as claimed in claim 9 (900), wherein,
Mark code translator (930) further becomes the hidden mark sequential decoding skew corresponding to tagged element (921) (340), and
Display driver (930) is further drawn the visible sequence of described at least one character code (912) according to skew (340).
11. one kind to the input message (300) decoders for decoding (900), described input message comprises a continuous plain text section (310) and at least one flag sequence (320), described code translator comprises:
Mark code translator (921) is decoded into a tagged element (345) and a marking bias (340) with described at least one flag sequence (320), and
Display driver (930) adopts and to depend on that the appearance form of tagged element (345) and marking bias (340) draws continuous plain text section (310).
12. code translator as claimed in claim 11 (900), wherein, described at least one flag sequence (320) is a character code hidden mark sequence (431), and
Mark code translator (921) is decoded as tagged element (345) and marking bias (340) with hidden mark sequence (431).
13. the coded message corresponding to origination message (100) (500), described origination message (100) has a plurality of text elements (110) and specify at least one tagged element (121) of the appearance of described origination message (100) when showing, described coded message (500) comprising:
Corresponding to the continuous plain text section (510) of a plurality of text elements (110), and
At least one coded markings sequence (570) corresponding to described at least one tagged element (121).
14. coded message as claimed in claim 13 (500), wherein
Coded markings sequence (570) is encoded as the stealthy sequence of a character code.
15. the coded message corresponding to origination message (100) (500), described origination message (100) has a plurality of text elements (110) and specify at least one tagged element (121) of the appearance of described origination message when showing, described coded message (500) comprising:
Corresponding to the visible sequence of a plurality of character codes (510) of a plurality of text elements (110), and
At least one character code hidden mark sequence (570) corresponding to described at least one tagged element (121).
CN00801713A 1999-06-15 2000-06-07 Invisible encoding of attribute data in character based documents and files Pending CN1335966A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US33363299A 1999-06-15 1999-06-15
US09/333,632 1999-06-15

Publications (1)

Publication Number Publication Date
CN1335966A true CN1335966A (en) 2002-02-13

Family

ID=23303608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN00801713A Pending CN1335966A (en) 1999-06-15 2000-06-07 Invisible encoding of attribute data in character based documents and files

Country Status (4)

Country Link
EP (1) EP1145140A3 (en)
JP (1) JP2003502735A (en)
CN (1) CN1335966A (en)
WO (1) WO2000077677A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110979B (en) * 2006-07-19 2011-06-22 阿里巴巴集团控股有限公司 Method, device and system for message transmission
CN1530857B (en) * 2003-03-05 2011-11-16 惠普开发有限公司 Method and device for document and pattern distribution
CN107172436A (en) * 2017-06-09 2017-09-15 国政通科技股份有限公司 A kind of method and system of ID card information transmission protection
CN111144073A (en) * 2019-12-30 2020-05-12 文思海辉智科科技有限公司 Blank character visualization method and device in online text display system
CN117556782A (en) * 2024-01-11 2024-02-13 深圳市度申科技有限公司 Text formatting method, electronic equipment and computer readable storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1410619B1 (en) 2001-06-12 2006-10-25 International Business Machines Corporation Method of invisibly embedding and hiding data into soft-copy text documents
US7475429B2 (en) 2001-06-12 2009-01-06 International Business Machines Corporation Method of invisibly embedding into a text document the license identification of the generating licensed software
CN1323365C (en) * 2001-06-12 2007-06-27 国际商业机器公司 Method of authenticating plurality of files linked to text document
KR100451180B1 (en) * 2001-11-28 2004-10-02 엘지전자 주식회사 Method for transmitting message service using tag
JP2003196270A (en) * 2001-12-27 2003-07-11 Sharp Corp Document information processing method, document information processor, communication system, computer program and recording medium
JP4184155B2 (en) * 2003-05-22 2008-11-19 シャープ株式会社 Data processing apparatus, data processing method, data processing program, and computer-readable recording medium recording the data processing program
EP1628227A4 (en) * 2003-05-22 2010-07-07 Sharp Kk Data processing device, data processing method, data processing program, and computer-readable recording medium containing the data processing program
JP2008016048A (en) * 2003-09-22 2008-01-24 Fujitsu Ltd Program, information processor, and method for processing invisible character
JP4628450B2 (en) * 2008-07-01 2011-02-09 シャープ株式会社 Data processing apparatus, data processing method, data processing program, and recording medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4370645A (en) * 1981-06-16 1983-01-25 International Business Machines Corporation Ghost cursor in display all codes mode
US4749289A (en) * 1986-06-13 1988-06-07 Brother Kogyo Kabushiki Kaisha Printing device for attribute printing

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1530857B (en) * 2003-03-05 2011-11-16 惠普开发有限公司 Method and device for document and pattern distribution
CN101110979B (en) * 2006-07-19 2011-06-22 阿里巴巴集团控股有限公司 Method, device and system for message transmission
CN107172436A (en) * 2017-06-09 2017-09-15 国政通科技股份有限公司 A kind of method and system of ID card information transmission protection
CN107172436B (en) * 2017-06-09 2019-11-26 国政通科技股份有限公司 A kind of method and system of ID card information transmission protection
CN111144073A (en) * 2019-12-30 2020-05-12 文思海辉智科科技有限公司 Blank character visualization method and device in online text display system
CN111144073B (en) * 2019-12-30 2021-11-16 文思海辉智科科技有限公司 Blank character visualization method and device in online text display system
CN117556782A (en) * 2024-01-11 2024-02-13 深圳市度申科技有限公司 Text formatting method, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
WO2000077677A2 (en) 2000-12-21
WO2000077677A3 (en) 2001-05-03
EP1145140A2 (en) 2001-10-17
JP2003502735A (en) 2003-01-21
EP1145140A3 (en) 2002-11-13

Similar Documents

Publication Publication Date Title
US6966029B1 (en) Script embedded in electronic documents as invisible encoding
Taha et al. A high capacity algorithm for information hiding in Arabic text
CN1335966A (en) Invisible encoding of attribute data in character based documents and files
US5761686A (en) Embedding encoded information in an iconic version of a text image
US7177794B2 (en) System and method for writing Indian languages using English alphabet
US7724158B2 (en) Object representing and processing method and apparatus
Allen et al. The unicode standard
WO2008013720A2 (en) Method and apparatus for font subsetting
US20070185837A1 (en) Detection of lists in vector graphics documents
CN1845099A (en) Method and structure for conversion and storage of multimedia electronic file in intelligent mobile terminal
CN101008940B (en) Method and device for automatic processing font missing
Gillam Unicode demystified: a practical programmer's guide to the encoding standard
Khairullah A novel text steganography system using font color of the invisible characters in microsoft word documents
US5444445A (en) Master + exception list method and apparatus for efficient compression of data having redundant characteristics
US7359850B2 (en) Spelling and encoding method for ideographic symbols
Shirali-Shahreza A new Persian/Arabic text steganography using “La” word
JP2007317214A (en) Unicode converter
Davis et al. Unicode collation algorithm
JP2001014311A5 (en)
Shirali-Shahreza Pseudo-space Persian/Arabic text steganography
JP2003528336A (en) Method and apparatus for storing and displaying symbols
Peruginelli et al. Character sets: towards a standard solution?
Barron Portable documents: problems and (partial) solutions
McGillivray Statistical analysis of digital paleographic data: what can it tell us?
KR0133081B1 (en) Korean character control method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication