CN104516859A - Character correcting method and system - Google Patents

Character correcting method and system Download PDF

Info

Publication number
CN104516859A
CN104516859A CN201310447805.6A CN201310447805A CN104516859A CN 104516859 A CN104516859 A CN 104516859A CN 201310447805 A CN201310447805 A CN 201310447805A CN 104516859 A CN104516859 A CN 104516859A
Authority
CN
China
Prior art keywords
word
font
wide
width
mistake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310447805.6A
Other languages
Chinese (zh)
Other versions
CN104516859B (en
Inventor
孙浩鹏
丁力
董宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fangzheng Apapi Technology Co Ltd
New Founder Holdings Development Co ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201310447805.6A priority Critical patent/CN104516859B/en
Publication of CN104516859A publication Critical patent/CN104516859A/en
Application granted granted Critical
Publication of CN104516859B publication Critical patent/CN104516859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Controls And Circuits For Display Device (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides character correcting method and system. The method comprises the steps of acquiring at least one font of characters in a text; selecting the fonts with accurate character pattern description information according to the width of the characters in the fonts; correcting the characters in the fonts with the inaccurate character pattern description information. According to the method, the character pattern information is expressed well through the character width information, and then the fonts can be corrected, so that the problems such as non-uniform line spacing, excessively large clearance between characters, or overlapped characters in typesetting can be solved; the characters are respectively corrected according to the characteristics of the characters with the same width and the characters with different width, and therefore, the readability and the aesthetics of a content can be effectively improved.

Description

A kind of word modification method and system
Technical field
The present invention relates to a kind of computer version process field, specifically one have word correction invention send out and system.
Background technology
Computer font is called for short font, is a kind of data acquisition containing font pattern set, character code collection.It contains the mapping relations of character code to font, and the descriptor of each font.Glyph bitmap can be drawn out by font descriptor.Computer character generally presents to user by the form of document.Streaming document refer to without fixed format, content can represent the actual size of medium according to it and carry out the document of content arrangement.EPub and TXT is two kinds of conventional streaming documents.PDF and CEBX with structured message also can be considered streaming document.
Along with the universal of smart mobile phone and the rise of mobile Internet, increasing people's choice for use mobile device carries out mobile reading, and streaming document is best suited for the scheme of mobile reading, because the product category of mobile device is various, screen resolution and size different, " rearrageability " of flowing content makes it can reasonably represent its content according to the situation of physical device, and this just greatly improves readability and the aesthetic property of content.
But, because historical reasons cause including not rigorous embedded font in part flow file, these fonts are containing vicious font descriptor, have plenty of containing vicious row high information, have plenty of containing the wide information of vicious word etc., this brings problem all to normal process of typeset, and the content ejected occurs that line space is uneven, word excesssive gap even text overlays phenomenon together.This have impact on readability and the aesthetic property of content greatly.
For this reason, also some composition methods for terminal reading document have been there are in the prior art, when the webpage layout of mobile terminal starts or when in webpage layout process, font changes, judge to treat whether typesetting character can widely process, if can widely process, then carry out the typesetting of character by calling the character duration data of preserving in advance in mobile terminal.In the program by the length of benchmark character string and actual treat typesetting character length and judge to treat whether typesetting word can widely process, the method can realize the process of wide word as Chinese, Japanese, Korean, but for there is the document of wide word and non-wide word (as English, German) simultaneously, because non-wide word is how many according to its character comprised, its width difference is very large, and the method cannot be suitable for.In addition, in these schemes, for the document containing wrong font descriptor, cannot effectively identify and process accordingly, therefore for the document containing wrong font descriptor, cannot effective recognition font descriptor whether mistake, also cannot process respectively, the printer's error for wide word and non-wide word is caused well not find, font typesetting weak effect.
Summary of the invention
For this reason, technical matters to be solved by this invention is that type-setting mode of the prior art well cannot find the mistake of the font descriptor that there is wide word and non-wide word simultaneously and the problem revised in time, thus proposes a kind of the word modification method with error correction and the system that can carry out error correction to the document that there is wide word and non-wide word.
For solving the problems of the technologies described above, of the present inventionly provide a kind of word modification method and system,
A kind of word modification method, comprising:
Obtain at least one font of word in document;
For often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake;
Word in the font of font descriptor mistake is revised.
Text composition method of the present invention, the described word according to the word in this font is wide, and the process selecting the font of font descriptor mistake comprises:
Choose at least two words in this font, calculate the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value;
Whether exceed threshold value according to described mean value, determine the font descriptor whether mistake of described font.
Text composition method of the present invention, for wide word, described threshold value is 0.9-1.1; For non-wide word, described threshold value is 1.2-1.4.
Word modification method of the present invention, describedly comprises the process that the word in the font of font descriptor mistake revises:
According to the font frame rectangle width of word, revise the word of described word is wide.
Word modification method of the present invention, when revising the word of described word is wide, following formula is utilized to revise: W=D × a:W is revised word width values, D is the font frame rectangle width value of this word, a is correction factor, for wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
Word modification method of the present invention, for wide word, correction factor a=1.05, for non-wide word correction factor a=1.3.
Word modification method of the present invention, described for often kind of font, wide according to the word of the word in this font, select in the process of the font of font descriptor mistake, process for embedded font.
Word modification method of the present invention, the described process that word in the font of font descriptor mistake is revised, comprise: for the word in the font of font descriptor mistake, first judge whether it is wide word, sample average is calculated for wide word and revises, non-wide word is obtained the font frame rectangle width value of character and revised.
Word modification method of the present invention, calculates sample average for wide word and the process carrying out revising is as follows:
First, judge that the word of this word is wide whether as calculated, if do not calculate, choose the wide word sample using this font, then obtain the shaped edge width of frame value of sample and calculate average, then revised the rear width value of acquisition correction with correction factor; If as calculated, then directly call before result of calculation.
Word modification method of the present invention, also comprise composition step, typesetting is carried out after word in the font of font descriptor mistake is revised, typesetting after word width values is directly obtained for the word of font not containing font descriptor mistake, process of typeset is for be added to current text width of having arranged by wide for current word, if be greater than one layout region width, then reset and currently arranged text width and by cumulative for a row baseline line height, then this word under current end of line row; If be not more than one layout region width, direct under current end of line row this word.
A kind of word update the system, comprising:
Acquiring unit: at least one font obtaining word in document;
Selection unit: for often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake;
Amending unit: the word in the font of font descriptor mistake is revised.
Text composing system of the present invention, described selection unit comprises:
Mean value computation module: choose at least two words in this font, calculates the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value;
Judge module: whether exceed threshold value according to described mean value, determines the font descriptor whether mistake of described font.
Text composing system of the present invention, for wide word, described threshold value is 0.9-1.1; For non-wide word, described threshold value is 1.2-1.4.
Word update the system of the present invention, described amending unit comprises: correcting module, according to the font frame rectangle width of word, revises the word of described word is wide.
Word update the system of the present invention, in described correcting module, utilizes following formula to revise: W=D × a:W is revised word width values, D is the font frame rectangle width value of this word, and a is correction factor, for wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
Word update the system of the present invention, for wide word, correction factor a=1.05, for non-wide word correction factor a=1.3.
Word update the system of the present invention, in described selection unit, processes for embedded font.
Word update the system of the present invention, described amending unit comprises
Judge module, for the word in the font of font descriptor mistake, first judges whether it is wide word;
Wide word correcting module: sample average is calculated for wide word and revises;
Non-wide word correcting module: non-wide word is obtained the font frame rectangle width value of character and revised.
Word update the system of the present invention, described wide word correcting module comprises:
Judge submodule: judge that the word of this word is wide whether as calculated,
Calculating sub module: if do not calculate, chooses the wide word sample using this font, then obtains the shaped edge width of frame value of sample and calculate average, then is revised the rear width value of acquisition correction with correction factor;
Call submodule: if as calculated, then the result of calculation before directly calling.
Word update the system of the present invention, also comprise typesetting unit, typesetting is carried out after word in the font of font descriptor mistake is revised, typesetting after word width values is directly obtained for the word of font not containing font descriptor mistake, process of typeset is for be added to current text width of having arranged by wide for current word, if be greater than one layout region width, then reset and currently arranged text width and by cumulative for a row baseline line height, then this word under current end of line row; If be not more than one layout region width, direct under current end of line row this word.
Technique scheme of the present invention has the following advantages compared to existing technology,
(1) a kind of word modification method of the present invention and system, obtain at least one font of word in document, for often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake, the word in the font of font descriptor mistake is revised.Well give expression to its font descriptor by the wide information of word, then revise, the line space that may occur in process of typeset is uneven in order to solve, the problem such as word excesssive gap or text overlays, effectively improve readability and the aesthetic property of content.
(2) word modification method of the present invention, the font of font descriptor mistake obtains in the following manner: choose at least two words in this font, calculate the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value; Whether exceed threshold value according to described mean value, determine the font descriptor whether mistake of described font.For wide word, described threshold value is 0.9-1.1; For non-wide word, described threshold value is 1.2-1.4, adopts the method effectively can obtain font descriptor and whether there occurs mistake, improve the discrimination of mistake.
(3) word modification method of the present invention, according to the font frame rectangle width of word, revise the word of described word is wide, following formula is utilized to revise: W=D × a:W is revised word width values, D is the font frame rectangle width value of this word, and a is correction factor, for wide word, a=1.01-1.1, preferred correction factor a=1.05.For non-wide word a=1.22-1.35, preferred a=1.3.By correction factor, wide word and non-wide word are revised respectively, improve the neat and aesthetic property of word after revising.
(4) word modification method of the present invention, process for embedded font, because non-embedded font can normal typesetting without the need to process, for the correct embedded font of font descriptor also without the need to process, therefore the embedded font containing wrong font descriptor is obtained by said method, and process targetedly, decrease treatment capacity, improve processing speed, in addition, by the mode of sample statistics, make the method have general applicability, and implement simple and convenient.
(5) word modification method of the present invention, for the word used containing wrong font descriptor font, first judge whether it is wide word, sample average is calculated for wide word and revises, non-wide word is obtained the font frame rectangle width value of character and revised.For wide and non-wide word, take different correction strategies, because wide word is as Chinese, Korean, itself is wide, after therefore once calculating, follow-uply can directly to call; For non-wide word, waiting and need to process for each character separately, like this, by revising targetedly, improve the accuracy that it calculates.
(6) word modification method of the present invention, for the font containing wrong font descriptor, word width values is obtained by revising, the current text width of typesetting is added to by wide for current word, then this width is compared with one layout region width, be not more than, directly insert, otherwise line feed is inserted, this ensures that there word sequentially to insert in turn, ensure that typesetting effect.
Accompanying drawing explanation
In order to make content of the present invention be more likely to be clearly understood, below according to a particular embodiment of the invention and by reference to the accompanying drawings, the present invention is further detailed explanation, wherein
Fig. 1 is the process flow diagram of the word modification method of one embodiment of the present of invention 1;
Fig. 2 is the process flow diagram of the word modification method of another one embodiment of the present invention;
Fig. 3 is the structural drawing of the word update the system in embodiments of the invention;
Fig. 4 is the method flow diagram of determining step in embodiments of the invention;
Fig. 5 is the method flow diagram of error correction step in embodiments of the invention;
Fig. 6 is the method flow diagram of composition step in embodiments of the invention;
Fig. 7 is the design sketch after the use word modification method typesetting described in embodiments of the invention.
Embodiment
Provide the concrete embodiment of word modification method of the present invention and system below, cooperation drawings and Examples are described in detail, whereby to the present invention how application technology means solve technical matters and the implementation procedure reaching technology effect can fully understand and realize according to this.
embodiment 1:
There is provided a kind of word modification method in the present embodiment, process flow diagram as shown in Figure 1, comprising:
(1) at least one font of word in document is obtained;
(2) for often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake.Detailed process is as follows:
Choose at least two words in this font, calculate the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value; Whether exceed threshold value according to described mean value, determine the font descriptor whether mistake of described font.For wide word, described threshold value gets 0.9 or 1 or 1.1; For non-wide word, described threshold value is 1.2 or 1.3 or 1.4.
(3) word in the font of font descriptor mistake is revised.According to the font frame rectangle width of word, revise the word of described word is wide.Correction utilizes following formula to revise: W=D × a, and wherein W is revised word width values, and D is the font frame rectangle width value of this word, and for wide word, correction factor a=1.05, for non-wide word correction factor a=1.3.
In other implementations, correction factor a can select as required, and for wide word, the general value of a is: 1.01-1.1, is 1.22-1.35. for the general span of non-wide word a
embodiment 2:
A kind of word update the system corresponding with embodiment 1 is provided in the present embodiment, comprises:
(1) acquiring unit: at least one font obtaining word in document;
(2) selection unit: for often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake.Detailed process is as follows:
Choose at least two words in this font, calculate the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value; Whether exceed threshold value according to described mean value, determine the font descriptor whether mistake of described font.For wide word, described threshold value gets 0.9 or 1 or 1.1; For non-wide word, described threshold value is 1.2 or 1.3 or 1.4.
(3) amending unit: the word in the font of font descriptor mistake is revised.According to the font frame rectangle width of word, revise the word of described word is wide.Correction utilizes following formula to revise: W=D × a, and wherein W is revised word width values, and D is the font frame rectangle width value of this word, and for wide word, correction factor a=1.05, for non-wide word correction factor a=1.3.
In other implementations, the correction factor a in amending unit can select as required, and for wide word, the general value of a is: 1.01-1.1, is 1.22-1.35. for the general span of non-wide word a
embodiment 3:
Another word modification method is provided in the present embodiment, comprises:
(1) at least one font of word in document is obtained;
(2) for often kind of font, process for embedded font, for embedded font, wide according to the word of the word in this font, select the font of font descriptor mistake.
Choose five words in this font as sample, calculate the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value; Whether exceed threshold value according to described mean value, determine the font descriptor whether mistake of described font.For wide word, described threshold value gets 1.1; For non-wide word, described threshold value is 1.3.
(3) word in the font of font descriptor mistake is revised.For the word in the font of font descriptor mistake, first judge whether it is wide word,
Calculate sample average for wide word and revise, process is as follows:
First, judge that the word of this word is wide whether as calculated, if do not calculate, choose the wide word sample using this font, then obtain the shaped edge width of frame value of sample and calculate average, revised to obtain with correction factor again and revise rear width value, revise and utilize following formula to revise: W=D × a, wherein W is revised word width values, D is the sample average of the font frame rectangle width value of this word, correction factor a=1.07; If as calculated, then directly call before result of calculation.
Non-wide word is obtained the font frame rectangle width value of character and revised by correction factor.W=D × a, W are revised word width values, and D is the font frame rectangle width value of this word, a=1.25.
As other embodiments, also composition step is comprised after above-mentioned makeover process, as shown in Figure 2, typesetting is carried out after word in the font of font descriptor mistake is revised, typesetting after word width values is directly obtained for the word of font not containing font descriptor mistake, process of typeset is for be added to current text width of having arranged by wide for current word, if be greater than one layout region width, then reset and currently arranged text width and by cumulative for a row baseline line height, then this word under current end of line row; If be not more than one layout region width, direct under current end of line row this word.
embodiment 4:
The present embodiment provides a kind of word update the system, comprising:
(1) resolution unit (also can be described as acquiring unit): at least one font obtaining word in document;
(2) judging unit (or being called selection unit): for often kind of font, wide according to the word of the word in this font, selects the font of font descriptor mistake.Comprise:
Mean value computation module: choose at least two words in this font, calculates the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value.
Judge module: whether exceed threshold value according to described mean value, determines the font descriptor whether mistake of described font.
(3) error correction unit (or being called amending unit): the word in the font of font descriptor mistake is revised, comprises
3.1, judge module, for the word in the font of font descriptor mistake, first judges whether it is wide word.
3.2, wide word correcting module: sample average calculated for wide word and revises, comprising:
Judge submodule: judge that the word of this word is wide whether as calculated,
Calculating sub module: if do not calculate, chooses the wide word sample using this font, then obtains the shaped edge width of frame value of sample and calculate average, then is revised the rear width value of acquisition correction with correction factor;
Call submodule: if as calculated, then the result of calculation before directly calling.
3.3, non-wide word correcting module: non-wide word is obtained the font frame rectangle width value of character and revised.
(4) typesetting unit, typesetting is carried out after word in the font of font descriptor mistake is revised, typesetting after word width values is directly obtained for the word of font not containing font descriptor mistake, process of typeset is for be added to current text width of having arranged by wide for current word, if be greater than one layout region width, then reset and currently arranged text width and by cumulative for a row baseline line height, then this word under current end of line row; If be not more than one layout region width, direct under current end of line row this word.
embodiment 5:
Provide a specific embodiment of the present invention below, a kind of word modification method, comprises the steps:
(1) analyzing step: the Word message receiving document, as received a streaming word content, being ePub streaming document in the present embodiment, parsing font resource wherein and word content, see 205 steps in Fig. 2.
Described font resource is character font data, is a kind of data acquisition containing font pattern set, character code collection, it is described that the mapping relations of character code to font, and the descriptor of each font.Font describes the information such as width, height that what and it each calligraphy and painting become in simple terms.Font can exist with the form of unique file, also can be embedded in document the part becoming document.
The order that described word content is word, system for font distribute for identifying the font name of font or numbering, drawn for descriptive text time the font size of size and each word corresponding to unicode encoded radio.Also can think that word content is exactly the set of text description, a text description mainly comprises its code value (determining specifically which word by code value), it uses which font in addition.The data content received can be resolved to font resource and word content by analyzing step.In the present embodiment, process respectively for different character font datas, due to the corrected parameter that different fonts has different word wide and different, process respectively for different character font datas, the judged result to each font can be improved, and improve correction effect, thus greatly better typesetting effect.
(2) determining step: find out the embedded font containing wrong font descriptor.In this step, judge successively for often kind of font resource, judge the font containing wrong font descriptor, see 210 steps in Fig. 2.First judge whether current font to be analyzed is embedded font, wait if current to judge the embedded font of font right and wrong, then can think correct font, embedded font is then processed: the word sample choosing some in the word content analytically gone out, these word samples all belong to this font to be judged, and they are not punctuation marks, then these word samples are obtained respectively to word width values and the font frame rectangle width value of same font size, calculate the word width values of word in each sample and the ratio of font frame rectangle width value, and assembly average, then judge whether described mean value exceeds threshold value, if mean value exceeds threshold range (excessive or too small), then marking this font is embedded font containing wrong font descriptor.
In this step, by the mean value of the ratio of word width values in statistical sample and font frame rectangle width, compare with threshold range, thus the embedded font obtained containing wrong font descriptor, because non-embedded font gets final product normal typesetting without the need to process, for the correct embedded font of font descriptor also without the need to process, therefore the embedded font containing wrong font descriptor is obtained by said method, and process targetedly, decrease treatment capacity, improve processing speed, in addition, by the mode of sample statistics, the method is made to have general applicability, and implement simple and convenient.
(3) error correction step: for the word used containing wrong font descriptor font, first judge whether it is wide word, according to judged result, revising employing in determining step those words be judged to containing wrong font descriptor font respectively, seeing step 215 in Fig. 2.First for the word of the font used containing wrong font descriptor, judge whether this embedded font belongs to wide word (as Chinese, Japanese, Korean etc.), still belongs to other non-wide words according to its unicode encoded radio.For wide word, each font only needs calculating one secondary word wide, get the font frame rectangle width value mean value of some words sample, then use correction factor (empirical value rule of thumb determined) to revise mean value, the word namely calculating all wide words of this font is wide.Because each font only needs calculating one secondary word wide, in follow-up judgement, first, judge that the word of this word is wide whether as calculated, if do not calculate, choose the wide word sample using this font, then obtain the shaped edge width of frame value of sample and calculate average, then being revised with correction factor.If calculated, then the word of all wide words of this font calculated before directly calling is wide.For non-wide word, need, for each character cell calculating word is wide, first to obtain the font frame rectangle width value of character, revising with correction factor (empirical value).After revising, width value is multiplied by font size to obtain actual word wide, and described actual word is wide is correct word width values.
In this error correction step, first for the word used containing wrong font descriptor font, first judge whether it is wide word, sample average is calculated for wide word and revises, non-wide word is obtained the font frame rectangle width value of character and revised.For wide and non-wide word, take different correction strategies, because wide word is as Chinese, Korean, itself is wide, after therefore once calculating, follow-uply can directly to call; For non-wide word, waiting and need to process for each character separately, like this, by revising targetedly, improve the accuracy that it calculates.Due in makeover process, be take relative value as the correction that standard is carried out, after therefore revising, needing to be multiplied by font size, to obtain actual word wide.
(4) composition step: all words are carried out typesetting.Typesetting is placed into by the word in word flow in applicable one layout region of showing medium size one by one, undertaken arranging and line-break by streaming word, according to revised word wide information, word is arranged one by one, thus determine that each word is showing the concrete coordinate on medium, sees step 220 in Fig. 2.First, each word in word flow is processed successively; By error correction step, correct word width values is obtained for the embedded font containing wrong font descriptor, for embedded font and non-embedded font not containing wrong font descriptor, directly obtains word width values; Then be added to current line arrange text width by wide for current word, compare with this width and one layout region width, if be greater than one layout region width, then reset current line and arrange text width and the line height that added up by row baseline, then this word under current end of line row, the maximal value of the high word height in previous row arrangement of row; If be not more than one layout region width, direct under current end of line row this word.Particular flow sheet is see Fig. 5.In this step, for the embedded font containing wrong font descriptor, word width values is obtained by error correction step, the current text width of typesetting is added to by wide for current word, then this width is compared with one layout region width, be not more than, directly insert, otherwise line feed is inserted, this ensures that there word sequentially to insert in turn, ensure that typesetting effect.
Word modification method described in the present embodiment, comprise analyzing step, determining step, error correction step and composition step, first literal resource and word content is parsed, then the embedded font containing wrong font descriptor is found out according to judgment criteria, then wide judgement is carried out for this embedded font, widely for whether to revise respectively, wide word and non-wide word difference are come and revises, uneven in order to solve the line space that may occur in process of typeset, the problems such as word excesssive gap or text overlays, effectively improve readability and the aesthetic property of content.
embodiment 6:
In the present invention, the emphasis of the present embodiment relies on for needing in process of typeset the process that font descriptor calculates typesetting position, and the typesetting rule of the more refinement related in concrete enforcement is mentioned in the present embodiment.In conjunction with a concrete application example in the present embodiment, be described.
Provide an ePub document in the present embodiment, the effect after typesetting as shown in Figure 7.It contains 3 kinds of fonts, is the font 1 of title respectively, the Chinese font 2 in text and the english font 3 in text, and wherein font 1 is correct font, and font 2 and font 3 are the fonts containing the wide information of erroneous words.
Adopt the text composing system with error correction in the present embodiment, see Fig. 3, use the word modification method of its correspondence, process flow diagram is see Fig. 2.First 3 kinds of font resources and the word flow (step 205) of document use is parsed; Then the 3 kinds of fonts parsed are judged (step 210) respectively, judge that font 2 and font 3 are wrong fonts; Then wide word (Chinese and Chinese punctuation mark) the word width values (step 215) in font 2 is calculated; Finally by typesetting unit, word flow is carried out typesetting (step 220).
Detailed process is as follows:
(1) receive document to be analyzed, as shown in Figure 7, parse font resource wherein and word content.It contains 3 kinds of fonts, is the font 1 of title respectively, the Chinese font 2 in text and the english font 3 in text, and wherein font 1 is correct font, and font 2 and font 3 are the fonts containing the wide information of erroneous words.
(2) deterministic process is with reference to Fig. 4, assesses (step 305) successively, judge whether font is embedded font (step 310) to 3 kinds of fonts, due to the embedded font of font 1 right and wrong, marks it for correct font (step 335).
Font 2 and font 3 are all embedded fonts, setting sample number is 5,5 word samples (step 315) are chosen from respective word content, the word sample chosen for font 2 is " ice ", "AND", " fire ", " it ", " song ", and the word sample chosen for font 3 is " e ", " p ", " i ", " c ", " f ".Word width values when obtaining these word sample font sizes 1 respectively and font frame rectangle width value, concrete numerical value as shown in Table 1 and Table 2.Can be calculated the word width values of font 2 and font 3 and shaped edge width of frame value ratio average be respectively 0.57 and 0.66(step 320).Font 2 belongs to Chinese font, and rule of thumb preset threshold range is 0.9-1.1, and font 3 belongs to english font, and rule of thumb preset threshold range is 1.2-1.4.Font 2 and the ratio average of font 3 and the threshold range corresponding to it are contrasted (step 325), they, all beyond threshold range, are labeled as the font (step 330) containing wrong font descriptor by two fonts.
Table 1, the wide numerical value of font 2 word sample word
Ice With Fire It Song
Word width values 0.5 0.5 0.5 0.5 0.5
Font frame rectangle width value 0.95 0.95 0.95 0.95 0.95
Table 2, the wide numerical value of font 3 word sample word
e p i c f
Word width values 0.25 0.25 0.1 0.25 0.2
Font frame rectangle width value 0.38 0.38 0.15 0.38 0.31
(3) error correction procedure is with reference to Fig. 5, for all words using font 1, does not need to carry out error correction.For the word using font 2, for the explanation of " ice " word, first judge that its unicode belongs to wide literal scope (step 405), judge whether the font 2 that it uses has calculated wide word word wide (step 410) again, due to it be first by the word of error correction, not yet calculate, then perform computation process.Choose the wide word sample of font 2, still " ice ", "AND", " fire ", " it ", " song " these 5 words are taken as sample (step 420), obtain their shaped edge width of frame value and calculate average 0.95, be multiplied by correction factor 1.05(empirical value again) revise, obtain the revised wide word word wide 1(step 425 of font 2).The font size 10 being multiplied by word " ice " obtains its actual word wide 10(step 430), use other wide words of font 2 directly to perform step 430 when correction word is wide again.For the word using font 3, for the explanation of " e " word, first judge that its unicode does not belong to wide literal scope (step 410), obtain the font frame rectangle width value 0.38 of this word, use correction factor 1.3(empirical value) carry out correction and obtain 0.49(step 415).The font size 10 being multiplied by word " e " obtains its actual word wide 4.9(step 430).
(4) process of typeset is with reference to Fig. 6, and front four words in this document are caption texts, does not launch the typesetting rule that title is described here.The process of typeset of following clarifying text.Process body text (step 505) successively, for the explanation of " ice " word, judge its mistake in font (step 510), then its correct word width values 10(step 515 is obtained by error correction unit), wide for the word of current character being added to currently is arranged text width, increasing by 30 is 40(step 525), judge that it is less than one layout region width 340(step 530), then " ice " word (step 540) under current end of line row.Repeat above-mentioned steps until word " j ", get its word wide 2.5(step 515), be added to and currently arranged text width, 342.5(step 525 is risen to) by 340, judge that it has been greater than one layout region width (step 530), then resetting current text width of having arranged is 0, and by cumulative for a row baseline line height 15(step 535), then word " j " under new a line row.Repeat this process until drained all words.Effect after typesetting as shown in Figure 6.
Obviously, above-described embodiment is only for clearly example being described, and the restriction not to embodiment.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here exhaustive without the need to also giving all embodiments.And thus the apparent change of extending out or variation be still among the protection domain of the invention.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.

Claims (15)

1. a word modification method, is characterized in that, comprising:
Obtain at least one font of word in document;
For often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake;
Word in the font of font descriptor mistake is revised.
2. text composition method according to claim 1, is characterized in that, the described word according to the word in this font is wide, and the process selecting the font of font descriptor mistake comprises:
Choose at least two words in this font, calculate the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value;
Whether exceed threshold value according to described mean value, determine the font descriptor whether mistake of described font.
3. text composition method according to claim 2, is characterized in that, for wide word, described threshold value is 0.9-1.1; For non-wide word, described threshold value is 1.2-1.4.
4. the word modification method according to claim 1 or 2 or 3, is characterized in that, describedly comprises the process that the word in the font of font descriptor mistake revises:
According to the font frame rectangle width of word, revise the word of described word is wide.
5. word modification method according to claim 4, it is characterized in that, when revising the word of described word is wide, following formula is utilized to revise: W=D × a:W is revised word width values, D is the font frame rectangle width value of this word, and a is correction factor, for wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
6. word modification method according to claim 5, is characterized in that, for wide word, correction factor a=1.05, for non-wide word correction factor a=1.3.
7. the word modification method according to any one in claim 1-6, is characterized in that, described for often kind of font, wide according to the word of the word in this font, selects in the process of the font of font descriptor mistake, processes for embedded font.
8. according to the arbitrary described word modification method of claim 1-7, it is characterized in that, the described process that word in the font of font descriptor mistake is revised, comprise: for the word in the font of font descriptor mistake, first judge whether it is wide word, sample average is calculated for wide word and revises, non-wide word is obtained the font frame rectangle width value of character and revised.
9. word modification method according to claim 8, is characterized in that, calculates sample average and the process carrying out revising is as follows for wide word:
First, judge that the word of this word is wide whether as calculated, if do not calculate, choose the wide word sample using this font, then obtain the shaped edge width of frame value of sample and calculate average, then revised the rear width value of acquisition correction with correction factor; If as calculated, then directly call before result of calculation.
10. the word modification method according to any one of claim 1-8, it is characterized in that, also comprise composition step, typesetting is carried out after word in the font of font descriptor mistake is revised, typesetting after word width values is directly obtained for the word of font not containing font descriptor mistake, process of typeset is for be added to current text width of having arranged by wide for current word, if be greater than one layout region width, then reset and currently arranged text width and by cumulative for a row baseline line height, then this word under current end of line row; If be not more than one layout region width, direct under current end of line row this word.
11. 1 kinds of word update the systems, is characterized in that, comprising:
Acquiring unit: at least one font obtaining word in document;
Selection unit: for often kind of font, wide according to the word of the word in this font, select the font of font descriptor mistake;
Amending unit: the word in the font of font descriptor mistake is revised.
12. text composing systems according to claim 11, it is characterized in that, described selection unit comprises:
Mean value computation module: choose at least two words in this font, calculates the ratio of the word width values of each word and the font frame rectangle width of this word, and calculating mean value;
Judge module: whether exceed threshold value according to described mean value, determines the font descriptor whether mistake of described font.
13. text composing systems according to claim 12, is characterized in that, for wide word, described threshold value is 0.9-1.1; For non-wide word, described threshold value is 1.2-1.4.
14. word update the systems according to claim 11 or 12 or 13, it is characterized in that, described amending unit comprises: correcting module, according to the font frame rectangle width of word, revises the word of described word is wide.
15. word update the systems according to claim 14, it is characterized in that, in described correcting module, following formula is utilized to revise: W=D × a:W is revised word width values, D is the font frame rectangle width value of this word, and a is correction factor, for wide word, a=1.01-1.1, for non-wide word a=1.22-1.35.
CN201310447805.6A 2013-09-27 2013-09-27 A kind of word modification method and system Active CN104516859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310447805.6A CN104516859B (en) 2013-09-27 2013-09-27 A kind of word modification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310447805.6A CN104516859B (en) 2013-09-27 2013-09-27 A kind of word modification method and system

Publications (2)

Publication Number Publication Date
CN104516859A true CN104516859A (en) 2015-04-15
CN104516859B CN104516859B (en) 2018-02-13

Family

ID=52792187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310447805.6A Active CN104516859B (en) 2013-09-27 2013-09-27 A kind of word modification method and system

Country Status (1)

Country Link
CN (1) CN104516859B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488471A (en) * 2015-11-30 2016-04-13 北大方正集团有限公司 Character pattern recognition method and device
CN105718671A (en) * 2016-01-22 2016-06-29 集美大学 CAD character position adjustment method for tire mold arc typesetting
CN105718673A (en) * 2016-01-22 2016-06-29 集美大学 Spacing adjustment method for words on straight line of CAD
CN105741339A (en) * 2016-01-22 2016-07-06 集美大学 CAD linear word spacing adjusting method
CN113807048A (en) * 2021-09-10 2021-12-17 济南浪潮数据技术有限公司 Method, device, terminal and storage medium for self-adapting character width

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6288725B1 (en) * 1997-02-24 2001-09-11 Zining Fu Representation and restoration method of font information
CN101097600A (en) * 2006-06-29 2008-01-02 北大方正集团有限公司 Character recognizing method and system
CN101158940A (en) * 2007-11-21 2008-04-09 金蝶软件(中国)有限公司 Method and device for dwindling character stuffing in target region
CN101655835A (en) * 2009-08-26 2010-02-24 北大方正集团有限公司 Method for text message processing, text message output and character retrieval in electronic document and device thereof
CN102982328A (en) * 2011-08-03 2013-03-20 夏普株式会社 Character recognition apparatus and character recognition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6288725B1 (en) * 1997-02-24 2001-09-11 Zining Fu Representation and restoration method of font information
CN101097600A (en) * 2006-06-29 2008-01-02 北大方正集团有限公司 Character recognizing method and system
CN101158940A (en) * 2007-11-21 2008-04-09 金蝶软件(中国)有限公司 Method and device for dwindling character stuffing in target region
CN101655835A (en) * 2009-08-26 2010-02-24 北大方正集团有限公司 Method for text message processing, text message output and character retrieval in electronic document and device thereof
CN102982328A (en) * 2011-08-03 2013-03-20 夏普株式会社 Character recognition apparatus and character recognition method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488471A (en) * 2015-11-30 2016-04-13 北大方正集团有限公司 Character pattern recognition method and device
CN105488471B (en) * 2015-11-30 2019-03-29 北大方正集团有限公司 A kind of font recognition methods and device
CN105718671A (en) * 2016-01-22 2016-06-29 集美大学 CAD character position adjustment method for tire mold arc typesetting
CN105718673A (en) * 2016-01-22 2016-06-29 集美大学 Spacing adjustment method for words on straight line of CAD
CN105741339A (en) * 2016-01-22 2016-07-06 集美大学 CAD linear word spacing adjusting method
CN105741339B (en) * 2016-01-22 2018-10-19 集美大学 A kind of CAD straight lines word spacing method of adjustment
CN105718671B (en) * 2016-01-22 2019-05-10 集美大学 A kind of CAD text point method of adjustment for the typesetting of tire-mold circular arc
CN105718673B (en) * 2016-01-22 2019-05-10 集美大学 A kind of CAD straight line text interval regulation method
CN113807048A (en) * 2021-09-10 2021-12-17 济南浪潮数据技术有限公司 Method, device, terminal and storage medium for self-adapting character width
CN113807048B (en) * 2021-09-10 2024-02-27 济南浪潮数据技术有限公司 Method, device, terminal and storage medium for self-adapting to character width

Also Published As

Publication number Publication date
CN104516859B (en) 2018-02-13

Similar Documents

Publication Publication Date Title
CN104516859A (en) Character correcting method and system
CN103268185B (en) The text display method of E-book reader and device
CN106933835A (en) The data lead-in method and system of a kind of compatibility parsing Excel file
WO2019041527A1 (en) Method of extracting chart in document, electronic device and computer-readable storage medium
CN104462068B (en) Character conversion system and character conversion method
CN104732228A (en) Detection and correction method for messy codes of PDF (portable document format) document
CN110990010B (en) Method and device for generating software interface code
US9613005B2 (en) Method and apparatus for bidirectional typesetting
CN112380824B (en) PDF document processing method, device, equipment and storage medium for automatically identifying columns
KR20150099936A (en) Method and apparatus for applying an alternate font for maintaining document layout
CN103970913A (en) UTF-8 and ANSI code identification method and device
CN101655835B (en) Method for text message processing, text message output and character retrieval in electronic document and device thereof
CN104516868A (en) Layout space streaming restoring method and layout space streaming restoring system
CN112949290B (en) Text error correction method and device and communication equipment
CN101937429A (en) Page composing method and system for mobile terminal
CN106776527B (en) Electronic book data display method and device and terminal equipment
US20130227444A1 (en) Method and Device for Improving Page Rendering Speed of Browser
CN110442843B (en) Character replacement method, system, computer device and computer readable storage medium
US20150331837A1 (en) Text processing method and mobile terminal
CN112699634B (en) Typesetting processing method of electronic book, electronic equipment and storage medium
CN104156345B (en) The method and apparatus of caption in identification portable document format file
CN105653549A (en) Method and device for extracting document information
CN103136166A (en) Method and device for font determination
CN112230989B (en) Webpage channel navigation bar extraction method, system, electronic equipment and storage medium
CN105335346B (en) A kind of Text Extraction and device of PDF document

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220624

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Beijing Fangzheng apapi Technology Co., Ltd.

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Beijing Fangzheng apapi Technology Co., Ltd.

TR01 Transfer of patent right