CN1253781C - Combiner word-formation method in Chinese characters electronicalization - Google Patents

Combiner word-formation method in Chinese characters electronicalization Download PDF

Info

Publication number
CN1253781C
CN1253781C CN 200410015238 CN200410015238A CN1253781C CN 1253781 C CN1253781 C CN 1253781C CN 200410015238 CN200410015238 CN 200410015238 CN 200410015238 A CN200410015238 A CN 200410015238A CN 1253781 C CN1253781 C CN 1253781C
Authority
CN
China
Prior art keywords
word
chinese character
chinese
font
parts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200410015238
Other languages
Chinese (zh)
Other versions
CN1558314A (en
Inventor
皮佑国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN 200410015238 priority Critical patent/CN1253781C/en
Publication of CN1558314A publication Critical patent/CN1558314A/en
Application granted granted Critical
Publication of CN1253781C publication Critical patent/CN1253781C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Controls And Circuits For Display Device (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a part combining and character forming method in electronic Chinese characters. Chinese character parts forming Chinese characters are used as the basic units of the information processing of internal storage, transfer and management, etc. The Chinese character parts are the set of the basic units of all the Chinese characters by combining radicals. The structure of the Chinese characters, the position in a character and the size of parts forming the Chinese characters and the patterns of various character shapes are respectively described as foundation to form the Chinese characters during forming the Chinese characters. The present invention produces characters according to parts instead of the existing method which selects characters in a character library during human-computer interaction and computer information processing. The present invention makes the Chinese character information system have advantages of steady standard, high efficiency, low cost and satisfaction of the need of total electronic Chinese characters.

Description

Combiner word-formation method in the Chinese character electronization
Technical field
The present invention relates to the generation method of a kind of Chinese character in computing machine, is specifically in the computer Chinese information process software, the combiner word-formation method in the Chinese character electronization.
Background technology
Computer information trend is the irreversible trend in the world today, and spoken and written languages are principal mode and carriers of information, so the foundation works and the gordian technique of the electronic information computer information of spoken and written languages.The computer information of alphabetic writing is with the primitive of letter as information processing, is example with English, and its primitive is made of the capital and small letter and the conventional sign of 26 letters.Because Chinese character is an ideographic language, the planar structure complexity, up to the present, international, domestic Chinese information processing software all is to adopt the character library mode, earlier character library set up in the Chinese character of a certain standard code, each Chinese character all has unique encoding, and this coding is called ISN usually.The ISN of Chinese character is as the object of information processings such as computer-internal storage, transmission and management in the character library.That is to say that-primitive is handled as least unit----with Chinese character during the underway civilian information processing of computing machine.And the Chinese character enormous amount: 6763 of standard GB 2312-80 baseset income Chinese characters, 7237 of second supplementary set income Chinese characters, 7039 of the 4th supplementary set income Chinese characters, totally 2139 words are called the standard character library.It satisfies the basic demand of daily use substantially.But can not satisfy whole requirements, because Chinese character is constantly the large character set of development.
As from the foregoing, make Chinese character all have unique encoding, even consider, also need two bytes just enough (except by compiling 8836 address codes at most 32 control characters of GB1988 regulation and 96 graphic characters) by the national Chinese characters of level 2 (6763) minimum in the above-mentioned number.This mode makes computer Chinese information have many problems when handling:
1, aspect storage:
1) only consider Chinese characters in common use (6763), a Chinese character address coding must two bytes.If consider more Chinese character, just must increase more byte (3 bytes or 4 bytes), just pay more expense.2) because the restriction of Chinese character quantity in the computing machine, the word that a lot of documents are particularly used in the document in history culture field can not find in existing character library, and electronization can only adopt graphic form, pays bigger cost for this reason.3) also in constantly development, character library always can not be caught up with the development of Chinese character to Chinese character in good time, increase a new Chinese character, needs to increase by two bytes at least and stores.This just means that the Chinese character mode is difficult to Chinese information management and sets up steady in a long-term and the rational data standard of scale, this just China still do not have the main cause of ISN standard so far.
In data transmission, alphabetic writings such as English the average information entropy less than 4.5 bits, a byte has 8 bits, therefore, remaining bit can be used for odd-even check and anti-wrong.The Chinese character mode of Chinese needs two bytes, and when setting up character library, the bit of two bytes does not have remaining bit to do check and anti-wrong all with full.This is to be easy to one of fundamental cause that occurs bad sign indicating number (not being mess code) in the Chinese network communication.Simultaneously, because code length is big, transport overhead is inevitable big, and efficient is low.
Aspect data management, the Chinese character information entropy is big, adopts selection mode during input, and workload is not little, and whole work efficiency is very low.Code length is big, management nature complexity.
From the angle analysis of the most basic amount of infosystem----entropy, the averaged static information entropy of Chinese character is 9.65, is maximum in the world information entropy, therefore can illustrate it is the minimum infosystem of efficient in theory.
Above-mentioned condition has proposed stern challenge to Chinese informationization: 1, the literal electronic information is an an irreversible historical trend, can or can not eliminate Chinese character through natural selection in the social progress process; 2, in the recent period, the information industry of China is in competition platform inequality when the information industry with the country that adopts alphabetic writing is at war with.How reversing above-mentioned passive situation, is to produce direct reason of the present invention.
Summary of the invention
The objective of the invention is to problem, provide the combiner word-formation method in a kind of Chinese character electronization, with the primitive of Hanzi component as information processings such as computer-internal storage, transmission and management at the prior art existence.When man-machine interaction, computing machine carries out coinage rather than resembles currently used method according to parts carrying out word selection.Make Chinese in the electronic information process stabilized reference, raise the efficiency, reduce expense and satisfy Chinese character electronization comprehensively.
Combiner word-formation method in the Chinese character electronization of the present invention is with the Hanzi component that the is used for coinage primitive as information processings such as storage inside, transmission and management; Described Hanzi component is meant and radical is merged the primitive set that (the other and wooden word as wood) can form all (containing existing Chinese character and following the generation) Chinese character.
Adopt the modern mathematics instrument respectively to the structure of Chinese character (about, up and down, left, center, right, upper, middle and lower etc.), form Chinese character the position of parts in becoming word (as upper and lower, left and right, go up in, in, below average), size (height and width with word are the ratio of benchmark) and the form (for example length breadth ratio) of various fonts be described coinage on this basis.
Combiner word-formation method in the Chinese character electronization of the present invention is with the parts of the forming Chinese character primitive as information processings such as computer-internal storage, transmission and management; The parts of Chinese character are set up part library, and during man-machine interaction, computing machine is according to parts coinage.
1, no character library Chinese character combination method
The present invention in conjunction with the characteristics of computer information processing, provides the no character library Chinese character combiner word-formation method that is fit to the computer information processing characteristics on the basis of comprehensive system summary Chinese characters word-formation rule.
The present invention cuts the end and gets rid of character library, adopts part library.Coinage principle according to Chinese character, in conjunction with the characteristics of computer information processing, determine scope 200 with interior Chinese character group word parts, these parts are based upon Hanzi component storehouse in the computing machine, all Chinese characters can be formed with the parts in the part library, so the present invention does not have character library.Cutting the end has thus solved the caused problems of character library mode.Parts in the part library can satisfy the needs of existing Chinese character combination, also can satisfy the constantly needs of development of Chinese character.
Unlike the prior art, method provided by the invention has only quantity 200 with interior part library, does not have huge character libraries thousands of, ten thousand.All Chinese characters are all formed according to certain principle combinations by each parts.When storing, transmitting and managing is that the parts and the assembly coding of Chinese character are stored, transmitted and manage.
2, the combiner word-formation method of Chinese character
Different with alphabetic writing, Chinese character is owing to the complicacy of its structure, and the unit construction rule has the feature of plane distribution.Simultaneously, the parts of Chinese character are except having planar characteristics of distribution, and each parts is in the plane except the difference of position, and its size dimension and shape (length and width ratio) also are not quite similar.
The present invention is according to the structure and the semanteme of Chinese character, adopt the modern mathematics instrument respectively the structure of Chinese character, the form of forming position, size and the various fonts of parts in becoming word of Chinese character to be described, and carry out intelligentized variation by computing machine, all according to the textural association of Chinese character again Chinese characters, and carry out level and smooth convergent-divergent according to font size.
Different with input method such as the Five-stroke Method, 1) input method such as the Five-stroke Method just solves Hanzi keyboard input problem, is input method of Chinese character.The present invention mainly solves the Chinese characters in computer word-formation method, being the bottom core technology 2 of Chinese information processing) computing machine is retrieved satisfactory set and is shown from character library behind the input method input block code such as the Five-stroke Method, select corresponding word by the importer again, so must rely on character library, the present invention does not rely on character library, but directly organizes word by parts.The Five-stroke Method etc.3) though input methods such as the Five-stroke Method also are by parts group word, but be that a kind of word decomposes coding by parts, do not relate to the combinatorial problem of parts in computing machine, that is to say the Hanzi component position on the word plane, size and the form problem and respectively organize word unit construction problem in the processing procedure in computing machine of not solving.The present invention needs to solve extremely the problems referred to above of parts of Chinese character comprehensively.
Invention compared with prior art has following advantage:
1,, can set up steady in a long-term and the rational data standard of scale for Chinese information management because primitive is stable.
2, aspect storage: a word has reduced consumption and cost as long as a byte has been saved storage space greatly.
3, because Chinese character is to make up by group word rule, its combination is limited by character library no longer, therefore can realize the electronization of existing Chinese character (the particularly very low word of the occurrence frequency of using in the historical document) fully, also can adapt to the needs of Chinese character development simultaneously.
4, because the shared space of Chinese character becomes a byte by original two bytes, therefore efficient obviously improves in message transmitting procedure, and expense diminishes.
5,, leave certain bit as error correction and anti-mistake, so the probability of the bad sign indicating number of appearance reduces the transmission precision raising greatly than having the character library mode now owing to 8 bit in the byte representing Chinese character do not use up.
Because the shared storage space of Chinese character reduces, averaged static information merchant reduces, and information uncertainty reduces, so management level of information and efficient raising.
Description of drawings
Fig. 1 is a nucleus module structure diagram of the present invention;
Fig. 2 is a combiner word-formation general flow chart of the present invention.
Embodiment
As shown in Figure 1, made up by implementer of the present invention, wherein the part library storage is by whole parts of forming Chinese character, and all Chinese characters of existing and later generation can form with unit construction wherein.Smart group word software comprises modules such as combiner word-formation supervisory routine, Chinese characters knowledge base, Chinese character style font size handling procedure.The combiner word-formation supervisory routine is responsible for relevant group of word of infosystem required to make an explanation, and group word process is managed; Chinese characters knowledge base includes the A to Z of such as the structure knowledges such as position component, form and size that Chinese character is formed.Chinese character style font size handling procedure is responsible for parts are carried out the variation of font and the size of word is carried out level and smooth conversion, with satisfy Chinese character handle in the needs of various fonts and all size font size.
Accompanying drawing 2 is schematic flow sheets of the present invention.1-2 is elaborated below in conjunction with accompanying drawing, when the input of carrying out Chinese character by Man Machine Interface (manually input or identification), the combiner word-formation supervisory routine makes an explanation according to part codes, font code and the font size code of Man Machine Interface input, determines that a little parts of Chinese character of input are formed.Carry out following operation then: 1) from the Hanzi component storehouse, access corresponding parts; 2) structure of this Chinese character of consulting in Chinese characters knowledge base comprises each position component, form and size; 3) with parts, font information with deliver to font handling procedure porch in the font size program; 4) by the font size program building block of this word is handled, realized that parts meet the requirement of font and structure, organize word according to structural information then.5) the font size program is carried out level and smooth convergent-divergent according to font size information to becoming word.What 6) will meet font and size specification becomes the word machine interactive interface of making a gift to someone, again by Man Machine Interface show, output function such as printing.To become the coded message of word to deliver to the combiner word-formation program simultaneously, will become the coded message of word to deliver to the information handling system interface by the combiner word-formation supervisory routine again, operation such as store by the information processing interface.
When directly calling in Chinese document by system, the combiner word-formation program makes an explanation according to the part codes, font code and the font size code that read in from the information processing interface, determines the Chinese character group word information identical operations that the Chinese character of importing is made up of those parts and is carried out coming as Man Machine Interface.
Embodiment 1: as using " journey " word of keyboard loading routine, we import " standing grain ", " mouth " and " ninth of the ten Heavenly Stems " three parts respectively, Chinese character group word program knows that by knowledge base " journey " is left and right sides structure, wherein " standing grain " is radical, short transverse is the word height, Width is that 1/3 word is wide, and one of bottom right is cast aside and be should be a bit.The right of word is a up-down structure, and top mouth word width direction closely is 2/3 of a word, short transverse be word high 1/3.Following " ninth of the ten Heavenly Stems " word height is 1/2 of a word, and Width is identical with " mouth ", and is following other neat with " standing grain ", wide with mouth word interval 1/6 word.The group word program proposes above-mentioned three parts according to said structure and delivers to the font size handling procedure from part library, this program requires to carry out even convergent-divergent and carry out homogeneous deformation by font by font size, " journey " word of promptly exportable corresponding big small type size and font according to output.
Embodiment 2: as importing " grass " word with keyboard, we import " Lv ", " day " and " ten " three parts respectively, and Chinese character group word program knows that by knowledge base Chinese character spare " grass " is the upper, middle and lower structure, and wherein parts " Lv " are last, short transverse be word high 1/4, Width is that word is wide; " day " is placed in the middle, highly be word high 1/4, width be word wide 1/2; Parts " ten " occupy down, highly be word high 1/2, width is that word is wide.The group word program proposes above-mentioned three parts according to said structure and delivers to the font size handling procedure from part library, this program requires to carry out even convergent-divergent and carry out homogeneous deformation by font by font size, " journey " word of promptly exportable corresponding big small type size and font according to output.
Embodiment 3: as importing " together " word with keyboard, we import " ", " one " and " mouth " three parts respectively, Chinese character group word program knows that by knowledge base Chinese character spare " together " is first investing mechanism, wherein the position of parts " Jiong " is last, left and right three directions, short transverse is the word height, and Width is that word is wide; " one " between two parties down, height is at last 1/3 place, width be word wide 1/2; Down, width is that word is wide by 1/3 to parts " mouth " between two parties, and 1/3 right side that a left side arises from word terminates in 2/3 wide (from left to right calculating) of word; Highly be word high 1/3, on arise from word high 1/3, terminate in down 2/3 high (meter from top to bottom) of word.The group word program proposes above-mentioned three parts according to said structure and delivers to the font size handling procedure from part library, this program requires to carry out even convergent-divergent and carry out homogeneous deformation by font by font size, " journey " word of promptly exportable corresponding big small type size and font according to output.
Illustrate: for straightforward is described, the numeral among three embodiment all is roughly, is not precise figures.

Claims (2)

1, the combiner word-formation method in a kind of Chinese character electronization is characterized in that with the Hanzi component that the is used for coinage primitive as information processings such as storage inside, transmission and management; Described Hanzi component is meant radical is merged the primitive set of forming whole Chinese characters, comprise: when the artificial input of carrying out Chinese character by Man Machine Interface or identification, the combiner word-formation supervisory routine makes an explanation according to part codes, font code and the font size code of Man Machine Interface input, determine a little parts compositions of Chinese character of input, carry out following operation then:
1) from the Hanzi component storehouse, accesses corresponding parts;
2) structure of this Chinese character of consulting in Chinese characters knowledge base comprises each position component, form and size;
3) with parts, font information with deliver to font handling procedure porch in the font size program;
4) by the font size program building block of this word is handled, realized that parts meet the requirement of font and structure, organize word according to structural information then;
5) the font size program is carried out level and smooth convergent-divergent according to font size information to becoming word;
What 6) will meet font and size specification becomes the word machine interactive interface of making a gift to someone, again by Man Machine Interface show, the printout operation; To become the coded message of word to deliver to the combiner word-formation program simultaneously, will become the coded message of word to deliver to the information handling system interface by the combiner word-formation supervisory routine again, carry out storage operation by the information processing interface.
2, the combiner word-formation method in the Chinese character electronization according to claim 1, when it is characterized in that directly calling in Chinese document, the combiner word-formation program makes an explanation according to the part codes, font code and the font size code that read in from the information processing interface, determines the Chinese character group word information identical operations that the Chinese character of importing is made up of those parts and is carried out coming with Man Machine Interface.
CN 200410015238 2004-01-20 2004-01-20 Combiner word-formation method in Chinese characters electronicalization Expired - Fee Related CN1253781C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410015238 CN1253781C (en) 2004-01-20 2004-01-20 Combiner word-formation method in Chinese characters electronicalization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410015238 CN1253781C (en) 2004-01-20 2004-01-20 Combiner word-formation method in Chinese characters electronicalization

Publications (2)

Publication Number Publication Date
CN1558314A CN1558314A (en) 2004-12-29
CN1253781C true CN1253781C (en) 2006-04-26

Family

ID=34351379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410015238 Expired - Fee Related CN1253781C (en) 2004-01-20 2004-01-20 Combiner word-formation method in Chinese characters electronicalization

Country Status (1)

Country Link
CN (1) CN1253781C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012037721A1 (en) * 2010-09-21 2012-03-29 Hewlett-Packard Development Company,L.P. Handwritten character font library
CN103186511B (en) * 2011-12-31 2017-03-08 北京大学 Chinese characters word-formation method and apparatus, the method for construction fontlib
CN103197768B (en) * 2013-04-10 2017-02-08 梁秀霞 Ideogram input method and ideogram input keyboard
JP6229303B2 (en) * 2013-05-16 2017-11-15 富士通株式会社 Program, information processing apparatus, and character recognition method
TWM522398U (en) * 2015-11-11 2016-05-21 Cheah-Shen Yap Replaceable type character assembly system
CN113127127A (en) * 2021-04-30 2021-07-16 上海锐线创意设计有限公司 Method and device for generating synthetic word

Also Published As

Publication number Publication date
CN1558314A (en) 2004-12-29

Similar Documents

Publication Publication Date Title
US20060239562A1 (en) System and method for binary persistence format for a recognition result lattice
CN101799808A (en) Data processing method and system thereof
CN110597900B (en) Method for generating vector slice by GDB data in real time according to needs
CN101499065B (en) Table item compression method and device based on FA, table item matching method and device
KR100643849B1 (en) Embedded curve chinese character library described based on stroke central line
CN113139227B (en) BIM component construction code creation method based on Revit
CN115061721A (en) Report generation method and device, computer equipment and storage medium
US7370060B2 (en) System and method for user edit merging with preservation of unrepresented data
CN1253781C (en) Combiner word-formation method in Chinese characters electronicalization
CN102508919A (en) Data processing method and system
CN104750727A (en) Column type memory storage and query device and column type memory storage and query method
CN112613110A (en) Component encoding method based on road and bridge engineering building information model BIM
CN114372177A (en) Excel table data matching method
CN1101027C (en) External character management apparatus
CN1421803A (en) System and method capable of performing pinyin romanization-phonetic notation conversion of multiple-syllable word
US8051107B2 (en) Method and device for creating relation-type form database
CN1773453A (en) System constituting method based on data definition
CN100371875C (en) Universal compressed Chinese character library chip
CN2779484Y (en) A universal compressed Chinese character library chip
CN105938469A (en) Code storage method, data storage structure of texts and method for compressed storage of texts and statistics output
CN106406560A (en) Method and system for outputting vector fonts of mechanical engineering characters in desktop operation system
CN101944081A (en) Computer generation, edition method of Guqin abbreviated character notation and system thereof
CN115904240A (en) Data processing method and device, electronic equipment and storage medium
CN114385540A (en) Data unit conversion method and device
CN113971044A (en) Component document generation method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20060426

Termination date: 20110120