CN1083638A - Chinese file shelves compression processing method and device - Google Patents

Chinese file shelves compression processing method and device Download PDF

Info

Publication number
CN1083638A
CN1083638A CN 92109706 CN92109706A CN1083638A CN 1083638 A CN1083638 A CN 1083638A CN 92109706 CN92109706 CN 92109706 CN 92109706 A CN92109706 A CN 92109706A CN 1083638 A CN1083638 A CN 1083638A
Authority
CN
China
Prior art keywords
chinese
compression
probability
processing unit
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 92109706
Other languages
Chinese (zh)
Other versions
CN1031680C (en
Inventor
张俊盛
刘显仲
柯淑津
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENGBAO CO Ltd
Original Assignee
SHENGBAO CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENGBAO CO Ltd filed Critical SHENGBAO CO Ltd
Priority to CN 92109706 priority Critical patent/CN1031680C/en
Publication of CN1083638A publication Critical patent/CN1083638A/en
Application granted granted Critical
Publication of CN1031680C publication Critical patent/CN1031680C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Utilizing Chinese characteristics, is coding unit with the Chinese word, according to its appearance probability in Chinese file, Chinese file is cut into the unit of the indefinite length of suitable compression, and the file after the processing can be weaved into and comprise the various frequency codes of Huffman code ex hoc genus anne.With the Chinese file that apparatus of the present invention were handled, convert Huffman code to after, can obtain about compression ratio about in the of 35-55%.The present invention also provides a kind of method that determines the Chinese language processing unit.

Description

Chinese file shelves compression processing method and device
The invention relates to a kind of Chinese file shelves compression processing method and device, particularly about a kind of Chinese word that utilizes as processing unit, with the archives compress technique, reach the method and the device of the shared memory capacity of a large amount of compression Chinese file shelves.
After Chinese Computerization is popularized gradually, produce a large amount of computer Chinese files, for example publish every day 28 spaces of a whole page the day newspaper office after computerization, promptly produce the literal shelves of about 2,500,000 middle literal every day, other is as publishing house, government bodies, the middle literal shelves that produced every day such as company are difficult to counting especially.Under the situation of this data blast, or be the waste that reduces the storage area, or be raising data transmitting speed, the necessity with the Chinese file compression is arranged in fact.
Being commonly used at present compress in the technology of Chinese file, a kind of ARC formula is arranged, is to be the unit with the byte, that is utilize the bit group of literal in the representative, according to its frequency that in Chinese file, occurs, form Huffman code, utilize known Hoffmannshen Method, the compression Chinese file.Because this kind mode is that Chinese file is compressed in the mode of compressing English data, do not use the characteristic of Chinese, so compression factor is not high, can reach about compression ratio about in the of 75%.
General Chinese sign indicating number no matter be BIG-5 sign indicating number or TCA sign indicating number, is compiled sign indicating number with two bytes to literal in each.This kind coded system because practicality reaches and the consideration of English ASCII character compatibility, is not used operable code space fully.Therefore, another kind of possible File Compress method is with middle literal recompile, reduces the figure place of each yard, can obtain some compressions very easily.
For example,, only need sign indicating number (annotate: 14 times of 2 is 16384) recompile in addition, can obtain the compression ratio of 14/16=87.5% with 14 positions of every word to literal in more than 10,000.
Above-mentioned dual mode all is to be coding unit with " word ".But the logical block that Chinese uses is actually " speech " rather than word, and Chinese word coding should obtain preferable compression effectiveness in theory.For example.The number of Chinese word commonly used is greatly about about 60,000, if with Chinese word coding, the sign indicating number (annotate: 16 times of 2 is 65536) with 16 positions can be compiled in each speech, can apply the needs of 60,000 speech enough, this mode, average number of words with reference to each Chinese word is about 1,6, and the code length that can approximately calculate average each word is 16/1.6=10 position.Therefore compression ratio then is approximately 10/16=62.5%.More efficient far beyond the coding that with the word is the unit.
Said method all only relates to the completeness (or imperfection incompleteness) of coding to the research of compression unit, the imperfection of utilization BIC-5 sign indicating number and TCA sign indicating number, again the sign indicating number that word or speech are compiled to make full use of code space reaches compression effectiveness, and handled all be the sign indicating number of equal length.In traditional data compression method, emphasize to utilize the appearance probability of coding unit, to the high unit of probability occurring, reach the purpose of compression by this to compile with long sign indicating number, the compression efficiency of these class methods is decided by choosing of compression unit usually.Yet known analogy probability Methods for Coding all can not be utilized Chinese characteristic, provides bigger compression factor, to conform with user's needs.
Purpose of the present invention is to provide a kind of characteristic of utilizing Chinese, the Chinese file shelves compression processing method and the device of compression Chinese file shelves.
Another object of the present invention is also at compression method that a kind of Chinese file shelves are provided and device, so that the storing memory amount of Chinese file shelves is done more efficient compression.
Another object of the present invention is also at the determining method that a kind of Chinese compression processing unit is provided, so that Chinese more efficient compression effect to be provided.
Find that through the inventor past, encoded method reached the technology of compression effectiveness, the number that its key calculates according to the appearance probability of all compression units in randomness (Entropy) value of file-promptly.Though the meaning of unrest degree value and computational methods are not within discussion scope of the present invention, but assist the technological thought of File Compress to be summarized as follows: in the archives of institute's desire compression with the notion of random degree value, other probability occurs all compression units more consistent (predictability is low, the randomness height), compression effectiveness is poorer; Otherwise if probability more inconsistent (predictability height, randomness is low) appears individually in compression unit, then compression effectiveness is better.
Each file all has its max. Possible compression ratio rate, and the preposition means of File Compress are promptly being found out the coding unit that one group of randomness that makes file reduces as far as possible.Therefore, though the present invention needn't be subject to its theory, but the inventor thinks, can make the Chinese file randomness drop to the compression unit group of theoretical minimum, should be the most approaching Chinese vocabulary of definition naturally, in other words, be the Huffman code that compression unit is weaved into similar Chinese vocabulary, Chinese file can be compressed near the theoretic limit.
Comprehensive above observation and analysis, the inventor thinks, if find a compression unit group of approaching Chinese words and phrases, is the compression unit coding with Chinese with this compression unit group, the at first compression that can obtain the first step because of the completeness and the logicality of its coding, add because of each compression unit uses the predictability of probability and weave into Huffman code, further more compressed.
Following with reference to description of drawings method and apparatus of the present invention.
The 1st figure represents the compression method of Chinese file shelves of the present invention and installs the system diagram of an embodiment.
The 2nd figure represents the compression method of Chinese file shelves of the present invention and installs the moving system figure of the compression processing unit cutter sweep of an embodiment.
In theory philological, how to seek out an objectively best compression unit group (or title " coding dictionary "), and the file shelves are compressed to theoretical minimum, be a very good problem to study.As long as can provide effective compression effectiveness but on using, find out one " coding dictionary " desirable, that be easy to processing (acquisition).Below how explanation method and apparatus of the present invention utilizes this one " coding dictionary " that the sign indicating number that Chinese file effectively cuts into a low randomness is gone here and there, and uses and compresses processing.
(1) compression of Chinese file shelves
The 1st figure represents the system diagram of one embodiment of the invention.Compression proposed by the invention at first comprises a compression processing unit cutter sweep (100) and one " coding dictionary " (101).Be marked with the use probability of each compression processing unit in the coding dictionary (101).Compression processing unit cutter sweep (100) can cut into the minimum compression unit sign indicating number string of randomness with file according to the use probability of each compression unit of record in the coding dictionary (101).
Coding table of comparisons generation device (102) can be according to the use probability of each compression unit of putting down in writing in the coding dictionary (101), derive Hofman tree and produce a coding table of comparisons (103) with the method that is suitable for, put down in writing the corresponding relation of each coding unit and Huffman code.Usually, this coding table of comparisons (103) is a table of making in advance on using, and is stored in the memory in a suitable manner.(104) be a memory, for storing treated file shelves.
When utilizing Chinese file shelves compression treatment device compression of the present invention to handle the Chinese file shelves, at first by system Chinese original document is read in, it is medium pending to be placed on memory (104).When doing the compression processing, by the use probability of compression processing unit cutter sweep (100) according to each compression unit of record in the coding dictionary (101), file according to method of the present invention, is cut into the minimum compression unit sign indicating number string of randomness, be stored in once again in the memory (104).The Chinese file shelves that compression set (105) is crossed encoding process, the reference encoder table of comparisons (103), the compression unit that constitutes a row sign indicating number string in the file shelves is converted to Huffman code one by one and obtains the compression shelves, be stored in once again in the memory (104), finish compression and handle.
Contract when handling carrying out back-pressure, back-pressure contracts processing unit (106) earlier with the compression shelves in the memory (104), and the reference encoder table of comparisons (103) is inverted to the original compression unit one by one to the Huffman code of representative compression shelves, and obtains original file shelves.
(2) cutting of compression processing unit
As previously mentioned, be the use probability of complying with each compression unit of record in the coding dictionary (101) earlier with compression processing unit cutter sweep (100) doing when compression is handled, file is cut into the minimum compression unit sign indicating number string of randomness.At this moment, the quality of cutting effect will become a treatment effect big key how.
Known cutting method is to utilize the means of long compression unit, with a Chinese fragment (normally Chinese sentence), searches the longest compression unit by known cut point toward sentence is terminal.Seek after, be cut point on next with this compression unit tail end, use these means then repeatedly, cut.
For example Chinese fragment " research institute carries out moving really " will be cut into the method:
" research institute | really | carry out | moving ",
This is not a desirable patterning method.The patterning method that randomness is lower should be:
" research institute | | really | action ",
So said method also is not suitable as Chinese file cutting process means.
The present invention is the cutting means that propose " probability competition " aspect cutting process.Via means, the Chinese file shelves can be cut into the coding unit sign indicating number string of a string low randomness, convert this yard string to Huffman code after, original Chinese file shelves can obtain very big compression.The cutting means of this kind probability competition can be applicable to various coding dictionary, and it can provide the minimum cutting effect of randomness under the coding dictionary of institute's reference; The coding dictionary more is an optimization, and resulting result is better.Though theoretic forced coding component dictionary is not an emphasis of the present invention,, can reach extraordinary compression effectiveness with general Chinese dictionary and the present invention use of arranging in pairs or groups.
Following foundation the 2nd figure illustrates the motion flow of Chinese file compression processing unit cutter sweep of the present invention (100).
The present invention at first obtains a Chinese fragment after starting in the Chinese file shelves (104) in (200) from memory, prepare to handle.This Chinese fragment is sentence normally.
With headed by the lead-in of this Chinese fragment, whether the different length word string of composition lists among the coding dictionary (101) in (201) systems inspection.To each seek coding unit, set up a virtual route, and be labeled as cut point at the relative position of compression unit tail end, then index is pointed to second word.This place accent virtual route promptly is the description to the cut point position.
Whether checkpoint (204) is checked has cut point above one in present index location.As surpass, then carry out action (202), otherwise, action (203) then carried out.
Relatively all have the path of cut point, the comprehensive probability that the cutting mode from lead-in to present index location may take place in present index location in action (202).The algorithm of comprehensive probability multiplies each other for the appearance probability with all compression units that are connected.Relatively, stay the probability maximum path takes place, delete all the other paths.It must be the path that the probability maximum takes place that this comparison makes last result.Action (202) need check whether index has pointed to this Chinese fragment tail end after finishing.In this way, then finish cutting to this Chinese fragment; As not, then down carry out action (203).
Action (203) is at first moved a position forward with index, points to literal in the next one.Then all cut points are done following action with the path that index location is identical at present: whether inspection lists among the coding dictionary (101) with composition different length word string headed by the word of present index location, then each is sought coding unit, form one or several new routes after being connected on original route, and be labeled as cut point at each new route tail end.List in the dictionary if all word strings are neither.Then delete this path, because this kind patterning method generation probability is zero.After action (202) finishes, carry out flow process rebound checkpoint (204), carry out above-mentioned exercises then repeatedly and finish up to sentence.At last, the cutting result with gained is indicated in the file shelves, is stored in the container, finishes cutting process.
Following example illustrates the handling process of compression processing unit cutter sweep shown in the 2nd figure.Suppose that the partial content that stores is in coding dictionary (101):
Table one
Coding unit generation probability
Grind 0.000009
Research 0.000496
Research institute 0.000018
Institute 0.002450
0.054438
Really 0.000070
True 0.000077
Certain 0.000067
Real 0.000179
Carry out 0.000084
Action 0.000167
Moving 0.000175
The probability of above annotation is 1,000,000 times (ten quadratic powers of ten) of actual value.
Step 1:
Action (200) obtains a Chinese fragment from the Chinese file shelves of memory (104), prepare to handle.
The Chinese fragment that obtains is:
" research institute carries out moving really "
Step 2:
, form the different length word string and whether list among the coding dictionary (101) with headed by the lead-in (grinding) of this Chinese fragment in (201) systems inspection.To each seek coding unit, set up a virtual route, and be labeled as cut point at the relative position of coding unit tail end.
Index refers at " grinding " word.Set up three initial paths.
Path: grind | 9000000
Research | 496000000
Research institute | 18000000
Step 3:
Checkpoint (204) checks that whether surpassing a path has cut point in present index location.
In this example, checkpoint (204) check result finds that not surpassing a path has cut point in target index location (grinding).Index moves to " studying carefully " word.
Whether (203) are positioned at the path of present index location (studying carefully) to all cut points at this moment, check with the different length word string of forming headed by the word of present index location and list among the coding dictionary (101).Found that with the word string headed by " studying carefully " and to study carefully institute as " studying carefully " ", " studying carefully " etc. is neither in the coding dictionary, delete this path " grind |, remaining:
Path: research | 496000000
Research institute | 18000000
Step 4
Index moves to the 3rd word.Repetitive operation (203), all cut points are done following action with the path that index location is identical at present: whether inspection lists among the coding dictionary (101) with composition different length word string headed by the word of present index location (institute).If any.With each seek coding unit, be connected on and form one or several new routes after the original route, and be labeled as cut point at each new route tail end.
In this example, " institute " word becomes so meet " institute " after path " research | " " study | institute | " among coding dictionary (101).
Path: research | institute | 1215200
Research institute | 18000000
Step 5:
(204) check that whether surpassing a path has cut point in present index location in the checkpoint equally.As surpass, then carry out action (202), calculate the path that the present index location in place has cutting, comprehensive probability may take place in the cutting mode of the index location from lead-in to present index.The algorithm of comprehensive probability is taken advantage of for the appearance probability of all compression units that are connected is looked into.Compare its size.Relatively, stay the path that the probability maximum takes place, delete all the other paths.
Check result finds to surpass a path has cut point in present index location, so relatively its generation probability is big little.The path: " research | institute | " deletion, because the probability (18000000) that probability (1215200) is lower than path " research institute | " takes place in it.The result is remaining:
Path: research institute | 18000000
Step 6:
Index moves to the 4th word.Repeat (203) action.Obtain two possible compression units " " and " really " be connected on " research institute | " afterwards, each forms a new route.
Path: research institute | | 979884
Research institute | really | 1260
Step 7:
Index moves to the 5th word.Repeat (203) action.Obtaining two other possible compression unit " really " and " really " is connected on " research institute | | " afterwards, each forms a new route.
Path: research institute | | really | 65.652228
Research institute | | really | 75.451068
Research institute | really | 1260.000000
Step 8
The judgement of foundation (204) repeats (202) action, and finding to surpass a path has cut point in present index location, and relatively probability takes place for it.
Path: research institute | | really |, deletion lower because probability (75.451068) takes place for it than path " research institute | really | " (1260.000000).Obtain:
Path: research institute | | really | 65.652228
Research institute | really | 1265.000000
Step 9
Index moves to the 6th word.According to the judgement of (204), repeat the action of (203).Obtain two other possible compression unit " reality " and " implementation " and be connected on " research institute | really | " afterwards, each forms a path, that is:
Path: research institute | | really | 65.652228
Research institute | really | real | 0.225540
Research institute | really | carry out | 0.105840
Step 10:
According to the judgement of (204), repeat the action of (202), finding to surpass a path has cut point in present index location, and relatively probability takes place in it.
The path " research institute | really | real | " deletion because its take place probability (0.225540) than the path " research institute | | really | " (65.652228) low.Obtain:
Path: research institute | | really | 65.652228
Research institute | really | carry out | 0.105840
Step 11:
Index moves to the 7th word.Checkpoint (204) check result finds not surpass a path has cut point in present index location (moving).Index moves to " moving " word:
Path: research institute | | really | action | 0.010964
Research institute | really | carry out | 0.105840
Step 12
Index moves to the Eight characters:
Path: research institute | | really | action | 0.010964
Research institute | really | carry out | moving | 0.000019
Step 13:
According to the judgement of (204), repeat the action of (202), finding to surpass a path has cut point in present index location, and relatively probability takes place in it.
The path " research institute | really | carry out | moving | " deletion because its take place probability (0.000019) than the path " research institute | | really | action | " (0.010964) low.Obtain:
Path: research institute | | really | action | 0.010964
Step 14:
Reach the sentence tail end, the compression processing unit cutting action finishes.
(3) invention effect
As shown in the above description, utilize Chinese file shelves compression processing method of the present invention and device, can obtain correct cutting effect, and can reach the good compression effect.
Table 2 shows the effect of utilizing Chinese file shelves compression processing method of the present invention and device to handle 3 Chinese file shelves gained.
Figure 921097069_IMG2
By shown in this table as can be known, the present invention can obtain the compression ratio about about 35-55% when doing the compression of Chinese archives and handle, really have the raising effect.Moreover applicable scope of the present invention can comprise various data compression or other processing through frequency coding, also has purposes is extensively arranged.
It more than is embodiment of the invention explanation.Those skilled in the art are not difficult by understanding spirit of the present invention in the above explanation, and make different variations according to this and extend, in any case but, all belong to the present invention's category.

Claims (12)

1, a kind of compression treatment device of Chinese file shelves comprises
One input unit is in order to import pending Chinese file shelves;
One memory is for storing the file shelves of handling;
One coding dictionary is put down in writing the use probability of each compression processing unit;
One compression processing unit cutter sweep can cut into the file shelves compression processing unit sign indicating number string that hangs down randomness;
The one coding table of comparisons, the compressed code of each compression processing unit represent in record, and this table of comparisons system on average occurs the probability coding according to each compression processing unit in the Chinese file shelves, give and occur probability the higher person short yard, the probability junior occurs, give and long sign indicating number;
One compression set with the Chinese file shelves of this compression processing unit cutter sweep cutting process, with reference to this coding table of comparisons, converts the compression processing unit in the file shelves to compressed code one by one and obtains the compression shelves;
It is characterized in that this compression processing unit cutter sweep system utilizes in this coding dictionary and puts down in writing, the use probability of each compression unit is calculated various possible cuttings probability mark as a result, the cutting result that the probability mark is the highest, and the voltage supply compression apparatus is handled.
2, as the device of claim the first, other contains the back-pressure processing unit that contracts, and the compressed file shelves in this memory with reference to this coding table of comparisons, can be inverted to the original compression unit to the compressed code of representing this document shelves one by one, and obtain original file shelves.
3, as the device of 1 of claim the, compression processing unit cutter sweep wherein contains:
One apparatus for temporary storage, can take out a Chinese fragment from the Chinese file that desire is handled:
Whether first testing fixture is checked with the different length word string of forming headed by the specific word of this Chinese fragment and is listed among this coding dictionary, seek the word string tail end be labeled as cut point, and, set up a virtual route to word string coherent before and after all;
Second testing fixture checks that whether surpassing a path has cut point at same position, is positioned at the path of this position to all cut points, the different length word string of inspection to form headed by the word of this position, whether list among the coding dictionary, can't in dictionary, find this word string, then delete this path;
One calculation element with reference to the use probability of each word string of this coding dictionary record, calculates each path probability mark;
One comparison means, relatively the probability in each path is disperseed, with the highest path of mark, as the result of cutting.
4, as the device of 3 of claims the, wherein the apparatus for temporary storage of this compression processing unit cutter sweep can take out a Chinese sentence, as the fragment of handling from the Chinese file shelves that desire is handled.
5, as the device of 3 of claims the, the coding dictionary of this compression processing unit cutter sweep wherein, the speech of the compression processing unit system Chinese of record.
6, as the device of 3 of claims the, wherein the calculation element of this compression processing unit cutter sweep is to use probability to multiply each other in this coding dictionary record the contained compression processing unit in each path, to calculate the probability mark in each path.
7, a kind of compression processing method of Chinese file shelves comprises
One input step is imported pending Chinese file shelves, is stored in the memory;
One cutting step cuts into this document shelves the compression processing unit sign indicating number string that hangs down randomness;
One compression step with the Chinese file shelves of this cutting step cutting process, with reference to the coding table of comparisons of compressed code of each compression processing unit of record, converts the compression processing unit in the file shelves to compression graceful sign indicating number one by one and obtains the compression shelves; Wherein, this table of comparisons system on average occurs the probability coding according to each compression processing unit in the Chinese file shelves, gives and the short sign indicating number of probability the higher person occurs, the probability junior occurs, gives and long sign indicating number;
It is characterized in that this cutting step system utilizes the use probability of each compression unit of putting down in writing in the coding dictionary, calculate various possible cutting results' probability mark, the cutting result that the probability mark is the highest, the voltage supply step process that contracts.
8, as the method for 7 of claims the, other contains the back-pressure step that contracts, can be with these compressed file shelves, and the reference encoder table of comparisons is inverted to the original compression unit to the compressed code of representing this document shelves one by one, and obtains original file shelves.
9, as the method for 7 of claims the, cutting step wherein contains:
One temporary step is taken out a Chinese fragment from the Chinese file shelves that desire is handled;
First checks step, and index places the lead-in of Chinese fragment, check with the different length word string of forming headed by this word and whether list among this coding dictionary, and to each seek word string, set up a virtual route, be labeled as cut point at the word string tail end:
The second inspection step checks that whether surpassing a path has cut point in present index location, as denying the mobile position of index forward;
The 3rd checks step, to all cut points and the present identical path of index location, whether inspection lists among the coding dictionary with the word string of the different length formed headed by the word of present index location, seek each to such an extent that encode, form new route after being connected on original route, and be labeled as cut point at each new route tail end, if can't in this encodes dictionary, find this word string, then delete this path;
One calculation procedure checks that to second step checks out that surpassing one has the path of cut point in index location at present, with reference to the use probability of each word string of this coding dictionary record, calculates this path, by the probability mark of lead-in to present index location;
One comparison step compares the probability mark in each path, and the path that mark is the highest is made for compression and handles;
The 4th step is checked and whether has been handled the last word of handling Chinese fragment to this, as not, then repeats above-mentioned processing, in this way, then finishes dividing processing.
10, as the method for 9 of claims the, wherein the temporary step of this cutting step is the fragment that the Chinese sentence that takes out from the Chinese file shelves that desire is handled deals with.
11, as the method for 9 of claims the, wherein the compression processing unit of the employed coding dictionary of this cutting step record is a Chinese word.
12, as the method for 9 of claims the, wherein the calculation procedure of this cutting step is to use probability to multiply each other in this coding dictionary record the contained compression processing unit in each path, to calculate the probability mark in each path.
CN 92109706 1992-08-22 1992-08-22 Processing method and device for compressing Chinese documents Expired - Fee Related CN1031680C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 92109706 CN1031680C (en) 1992-08-22 1992-08-22 Processing method and device for compressing Chinese documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 92109706 CN1031680C (en) 1992-08-22 1992-08-22 Processing method and device for compressing Chinese documents

Publications (2)

Publication Number Publication Date
CN1083638A true CN1083638A (en) 1994-03-09
CN1031680C CN1031680C (en) 1996-04-24

Family

ID=4944307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 92109706 Expired - Fee Related CN1031680C (en) 1992-08-22 1992-08-22 Processing method and device for compressing Chinese documents

Country Status (1)

Country Link
CN (1) CN1031680C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010108373A1 (en) * 2009-03-27 2010-09-30 广东国笔科技股份有限公司 Method and system for compressed encoding and decoding for word stock
CN101751451B (en) * 2008-12-11 2012-04-25 高德软件有限公司 Chinese data compression method and Chinese data decompression method and related devices

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100647192B1 (en) 2005-09-01 2006-11-23 삼성전자주식회사 Data coding apparatus and method for reducing decoding time of data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751451B (en) * 2008-12-11 2012-04-25 高德软件有限公司 Chinese data compression method and Chinese data decompression method and related devices
WO2010108373A1 (en) * 2009-03-27 2010-09-30 广东国笔科技股份有限公司 Method and system for compressed encoding and decoding for word stock

Also Published As

Publication number Publication date
CN1031680C (en) 1996-04-24

Similar Documents

Publication Publication Date Title
CN1205574C (en) Method of and apparatus for compressing and expanding data and data processing apparatus and network system using same
CN1183683C (en) Position adaptive coding method using prefix prediction
CN1145264C (en) Data compression and decompression system with immediate dictionary updating interleaved with string search
CN101036143A (en) Multi-stage query processing system and method for use with tokenspace repository
CN1269102C (en) Method for compressing dictionary data
CN1868127A (en) Data compression system and method
KR101049699B1 (en) Data Compression Method
CN1217259C (en) Method and apparatus for data compression, computer program and storage media
CN101036298A (en) System and method for static Huffman decoding
CN1257621A (en) Method and apparatus for lossless digital data compression
CN1894856A (en) Sequential data maintenance compression device
De Moura et al. Direct pattern matching on compressed text
CN1267963A (en) Data compression equipment and data restorer
CN1287320C (en) Method of compressing digital ink
CN101075261A (en) Method and device for compressing index
CN1949221A (en) Method and system of storing element and method and system of searching element
CN1031680C (en) Processing method and device for compressing Chinese documents
Wiseman et al. Conjugation-based compression for Hebrew texts
CN101055593A (en) Tibetan web page and its code identification method
CN1134895C (en) Method and apparatus for compression of electronic ink
CN100343851C (en) Database compression and decompression method
Bhadade et al. Lossless text compression using dictionaries
CN1282113C (en) Quick-sorting in page method based on quick sorting computation
CN101051846A (en) Quick Huffman decoding method based on context
CN1301596C (en) Compressing and decompressing method of digital image data

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee