CN105224828A - A kind of gene order fragment quick position key assignments index data compression method - Google Patents
A kind of gene order fragment quick position key assignments index data compression method Download PDFInfo
- Publication number
- CN105224828A CN105224828A CN201510648867.2A CN201510648867A CN105224828A CN 105224828 A CN105224828 A CN 105224828A CN 201510648867 A CN201510648867 A CN 201510648867A CN 105224828 A CN105224828 A CN 105224828A
- Authority
- CN
- China
- Prior art keywords
- key
- prefix
- gene order
- order fragment
- compressed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of gene order fragment quick position key assignments index data compression method, step comprises: 1) initialization compression result S set et
comp, the prefix length n that setting data compression adopts; 2) from gene sequence data S set et to be compressed
origthe current gene order fragment Key that middle taking-up one is to be compressed; 3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively
comp; 4) data acquisition Set to be compressed is judged
origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et
compexport.The present invention can improve search efficiency during big data quantity, has that compressed capability is strong, take up room little advantage.
Description
Technical field
The present invention relates to the bioinformatic analysis technology of gene sequencing data, be specifically related to a kind of gene order fragment quick position key assignments index data compression method.
Background technology
Sequencing sequence location technology is the basis of current high flux gene sequencing data analysis.Sequence fragment adopts the methods such as BWA to carry out the best string matching of tolerable partial error usually.But actual experiment shows, in most cases, the sequence fragment that major part order-checking obtains can be dispersed as shorter gene order fragment (36BP), and is mated completely accurately and fast by accurate Key-Value mapping method.
Mate in reference chain quickly and accurately in order to short gene order can be allowed, need first based on the data of reference chain, make Key-Value index data base, set up as follows: as reference chain data are: ACGTGCA, if the database of the key-value pair (Key-Value to) of short data records coupling need be built by 4 characters one group, as shown in Figure 1.See Fig. 1, with reference to chain data from back to front, character starts one by one, with 4 characters for length, can obtain 4 groups of Key-Value to the data as Query Database.If check order, the short data records that obtains is " GTGC ", is mapped by Key-Value, and can obtain the Offset (skew) that GTGC should be positioned at reference sequences is fast the position of 2.But there is an important problem and be in the method: the reference sequences chain usually making database is longer, and actual capabilities are more than 2*10
9individual character.If with 36 characters for fragment, make Key-Value data pair, the index data be so only made up of Key, will produce (2*10
9– 36) the huge data volume of * 36Bytes ≈ 67.05GB.Huge index data can the memory source of a large amount of consumption calculations system, and cause the Cache hit rate of Key-Value system to decline to a great extent, if when memory source is inadequate, also can cause significantly shaking because memory pages exchanges the system performance caused, thus make should very efficient exact matching, in Project Realization process, performance is had a greatly reduced quality.The method of existing condensed prefix tree can catch in index data, position and be worth all identical characters, merges, thus reduce the size of data directory in index tree.But the data after the method compression must adopt tree construction to carry out Key-Value inquiry, the degree of depth of its search efficiency and tree, the size of data volume are closely related, when data volume is large, the degree of depth of tree can deepen thereupon, its search efficiency can significantly decline, in addition, construct data space shared by a large amount of pointers needed for condensed prefix tree construction and also greatly offset compressed capability.
Summary of the invention
The technical problem to be solved in the present invention: for the problems referred to above of prior art, provides a kind of search efficiency when can improve big data quantity, and compressed capability is strong, take up room gene order fragment quick position key assignments index data compression method little.
In order to solve the problems of the technologies described above, the technical solution used in the present invention is:
A kind of gene order fragment quick position key assignments index data compression method, step comprises:
1) initialization compression result S set et
comp, the prefix length n that setting data compression adopts;
2) from gene sequence data S set et to be compressed
origthe current gene order fragment Key that middle taking-up one is to be compressed;
3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively
comp;
4) data acquisition Set to be compressed is judged
origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et
compexport.
Preferably, described step 3) detailed step comprise:
3.1) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length;
3.2) from described gene order fragment sequence Key
r0, Key
r1..., Key
r (n-1)middle selection gene order fragment sequence Key
rias current gene order fragment sequence;
3.3) by current gene order fragment sequence Key
ribe prefix Prefix according to prefix length n cutting
riwith suffix Postfix
ri, described prefix Prefix
riwith suffix Postfix
rilength sum be current gene order fragment sequence Key
rilength SL;
3.4) prefix Prefix is judged
ricorresponding mapping relations are integrated into compression result S set et
compin whether exist, if existed, then redirect perform step 3.5); Otherwise redirect performs step 3.6);
3.5) current gene order fragment sequence Key is judged
ridata <i, Postfix
ri> is at prefix Prefix
riwhether exist, if there is no, then by current gene order fragment sequence Key in corresponding mapping relations set
ridata <i, Postfix
ri> adds prefix Prefix
ricorresponding mapping relations set, wherein i represents current gene order fragment sequence Key
rithe number of times of circulation skew, Postfix
rifor current gene order fragment sequence Key
riprefix, redirect perform step 3.7); Otherwise, ignore current gene order fragment sequence Key
rithe follow-up gene order fragment sequence with common prefix, redirect performs step 4);
3.6) be prefix Prefix
rinewly-built mapping relations Prefix
ri→ { <i, Postfix
ri>} also adds compression result S set et
comp, wherein i represents current gene order fragment sequence Key
rithe number of times of circulation skew, Postfix
rifor current gene order fragment sequence Key
riprefix, redirect perform step 3.7);
3.7) gene order fragment sequence Key is judged
r0, Key
r1..., Key
r (n-1)whether be disposed, if be not yet disposed, then select next current gene order fragment sequence Key
riand redirect performs step 3.3), otherwise redirect performs step 4).
Preferably, described step 1) in the detailed step of prefix length n that adopts of setting data compression comprise:
1.1) compressibility function f (n) of prefix length n is constructed;
1.2) the prefix length n making the value of compressibility function f (n) reach maximal value is asked for.
Preferably, described step 1.1) in construct the compressibility function that obtains such as formula shown in (1);
In formula (1), f (n) is compressibility function, and TL is data acquisition Set to be compressed
origlength, SL is data acquisition Set to be compressed
origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed
origin bit storage space in gene order fragment Key to be compressed shared by each element, the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, the computing function expression formula of byte estimation function S (n) that the length of described prefix Prefix accounts for for index data during n is such as formula shown in (2);
In formula (2), the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, TL is data acquisition Set to be compressed
origlength, SL is data acquisition Set to be compressed
origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed
origin bit storage space in gene order fragment Key to be compressed shared by each element, n is prefix length.
Preferably, described circulation skew is ring shift left.
Preferably, prefix length n value is 32.
Gene order fragment quick position key assignments index data compression method of the present invention has following advantage: current gene order fragment Key is circulated skew 0 to the individual gene order fragment sequence Key with common prefix of (n-1) secondary formation n by the present invention respectively
r0, Key
r1..., Key
r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively
comp, by gene order fragment Key being cut into prefix (Prefix) and suffix (Postfix) two parts, by carrying out the circulation offset operation of certain number of times to gene order fragment, the sequence of seizure same prefix as much as possible in adjacent short fragment sequence, and by the prefix sequence of these gene orders fragment Key is merged, and together with the coding that suffix array and circulation are offset number of times, associating is unique represents a specific gene order fragment Key, greatly can save the storage space of these index short data records like this, simultaneously, owing to only having prefix and suffix two-stage sequence, the progression that the present invention does not exist traditional prefix compressed tree increases with data scale and increases the defect caused, search efficiency during big data quantity can be improved, there is compressed capability strong, take up room little advantage.
Accompanying drawing explanation
Fig. 1 is the principle schematic that prior art builds the key-value pair data storehouse of gene order fragment.
Fig. 2 is the process flow diagram of embodiment of the present invention method.
Fig. 3 is the principle schematic that the embodiment of the present invention builds the key-value pair data storehouse of gene order fragment.
Fig. 4 is embodiment of the present invention method step 3) process flow diagram.
Embodiment
As shown in Figure 2, the step of the present embodiment gene order fragment quick position key assignments index data compression method comprises:
1) initialization compression result S set et
comp, the prefix length n that setting data compression adopts;
2) from gene sequence data S set et to be compressed
origthe current gene order fragment Key that middle taking-up one is to be compressed;
3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively
comp;
4) data acquisition Set to be compressed is judged
origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et
compexport.
Can find according to key-value pair data storehouse building process, because gene order fragment Key data are started and the sequence of intercept one section of length-specific by character one by one, in its adjacent short data records (n character) repeatedly intercepted, actual have most of duplicate repeat character (RPT).In the present embodiment, for n by the gene order fragment sequence Key obtained after circulation skew
r0, Key
r1..., Key
r (n-1), each gene order fragment sequence is encapsulated as based on common prefix and different circulations skew number of times and suffix add compression result S set et
comp.Definition circulation offset operation symbol <<
rn represents sequential element circulation skew n, as shown in Figure 3, with T, G, C, A adjacent 3 times from data acquisition Set to be compressed
origthe short data records character string intercepted is example, by gene order fragment Key (T, G, C, A), circulation skew 0 is secondary to (n-1) respectively, form gene order fragment sequence T, G, C, A respectively, gene order fragment sequence T, G, C, G, gene order fragment sequence T, G, C, G, therefore gene order fragment sequence can be expressed as TG<< respectively
r0CA, TG<<
r1CG, TG<<
r2CG, circulation offset operation symbol <<
rn comprises circulation skew number of times, circulation offset operation symbol <<
rtG on front side of n is common prefix, circulation offset operation symbol <<
rsuffix is on rear side of n.It should be noted that, be only the exemplary illustration carried out for the gene order fragment Key of 4 bases, also can adopt the gene order fragment of other quantity bases in addition as required, its principle is identical with the present embodiment, therefore does not repeat them here herein.
Known see Fig. 3, when the skew number of times that circulates is 0, suffix C, A are all positioned at the rear side of common prefix TG before circulation skew; When circulation skew number of times is 1, suffix C is in circulation skew anteposition in the rear side of common prefix TG, and suffix G offsets anteposition in the front side of common prefix TG in circulation; When the skew number of times that circulates is 2, suffix C, G are all positioned at the front side of common prefix TG before circulation skew.Therefore, based on the principle that above-mentioned circulation offsets, can reduce rapidly according to the gene order fragment sequence after compression the raw data obtaining gene order fragment sequence.In the present embodiment, circulation skew is ring shift left, and certainly the ultimate principle of ring shift right is identical with ring shift left, therefore does not repeat them here its concrete implementation detail.
The present embodiment step 1) in the detailed step of prefix length n that adopts of setting data compression comprise:
1.1) compressibility function f (n) of prefix length n is constructed;
1.2) the prefix length n making the value of compressibility function f (n) reach maximal value is asked for.
Step 1.1) in construct the compressibility function that obtains such as formula shown in (1);
In formula (1), f (n) is compressibility function, and TL is data acquisition Set to be compressed
origlength, SL is data acquisition Set to be compressed
origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed
origin bit storage space in gene order fragment Key to be compressed shared by each element, the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, the computing function expression formula of byte estimation function S (n) that the length of described prefix Prefix accounts for for index data during n is such as formula shown in (2);
In formula (2), the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, TL is data acquisition Set to be compressed
origlength, SL is data acquisition Set to be compressed
origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed
origin bit storage space in gene order fragment Key to be compressed shared by each element, n is prefix length.
As shown in Figure 3, step 3) detailed step comprise:
3.1) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length;
3.2) from described gene order fragment sequence Key
r0, Key
r1..., Key
r (n-1)middle selection gene order fragment sequence Key
rias current gene order fragment sequence;
3.3) by current gene order fragment sequence Key
ribe prefix Prefix according to prefix length n cutting
riwith suffix Postfix
ri, described prefix Prefix
riwith suffix Postfix
rilength sum be current gene order fragment sequence Key
rilength SL;
3.4) prefix Prefix is judged
ricorresponding mapping relations are integrated into compression result S set et
compin whether exist, if existed, then redirect perform step 3.5); Otherwise redirect performs step 3.6);
3.5) current gene order fragment sequence Key is judged
ridata <i, Postfix
ri> is at prefix Prefix
riwhether exist, if there is no, then by current gene order fragment sequence Key in corresponding mapping relations set
ridata <i, Postfix
ri> adds prefix Prefix
ricorresponding mapping relations set, wherein i represents current gene order fragment sequence Key
rithe number of times of circulation skew, Postfix
rifor current gene order fragment sequence Key
riprefix, redirect perform step 3.7); Otherwise, ignore current gene order fragment sequence Key
rithe follow-up gene order fragment sequence with common prefix, redirect performs step 4);
3.6) be prefix Prefix
rinewly-built mapping relations Prefix
ri→ { <i, Postfix
ri>} also adds compression result S set et
comp, wherein i represents current gene order fragment sequence Key
rithe number of times of circulation skew, Postfix
rifor current gene order fragment sequence Key
riprefix, redirect perform step 3.7);
3.7) gene order fragment sequence Key is judged
r0, Key
r1..., Key
r (n-1)whether be disposed, if be not yet disposed, then select next current gene order fragment sequence Key
riand redirect performs step 3.3), otherwise redirect performs step 4).
In the present embodiment, compression result S set et
compdata structure as follows:
{
prefix
1→{<rotate
1,postfix
1>,<rotate
2,postfix
2>,…},
prefix
2→{<rotate
3,postfix
3>,…},
…}
In above-mentioned data structure, prefix
1be the common prefix of first gene order fragment, prefix
1→ { <rotate
1, postfix
1>, <rotate
2, postfix
2> ... be prefix Prefix
ricorresponding mapping relations set, rotate
1for having prefix prefix
1first gene order fragment sequence circulation skew number of times, postfix
1for having prefix prefix
1the suffix of first gene order fragment sequence, rotate
2for having prefix prefix
1second gene order fragment sequence circulation skew number of times, postfix
2for having prefix prefix
1the suffix of second gene order fragment sequence; Prefix
2be the common prefix of second gene order fragment, prefix
2→ { <rotate
3, postfix
3> ... be prefix prefix
2corresponding mapping relations set, rotate
3for having prefix prefix
2first gene order fragment sequence circulation skew number of times, postfix
3for having prefix prefix
1the suffix of first gene order fragment sequence.
In the present embodiment, data acquisition Set to be compressed
origlength TL=2*10
9the length SL=36 of gene order fragment Key, bit storage space b=2bits (because effective reference sequences composition only has ACGT) in gene order fragment Key shared by each element, choose the prefix length n=32 that data compression adopts, thus the length that can calculate prefix Prefix byte estimation function S (n)=6500000000Bytes that index data accounts for when being 32, i.e. 6.05GB, relative to size of data (TL*SL*b/8=2*10 when not compressing
9× 36 × 2/8Bytes=16.76GB), the present embodiment gene order fragment quick position key assignments index data compression method can reach the compressibility of nearly 2.8 times, therefore search efficiency when the present embodiment can improve big data quantity, has that compressed capability is strong, take up room little advantage.
The above is only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, and all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications also should be considered as protection scope of the present invention.
Claims (6)
1. a gene order fragment quick position key assignments index data compression method, is characterized by step and comprises:
1) initialization compression result S set et
comp, the prefix length n that setting data compression adopts;
2) from gene sequence data S set et to be compressed
origthe current gene order fragment Key that middle taking-up one is to be compressed;
3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively
comp;
4) data acquisition Set to be compressed is judged
origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et
compexport.
2. gene order fragment quick position key assignments index data compression method according to claim 1, is characterized in that, described step 3) detailed step comprise:
3.1) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n
r0, Key
r1..., Key
r (n-1), n is prefix length;
3.2) from described gene order fragment sequence Key
r0, Key
r1..., Key
r (n-1)middle selection gene order fragment sequence Key
rias current gene order fragment sequence;
3.3) by current gene order fragment sequence Key
ribe prefix Prefix according to prefix length n cutting
riwith suffix Postfix
ri, described prefix Prefix
riwith suffix Postfix
rilength sum be current gene order fragment sequence Key
rilength SL;
3.4) prefix Prefix is judged
ricorresponding mapping relations are integrated into compression result S set et
compin whether exist, if existed, then redirect perform step 3.5); Otherwise redirect performs step 3.6);
3.5) current gene order fragment sequence Key is judged
ridata <i, Postfix
ri> is at prefix Prefix
riwhether exist, if there is no, then by current gene order fragment sequence Key in corresponding mapping relations set
ridata <i, Postfix
ri> adds prefix Prefix
ricorresponding mapping relations set, wherein i represents current gene order fragment sequence Key
rithe number of times of circulation skew, Postfix
rifor current gene order fragment sequence Key
riprefix, redirect perform step 3.7); Otherwise, ignore current gene order fragment sequence Key
rithe follow-up gene order fragment sequence with common prefix, redirect performs step 4);
3.6) be prefix Prefix
rinewly-built mapping relations Prefix
ri→ { <i, Postfix
ri>} also adds compression result S set et
comp, wherein i represents current gene order fragment sequence Key
rithe number of times of circulation skew, Postfix
rifor current gene order fragment sequence Key
riprefix, redirect perform step 3.7);
3.7) gene order fragment sequence Key is judged
r0, Key
r1..., Key
r (n-1)whether be disposed, if be not yet disposed, then select next current gene order fragment sequence Key
riand redirect performs step 3.3), otherwise redirect performs step 4).
3. gene order fragment quick position key assignments index data compression method according to claim 2, is characterized in that, described step 1) in the detailed step of prefix length n that adopts of setting data compression comprise:
1.1) compressibility function f (n) of prefix length n is constructed;
1.2) the prefix length n making the value of compressibility function f (n) reach maximal value is asked for.
4. gene order fragment quick position key assignments index data compression method according to claim 3, is characterized in that, described step 1.1) in construct the compressibility function that obtains such as formula shown in (1);
In formula (1), f (n) is compressibility function, and TL is data acquisition Set to be compressed
origlength, SL is data acquisition Set to be compressed
origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed
origin bit storage space in gene order fragment Key to be compressed shared by each element, the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, the computing function expression formula of byte estimation function S (n) that the length of described prefix Prefix accounts for for index data during n is such as formula shown in (2);
In formula (2), the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, TL is data acquisition Set to be compressed
origlength, SL is data acquisition Set to be compressed
origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed
origin bit storage space in gene order fragment Key to be compressed shared by each element, n is prefix length.
5. according to the gene order fragment quick position key assignments index data compression method in Claims 1 to 4 described in any one, it is characterized in that, described circulation skew is ring shift left.
6. gene order fragment quick position key assignments index data compression method according to claim 5, it is characterized in that, prefix length n value is 32.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510648867.2A CN105224828B (en) | 2015-10-09 | 2015-10-09 | A kind of gene order fragment is quickly positioned with key assignments index data compression method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510648867.2A CN105224828B (en) | 2015-10-09 | 2015-10-09 | A kind of gene order fragment is quickly positioned with key assignments index data compression method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105224828A true CN105224828A (en) | 2016-01-06 |
CN105224828B CN105224828B (en) | 2017-10-27 |
Family
ID=54993793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510648867.2A Active CN105224828B (en) | 2015-10-09 | 2015-10-09 | A kind of gene order fragment is quickly positioned with key assignments index data compression method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105224828B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930104A (en) * | 2016-05-17 | 2016-09-07 | 百度在线网络技术(北京)有限公司 | Data storing method and device |
CN106484865A (en) * | 2016-10-10 | 2017-03-08 | 哈尔滨工程大学 | One kind is based on four word chained list dictionary tree searching algorithm of DNA k mer index problem |
CN106897582A (en) * | 2017-01-25 | 2017-06-27 | 人和未来生物科技(长沙)有限公司 | A kind of heterogeneous platform understood towards gene data |
CN110060731A (en) * | 2019-04-12 | 2019-07-26 | 福建师范大学 | Determine that overlapping genes are to the method for quantity between genome based on distributed computing |
WO2019205963A1 (en) * | 2018-04-27 | 2019-10-31 | 人和未来生物科技(长沙)有限公司 | Gene sequencing quality line data compression pre-processing and decompression and restoration methods, and system |
CN110782946A (en) * | 2019-10-17 | 2020-02-11 | 南京医基云医疗数据研究院有限公司 | Method and device for identifying repeated sequence, storage medium and electronic equipment |
CN112765113A (en) * | 2021-01-31 | 2021-05-07 | 云知声智能科技股份有限公司 | Index compression method and device, computer readable storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101036141A (en) * | 2004-03-26 | 2007-09-12 | 甲骨文国际有限公司 | A database management system with persistent, user- accessible bitmap values |
CN101499094A (en) * | 2009-03-10 | 2009-08-05 | 焦点科技股份有限公司 | Data compression storing and retrieving method and system |
CN102831224A (en) * | 2012-08-24 | 2012-12-19 | 北京百度网讯科技有限公司 | Creating method for data index base and searching suggest generation method and device |
CN103870492A (en) * | 2012-12-14 | 2014-06-18 | 腾讯科技(深圳)有限公司 | Data storing method and device based on key sorting |
US20150006577A1 (en) * | 2013-06-28 | 2015-01-01 | Khalifa University of Science, Technology, and Research | Method and system for searching and storing data |
-
2015
- 2015-10-09 CN CN201510648867.2A patent/CN105224828B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101036141A (en) * | 2004-03-26 | 2007-09-12 | 甲骨文国际有限公司 | A database management system with persistent, user- accessible bitmap values |
CN101499094A (en) * | 2009-03-10 | 2009-08-05 | 焦点科技股份有限公司 | Data compression storing and retrieving method and system |
CN102831224A (en) * | 2012-08-24 | 2012-12-19 | 北京百度网讯科技有限公司 | Creating method for data index base and searching suggest generation method and device |
CN103870492A (en) * | 2012-12-14 | 2014-06-18 | 腾讯科技(深圳)有限公司 | Data storing method and device based on key sorting |
US20150006577A1 (en) * | 2013-06-28 | 2015-01-01 | Khalifa University of Science, Technology, and Research | Method and system for searching and storing data |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105930104A (en) * | 2016-05-17 | 2016-09-07 | 百度在线网络技术(北京)有限公司 | Data storing method and device |
CN106484865A (en) * | 2016-10-10 | 2017-03-08 | 哈尔滨工程大学 | One kind is based on four word chained list dictionary tree searching algorithm of DNA k mer index problem |
CN106897582A (en) * | 2017-01-25 | 2017-06-27 | 人和未来生物科技(长沙)有限公司 | A kind of heterogeneous platform understood towards gene data |
CN106897582B (en) * | 2017-01-25 | 2018-03-09 | 人和未来生物科技(长沙)有限公司 | A kind of heterogeneous platform understood towards gene data |
WO2019205963A1 (en) * | 2018-04-27 | 2019-10-31 | 人和未来生物科技(长沙)有限公司 | Gene sequencing quality line data compression pre-processing and decompression and restoration methods, and system |
CN110428868A (en) * | 2018-04-27 | 2019-11-08 | 人和未来生物科技(长沙)有限公司 | Gene sequencing quality row data compression pretreatment, decompression restoring method and system |
CN110428868B (en) * | 2018-04-27 | 2021-11-26 | 人和未来生物科技(长沙)有限公司 | Method and system for compressing, preprocessing and decompressing and reducing gene sequencing mass data |
CN110060731A (en) * | 2019-04-12 | 2019-07-26 | 福建师范大学 | Determine that overlapping genes are to the method for quantity between genome based on distributed computing |
CN110782946A (en) * | 2019-10-17 | 2020-02-11 | 南京医基云医疗数据研究院有限公司 | Method and device for identifying repeated sequence, storage medium and electronic equipment |
CN112765113A (en) * | 2021-01-31 | 2021-05-07 | 云知声智能科技股份有限公司 | Index compression method and device, computer readable storage medium and electronic equipment |
CN112765113B (en) * | 2021-01-31 | 2024-04-09 | 云知声智能科技股份有限公司 | Index compression method, index compression device, computer readable storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN105224828B (en) | 2017-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105224828A (en) | A kind of gene order fragment quick position key assignments index data compression method | |
CN110413611B (en) | Data storage and query method and device | |
CN111046034B (en) | Method and system for managing memory data and maintaining data in memory | |
CN107818115B (en) | Method and device for processing data table | |
EP3072076B1 (en) | A method of generating a reference index data structure and method for finding a position of a data pattern in a reference data structure | |
EP3435256B1 (en) | Optimal sort key compression and index rebuilding | |
US9953058B1 (en) | Systems and methods for searching large data sets | |
CN111177302A (en) | Business document processing method and device, computer equipment and storage medium | |
CN111078672B (en) | Data comparison method and device for database | |
CN105574212A (en) | Image retrieval method for multi-index disk Hash structure | |
CN104636349A (en) | Method and equipment for compression and searching of index data | |
CN114064984B (en) | World state increment updating method and device based on sparse array linked list | |
CN115374129B (en) | Database joint index coding method and system | |
Cracco et al. | Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT | |
CN111723097A (en) | Application program interface configuration method and device, computer equipment and storage medium | |
CN109344163A (en) | A kind of data verification method, device and computer-readable medium | |
CN113468571A (en) | Tracing method based on block chain | |
CN112948898A (en) | Method for preventing application data from being tampered in block chain and security module | |
CN110532284B (en) | Mass data storage and retrieval method and device, computer equipment and storage medium | |
CN110389953B (en) | Data storage method, storage medium, storage device and server based on compression map | |
CN102456073A (en) | Partial extremum inquiry method | |
KR20160123219A (en) | A multilevel indexing method for audio fingerprint library data | |
US8988258B2 (en) | Hardware compression using common portions of data | |
CN104750846A (en) | Method and device for finding substring | |
CN114398373A (en) | File data storage and reading method and device applied to database storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |