CN105224828A - A kind of gene order fragment quick position key assignments index data compression method - Google Patents

A kind of gene order fragment quick position key assignments index data compression method Download PDF

Info

Publication number
CN105224828A
CN105224828A CN201510648867.2A CN201510648867A CN105224828A CN 105224828 A CN105224828 A CN 105224828A CN 201510648867 A CN201510648867 A CN 201510648867A CN 105224828 A CN105224828 A CN 105224828A
Authority
CN
China
Prior art keywords
key
prefix
gene order
order fragment
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510648867.2A
Other languages
Chinese (zh)
Other versions
CN105224828B (en
Inventor
宋卓
李�根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Human And Future Biotechnology (changsha) Co Ltd
Original Assignee
Human And Future Biotechnology (changsha) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Human And Future Biotechnology (changsha) Co Ltd filed Critical Human And Future Biotechnology (changsha) Co Ltd
Priority to CN201510648867.2A priority Critical patent/CN105224828B/en
Publication of CN105224828A publication Critical patent/CN105224828A/en
Application granted granted Critical
Publication of CN105224828B publication Critical patent/CN105224828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of gene order fragment quick position key assignments index data compression method, step comprises: 1) initialization compression result S set et comp, the prefix length n that setting data compression adopts; 2) from gene sequence data S set et to be compressed origthe current gene order fragment Key that middle taking-up one is to be compressed; 3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively comp; 4) data acquisition Set to be compressed is judged origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et compexport.The present invention can improve search efficiency during big data quantity, has that compressed capability is strong, take up room little advantage.

Description

A kind of gene order fragment quick position key assignments index data compression method
Technical field
The present invention relates to the bioinformatic analysis technology of gene sequencing data, be specifically related to a kind of gene order fragment quick position key assignments index data compression method.
Background technology
Sequencing sequence location technology is the basis of current high flux gene sequencing data analysis.Sequence fragment adopts the methods such as BWA to carry out the best string matching of tolerable partial error usually.But actual experiment shows, in most cases, the sequence fragment that major part order-checking obtains can be dispersed as shorter gene order fragment (36BP), and is mated completely accurately and fast by accurate Key-Value mapping method.
Mate in reference chain quickly and accurately in order to short gene order can be allowed, need first based on the data of reference chain, make Key-Value index data base, set up as follows: as reference chain data are: ACGTGCA, if the database of the key-value pair (Key-Value to) of short data records coupling need be built by 4 characters one group, as shown in Figure 1.See Fig. 1, with reference to chain data from back to front, character starts one by one, with 4 characters for length, can obtain 4 groups of Key-Value to the data as Query Database.If check order, the short data records that obtains is " GTGC ", is mapped by Key-Value, and can obtain the Offset (skew) that GTGC should be positioned at reference sequences is fast the position of 2.But there is an important problem and be in the method: the reference sequences chain usually making database is longer, and actual capabilities are more than 2*10 9individual character.If with 36 characters for fragment, make Key-Value data pair, the index data be so only made up of Key, will produce (2*10 9– 36) the huge data volume of * 36Bytes ≈ 67.05GB.Huge index data can the memory source of a large amount of consumption calculations system, and cause the Cache hit rate of Key-Value system to decline to a great extent, if when memory source is inadequate, also can cause significantly shaking because memory pages exchanges the system performance caused, thus make should very efficient exact matching, in Project Realization process, performance is had a greatly reduced quality.The method of existing condensed prefix tree can catch in index data, position and be worth all identical characters, merges, thus reduce the size of data directory in index tree.But the data after the method compression must adopt tree construction to carry out Key-Value inquiry, the degree of depth of its search efficiency and tree, the size of data volume are closely related, when data volume is large, the degree of depth of tree can deepen thereupon, its search efficiency can significantly decline, in addition, construct data space shared by a large amount of pointers needed for condensed prefix tree construction and also greatly offset compressed capability.
Summary of the invention
The technical problem to be solved in the present invention: for the problems referred to above of prior art, provides a kind of search efficiency when can improve big data quantity, and compressed capability is strong, take up room gene order fragment quick position key assignments index data compression method little.
In order to solve the problems of the technologies described above, the technical solution used in the present invention is:
A kind of gene order fragment quick position key assignments index data compression method, step comprises:
1) initialization compression result S set et comp, the prefix length n that setting data compression adopts;
2) from gene sequence data S set et to be compressed origthe current gene order fragment Key that middle taking-up one is to be compressed;
3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively comp;
4) data acquisition Set to be compressed is judged origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et compexport.
Preferably, described step 3) detailed step comprise:
3.1) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length;
3.2) from described gene order fragment sequence Key r0, Key r1..., Key r (n-1)middle selection gene order fragment sequence Key rias current gene order fragment sequence;
3.3) by current gene order fragment sequence Key ribe prefix Prefix according to prefix length n cutting riwith suffix Postfix ri, described prefix Prefix riwith suffix Postfix rilength sum be current gene order fragment sequence Key rilength SL;
3.4) prefix Prefix is judged ricorresponding mapping relations are integrated into compression result S set et compin whether exist, if existed, then redirect perform step 3.5); Otherwise redirect performs step 3.6);
3.5) current gene order fragment sequence Key is judged ridata <i, Postfix ri> is at prefix Prefix riwhether exist, if there is no, then by current gene order fragment sequence Key in corresponding mapping relations set ridata <i, Postfix ri> adds prefix Prefix ricorresponding mapping relations set, wherein i represents current gene order fragment sequence Key rithe number of times of circulation skew, Postfix rifor current gene order fragment sequence Key riprefix, redirect perform step 3.7); Otherwise, ignore current gene order fragment sequence Key rithe follow-up gene order fragment sequence with common prefix, redirect performs step 4);
3.6) be prefix Prefix rinewly-built mapping relations Prefix ri→ { <i, Postfix ri>} also adds compression result S set et comp, wherein i represents current gene order fragment sequence Key rithe number of times of circulation skew, Postfix rifor current gene order fragment sequence Key riprefix, redirect perform step 3.7);
3.7) gene order fragment sequence Key is judged r0, Key r1..., Key r (n-1)whether be disposed, if be not yet disposed, then select next current gene order fragment sequence Key riand redirect performs step 3.3), otherwise redirect performs step 4).
Preferably, described step 1) in the detailed step of prefix length n that adopts of setting data compression comprise:
1.1) compressibility function f (n) of prefix length n is constructed;
1.2) the prefix length n making the value of compressibility function f (n) reach maximal value is asked for.
Preferably, described step 1.1) in construct the compressibility function that obtains such as formula shown in (1);
f ( n ) = T L * S L * b 8 * S ( n ) - - - ( 1 )
In formula (1), f (n) is compressibility function, and TL is data acquisition Set to be compressed origlength, SL is data acquisition Set to be compressed origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed origin bit storage space in gene order fragment Key to be compressed shared by each element, the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, the computing function expression formula of byte estimation function S (n) that the length of described prefix Prefix accounts for for index data during n is such as formula shown in (2);
S ( n ) = ( log 2 ( S L - n ) + ( S L - n ) * b ) * T L 8 + n * b * T L 8 * ( S L - n ) - - - ( 2 )
In formula (2), the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, TL is data acquisition Set to be compressed origlength, SL is data acquisition Set to be compressed origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed origin bit storage space in gene order fragment Key to be compressed shared by each element, n is prefix length.
Preferably, described circulation skew is ring shift left.
Preferably, prefix length n value is 32.
Gene order fragment quick position key assignments index data compression method of the present invention has following advantage: current gene order fragment Key is circulated skew 0 to the individual gene order fragment sequence Key with common prefix of (n-1) secondary formation n by the present invention respectively r0, Key r1..., Key r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively comp, by gene order fragment Key being cut into prefix (Prefix) and suffix (Postfix) two parts, by carrying out the circulation offset operation of certain number of times to gene order fragment, the sequence of seizure same prefix as much as possible in adjacent short fragment sequence, and by the prefix sequence of these gene orders fragment Key is merged, and together with the coding that suffix array and circulation are offset number of times, associating is unique represents a specific gene order fragment Key, greatly can save the storage space of these index short data records like this, simultaneously, owing to only having prefix and suffix two-stage sequence, the progression that the present invention does not exist traditional prefix compressed tree increases with data scale and increases the defect caused, search efficiency during big data quantity can be improved, there is compressed capability strong, take up room little advantage.
Accompanying drawing explanation
Fig. 1 is the principle schematic that prior art builds the key-value pair data storehouse of gene order fragment.
Fig. 2 is the process flow diagram of embodiment of the present invention method.
Fig. 3 is the principle schematic that the embodiment of the present invention builds the key-value pair data storehouse of gene order fragment.
Fig. 4 is embodiment of the present invention method step 3) process flow diagram.
Embodiment
As shown in Figure 2, the step of the present embodiment gene order fragment quick position key assignments index data compression method comprises:
1) initialization compression result S set et comp, the prefix length n that setting data compression adopts;
2) from gene sequence data S set et to be compressed origthe current gene order fragment Key that middle taking-up one is to be compressed;
3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively comp;
4) data acquisition Set to be compressed is judged origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et compexport.
Can find according to key-value pair data storehouse building process, because gene order fragment Key data are started and the sequence of intercept one section of length-specific by character one by one, in its adjacent short data records (n character) repeatedly intercepted, actual have most of duplicate repeat character (RPT).In the present embodiment, for n by the gene order fragment sequence Key obtained after circulation skew r0, Key r1..., Key r (n-1), each gene order fragment sequence is encapsulated as based on common prefix and different circulations skew number of times and suffix add compression result S set et comp.Definition circulation offset operation symbol << rn represents sequential element circulation skew n, as shown in Figure 3, with T, G, C, A adjacent 3 times from data acquisition Set to be compressed origthe short data records character string intercepted is example, by gene order fragment Key (T, G, C, A), circulation skew 0 is secondary to (n-1) respectively, form gene order fragment sequence T, G, C, A respectively, gene order fragment sequence T, G, C, G, gene order fragment sequence T, G, C, G, therefore gene order fragment sequence can be expressed as TG<< respectively r0CA, TG<< r1CG, TG<< r2CG, circulation offset operation symbol << rn comprises circulation skew number of times, circulation offset operation symbol << rtG on front side of n is common prefix, circulation offset operation symbol << rsuffix is on rear side of n.It should be noted that, be only the exemplary illustration carried out for the gene order fragment Key of 4 bases, also can adopt the gene order fragment of other quantity bases in addition as required, its principle is identical with the present embodiment, therefore does not repeat them here herein.
Known see Fig. 3, when the skew number of times that circulates is 0, suffix C, A are all positioned at the rear side of common prefix TG before circulation skew; When circulation skew number of times is 1, suffix C is in circulation skew anteposition in the rear side of common prefix TG, and suffix G offsets anteposition in the front side of common prefix TG in circulation; When the skew number of times that circulates is 2, suffix C, G are all positioned at the front side of common prefix TG before circulation skew.Therefore, based on the principle that above-mentioned circulation offsets, can reduce rapidly according to the gene order fragment sequence after compression the raw data obtaining gene order fragment sequence.In the present embodiment, circulation skew is ring shift left, and certainly the ultimate principle of ring shift right is identical with ring shift left, therefore does not repeat them here its concrete implementation detail.
The present embodiment step 1) in the detailed step of prefix length n that adopts of setting data compression comprise:
1.1) compressibility function f (n) of prefix length n is constructed;
1.2) the prefix length n making the value of compressibility function f (n) reach maximal value is asked for.
Step 1.1) in construct the compressibility function that obtains such as formula shown in (1);
f ( n ) = T L * S L * b 8 * S ( n ) - - - ( 1 )
In formula (1), f (n) is compressibility function, and TL is data acquisition Set to be compressed origlength, SL is data acquisition Set to be compressed origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed origin bit storage space in gene order fragment Key to be compressed shared by each element, the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, the computing function expression formula of byte estimation function S (n) that the length of described prefix Prefix accounts for for index data during n is such as formula shown in (2);
S ( n ) = ( log 2 ( S L - n ) + ( S L - n ) * b ) * T L 8 + n * b * T L 8 * ( S L - n ) - - - ( 2 )
In formula (2), the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, TL is data acquisition Set to be compressed origlength, SL is data acquisition Set to be compressed origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed origin bit storage space in gene order fragment Key to be compressed shared by each element, n is prefix length.
As shown in Figure 3, step 3) detailed step comprise:
3.1) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length;
3.2) from described gene order fragment sequence Key r0, Key r1..., Key r (n-1)middle selection gene order fragment sequence Key rias current gene order fragment sequence;
3.3) by current gene order fragment sequence Key ribe prefix Prefix according to prefix length n cutting riwith suffix Postfix ri, described prefix Prefix riwith suffix Postfix rilength sum be current gene order fragment sequence Key rilength SL;
3.4) prefix Prefix is judged ricorresponding mapping relations are integrated into compression result S set et compin whether exist, if existed, then redirect perform step 3.5); Otherwise redirect performs step 3.6);
3.5) current gene order fragment sequence Key is judged ridata <i, Postfix ri> is at prefix Prefix riwhether exist, if there is no, then by current gene order fragment sequence Key in corresponding mapping relations set ridata <i, Postfix ri> adds prefix Prefix ricorresponding mapping relations set, wherein i represents current gene order fragment sequence Key rithe number of times of circulation skew, Postfix rifor current gene order fragment sequence Key riprefix, redirect perform step 3.7); Otherwise, ignore current gene order fragment sequence Key rithe follow-up gene order fragment sequence with common prefix, redirect performs step 4);
3.6) be prefix Prefix rinewly-built mapping relations Prefix ri→ { <i, Postfix ri>} also adds compression result S set et comp, wherein i represents current gene order fragment sequence Key rithe number of times of circulation skew, Postfix rifor current gene order fragment sequence Key riprefix, redirect perform step 3.7);
3.7) gene order fragment sequence Key is judged r0, Key r1..., Key r (n-1)whether be disposed, if be not yet disposed, then select next current gene order fragment sequence Key riand redirect performs step 3.3), otherwise redirect performs step 4).
In the present embodiment, compression result S set et compdata structure as follows:
{
prefix 1→{<rotate 1,postfix 1>,<rotate 2,postfix 2>,…},
prefix 2→{<rotate 3,postfix 3>,…},
…}
In above-mentioned data structure, prefix 1be the common prefix of first gene order fragment, prefix 1→ { <rotate 1, postfix 1>, <rotate 2, postfix 2> ... be prefix Prefix ricorresponding mapping relations set, rotate 1for having prefix prefix 1first gene order fragment sequence circulation skew number of times, postfix 1for having prefix prefix 1the suffix of first gene order fragment sequence, rotate 2for having prefix prefix 1second gene order fragment sequence circulation skew number of times, postfix 2for having prefix prefix 1the suffix of second gene order fragment sequence; Prefix 2be the common prefix of second gene order fragment, prefix 2→ { <rotate 3, postfix 3> ... be prefix prefix 2corresponding mapping relations set, rotate 3for having prefix prefix 2first gene order fragment sequence circulation skew number of times, postfix 3for having prefix prefix 1the suffix of first gene order fragment sequence.
In the present embodiment, data acquisition Set to be compressed origlength TL=2*10 9the length SL=36 of gene order fragment Key, bit storage space b=2bits (because effective reference sequences composition only has ACGT) in gene order fragment Key shared by each element, choose the prefix length n=32 that data compression adopts, thus the length that can calculate prefix Prefix byte estimation function S (n)=6500000000Bytes that index data accounts for when being 32, i.e. 6.05GB, relative to size of data (TL*SL*b/8=2*10 when not compressing 9× 36 × 2/8Bytes=16.76GB), the present embodiment gene order fragment quick position key assignments index data compression method can reach the compressibility of nearly 2.8 times, therefore search efficiency when the present embodiment can improve big data quantity, has that compressed capability is strong, take up room little advantage.
The above is only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, and all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (6)

1. a gene order fragment quick position key assignments index data compression method, is characterized by step and comprises:
1) initialization compression result S set et comp, the prefix length n that setting data compression adopts;
2) from gene sequence data S set et to be compressed origthe current gene order fragment Key that middle taking-up one is to be compressed;
3) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length, by all gene order fragment sequences based on common prefix and different circulations skew number of times and suffix add compression result S set et respectively comp;
4) data acquisition Set to be compressed is judged origwhether be empty, if non-NULL, take out current gene order fragment Key next to be compressed, and redirect perform step 2); Otherwise, by compression result S set et compexport.
2. gene order fragment quick position key assignments index data compression method according to claim 1, is characterized in that, described step 3) detailed step comprise:
3.1) skew 0 that circulated respectively by current gene order fragment Key has the gene order fragment sequence Key of common prefix to (n-1) secondary formation n r0, Key r1..., Key r (n-1), n is prefix length;
3.2) from described gene order fragment sequence Key r0, Key r1..., Key r (n-1)middle selection gene order fragment sequence Key rias current gene order fragment sequence;
3.3) by current gene order fragment sequence Key ribe prefix Prefix according to prefix length n cutting riwith suffix Postfix ri, described prefix Prefix riwith suffix Postfix rilength sum be current gene order fragment sequence Key rilength SL;
3.4) prefix Prefix is judged ricorresponding mapping relations are integrated into compression result S set et compin whether exist, if existed, then redirect perform step 3.5); Otherwise redirect performs step 3.6);
3.5) current gene order fragment sequence Key is judged ridata <i, Postfix ri> is at prefix Prefix riwhether exist, if there is no, then by current gene order fragment sequence Key in corresponding mapping relations set ridata <i, Postfix ri> adds prefix Prefix ricorresponding mapping relations set, wherein i represents current gene order fragment sequence Key rithe number of times of circulation skew, Postfix rifor current gene order fragment sequence Key riprefix, redirect perform step 3.7); Otherwise, ignore current gene order fragment sequence Key rithe follow-up gene order fragment sequence with common prefix, redirect performs step 4);
3.6) be prefix Prefix rinewly-built mapping relations Prefix ri→ { <i, Postfix ri>} also adds compression result S set et comp, wherein i represents current gene order fragment sequence Key rithe number of times of circulation skew, Postfix rifor current gene order fragment sequence Key riprefix, redirect perform step 3.7);
3.7) gene order fragment sequence Key is judged r0, Key r1..., Key r (n-1)whether be disposed, if be not yet disposed, then select next current gene order fragment sequence Key riand redirect performs step 3.3), otherwise redirect performs step 4).
3. gene order fragment quick position key assignments index data compression method according to claim 2, is characterized in that, described step 1) in the detailed step of prefix length n that adopts of setting data compression comprise:
1.1) compressibility function f (n) of prefix length n is constructed;
1.2) the prefix length n making the value of compressibility function f (n) reach maximal value is asked for.
4. gene order fragment quick position key assignments index data compression method according to claim 3, is characterized in that, described step 1.1) in construct the compressibility function that obtains such as formula shown in (1);
f ( n ) = T L * S L * b 8 * S ( n ) - - - ( 1 )
In formula (1), f (n) is compressibility function, and TL is data acquisition Set to be compressed origlength, SL is data acquisition Set to be compressed origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed origin bit storage space in gene order fragment Key to be compressed shared by each element, the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, the computing function expression formula of byte estimation function S (n) that the length of described prefix Prefix accounts for for index data during n is such as formula shown in (2);
S ( n ) = ( log 2 ( S L - n ) + ( S L - n ) * b ) * T L 8 + n * b * T L 8 * ( S L - n ) - - - ( 2 )
In formula (2), the byte estimation function that when length that S (n) is prefix Prefix is n, index data accounts for, TL is data acquisition Set to be compressed origlength, SL is data acquisition Set to be compressed origin the length of gene order fragment Key to be compressed, b is data acquisition Set to be compressed origin bit storage space in gene order fragment Key to be compressed shared by each element, n is prefix length.
5. according to the gene order fragment quick position key assignments index data compression method in Claims 1 to 4 described in any one, it is characterized in that, described circulation skew is ring shift left.
6. gene order fragment quick position key assignments index data compression method according to claim 5, it is characterized in that, prefix length n value is 32.
CN201510648867.2A 2015-10-09 2015-10-09 A kind of gene order fragment is quickly positioned with key assignments index data compression method Active CN105224828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510648867.2A CN105224828B (en) 2015-10-09 2015-10-09 A kind of gene order fragment is quickly positioned with key assignments index data compression method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510648867.2A CN105224828B (en) 2015-10-09 2015-10-09 A kind of gene order fragment is quickly positioned with key assignments index data compression method

Publications (2)

Publication Number Publication Date
CN105224828A true CN105224828A (en) 2016-01-06
CN105224828B CN105224828B (en) 2017-10-27

Family

ID=54993793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510648867.2A Active CN105224828B (en) 2015-10-09 2015-10-09 A kind of gene order fragment is quickly positioned with key assignments index data compression method

Country Status (1)

Country Link
CN (1) CN105224828B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930104A (en) * 2016-05-17 2016-09-07 百度在线网络技术(北京)有限公司 Data storing method and device
CN106484865A (en) * 2016-10-10 2017-03-08 哈尔滨工程大学 One kind is based on four word chained list dictionary tree searching algorithm of DNA k mer index problem
CN106897582A (en) * 2017-01-25 2017-06-27 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform understood towards gene data
CN110060731A (en) * 2019-04-12 2019-07-26 福建师范大学 Determine that overlapping genes are to the method for quantity between genome based on distributed computing
WO2019205963A1 (en) * 2018-04-27 2019-10-31 人和未来生物科技(长沙)有限公司 Gene sequencing quality line data compression pre-processing and decompression and restoration methods, and system
CN110782946A (en) * 2019-10-17 2020-02-11 南京医基云医疗数据研究院有限公司 Method and device for identifying repeated sequence, storage medium and electronic equipment
CN112765113A (en) * 2021-01-31 2021-05-07 云知声智能科技股份有限公司 Index compression method and device, computer readable storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101036141A (en) * 2004-03-26 2007-09-12 甲骨文国际有限公司 A database management system with persistent, user- accessible bitmap values
CN101499094A (en) * 2009-03-10 2009-08-05 焦点科技股份有限公司 Data compression storing and retrieving method and system
CN102831224A (en) * 2012-08-24 2012-12-19 北京百度网讯科技有限公司 Creating method for data index base and searching suggest generation method and device
CN103870492A (en) * 2012-12-14 2014-06-18 腾讯科技(深圳)有限公司 Data storing method and device based on key sorting
US20150006577A1 (en) * 2013-06-28 2015-01-01 Khalifa University of Science, Technology, and Research Method and system for searching and storing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101036141A (en) * 2004-03-26 2007-09-12 甲骨文国际有限公司 A database management system with persistent, user- accessible bitmap values
CN101499094A (en) * 2009-03-10 2009-08-05 焦点科技股份有限公司 Data compression storing and retrieving method and system
CN102831224A (en) * 2012-08-24 2012-12-19 北京百度网讯科技有限公司 Creating method for data index base and searching suggest generation method and device
CN103870492A (en) * 2012-12-14 2014-06-18 腾讯科技(深圳)有限公司 Data storing method and device based on key sorting
US20150006577A1 (en) * 2013-06-28 2015-01-01 Khalifa University of Science, Technology, and Research Method and system for searching and storing data

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930104A (en) * 2016-05-17 2016-09-07 百度在线网络技术(北京)有限公司 Data storing method and device
CN106484865A (en) * 2016-10-10 2017-03-08 哈尔滨工程大学 One kind is based on four word chained list dictionary tree searching algorithm of DNA k mer index problem
CN106897582A (en) * 2017-01-25 2017-06-27 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform understood towards gene data
CN106897582B (en) * 2017-01-25 2018-03-09 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform understood towards gene data
WO2019205963A1 (en) * 2018-04-27 2019-10-31 人和未来生物科技(长沙)有限公司 Gene sequencing quality line data compression pre-processing and decompression and restoration methods, and system
CN110428868A (en) * 2018-04-27 2019-11-08 人和未来生物科技(长沙)有限公司 Gene sequencing quality row data compression pretreatment, decompression restoring method and system
CN110428868B (en) * 2018-04-27 2021-11-26 人和未来生物科技(长沙)有限公司 Method and system for compressing, preprocessing and decompressing and reducing gene sequencing mass data
CN110060731A (en) * 2019-04-12 2019-07-26 福建师范大学 Determine that overlapping genes are to the method for quantity between genome based on distributed computing
CN110782946A (en) * 2019-10-17 2020-02-11 南京医基云医疗数据研究院有限公司 Method and device for identifying repeated sequence, storage medium and electronic equipment
CN112765113A (en) * 2021-01-31 2021-05-07 云知声智能科技股份有限公司 Index compression method and device, computer readable storage medium and electronic equipment
CN112765113B (en) * 2021-01-31 2024-04-09 云知声智能科技股份有限公司 Index compression method, index compression device, computer readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN105224828B (en) 2017-10-27

Similar Documents

Publication Publication Date Title
CN105224828A (en) A kind of gene order fragment quick position key assignments index data compression method
CN110413611B (en) Data storage and query method and device
CN111046034B (en) Method and system for managing memory data and maintaining data in memory
CN107818115B (en) Method and device for processing data table
EP3072076B1 (en) A method of generating a reference index data structure and method for finding a position of a data pattern in a reference data structure
EP3435256B1 (en) Optimal sort key compression and index rebuilding
US9953058B1 (en) Systems and methods for searching large data sets
CN111177302A (en) Business document processing method and device, computer equipment and storage medium
CN111078672B (en) Data comparison method and device for database
CN105574212A (en) Image retrieval method for multi-index disk Hash structure
CN104636349A (en) Method and equipment for compression and searching of index data
CN114064984B (en) World state increment updating method and device based on sparse array linked list
CN115374129B (en) Database joint index coding method and system
Cracco et al. Extremely fast construction and querying of compacted and colored de Bruijn graphs with GGCAT
CN111723097A (en) Application program interface configuration method and device, computer equipment and storage medium
CN109344163A (en) A kind of data verification method, device and computer-readable medium
CN113468571A (en) Tracing method based on block chain
CN112948898A (en) Method for preventing application data from being tampered in block chain and security module
CN110532284B (en) Mass data storage and retrieval method and device, computer equipment and storage medium
CN110389953B (en) Data storage method, storage medium, storage device and server based on compression map
CN102456073A (en) Partial extremum inquiry method
KR20160123219A (en) A multilevel indexing method for audio fingerprint library data
US8988258B2 (en) Hardware compression using common portions of data
CN104750846A (en) Method and device for finding substring
CN114398373A (en) File data storage and reading method and device applied to database storage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant