CN109243529A - Gene transferring horizontally recognition methods based on local sensitivity Hash - Google Patents
Gene transferring horizontally recognition methods based on local sensitivity Hash Download PDFInfo
- Publication number
- CN109243529A CN109243529A CN201810988512.1A CN201810988512A CN109243529A CN 109243529 A CN109243529 A CN 109243529A CN 201810988512 A CN201810988512 A CN 201810988512A CN 109243529 A CN109243529 A CN 109243529A
- Authority
- CN
- China
- Prior art keywords
- word
- segment
- hash
- length
- hash value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000035945 sensitivity Effects 0.000 title claims abstract description 24
- 238000012546 transfer Methods 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 14
- 239000012634 fragment Substances 0.000 claims abstract description 10
- 230000002068 genetic effect Effects 0.000 claims abstract description 10
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 230000009897 systematic effect Effects 0.000 description 5
- 239000000463 material Substances 0.000 description 4
- 241000196324 Embryophyta Species 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 230000001394 metastastic effect Effects 0.000 description 2
- 206010061289 metastatic neoplasm Diseases 0.000 description 2
- 244000063299 Bacillus subtilis Species 0.000 description 1
- 235000014469 Bacillus subtilis Nutrition 0.000 description 1
- 108020004414 DNA Proteins 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- 241000606768 Haemophilus influenzae Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 101150039352 can gene Proteins 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010429 evolutionary process Effects 0.000 description 1
- 235000003869 genetically modified organism Nutrition 0.000 description 1
- 229940047650 haemophilus influenzae Drugs 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses the gene transferring horizontally recognition methods based on local sensitivity Hash comprising following steps: step 1, full-length genome data being cut into N number of segment with identical bp number according to setting step-length;Step 2, each segment is subjected to the processing of K- word;Step 3, the word frequency of the K- word in each segment and standardization are counted;Step 4, m hash family of functions is constructed, m hash value of each segment is calculated;Step 5, the processing of branch's item is carried out to m hash value of each genetic fragment, compares in the row item of the segment hash value in the row item of hash value and full-length genome, when hash value all at least one row item is all equal, think that the segment is similar to full-length genome;Otherwise, Horizontal Gene Transfer occurs for prediction;Step 6, it calculates the Euclidean distance in similar set between candidate segment and m hash value of full-length genome and Horizontal Gene Transfer then occurs greater than given threshold.The present invention helps to reduce computer using resource, improves prediction computational efficiency.
Description
Technical field
The present invention relates to Bioinformatics fields, more particularly to the identification of the gene transferring horizontally based on local sensitivity Hash
Method.
Background technique
Horizontal Gene Transfer (horizontal gene transfer, that is, HGT) refers to inhereditary material with landscape mode in list
Cell and (or) information interchange between multicellular organisms, i.e. organism obtains inhereditary material from the individual other than parental generation
Process, be promote spore a significant process.Horizontal Gene Transfer can occur in different plant species (interspecies level base
Because of transfer) between or same species (plant in Horizontal Gene Transfer) between.It has broken the boundary of affiliation, flows gene
Possibility become increasingly complex.
As the mankind and other biological genome examining orders are successively performed, it has been found that between different plant species, or even parent
The universal of Horizontal Gene Transfer is further demonstrated with the presence of a large amount of homologous genes on genome between the far biology of edge relationship
Property and remote edge.For the understanding during biological evolution and between species, inhereditary material is determined for the prediction of gene transferring horizontally
Property and quantitative estimation have important meaning.And in recent years, find there are a large amount of DNA with activity of conversion in natural environment
Molecule and the competent cell that can actively absorb exogenous DNA, so that people have the Horizontal Gene Transfer occurred in environment
New understanding.Further investigation to the ecological effect of Horizontal Gene Transfer and its generation, it will help genetically engineered biological is done
New evaluation out, so that the application of technique for gene engineering and genetically modified organism plays bigger effect
The method of detection level gene transfer (HGT) is broadly divided into two major classes: systematic growth method and parametric method.Base
In phylogenetic tree method in terms of detecting HGT confidence level it is higher.But systematic growth method be largely dependent upon it is defeated
The accuracy of the intrinsic gene and species tree that enter, this may be challenging to gene tree and the building of species tree.Even if input tree
In there is no mistake, if systematic growth it is conflicting be also likely to be the evolutionary process other than HGT result (as repeat and lose), from
And infer HGT with leading to these method faults.
Summary of the invention
The purpose of the present invention is to provide the gene transferring horizontally recognition methods based on local sensitivity Hash.
The technical solution adopted by the present invention is that:
Gene transferring horizontally recognition methods based on local sensitivity Hash comprising following steps:
Step 1, full-length genome data are cut into N number of segment with identical bp number according to setting step-length;
Step 2, each segment after segmentation is subjected to the processing of K- word, the i.e. sliding that a segment is K by a length
Window, this section of sequence in the window are a K- words, and each genetic fragment obtains altogether | ∑ |KA word, wherein | ∑ | it is base
The character set size of cause, K are the length of window of K- word processing;
Step 3, the word frequency for counting the K- word in each segment obtains the word frequency set X={ X of K- word1, X2..., Xt...,
Xn, XtFor the frequency that t-th of K- word occurs in current clip, n is the sum of K- word;And the word frequency of each K- word is marked
Quasi-ization handles to obtain the result S after word frequency standardizationt, form the word frequency set S={ S after standardizing1, S2..., St..., Sn, St
For after the word frequency standardization of t-th K- word as a result, n is the sum of K- word;
The wherein result S after the word frequency standardization of each K- wordtCalculation formula it is as follows:
μ is the mean value of all K- word word frequency, and δ is the standard deviation of K- word word frequency;
Step 4, m hash family of functions is constructed, m hash value of each segment is calculated.
Step 5, the processing of branch's item is carried out to m hash value of each genetic fragment, compares hash in the row item of the segment
Value and hash value in the row item of full-length genome,
When there is hash value all in one or more row items all equal, think that the segment is similar to full-length genome and shape
At similar set;Otherwise, prediction is that Horizontal Gene Transfer has occurred;
Step 6, the Euclidean distance in similar set between candidate segment and m hash value of full-length genome is calculated, Europe is worked as
Family name's distance is greater than given threshold, and the value of threshold value isThe length of window that K is handled for K- word, the i.e. length of K- word, then
It is considered that Horizontal Gene Transfer has occurred;
Euclidean distance formula are as follows:
M is hash function number, i.e. m is the hash value number of sequence.
Further, full-length genome data are cut into N number of weight with 5000 bp according to 500 step-length in step 1
Folded segment.
Further, in step 4 hash family of functions form are as follows:
Wherein b is (0, r) inner random number, and r is the segment length being segmented on straight line, the function in family of functions according to a and b not
It is same to establish index functions.
The invention adopts the above technical scheme, and using in parametric method, the most commonly used non-comparison method proposition is based on
The prediction technique of local sensitivity Hash is not only improved the standard the search efficiency of metastatic gene using local sensitivity Hash, and mentioned
Height overcomes parametric method method for the process of ancient generation for improveing the susceptibility of gene to improve the accuracy of prediction
The shortcomings that insensitivity of the gene transferring horizontally of improvement, reduces computer and uses in the case where not reducing the preceding topic of forecasting accuracy
Resource improves prediction computational efficiency.The present invention non-comparison method highly efficient using detection efficiency, overcomes systematic growth method
The disadvantages of calculating costs dearly, and excessively relies on reliable phylogenetic tree.
Detailed description of the invention
The present invention is described in further details below in conjunction with the drawings and specific embodiments;
Fig. 1 is that the present invention is based on the flow diagrams of the gene transferring horizontally recognition methods of local sensitivity Hash.
Specific embodiment
An important factor for Horizontal Gene Transfer frequently occurs in nature, and HGT is many biological evolutions, Horizontal Gene Transfer
An important factor for being many biological evolutions, the inhereditary material generated is for promoting genome innovation to have highly important work
With they may provide the adaptability that selective advantage improves new environment for host organisms.Since Horizontal Gene Transfer can
Gene will be transferred to from the entirely different genotype of remote source phylogenetic ancestry, or the new gene with new function
Group, it is the main source and niche adaptation mechanism of phenotype innovation, while also enriching the diversity of species.Detect HGT thing
Part can help us to be best understood from the historical background of odc gene group, explain the generation of new factor during biological evolution,
With biological significance.In addition, detection HGT event can help us to be best understood from the historical background of odc gene group, solve
The generation of new factor during biological evolution is released, there is biological significance, it is theoretical to enrich classical darwinian evolution.
The present invention uses p-stable LSH algorithm, which does not need to be mapped to luv space in the space Hamming,
Local sensitivity Hash operation can be carried out directly under Euclidean space.
Local sensitivity, it is very big in the still adjacent probability of new data space to be construed to the point being closer in space,
And apart from the farther away probability very little for being mapped to the same bucket.Local sensitivity Hash (Locality-Sensitive
Hashing, LSH) it is to solve the problems, such as that approximate KNN is searched for using the method for local sensitivity.Original LSH method, being will be former
For beginning space reflection into the space Hamming, i.e. the expression-form at luv space midpoint is converted into the expression at the space Hamming midpoint
Form, distance metric are also converted into the distance metric in the space Hamming.The search time of LSH method is linear phase with dimension
It closes, with Space Scale time correlation of indices, greatly reduces search time.
Wherein p-stable LSH is applied in d dimension Euclidean space, 0 < p≤2.P-stable LSH be LSH into
Rank, the algorithm apply the concept of p-stable distribution (p- Stable distritation).P- Stable distritation is not specific point
Cloth, but meet the family of distributions of certain condition.As p=1, representative is standard Cauchy distribution;As p=2, representative is Gauss point
Cloth.The form of p-stable LSH family of functions are as follows:Wherein b is the random number in (0, r), and r is straight line
The segment length of upper segmentation, the function in hash function race establish index functions according to the difference of a and b.The dot product of vector a and v are used to
Hash function race is generated, and the hash function race is local sensitivity.P-stable LSH method is that straight line is divided into length
Degree is the several segments isoplith of r, is mapped to point on same section and assigns identical hash value, is mapped to point on different sections then
Assign different hash values.
As shown in Figure 1, the invention discloses the gene transferring horizontally recognition methods based on local sensitivity Hash comprising with
Lower step:
Step 1, full-length genome data are cut into N number of segment with identical bp number according to setting step-length;
Step 2, each segment after segmentation is subjected to the processing of K- word, the i.e. sliding that a segment is K by a length
Window, this section of sequence in the window are a K- words, and each genetic fragment obtains altogether | ∑ |KA word, wherein | ∑ | it is base
The character set size of cause, K are the length of window of K- word processing;
Step 3, the word frequency for counting the K- word in each segment obtains the word frequency set X={ X of K- word1, X2..., Xt...,
Xn, XtFor the frequency that t-th of K- word occurs in current clip, n is the sum of K- word;And the word frequency of each K- word is marked
Quasi-ization handles to obtain the result S after word frequency standardizationt, form the word frequency set S={ S after standardizing1, S2..., St..., Sn, St
For after the word frequency standardization of t-th K- word as a result, n is the sum of K- word;
The wherein result S after the word frequency standardization of each K- wordtCalculation formula it is as follows:
μ is the mean value of all K- word word frequency, and δ is the standard deviation of K- word word frequency;
Step 4, m hash family of functions is constructed, m hash value of each segment is calculated.
Step 5, the processing of branch's item is carried out to m hash value of each genetic fragment, compares hash in the row item of the segment
Value and hash value in the row item of full-length genome,
When there is hash value all in one or more row items all equal, think that the segment is similar to full-length genome and shape
At similar set;Otherwise, prediction is that Horizontal Gene Transfer has occurred;
Step 6, the Euclidean distance in similar set between candidate segment and m hash value of full-length genome is calculated, Europe is worked as
Family name's distance is greater than given threshold, and the value of threshold value isThe length of window that K is handled for K- word, the i.e. length of K- word, then
It is considered that Horizontal Gene Transfer has occurred;
Euclidean distance formula are as follows:
M is hash function number, i.e. m is the hash value number of sequence.
Further, full-length genome data are cut into N number of weight with 5000 bp according to 500 step-length in step 1
Folded segment.
Further, in step 4 hash family of functions form are as follows:
Wherein b is (0, r) inner random number, and r is the segment length being segmented on straight line, the function in family of functions according to a and b not
It is same to establish index functions.
In order to verify prediction effect of the invention, can be carried out according to the HGT segment result of prediction and actual HGT segment real
Interpretation of result is tested, the gene transferring horizontally recognition methods model based on local sensitivity Hash is carried out using F-measure method
Assessment.
Just treatment process of the invention is described in detail below:
It with intercepted length is that 300bp Escherichia coli segment is to become apparent from analogue data treatment process in description this patent
Receptor, it is random to intercept haemophilus influenzae and full genome segment that bacillus subtilis length is 50bp is as donor gene group,
Two positions are randomly selected in acceptor gene group to be inserted into, and are constituted simulation HGT and are shifted data.Water based on local sensitivity Hash
Flat turn moves gene recognition method, and steps are as follows:
Such as: the sequence that a total length is 400 bp, part of segment are to belong to the segment that HGT transfer occurs;
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAG
CTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTATCTGTCAGAAAAATGACTAAATAGCGGC
TCCCACAATGTTCAAATGTGGGGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTAC
AGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCGCTGGCGTTTCTCTGCTCGACGGTC
ACCGGGATTTTATTTGGCTGGTTACACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAA
AAGCCCGCACCTGACAGTGCGG
Step 1, sequence is cut into the segment of the overlapping of 100 bp, step-length 20, available 20 segments;
Such as first segment are as follows:
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAG
CTTCTGAACTGGTTACCTGCCGTGAGTAAAT
Second segment are as follows:
ACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGC
CGTGAGTAAATTAAAATTTTATTGACTTATC
And so on etc..It is introduced by taking the first two segment as an example below:
Step 2, each segment after segmentation is subjected to the processing of K- word, the i.e. sliding that a segment is K by a length
Window, this section of sequence in the window are a K- words, and each genetic fragment obtains altogether | ∑ |KA word;
Such as when setting sliding window length K=2, the K- word number of each segment acquisition: AA AC AG AT CA CC CG
CT GA GC GG GT TA TC TG TT
Step 3, the word frequency X of the K- word occurred in each genetic fragment is countedt, XtOccur in the segment for t-th of K- word
Frequency, and word frequency is standardized, the result after standardization is St, StAfter the word frequency standardization of t-th of K- word
As a result;
Table one: the word frequency of each word in the two sequences
Table two: the result after word frequency standardization
Step 4, m hash family of functions is constructed, m hash value of each segment, such as 6 hash functions of building are calculated
Race.
A and b is the vector of 16 dimensions of Gaussian distributed.
a1[0.2728,-0.5261,-1.1690,-1.0743,1.0394,0.0461,0.1844,0.0007,
0.0448,-1.1816,-0.2499,-1.82 94,-0.3410,1.8652,0.2897,1.9430]
a2[-0.4519,-1.2261,-1.1323,1.3255,-0.9256,-0.1244,-1.0290,-0.4214,
0.0356,-0.5530,2.6491,0.7 705,-0.9915,-2.0205,-0.4313,-0.5070]
a3[-0.4519,-1.2261,-1.1323,1.3255,-0.9256,-0.1244,-1.0290,-0.4214,
0.0356,-0.5530,2.6491,0.7 705,-0.9915,-2.0205,-0.4313,-0.5070]
a4[-0.4519,-1.2261,-1.1323,1.3255,-0.9256,-0.1244,-1.0290,-0.4214,
0.0356,-0.5530,2.6491,0.7 705,-0.9915,-2.0205,-0.4313,-0.5070]
a5[-0.4519,-1.2261,-1.1323,1.3255,-0.9256,-0.1244,-1.0290,-0.4214,
0.0356,-0.5530,2.6491,0.7 705,-0.9915,-2.0205,-0.4313,-0.5070]
a6[-0.4519,-1.2261,-1.1323,1.3255,-0.9256,-0.1244,-1.0290,-0.4214,
0.0356,-0.5530,2.6491,0.7 705,-0.9915,-2.0205,-0.4313,-0.5070]
b1[4.5692,1.2317,5.4587,0.7120,4.6902,3.0200,5.4109,2.5662,4.5222,
2.4379,5.7483,1.7211,1.7362,4.0020,0.2040,5.9893]
b2[3.4901,1.3752,0.0630,3.3493,3.6519,3.3700,2.7356,3.2697,3.7174,
4.9290,3.5241,5.7950,5.5712,0.2950,2.7426,0.5739]
b3[2.7009,0.8035,4.7329,1.3775,4.6591,0.6980,4.4261,0.7008,0.8813,
0.8644,2.3402,5.8353,1.3543,3.7460,2.9168,5.3354]
b4[2.5758,2.2662,5.6072,1.1562,1.1456,3.6358,5.4605,4.5566,2.8090,
1.1688,5.1190,0.4194,2.6726,3.0355,3.5281,4.6226]
b5[0.9153,2.3426,3.6952,4.1016,0.0308,1.0368,1.8780,3.7876,5.7924,
2.4167,5.8599,0.6297,5.6887,4.0295,1.6678,1.9148]
b6[1.8040,3.3165,2.1672,3.7913,4.2346,3.4207,0.8061,3.8269,3.8916,
3.2757,1.0729,3.6728,2.2185,2.7205,3.6779,0.1046]
Two segments are calculated according to p-stable LSH formula, and each segment obtains 6 hash values;
First segment, 6 hash values are [7,6,6,6,6,6]
Second segment, 6 hash values are [7,7,6,6,6,7]
Step 5, the processing of branch's item is carried out to 6 hash values of each genetic fragment, compares hash in the row item of the segment
Value and hash value in the row item of full-length genome, if having hash value all in one or more row items it is all equal if think the segment
Similar to full-length genome, otherwise prediction is that Horizontal Gene Transfer has occurred;
Step 6, the Euclidean distance in similar set between candidate segment and full-length genome hash value is calculated, if Euclidean distance
Then it is considered that Horizontal Gene Transfer has occurred greater than threshold value;
In order to verify prediction effect of the invention, can be carried out according to the HGT segment result of prediction and actual HGT segment real
Interpretation of result is tested, the gene transferring horizontally recognition methods model based on local sensitivity Hash is carried out using F-measure method
Assessment.
The present invention assumes that the gene of horizontal transfer is related to distant relative raw using non-comparison method prediction level gene transfer
The gene of object has the similitude of height.Non- comparison method is more sensitive generally for the gene transferring horizontally occurred recently,
It is insensitive for the gene for having already passed through improvement of ancient generation horizontal transfer.The non-usual speed of comparison method is quickly.They
It can quickly determine the candidate list in the horizontal transfer region assumed.Non- comparison method avoids the problem of arrangement mass data,
These data may excessively disperse, and can not obtain significant comparison.Non- comparison method will not using gene as unit of analysis, because
This can detecte the genome area in any possible lateral source.
The present invention is not only improved the standard the search efficiency of metastatic gene using local sensitivity Hash, and is improved for improvement
The susceptibility of gene, to improve the accuracy of prediction.It is high it is an object of the invention to overcome systematic growth method to calculate cost
It is expensive, the disadvantages of excessively relying on reliable phylogenetic tree.Using in parametric method, the most commonly used non-comparison method proposes base
In the prediction technique of local sensitivity Hash, the characteristics of using local sensitivity hash function, improve for ancient turn by improvement
The susceptibility for moving gene overcomes the insensitivity of the gene transferring horizontally of process improvement of the parametric method method for ancient generation
Disadvantage reduces computer and uses resource in the case where not reducing the preceding topic of forecasting accuracy, improves prediction computational efficiency.
Claims (3)
1. the gene transferring horizontally recognition methods based on local sensitivity Hash, it is characterised in that: itself the following steps are included:
Step 1, full-length genome data are cut into N number of segment with identical bp number according to setting step-length;
Step 2, each segment after segmentation is subjected to the processing of K- word, each genetic fragment obtains altogether | ∑ |KA word, wherein | ∑ |
For the character set size of gene, K is the length of window of K- word processing;
Step 3, the word frequency for counting the K- word in each segment obtains the word frequency set X={ X of K- word1, X2..., Xt..., Xn, XtFor
The frequency that t-th of K- word occurs in current clip, n are the sum of K- word;And place is standardized to the word frequency of each K- word
Reason obtains the result S after word frequency standardizationt, form the word frequency set S={ S after standardizing1, S2..., St..., Sn, StIt is t-th
K- word word frequency standardization after as a result, n be K- word sum;
The wherein result S after the word frequency standardization of each K- wordtCalculation formula it is as follows:
μ is the mean value of all K- word word frequency, and δ is the standard deviation of K- word word frequency;
Step 4, m hash family of functions is constructed, m hash value of each segment is calculated;
Step 5, the processing of branch's item is carried out to m hash value of each genetic fragment, compare in the row item of the segment hash value with
Hash value in the row item of full-length genome,
When there is hash value all in one or more row items all equal, think that the segment is similar to full-length genome and forms phase
Like set;Otherwise, prediction is that Horizontal Gene Transfer has occurred;
Step 6, calculate Euclidean distance between candidate segment and m hash value of full-length genome in similar set, when Euclidean away from
It is from the value for being greater than given threshold, threshold valueK is the length of window of K- word processing, the i.e. length of K- word, then it is assumed that
It is that Horizontal Gene Transfer has occurred;
Euclidean distance formula are as follows:
M is hash function number, i.e. m is the hash value number of sequence.
2. the gene transferring horizontally recognition methods according to claim 1 based on local sensitivity Hash, it is characterised in that: step
Full-length genome data are cut into the segment of N number of overlapping with 5000 bp in rapid 1 according to 500 step-length.
3. the gene transferring horizontally recognition methods according to claim 1 based on local sensitivity Hash, it is characterised in that: step
The form of hash family of functions in rapid 4 are as follows:
Wherein b is (0, r) inner random number, and r is the segment length being segmented on straight line, and the function in family of functions is built according to the difference of a and b
Vertical index functions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810988512.1A CN109243529B (en) | 2018-08-28 | 2018-08-28 | Horizontal transfer gene identification method based on locality sensitive hashing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810988512.1A CN109243529B (en) | 2018-08-28 | 2018-08-28 | Horizontal transfer gene identification method based on locality sensitive hashing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109243529A true CN109243529A (en) | 2019-01-18 |
CN109243529B CN109243529B (en) | 2021-09-07 |
Family
ID=65068789
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810988512.1A Expired - Fee Related CN109243529B (en) | 2018-08-28 | 2018-08-28 | Horizontal transfer gene identification method based on locality sensitive hashing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109243529B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110660452A (en) * | 2019-09-20 | 2020-01-07 | 中国人民解放军军事科学院军事医学研究院 | Method for detecting bacterial gene horizontal transfer and donor strain generating horizontal transfer DNA fragment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101533484A (en) * | 2008-03-12 | 2009-09-16 | 中国科学院半导体研究所 | Method for forecasting gene transferring horizontally in genome |
CN103631928A (en) * | 2013-12-05 | 2014-03-12 | 中国科学院信息工程研究所 | LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system |
CN105183792A (en) * | 2015-08-21 | 2015-12-23 | 东南大学 | Distributed fast text classification method based on locality sensitive hashing |
CN107103206A (en) * | 2017-04-27 | 2017-08-29 | 福建师范大学 | The DNA sequence dna cluster of local sensitivity Hash based on standard entropy |
-
2018
- 2018-08-28 CN CN201810988512.1A patent/CN109243529B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101533484A (en) * | 2008-03-12 | 2009-09-16 | 中国科学院半导体研究所 | Method for forecasting gene transferring horizontally in genome |
CN103631928A (en) * | 2013-12-05 | 2014-03-12 | 中国科学院信息工程研究所 | LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system |
CN105183792A (en) * | 2015-08-21 | 2015-12-23 | 东南大学 | Distributed fast text classification method based on locality sensitive hashing |
CN107103206A (en) * | 2017-04-27 | 2017-08-29 | 福建师范大学 | The DNA sequence dna cluster of local sensitivity Hash based on standard entropy |
Non-Patent Citations (2)
Title |
---|
JIE LIN 等: "《SSAW:A new sequence similarity analysis method based on the stationary discrete wavelet transform》", 《SPRING》 * |
徐彭娜 等: "基于位置信息熵的局部敏感哈希聚类方法", 《计算机应用与软件》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110660452A (en) * | 2019-09-20 | 2020-01-07 | 中国人民解放军军事科学院军事医学研究院 | Method for detecting bacterial gene horizontal transfer and donor strain generating horizontal transfer DNA fragment |
Also Published As
Publication number | Publication date |
---|---|
CN109243529B (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Koetschan et al. | Internal transcribed spacer 1 secondary structure analysis reveals a common core throughout the anaerobic fungi (Neocallimastigomycota) | |
CN101748213B (en) | Environmental microorganism detection method and system | |
CN110997936A (en) | Method and device for genotyping based on low-depth genome sequencing and application of method and device | |
US20190177719A1 (en) | Method and System for Generating and Comparing Reduced Genome Data Sets | |
Ma | DeepMNE: deep multi-network embedding for lncRNA-disease association prediction | |
Azad et al. | Detecting laterally transferred genes | |
Zhou et al. | Deep learning reveals many more inter-protein residue-residue contacts than direct coupling analysis | |
Shaker et al. | Information retrieval for cancer cell detection based on advanced machine learning techniques | |
de Silva et al. | Evolutionary k-nearest neighbor imputation algorithm for gene expression data | |
Chen et al. | Computational prediction of operons in Synechococcus sp. WH8102 | |
Saha et al. | In silico prediction of yeast deletion phenotypes | |
Priyadarshana et al. | A modified cross entropy method for detecting multiple change points in DNA Count Data | |
CN109243529A (en) | Gene transferring horizontally recognition methods based on local sensitivity Hash | |
CN108268753A (en) | A kind of microorganism group recognition methods and device, equipment | |
US7962427B2 (en) | Method for the detection of atypical sequences via generalized compositional methods | |
Garzon et al. | DNA chips for species identification and biological phylogenies | |
James et al. | MeShClust2: Application of alignment-free identity scores in clustering long DNA sequences | |
CN106555008A (en) | Detection and identification method and system for microorganisms | |
Aleb et al. | An improved K-means algorithm for DNA sequence clustering | |
CN106650311A (en) | Detection and recognition method and system for microorganisms | |
Nguyen et al. | Efficient and accurate OTU clustering with GPU-based sequence alignment and dynamic dendrogram cutting | |
Raje et al. | Self-organizing maps: A tool to ascertain taxonomic relatedness based on features derived from 16S rDNA sequence | |
Das et al. | A novel SFLA based method for gene expression biclustering | |
Shi et al. | Ultra-rapid metagenotyping of the human gut microbiome | |
Aljouie et al. | Cross-validation and cross-study validation of chronic lymphocytic leukaemia with exome sequences and machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210907 |