CN109072278A - Isolated nucleic acid and application - Google Patents

Isolated nucleic acid and application Download PDF

Info

Publication number
CN109072278A
CN109072278A CN201680083628.XA CN201680083628A CN109072278A CN 109072278 A CN109072278 A CN 109072278A CN 201680083628 A CN201680083628 A CN 201680083628A CN 109072278 A CN109072278 A CN 109072278A
Authority
CN
China
Prior art keywords
nucleic acid
acid sequence
abundance
sequence
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201680083628.XA
Other languages
Chinese (zh)
Inventor
仲文迪
刘婉辉
郑智俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Realbio Technology Co ltd
Original Assignee
Shanghai Realbio Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Realbio Technology Co ltd filed Critical Shanghai Realbio Technology Co ltd
Publication of CN109072278A publication Critical patent/CN109072278A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/02Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving viable microorganisms
    • C12Q1/04Determining presence or kind of microorganism; Use of selective media for testing antibiotics or bacteriocides; Compositions containing a chemical indicator therefor
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Abstract

The nucleic acid of one group of separation, it includes at least one of first, second, and third nucleic acid sequence cluster, nucleic acid sequence in first, second, and third nucleic acid sequence cluster is corresponded with sequence shown in SEQ ID NO:1-110, SEQ ID NO:111-299 and SEQ ID NO:300-453 respectively, and the sequence similarity of the corresponding sequence of the nucleic acid sequence in every first, second or third nucleic acid sequence cluster is not less than 90%.A kind of nucleic acid using the separation determines the method and device of individual state.Compared to health population, so-called isolated nucleic acid significant enrichment in patients with ankylosing spondylitis group can be as the diacritics of health population and patients with ankylosing spondylitis group.

Description

Isolated nucleic acids and uses Technical Field
The invention relates to the field of biomarkers, in particular to an isolated nucleic acid and application thereof, and more particularly to a group of isolated nucleic acids, application of the isolated nucleic acids, a method for determining the state of an individual by using the isolated nucleic acids, a device for determining the state of the individual by using the isolated nucleic acids, a method for classifying a plurality of individuals by using the isolated nucleic acids, a medicament for treating ankylosing spondylitis and a method for preparing the medicament for treating ankylosing spondylitis.
Background
Ankylosing Spondylitis (AS) is a chronic progressive inflammatory disease that mainly invades the spine and involves the sacroiliac and peripheral joints. The disease is mostly male with 15-30 years old, the ratio of male to female is 2: 1-3: 1, the cause of ankylosing spondylitis is not completely clear at present, and the research results in recent years show that the disease is related to factors such as genetic predisposition, infection, immunity and the like.
Ankylosing spondylitis is usually insidious in onset and has no clinical symptoms in the early stage, and some patients can show mild general symptoms in the early stage, such as hypodynamia, emaciation, long-term or intermittent low fever, anorexia, mild anemia and the like. Because of the mild condition, most patients cannot find the disease early, so that the condition is delayed and the optimal treatment time is lost.
Ankylosing spondylitis is closely related to human leukocyte antigen HLA-B27, according to epidemiological investigation, the positive rate of HLA-B27 of ankylosing spondylitis patients is as high as 90-96%, while the positive rate of HLA-27 of common people is only 4-9%; the incidence rate of ankylosing spondylitis of HLA-B27 positive people is about 10% -20%, while the incidence rate of common people is 1-2 per mill, and the difference is about 100 times. Although the HLA-B27 test is helpful for diagnosis of ankylosing spondylitis, most patients are diagnosed by medical history, signs and X-ray examination. At present, the early diagnosis of ankylosing spondylitis is mainly examined by imaging such as CT, radioactive nuclear scanning, magnetic resonance and the like, and the examination process is complex.
With the completion of human genome sequencing and the high-speed development of high-throughput sequencing technology, gene screening becomes the direction of ankylosing spondylitis diagnosis, and has advantages for discovering groups with potential ankylosing spondylitis. Studies have shown that more than 70% of ankylosing spondylitis patients suffer from intestinal inflammation, and 5-10% of these patients have severe intestinal inflammation and may develop clinical Inflammatory Bowel Disease (IBD) or Crohn's Disease (CD) (miellants et al, 1985). Some marker genes for Crohn's Disease (CD) are associated with ankylosing spondylitis (Parkes et al, 2013), suggesting that both diseases may have similar pathogenesis, possibly associated with gut disorders. Studies have shown that multiple genes associated with ankylosing spondylitis play an important role in gut immunity, such as genes involved in the interleukin IL-23 pathway, in regulating gut health (Wellcome et al, 2007). Compared to healthy controls, ankylosing spondylitis patients and their orthotopic intestinal permeability is increased, again illustrating the important role of intestinal microbes in ankylosing spondylitis (miellents et al, 1991). To date, there are no reports of intestinal microbial markers for patients with ankylosing spondylitis.
Disclosure of Invention
The present invention is directed to at least one of the above problems or to at least one alternative business means.
The present invention aims to solve at least one of the above technical problems to some extent or at least to provide a commercial choice.
According to a first aspect of the invention, there is provided a set of isolated nucleic acids comprising a cluster of nucleic acid sequences of at least one of: a first cluster of nucleic acid sequences, the nucleic acid sequences in the first cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 1-110, and the nucleic acid sequences in each of the first nucleic acid sequence clusters correspond to their corresponding SEQ ID NOs: 1-110, the sequence similarity of the sequences is not less than 90%; a second cluster of nucleic acid sequences, the nucleic acid sequences in the second cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 111-299, and the nucleic acid sequence in each of the second nucleic acid sequence clusters is corresponding to the sequence shown in SEQ ID NO: the sequence similarity of the sequences in 111-299 is not less than 90 percent; a third cluster of nucleic acid sequences, the nucleic acid sequences in the third cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 300-453, and the nucleic acid sequence in each of said third nucleic acid sequence clusters corresponds to the corresponding sequence shown in SEQ ID NO: sequence similarity of the sequences in 300-453 was not less than 90%.
According to an embodiment of the invention, the isolated nucleic acid further comprises a fourth cluster of nucleic acid sequences, the nucleic acid sequences in the fourth cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 454-524, and the nucleic acid sequence in each of the fourth nucleic acid sequence clusters corresponds to the sequence shown in SEQ ID NO: the sequence similarity of the sequences in 454-524 is not less than 90%.
According to a second aspect of the present invention, the present invention provides the use of the above-mentioned isolated nucleic acid for detecting ankylosing spondylitis, and/or for treating ankylosing spondylitis, and/or for the preparation of a medicament for treating ankylosing spondylitis, and/or for the preparation of a functional food.
According to a third aspect of the invention, the invention provides a method of obtaining an isolated nucleic acid according to one aspect of the invention as described above, the method comprising: (1) obtaining a first sequencing result and a second sequencing result, wherein the first sequencing result is a sequencing result of nucleic acids of a plurality of stool samples of ankylosing spondylitis patients and comprises a plurality of first reads, and the second sequencing result is a sequencing result of nucleic acids of stool samples of a plurality of healthy individuals and comprises a plurality of second reads; (2) respectively assembling a first reading segment and a second reading segment, and correspondingly obtaining a plurality of first assembling sequences and a plurality of second assembling sequences; (3) determining the abundance of the first assembly sequence and the abundance of the second assembly sequence based on the support of the first assembly sequence by the first read and the support of the second assembly sequence by the second read, respectively; (4) clustering the first assembly sequences and the second assembly sequences according to the abundance of the first assembly sequences and the abundance of the second assembly sequences determined in (3) to obtain a plurality of gene clusters, wherein each gene cluster comprises a plurality of first assembly sequences and/or second assembly sequences; (5) a statistical test to determine gene clusters significantly enriched in fecal samples of the plurality of ankylosing spondylitis patients and/or the plurality of healthy individuals to obtain the nucleic acid.
According to a fourth aspect of the present invention, there is provided a method of determining the status of an individual using a nucleic acid according to one aspect of the present invention as described above, the method comprising: determining the abundance of the nucleic acid sequence cluster in the nucleic acid in a fecal sample of the individual and in a control group; comparing the difference between the abundance of the nucleic acid sequence cluster in the individual's stool sample and the abundance in a control group consisting of stool samples from one or more groups of individuals of the same state, including suffering from ankylosing spondylitis and not suffering from ankylosing spondylitis, and determining the status of the individual based on whether the difference is statistically significant.
All or a part of the steps of the method for determining the state of an individual using the nucleic acid of the aspect of the present invention described above may be performed using an apparatus/system including the respective unit function modules which are detachable, or the method may be programmed, stored in a machine-readable medium, and executed by a machine.
According to a fifth aspect of the present invention, there is provided an apparatus for determining the status of an individual using the nucleic acid according to the above-described aspect of the present invention, the apparatus being adapted to perform the method for determining the status of an individual according to the above-described aspect of the present invention, the apparatus comprising: an abundance determining unit for determining the abundance of the nucleic acid sequence cluster in the nucleic acid in a stool sample of the individual and in a control group; an individual status determination unit for comparing the abundance of the nucleic acid sequence cluster in the individual's stool sample with the abundance in a control group consisting of one or more groups of stool samples of individuals of the same status, including with ankylosing spondylitis and without ankylosing spondylitis, and determining the status of the individual depending on whether the difference is statistically significant.
According to a sixth aspect of the present invention, there is provided a system for determining the status of an individual using a nucleic acid according to one aspect of the present invention as described above, the system being arranged to carry out all or part of the steps of the method for determining the status of an individual according to one aspect of the present invention as described above, the system comprising: the data input module is used for inputting data; the data output module is used for outputting data; a processor for executing an executable program, the execution of which comprises all or part of the steps of performing the method of determining the status of an individual of one aspect of the present invention as described above; and the storage module is connected with the data input module, the data output module and the processor and is used for storing data, wherein the storage module comprises the executable program.
According to a seventh aspect of the present invention, there is provided a method of classifying a plurality of individuals using a nucleic acid according to one aspect of the present invention as described above, the method comprising: determining the state of each individual by using the method for determining the state of the individual according to the aspect of the invention; and classifying the individuals according to the obtained states of the individuals. The method can distinguish a plurality of individuals or a plurality of unknown stool samples according to different states of the individuals, and is convenient for classification and marking management.
According to an eighth aspect of the invention there is provided a medicament for the treatment of ankylosing spondylitis, the medicament causing a decrease in the abundance of a first, second and/or third one of the nucleic acids of any one of the preceding embodiments in the intestine of a patient and/or causing an increase in the abundance of a fourth one of the nucleic acid sequences of a nucleic acid comprising the fourth nucleic acid sequence cluster.
According to a ninth aspect of the present invention, there is provided a method for producing or screening a medicament for treating ankylosing spondylitis according to one aspect of the present invention described above, the method comprising the step of screening a substance causing a decrease in the abundance of the first nucleic acid sequence cluster, the second nucleic acid sequence cluster and/or the third nucleic acid sequence cluster in a nucleic acid according to any one of the embodiments of the present invention described above, and/or a substance causing an increase in the abundance of the fourth nucleic acid sequence cluster in a nucleic acid including the fourth nucleic acid sequence cluster, as the medicament.
The isolated nucleic acid of one aspect of the present invention is determined by comparing the abundance difference between the sequences of the microorganisms in the intestine of the ankylosing spondylitis patients and healthy populations by processing and analyzing the sequencing data of the intestinal microorganism samples, and then performing a large number of sample tests. The group of isolated nucleic acids can be used as ankylosing spondylitis markers, wherein any one of the first to third nucleic acid sequence clusters is significantly enriched in a population of ankylosing spondylitis patients compared with a healthy individual group, and the significant enrichment refers to that the abundance of the nucleic acid sequence cluster contained in the ankylosing spondylitis markers in a disease group is statistically significantly higher or is significantly and substantially higher than that in a healthy group compared with that in a healthy control group. The isolated nucleic acid may also include a fourth cluster of nucleic acid sequences that is significantly enriched in the healthy population as compared to the group of ankylosing spondylitis patients, wherein significantly enriched is a cluster of sequences that is statistically significantly higher or significantly, substantially higher than the abundance in the disease group as compared to the abundance in the disease group. The group of separated nucleic acids can be used for determining the probability of an individual in a state with ankylosing spondylitis or in a healthy state, and can be used for non-invasive early detection or auxiliary detection of ankylosing spondylitis.
Furthermore, the substance capable of decreasing the abundance of any one of the first to third nucleic acid sequence clusters in the group of isolated nucleic acids and/or the substance capable of increasing the abundance of the fourth nucleic acid sequence cluster in the isolated nucleic acids comprising the fourth nucleic acid sequence cluster can be used for treating ankylosing spondylitis or is beneficial to ankylosing spondylitis patients, and the substance capable of decreasing/increasing the abundance of the nucleic acid sequence cluster in the group of isolated nucleic acids is not limited to drugs for treating ankylosing spondylitis and functional foods for beneficial gut flora balance. The nucleic acid determined by the method of the invention, namely the ankylosing spondylitis marker, can be used for preparing medicines for treating ankylosing spondylitis and/or functional foods, health-care medicines and the like beneficial to balancing intestinal flora.
Moreover, the method, the device and/or the system for determining the state of the individual according to one aspect of the present invention are based on detecting the abundance of the nucleic acid sequence cluster in the fecal sample of the individual, comparing the abundance of the nucleic acid sequence cluster in the detected nucleic acid with the abundance of the nucleic acid sequence cluster in the control group, and determining the relative probability that the individual is the ankylosing spondylitis individual or the healthy individual according to the obtained comparison result. Provides a non-invasive auxiliary detection method for early detection of ankylosing spondylitis.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic representation of the experimental analysis procedure for screening to identify ankylosing spondylitis markers in an example of the present invention.
Fig. 2 is an abundance heatmap of clustered CAGs 521 in a discovery dataset and a validation dataset in an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. It should be noted that the terms "first" or "second", etc. are used herein for convenience of description only and are not to be construed as indicating or implying any relative importance, nor should they be construed as implying a chronological relationship between the first and second named.
In the description of the present invention, "a plurality" means two or more unless otherwise specified. In this document, unless expressly stated or limited otherwise, the terms "connected" and "connected" are to be construed broadly, e.g., as meaning either a fixed connection or a removable connection or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements.
A biological marker is a cellular, biochemical or molecular change that can be detected from a biological medium. Biological media include various body fluids, tissues, cells, feces, hair, breath, and the like.
The abundance refers to the abundance of a microorganism or nucleic acid sequence in a population of such microorganisms or sequences. For example, the abundance of a microorganism in a population of gut microbiota, may be expressed as the amount of that microorganism in that population; as another example, the abundance of a nucleic acid sequence in a set of nucleic acid sequences can be expressed as a ratio of the number of such nucleic acid sequences to the total number of the set of sequences.
The identity (identity) and similarity (similarity) of sequences are referred to as the degree of identity or similarity, respectively, between sequences.
According to one embodiment of the present invention there is provided a set of isolated nucleic acids comprising at least one of the following clusters of nucleic acid sequences: a first cluster of nucleic acid sequences, the nucleic acid sequences in the first cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 1-110, and the nucleic acid sequences in each of the first nucleic acid sequence clusters correspond to their corresponding SEQ ID NOs: 1-110, the sequence similarity of the sequences is not less than 90%; a second cluster of nucleic acid sequences, the nucleic acid sequences in the second cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 111-299, and the nucleic acid sequence in each of the second nucleic acid sequence clusters is corresponding to the sequence shown in SEQ ID NO: the sequence similarity of the sequences in 111-299 is not less than 90 percent; a third cluster of nucleic acid sequences, the nucleic acid sequences in the third cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 300-453, and the nucleic acid sequence in each of said third nucleic acid sequence clusters corresponds to the corresponding sequence shown in SEQ ID NO: the sequence similarity of the sequences in 300-453 is not less than 90%.
According to an embodiment of the invention, the isolated nucleic acid further comprises a fourth cluster of nucleic acid sequences, the nucleic acid sequences in the fourth cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 454-524, and the nucleic acid sequence in each of the fourth nucleic acid sequence clusters corresponds to the sequence shown in SEQ ID NO: the sequence similarity of the sequences in 454-524 is not less than 90%.
The group of separated nucleic acids is determined by comparing the abundance difference of the intestinal microbial sequences of ankylosing spondylitis patients and healthy groups through processing and analyzing sequencing data of intestinal microbial samples and through a large number of sample test verifications, and the nucleic acid sequence contained in each nucleic acid sequence cluster is a non-redundant sequence. The group of isolated nucleic acids can be used as ankylosing spondylitis markers, wherein any one of the first to third nucleic acid sequence clusters is significantly enriched in a population of ankylosing spondylitis patients compared with a healthy individual group, wherein the significant enrichment refers to that the abundance of the nucleic acid sequence cluster contained in the ankylosing spondylitis markers in a disease group is statistically significantly higher or is significantly and substantially higher than that in a healthy group compared with that in a healthy control group; the isolated nucleic acid may also include a fourth cluster of nucleic acid sequences that is significantly enriched in the healthy population as compared to the group of ankylosing spondylitis patients, wherein significantly enriched is a cluster of sequences that is statistically significantly higher or significantly, substantially higher than the abundance in the disease group as compared to the abundance in the disease group. The group of separated nucleic acids can be used for determining the probability of an individual in a state with ankylosing spondylitis or in a healthy state, and can be used for non-invasive early detection or auxiliary detection of ankylosing spondylitis. It is noted that, SEQ ID NO: 1-110, SEQ ID NO: 111-299, SEQ ID NO: 300-453 and SEQ ID NO: 454-524 are sequence clusters determined by the inventors based on the sequencing data of the sample nucleic acid and the assembly clustering analysis, and the sequences in each sequence cluster are from the same species, but it is generally considered that the success rate of DNA/DNA hybridization is as high as 80% or more in the same species, so the inventors herein define that as long as the sequence contained in the nucleic acid sequence cluster is identical to the corresponding sequence shown in SEQ ID NO: 1-110, SEQ ID NO: 111-299, SEQ ID NO: 300-453 and/or SEQ ID NO: the sequence similarity of the sequences in 454-524 is not less than 90%, and the sequences can be used as the ankylosing spondylitis sequence markers. The inventor changes the nucleotide of the sequence in the nucleic acid sequence cluster for many times, so that the sequence similarity between the sequence in the nucleic acid sequence cluster and the corresponding sequence in SEQ ID NO is not less than 90 percent for experimental verification, and the sequence similarity not less than 90 percent is supported by experiments and can also be used as the ankylosing spondylitis marker.
According to an embodiment of the invention, the set of isolated nucleic acids consists of one, two or all three of the first, second and third nucleic acid sequence clusters.
According to one embodiment of the invention, said nucleic acid comprises said first nucleic acid sequence cluster. Each nucleic acid sequence in the first nucleic acid sequence cluster has a sequence identical to its corresponding sequence in SEQ ID NO: 1-110 is not less than 95% sequence similarity. According to one embodiment of the invention, the nucleic acid further comprises at least one of the second nucleic acid sequence cluster and the third nucleic acid sequence cluster. According to one embodiment of the invention, the nucleic acid sequences in each of said second cluster of nucleic acid sequences correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 111-299 was not less than 95%. Each nucleic acid sequence in the third cluster of nucleic acid sequences has a sequence identical to its corresponding sequence in SEQ ID NO: the sequence similarity of the sequences in 300-453 was not less than 95%.
According to one embodiment of the invention, said nucleic acid comprises said second cluster of nucleic acid sequences. According to one embodiment of the invention, the nucleic acid sequences in each of said second cluster of nucleic acid sequences correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 111-299 was not less than 95%. According to one embodiment of the invention, the nucleic acid further comprises at least one of the first nucleic acid sequence cluster and the third nucleic acid sequence cluster. According to one embodiment of the invention, the nucleic acid sequences in each of said first nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: 1-110 is not less than 95% sequence similarity. Each nucleic acid sequence in the third cluster of nucleic acid sequences has a sequence identical to its corresponding sequence in SEQ ID NO: the sequence similarity of the sequences in 300-453 was not less than 95%.
According to one embodiment of the invention, said nucleic acid comprises said third cluster of nucleic acid sequences. According to one embodiment of the invention, the nucleic acid sequences in each of said third cluster of nucleic acid sequences correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 300-453 was not less than 95%. According to one embodiment of the invention, the nucleic acid further comprises at least one of the first nucleic acid sequence cluster and the second nucleic acid sequence cluster. Each nucleic acid sequence in the first nucleic acid sequence cluster has a sequence identical to its corresponding sequence in SEQ ID NO: 1-110 is not less than 95% sequence similarity. Each of the nucleic acid sequences in the second cluster of nucleic acid sequences has a sequence identical to its corresponding sequence of SEQ ID NO: the sequence similarity of the sequences in 111-299 was not less than 95%.
According to one embodiment of the invention, the nucleic acid further comprises fourth nucleic acid sequence clusters, each of the nucleic acid sequences in the fourth nucleic acid sequence clusters has a sequence identical to its corresponding sequence in SEQ ID NO: the sequence similarity of the sequences in 454-524 is not less than 95%.
According to another embodiment of the present invention, there is provided a use of the isolated nucleic acid of any one of the above embodiments in detecting ankylosing spondylitis, and/or in treating ankylosing spondylitis, and/or in the manufacture of a medicament for treating ankylosing spondylitis, and/or in the manufacture of a functional food.
The isolated nucleic acid is determined by processing and analyzing sequencing data of an intestinal microorganism sample, comparing the abundance difference of the intestinal microorganism sequences of ankylosing spondylitis patients and healthy groups, and then verifying through a large number of sample tests. The isolated nucleic acid can be used as an ankylosing spondylitis marker, wherein any one of the first to third nucleic acid sequence clusters is significantly enriched in a population of ankylosing spondylitis patients compared with a healthy individual group, wherein significantly enriched refers to that the abundance of the nucleic acid sequence cluster contained in the ankylosing spondylitis marker in a disease group is statistically significantly higher or significantly and substantially higher than that in a healthy group compared with that in a healthy control group; the isolated nucleic acid may also include a fourth cluster of nucleic acid sequences that is significantly enriched in the healthy population as compared to the group of ankylosing spondylitis patients, wherein significantly enriched is a cluster of sequences that is statistically significantly higher or significantly, substantially higher than the abundance in the disease group as compared to the abundance in the disease group. The group of separated nucleic acids can be used for determining the probability of an individual in a state with ankylosing spondylitis or in a healthy state, and can be used for non-invasive early detection or auxiliary detection of ankylosing spondylitis; the substance capable of reducing the abundance of any one of the first to third nucleic acid sequence clusters in the isolated nucleic acid and/or the substance capable of increasing the abundance of the fourth nucleic acid sequence cluster in the isolated nucleic acid comprising the fourth nucleic acid sequence cluster can be used for treating ankylosing spondylitis or can be advantageously administered to patients with ankylosing spondylitis, the substance capable of reducing/increasing the abundance of the nucleic acid sequence cluster in the isolated nucleic acid is not limited to a medicament for treating ankylosing spondylitis and a functional food for balancing beneficial intestinal flora, and the nucleic acids of the isolated pair provided by the embodiment can be used for preparing a medicament for treating ankylosing spondylitis and/or for preparing a functional food, a health-care medicament and the like for balancing intestinal flora.
According to another embodiment of the present invention, there is provided a method of obtaining an isolated nucleic acid as described in any of the above embodiments of the present invention, the method comprising: (1) obtaining a first sequencing result and a second sequencing result, wherein the first sequencing result is a sequencing result of nucleic acids of a plurality of stool samples of ankylosing spondylitis patients and comprises a plurality of first reads, and the second sequencing result is a sequencing result of nucleic acids of stool samples of a plurality of healthy individuals and comprises a plurality of second reads; (2) respectively assembling a first reading segment and a second reading segment, and correspondingly obtaining a plurality of first assembling sequences and a plurality of second assembling sequences; (3) determining the abundance of the first assembly sequence and the abundance of the second assembly sequence based on the support of the first assembly sequence by the first read and the support of the second assembly sequence by the second read, respectively; (4) clustering the first assembly sequences and the second assembly sequences according to the abundance of the first assembly sequences and the abundance of the second assembly sequences determined in (3) to obtain a plurality of gene clusters, wherein each gene cluster comprises a plurality of first assembly sequences and/or second assembly sequences; (5) a statistical test to determine gene clusters significantly enriched in fecal samples of the plurality of ankylosing spondylitis patients and/or the plurality of healthy individuals to obtain the nucleic acid. The method can be used for efficiently determining the nucleic acid sequence cluster which can be used as the ankylosing spondylitis marker.
According to an embodiment of the present invention, (3) includes: respectively aligning the first reading section and the second reading section to the first assembly sequence and the second assembly sequence to correspondingly obtain a first alignment result and a second alignment result; determining the first assembly order based on the obtained first and second comparison results, respectively, using the following formulaAbundance of column and second assembly sequences: abundance of assembly sequence S Ab (S) ═ Ab (U)S)+Ab(MS) Wherein, Ab (U)S)=US/lS,USNumber of reads of the assembled sequence S for unique alignment, lSFor the length of the assembly sequence S, MSFor the number of reads of the assembled sequence S over a non-unique alignment, i represents the number of reads of the assembled sequence S over a non-unique alignment, CoiIs the abundance coefficient of the read i of the assembled sequence S over the non-unique alignment, N1 is the total number of assembled sequences over the read alignment of the assembled sequence S over the non-unique alignment, j is the number of assembled sequences over the read alignment of the assembled sequence S over the non-unique alignment, U is the number of assembled sequences over the non-unique alignment, andjthe number of reads that assemble sequence j for the unique alignment. The abundance determination formula is based on the contribution of the reads of the assembled sequence to the abundance of the assembled sequence on the unique and non-unique alignments in the alignment results, and the determined abundance is very accurate while making full use of the sequencing data.
According to an embodiment of the present invention, before performing (4), the first assembly sequence and the second assembly sequence present only in the stool sample less than one-fifteenth of the total number of the stool samples are removed. Thus, the finally obtained nucleic acid sequence capable of being used as a marker has practical significance and can be used for detection and judgment of unknown samples.
According to an embodiment of the present invention, (4) includes: performing first clustering on the first assembly sequence and the second assembly sequence to obtain a first clustering result; and performing second clustering on the first clustering result obtained after the clusters only comprising one assembly sequence are removed, so as to obtain the plurality of gene clusters. And performing secondary clustering based on the abundance difference, and eliminating clusters which only contain one assembly sequence in the first clustering result, so that the finally obtained nucleic acid sequence cluster can be effectively used as the ankylosing spondylitis sequence marker.
Clustering may utilize known methods, and the invention is not limited in this regard. According to one embodiment of the invention, sequences are clustered using the canopy algorithm based on their abundance. Unlike traditional clustering algorithms (such as K-means), Canopy clustering has the greatest characteristic that the K value, namely the number of clustering (cluster), does not need to be specified in advance, so that the Canopy clustering algorithm has great practical application value. Moreover, although the accuracy of Canopy clustering is low compared to other clustering algorithms, Canopy clustering has a great advantage in speed, and therefore Canopy clustering can be used to perform "coarse" clustering on data first, and then cancanopy or K-means clustering can be used to perform further "fine" clustering.
According to a further embodiment of the present invention there is provided a method of determining the status of an individual using an isolated nucleic acid of any one of the above embodiments, the method comprising the steps of:
determining the abundance of a cluster of nucleic acid sequences comprised by the isolated nucleic acid.
Determining the abundance of the nucleic acid sequence cluster in the nucleic acid in a fecal sample of the individual and in a control group.
According to an embodiment of the invention, the abundance of the nucleic acid sequence clusters in the individual's stool sample and/or in the control group is determined by: obtaining sequencing data for nucleic acids in a fecal sample of the individual and nucleic acids in a control group, the sequencing data from either source comprising a plurality of reads; determining the abundance of each nucleic acid sequence in the nucleic acid sequence cluster based on the support of the reads in the sequencing data from any source for that nucleic acid sequence cluster in a fecal sample from that source; and determining the abundance of the nucleic acid sequence cluster in the same fecal sample according to the abundance of the nucleic acid sequence in the fecal sample of the source.
The sequencing data is obtained by sequencing nucleic acids in a sample, and the sequencing can be selected from but not limited to semiconductor sequencing technology platforms such as PGM, Ion Proton and BGISEQ-100 platforms, synthetic sequencing technology platforms such as Hiseq and Miseq sequence platforms of Illumina and single molecule real-time sequencing platforms such as PacBio sequence platform according to different selected sequencing platforms. The sequencing mode can be single-ended sequencing or double-ended sequencing, and the obtained off-machine data are sequencing and reading fragments which are called reads (reads).
According to one embodiment of the invention, the sequencing data is said to be based on any sourceThe read of (a) supports each nucleic acid sequence in the nucleic acid sequence cluster, determining the abundance of the nucleic acid sequence in the fecal sample from the source, comprising: aligning the reads onto the nucleic acid sequences, based on the obtained alignment, determining the abundance of the nucleic acid sequences using the following formula: abundance of nucleic acid sequence G Ab (G) ═ Ab (U)G)+Ab(MG) Wherein, Ab (U)G)=UG/lG,UGFor the number of reads of this nucleic acid sequence G which are uniquely aligned, lGIs the length of the nucleic acid sequence G, MGIs the number of reads of the nucleic acid sequence G on the non-unique alignment, x represents the number of reads of the nucleic acid sequence G on the non-unique alignment, CoxIs the abundance coefficient of the read x of the nucleic acid sequence G on a non-unique alignment, N2 is the total number of nucleic acid sequences on the read alignment of the nucleic acid sequence G on a non-unique alignment, y is the number of nucleic acid sequences on the read alignment of the nucleic acid sequence G on a non-unique alignment, U is the number of the nucleic acid sequences on the read alignment of the nucleic acid sequence G on a non-unique alignment, U is the abundance coefficient of the read x of the nucleicyNumber of reads of nucleic acid sequence j that are unique to the alignment.
The alignment can be performed by using known alignment software, such as SOAP, BWA, TeraMap, etc., in the alignment process, the alignment parameters are generally set, one or a pair of reads (reads) is set to allow at most k base mismatches (mismatches), for example, k is set to be less than or equal to 2, and if more than k bases in reads are mismatched, it is considered that the reads cannot align (align) the nucleic acid sequence. The obtained alignment result includes the alignment condition of each read with each nucleic acid sequence, including information on whether the read can align a certain or some nucleic acid sequences, only align a single nucleic acid sequence or multiple nucleic acid sequences, align the position of the nucleic acid sequence, align the unique position of the nucleic acid sequence or multiple positions, and the like. According to one embodiment of the invention, alignment was performed using SOAPalign 2.21 with the setting parameter-r 2-m 100-x 1000. reads are aligned to nucleic acid sequences in a nucleic acid sequence cluster, and the alignment can be divided into two parts: a) unique alignment of the reads of the last nucleic acid sequence, these reads are called Unique reads (U); b) the Multiple nucleic acid sequences are aligned and the reads are referred to as Multiple reads (M). For a given nucleic acid sequence G, the abundance is represented as Ab (G), in relation to the Unique reads and Multiple reads, Ab (U) and Ab (M) in the above formulas are the abundances contributed by the Unique reads and Multiple reads, respectively, of the assembled fragment G. Each multiple reads has a characteristic abundance coefficient Co, and assuming that a multiple read aligns N nucleic acid sequences, the Co of the multiple read can be calculated using the following formula: that is, for such multiple reads, the sum of the abundances of unique reads of the N sequences aligned thereto is used as the denominator.
The abundance of a nucleic acid sequence cluster is related to the abundance of the nucleic acid sequences it comprises, which according to one embodiment of the invention is the mean or median of the abundance of the nucleic acid sequences it comprises.
The state of the individual is determined.
Comparing the difference between the abundance of the nucleic acid sequence cluster in the individual's stool sample and the abundance in a control group consisting of stool samples from one or more groups of individuals of the same state, including suffering from ankylosing spondylitis and not suffering from ankylosing spondylitis, and determining the status of the individual based on whether the difference is statistically significant.
According to an embodiment of the invention, the control group consists of stool samples of a plurality of individuals suffering from ankylosing spondylitis, the status of an individual is determined as suffering from ankylosing spondylitis when the abundance of a nucleic acid sequence cluster contained in said nucleic acid, excluding the fourth nucleic acid sequence cluster, in the stool sample of the individual is not statistically different from the abundance thereof in the control group, and/or when the nucleic acid comprises the fourth nucleic acid sequence cluster and the abundance of the fourth nucleic acid sequence cluster is not statistically different from the abundance thereof in the control group.
According to an embodiment of the invention, the control group consists of stool samples of a plurality of healthy individuals, and the individual is determined to be in a state of suffering from ankylosing spondylitis when the abundance of nucleic acid sequence clusters contained in the nucleic acids other than the fourth nucleic acid sequence cluster in the stool sample of the individual is statistically higher than that in the control group, and/or when the nucleic acids comprise the fourth nucleic acid sequence cluster and the abundance of the fourth nucleic acid sequence cluster is statistically lower than that in the control group.
All or a part of the steps of the method for determining the status of an individual using separated nucleic acid in any of the above embodiments of the present invention may be performed using an apparatus/system including the corresponding unit function modules which can be detached, or the method may be programmed, stored in a machine-readable medium, and executed by a machine.
According to an embodiment of the present invention, there is provided an apparatus for determining the status of an individual using an isolated nucleic acid as in any one of the above embodiments, the apparatus being adapted to perform all or part of the steps of any one of the above methods for determining the status of an individual, the apparatus comprising: an abundance determining unit for determining the abundance of the nucleic acid sequence cluster in the nucleic acid in a stool sample of the individual and in a control group; an individual status determination unit for comparing the abundance of the nucleic acid sequence cluster in the individual's stool sample with the abundance in a control group consisting of one or more groups of stool samples of individuals of the same status, including with ankylosing spondylitis and without ankylosing spondylitis, and determining the status of the individual depending on whether the difference is statistically significant. The above description of the technical features and advantages of the method for determining the status of an individual using isolated nucleic acids according to any embodiment of the present invention applies equally to the apparatus according to this aspect of the present invention and will not be described in further detail herein.
According to an embodiment of the present invention, the following is performed using the abundance determining unit: obtaining sequencing data for nucleic acids in a fecal sample of the individual and nucleic acids in a control group, the sequencing data from either source comprising a plurality of reads; determining the abundance of the nucleic acid sequence in the fecal sample from any source based on the support of the reads in the sequencing data from that source for the nucleic acid sequences in the nucleic acid sequence cluster; and determining the abundance of the nucleic acid sequence cluster in the same fecal sample according to the abundance of the nucleic acid sequence in the fecal sample of the source.
According to an embodiment of the present invention, determining the abundance of a nucleic acid sequence in a fecal sample from any source based on the support of reads in the sequencing data from the source for the nucleic acid sequence in the nucleic acid sequence cluster comprises: aligning the reads onto the nucleic acid sequences, based on the obtained alignment, determining the abundance of the nucleic acid sequences using the following formula: abundance of nucleic acid sequence G Ab (G) ═ Ab (U)G)+Ab(MG) Wherein, Ab (U)G)=UG/lG,UGFor the number of reads of this nucleic acid sequence G which are uniquely aligned, lGIs the length of the nucleic acid sequence G, MGIs the number of reads of the nucleic acid sequence G on the non-unique alignment, x represents the number of reads of the nucleic acid sequence G on the non-unique alignment, CoxIs the abundance coefficient of the read x of the nucleic acid sequence G on a non-unique alignment, N2 is the total number of nucleic acid sequences on the read alignment of the nucleic acid sequence G on a non-unique alignment, y is the number of nucleic acid sequences on the read alignment of the nucleic acid sequence G on a non-unique alignment, U is the number of the nucleic acid sequences on the read alignment of the nucleic acid sequence G on a non-unique alignment, U is the abundance coefficient of the read x of the nucleicyNumber of reads of nucleic acid sequence j that are unique to the alignment.
According to an embodiment of the invention, the abundance of the nucleic acid sequence cluster is the mean or median of the abundances of the nucleic acid sequences it comprises.
According to an embodiment of the invention, the control group consists of stool samples of a plurality of individuals suffering from ankylosing spondylitis, the status of an individual is determined as suffering from ankylosing spondylitis when the abundance of a nucleic acid sequence cluster contained in said nucleic acid, excluding the fourth nucleic acid sequence cluster, in the stool sample of the individual is not statistically different from the abundance thereof in the control group, and/or when the nucleic acid comprises the fourth nucleic acid sequence cluster and the abundance of the fourth nucleic acid sequence cluster is not statistically different from the abundance thereof in the control group.
According to an embodiment of the invention, the control group consists of stool samples of a plurality of healthy individuals, and the individual is determined to be in a state of suffering from ankylosing spondylitis when the abundance of nucleic acid sequence clusters contained in the nucleic acids other than the fourth nucleic acid sequence cluster in the stool sample of the individual is statistically higher than that in the control group, and/or when the nucleic acids comprise the fourth nucleic acid sequence cluster and the abundance of the fourth nucleic acid sequence cluster is statistically lower than that in the control group.
The statistical Confidence interval (Confidence interval) of a probability sample is the interval estimate for a certain overall parameter of this sample, the Confidence interval exhibits the extent to which the true value of this parameter has a certain probability around the measurement, the Confidence interval gives the degree of Confidence of the measured value of the measured parameter, i.e. the previously required "certain probability", this probability is called the Confidence level, according to an embodiment of the invention, the predetermined Confidence interval is the 95% Confidence interval, i.e. it is stated that the determined state of the individual is determined with this embodiment, 95% is reliable, it is stated that, depending on the purpose or requirement, there may be different requirements on the degree of Confidence of the result of the state of the individual, the person skilled in the art may select different levels of significance (α), i.e. the probability of the determined state of the individual is different, α, the probability of the error of the determination is different.
According to one embodiment of the present invention there is provided a system for determining the status of an individual using an isolated nucleic acid as described in any one of the embodiments of the present invention above, the system being arranged to perform all or part of the steps of a method for determining the status of an individual using an isolated nucleic acid as described in any one of the embodiments of the present invention above, the system comprising: the data input module is used for inputting data; the data output module is used for outputting data; a processor for executing an executable program, the executing of the executable program comprising performing the method of determining the status of an individual of any of the embodiments of the present invention described above; and the storage unit is connected with the data input module, the data output module and the processor and is used for storing data, wherein the storage unit comprises the executable program. The above description of the technical features and advantages of the method for determining the status of an individual using isolated nucleic acids according to any embodiment of the present invention applies equally to the system according to this aspect of the present invention and will not be described in further detail herein.
According to one embodiment of the present invention, there is provided a medicament for treating ankylosing spondylitis, the medicament causing a decrease in the abundance of a first, second and/or third one of the nucleic acids of any one of the preceding embodiments in the intestine of a patient and/or causing an increase in the abundance of a fourth one of the nucleic acids comprising the fourth one of the nucleic acid sequence clusters. The isolated nucleic acid is determined by comparing the abundance difference of the intestinal microbial sequences of ankylosing spondylitis patients and healthy groups through processing and analyzing sequencing data of intestinal microbial samples and through a large number of sample tests, and each nucleic acid sequence cluster contained in the isolated nucleic acid is a non-redundant sequence. The isolated nucleic acid can be used as an ankylosing spondylitis marker, wherein any one of the first to third nucleic acid sequence clusters is significantly enriched in a population of ankylosing spondylitis patients compared with a healthy individual group, wherein significantly enriched refers to that the abundance of the nucleic acid sequence cluster contained in the ankylosing spondylitis marker in a disease group is statistically significantly higher or significantly and substantially higher than that in a healthy group compared with that in a healthy control group; the isolated nucleic acid may also include a fourth cluster of nucleic acid sequences that is significantly enriched in the healthy population as compared to the group of ankylosing spondylitis patients, wherein significantly enriched is a cluster of sequences that is statistically significantly higher or significantly, substantially higher than the abundance in the disease group as compared to the abundance in the disease group. The group of separated nucleic acids can be used for determining the probability of an individual in a state with ankylosing spondylitis or in a healthy state, and can be used for non-invasive early detection or auxiliary detection of ankylosing spondylitis; the substance capable of reducing the abundance of any one of the first to third nucleic acid sequence clusters in the isolated nucleic acids and/or the substance capable of increasing the abundance of the fourth nucleic acid sequence cluster in the isolated nucleic acids comprising the fourth nucleic acid sequence cluster can be used for treating ankylosing spondylitis or benefiting patients with ankylosing spondylitis, the substance capable of reducing/increasing the abundance of the nucleic acid sequence cluster in the isolated nucleic acids of the group is not limited to a medicament for treating ankylosing spondylitis and a functional food for benefiting gut flora balance, and the nucleic acids of the isolated pair provided by the embodiment can be used for preparing a medicament for treating ankylosing spondylitis and/or for preparing a functional food, a health-care medicament and the like for benefiting gut flora balance.
By utilizing the medicine or functional food of the embodiment, the determined ankylosing spondylitis sequence marker is reasonably and effectively applied to support the growth of beneficial bacteria or sequences in the intestinal tract and/or inhibit potential pathogenic bacteria or sequences in the intestinal tract, so that the defect of the intestinal barrier can be prevented, the intestinal microecological structure can be improved and restored, and the medicine or functional food has important significance for assisting in reducing the blood endotoxin level and/or relieving the clinical symptoms of ankylosing spondylitis.
According to an embodiment of the present invention, there is also provided a method for producing or screening the above-mentioned drug, the method comprising the step of screening a substance causing a decrease in the abundance of the first nucleic acid sequence cluster, the second nucleic acid sequence cluster and/or the third nucleic acid sequence cluster in the nucleic acid in any of the above-mentioned embodiments, and/or a substance causing an increase in the abundance of the fourth nucleic acid sequence cluster in the nucleic acid including the fourth nucleic acid sequence cluster, as the drug.
By using the method for producing or screening the medicament for treating the ankylosing spondylitis in the embodiment of the invention, the medicament capable of supporting the growth of beneficial bacteria in the intestinal tract and/or inhibiting potential pathogenic bacteria in the intestinal tract can be obtained by reasonably and effectively applying the determined ankylosing spondylitis biomarker for screening, the defect of the intestinal tract barrier can be prevented, the microecological structure of the intestinal tract can be improved and recovered, and the method has important significance for assisting in reducing the blood endotoxin level and/or relieving the clinical symptoms of the ankylosing spondylitis.
According to a final embodiment of the present invention there is provided a method of classifying a plurality of individuals using an isolated nucleic acid of any of the above embodiments of the present invention, the method comprising: determining the state of each individual by using the method for determining the state of the individual in any embodiment of the invention; and classifying the individuals according to the obtained states of the individuals. The method can distinguish a plurality of individuals or a plurality of unknown stool samples according to different states of the individuals, and is convenient for classification and marking management. Furthermore, the above description of the technical features and advantages of the method of determining the status of an individual using the isolated nucleic acid of any of the embodiments of the present invention applies equally to the method of this aspect of the present invention, and will not be repeated here.
The method and/or apparatus of the present invention is described in detail below with reference to specific embodiments. Reagents, sequences (linkers, tags, and primers), software, and instruments referred to in the following examples are conventional commercial products or open sources, such as the transcriptome library construction kit from Illumina, unless otherwise submitted.
The following embodiments include a first phase and a second phase, namely a corresponding discovery phase and a verification phase. The discovery phase includes: determining gut microbial composition and functional alterations based on analytically comparing data sets of 83 AS patients and 73 healthy control groups to determine species markers; the verification phase comprises: the data set of 24 AS patients and 31 healthy control groups was used to verify the accuracy of the first stage results.
Example 1
In this example, the inventors performed a correlation analysis study of the microorganisms of the entire intestinal flora from stool samples of 73 ankylosing spondylitis patients and 83 healthy controls to characterize the fecal microflora and functional components. In summary, the inventors downloaded about 428.09Gb high quality sequencing data (LC healthy persons) and 293Gb high quality sequencing data from experimental sequencing to construct the ankylosing spondylitis reference gene set, and constructed a more complete gene set (assembly fragment set) with the downloaded LC gene set and IGC gene set. Quantitative metagenomic analysis showed that in a large number of patients and healthy control groups, 1,708,139 genes could be clustered into 199 gene clusters (Co-abordangence gene groups, CAGs) representing bacterial species, with 8 of them showing differences between the AS patient group and the healthy control group, 3 CAGs being mainly enriched in the AS patient group and 5 CAGs being mainly enriched in the healthy group, AS shown in table 1. Of these 8 CAGs, 6 species can be annotated, and 2 species cannot be annotated, as new species that are not identified.
1. Acquisition of sequencing data
1.1 sample Collection and DNA extraction
Ankylosing spondylitis patients come from subsidiary hospitals of Chinese medicine university in Hangzhou Zhejiang, and 73 stool samples of ankylosing spondylitis patients in China are collected in experiments, wherein fresh stool samples of each individual are divided into 200 mg/part and 5 parts in total, and are immediately frozen and stored in a refrigerator at the temperature of-80 ℃.
Total DNA was extracted from fecal samples of 73 patients with ankylosing spondylitis of China. DNA is extracted by a method of extracting DNA by phenol trichloromethane treatment.
1.2 library construction and sequencing, and reference data download
DNA banking was performed according to the instructions of the Instrument manufacturer (Illumina). The library was sequenced by PE100 bp. The Illumina HiSeq2000 (Illumina, San Diego, CA) platform sequenced a library of 73 samples. On average, each sample yielded 4.03Gb (sd.. + -. 0.64Gb) high quality sequencing results, totaling 293Gb sequencing data volume.
Sequencing data for 83 healthy chinese were downloaded from EBI, access No.: ERP 005860.
Referring to the experimental procedure of fig. 1, relevant biomarkers for ankylosing spondylitis were identified, wherein omitted steps or details are well known to those skilled in the art, and several important steps are described as follows.
2. Identification of biomarkers
2.1 basic processing of sequencing data
1) Sequencing data is subjected to quality control: after obtaining the sequencing data of 156 samples in the first stage, the samples were filtered, and the quality control was performed according to the following criteria: a) (ii) removing reads greater than 50% of the low quality base (Q20); b) removing reads greater than 5N bases; c) the tail low mass (Q20) and N bases were removed. Sequences that miss pairs of reads are considered to be a single read for assembly.
2) The downloaded data of the healthy person is processed as well.
3) LC Gene set obtained from
Qin,N.et al.Alterations of the human gut microbiome in liver cirrhosis.Nature 513,59-64(2014).,
The IGC gene set was downloaded from ftp:// climb. genomics. cn/pub/10.5524/100001-101000/100064/1. GeneCatalogs/IGC. fa. gz.
2.2 obtaining the genome of ankylosing spondylitis microorganism
The metagenomic biomarker main body is a gene and a corresponding function, so the sequencing sequence needs to be assembled and subjected to gene prediction, redundancy is removed, and a non-redundant reference gene set is constructed. All sample reads were assembled into contigs (assembly fragments or contigs) using soapdenov software. 737 million contigs (minimum fragment length 500bp) were finally produced. The total length of the contigs is 13.38Gb, the length of N50 is 1,075-40,644 bp, and the average length is 7,022 bp.
To predict microbial genes for each of the 156 samples, the inventors used methods in the human gut metagenomic project (MetaHIT). The MetaGeneMark program predicts 14,888,074 Open Reading Frames (ORFs) of greater than 100bp in length. The predicted ORFs total length is 11,136,246,978bp, accounting for 83.20% of the total length of contigs. Establishing a non-redundant 'AS gene set' by removing redundant ORFs, defining short ORFs with sequence identity (identity) over 95% and sequence coverage (coverage) over 90% after pairing AS identical sequences, and removing redundant ORFs to remove redundancy, namely randomly reserving one of the identical sequences. The final non-redundant ankylosing spondylitis gene set contained 1039 ten thousand ORFs, with an average length of 747 bp.
The downloaded gene set is an IGC and LC gene set, the IGC gene set comprises 9,879,896 genes, the total length is 7,436,156,055bp, and the average length is 753 bp; the LC gene set contains 2,688,468 genes, and has a total length of 2,017,496,337bp and an average length of 750 bp.
2.3 Gene abundance analysis
The pair of paired-end reads processed by step 2.1 were aligned (matched) to the non-redundant reference gene set in step 2.2 using SOAPalign 2.21 with the parameters-r 2-m 100-x 1000. Reads, aligned to a non-redundant reference gene set, may be divided into two parts: a) unique reads (U): comparing reads with only one gene in a non-redundant gene set; these reads are defined as unique reads. b) Multiple reads (M): multiplex reads are defined if reads align to more than one gene in the non-redundant set of genes.
For a given gene G, the abundance is Ab (G), and is related to U reads and M reads, and is calculated as follows:
ab (U) and Ab (M) are the abundance of unique reads and multiple reads of the gene G, respectively, and l represents the length of the gene G. Each multiple reads has a specific gene abundance coefficient Co; assuming that a multiple reads aligns the N genes, Co of the multiple reads is calculated as follows:
that is, for multiple reads, the inventors set the sum of the abundance of unique reads of the N genes against which they were aligned as the denominator.
2.4 Association analysis/screening sequence markers
In order to conduct a study of intestinal metagenomics between normal (83) and AS patients (73), a study was conducted to identify unknown species in the gene set after redundancy elimination. To explore and understand the known and unknown species associated with AS, genes of different abundance were first identified on the gene set of 156 samples, and then clustered according to the abundance table. The abundance of genes of the same species in the same individual is similar, while the genes of the same species in different individuals have differences, so the genes of the same species can be clustered through the consistency of the abundance. The resulting gene cluster represents a metagenomic species (CAG). And identifying known or unknown species associated with AS therein based on the found CAG.
Briefly, genes detected in at least 10 samples were first screened from the gene abundance table as input for clustering. For the first clustering with canopy algorithm, the threshold T1 is pearson correlation coefficient >0.95 and spearman correlation coefficient >0.6, and the threshold T2 is pearson correlation coefficient > 0.9.
After the first clustering is completed, to correct fragmentation of the first clustering, canopy with only one gene is removed, and the second clustering is performed. The input used for the second clustering is the average abundance of canty obtained after the first clustering, the algorithm used is a canty algorithm, and the condition that more than 70% of elements in the class satisfy the threshold Pearson correlation coefficient >0.97 is that new elements (gene sequences) are added into the original class. After the result of the secondary clustering is obtained, it is necessary to remove the overlapping portion in the clustering, and a class having the largest pearson correlation coefficient is added to the genes existing in a plurality of classes.
The example actual operation runs and results: 1,708,139 genes screened from the gene abundance table are added into the cluster, 827,811 cantys are obtained after the first clustering, wherein the number of the genes in 167,055 cantys is more than one, and the genes participate in the second clustering. After the clustering is finished, 111,306 CAGs are obtained, wherein 954 are obtained for CAGs with the gene number being greater than 100, and 199 are obtained for CAGs with the gene number being greater than 700. Based on the average abundance of CAGs with 199 genes >700, CAGs were screened where the median abundance (median) was greater than 1e-7 in healthy people or patients and corrected by the Benjamini Hochberg multiplex test with a threshold of fdr <0.0005 to find 5 CAGs enriched in healthy people and 3 CAGs enriched in AS patients, containing 4,959 and 4,538 genes, respectively.
To demonstrate that the genes in the gene cluster belong to the genome and are consistent with CAG taxonomic annotation, blat analysis was performed on 6006 genomes, with reference to the valid reference genome in NCBI by the third edition of month 8 2012 and the DACC intestinal genome of HMP, MetaHIT. When there is only one species, there is more than 90% of the genes in CAG aligned to its genome, and the aligned portion accounts for 90% of the shorter ORF, with a degree of similarity of 95%, the CAG is annotated with the known genome. 6 of the CAGs with 8 gene numbers >700 were classified at the species level, and the other 2 were not annotated to the species, indicating that these are unpublished new species, as shown in Table 1. The marker gene is uniformly annotated to verify the clustering quality and is suitable for the whole CAG gene.
Table 1 species annotation of differential CAG
1,708,139 gene segments were clustered into 199 CAGs. The abundance of 8 CAGs varies significantly between healthy and AS patients. 5 CAGs contained 4,959 genes enriched in healthy individuals and 3 CAGs contained 4,538 genes enriched in AS patients. Of these, 2 were not annotated as species, which were not identified as new species, as shown in table 1. Bacteroides are present in the human intestinal tract and primarily help break down food to provide the nutrients and energy needed by the human body. Eubacterium is mainly present in the cecum and colon and can produce butyric acid, which is the main energy source of intestinal mucosal cells and plays a role in intestinal mucosal repair (Pryde et al, 2002; Augenlicht et al, 2002; Segain et al, 2000). Ruminococcus mainly plays a role in degrading fibers.
Example 2
Taking the significance level α to be 0.05, taking the partial nucleic acid sequence of the marker significantly enriched in the ankylosing spondylitis patients or healthy population determined in example 1 to represent the CAG in which the marker is located, the difference of the abundance of the partial marker in the healthy group and the disease group in the verification group is also significant (P <0.05), and the verification results are shown in table 2 and fig. 2.
In the validation set, it was found that all 199 CAGs maintain the abundance consistency, so that these gene clusters can be considered as all genes or partial genes representing a certain strain. For 8 CAGs that showed association with AS in the experimental set, significant differences were still found in 4 of them (p < 0.05).
TABLE 2
Example 3
45 stool samples were used for the detection of the individual status of the sample source.
Determining the abundance of the first 3 CAG (strains) in the table 2 in each stool sample by referring to the method in example 2, judging whether the abundance of the 3 strains in each sample is significantly greater than that in the control group (healthy group), judging that the individuals of which the abundances of the 3 species are significantly greater than that in the control group are ankylosing spondylitis patients, and otherwise, judging that the individuals are non-ankylosing spondylitis patients.
The individual state judgment can be carried out on 38 samples, and the detection result shows that the judgment on the individual state corresponding to 35 samples in the 38 samples is consistent with the recorded individual state of the sample source.
In addition, the inventors further performed state-confirmation tests on stool samples of a large number of ankylosing spondylitis patients in combination with the use of the last 1 CAG in table 2 as a marker, wherein it was determined that the state was more than 95% identical to the recorded state using the method of this example.
In addition, preferably, the inventors found that the combined detection of the enriched species in the disease group and the healthy group in table 2, for example, the first three markers in table 2 are enriched, and the last marker in table 2 is not enriched, can more accurately judge and find the ankylosing spondylitis patients or susceptible people.
In the scheme of treating ankylosing spondylitis by using the markers, the inventors found that the growth of the first three markers in table 2 is inhibited or eliminated, and the last marker in table 2 is enriched, so that the treatment effect is excellent.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (39)

  1. A set of isolated nucleic acids comprising at least one of the following nucleic acid sequence clusters:
    a first cluster of nucleic acid sequences, the nucleic acid sequences in the first cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 1-110, and the nucleic acid sequences in each of the first nucleic acid sequence clusters correspond to their corresponding SEQ ID NOs: 1-110, the sequence similarity of the sequences is not less than 90%;
    a second cluster of nucleic acid sequences, the nucleic acid sequences in the second cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 111-299, and the nucleic acid sequence in each of the second nucleic acid sequence clusters is corresponding to the sequence shown in SEQ ID NO: the sequence similarity of the sequences in 111-299 is not less than 90 percent;
    a third cluster of nucleic acid sequences, the nucleic acid sequences in the third cluster of nucleic acid sequences being identical to the nucleic acid sequences of SEQ ID NO: 300-453, and the nucleic acid sequence in each of said third nucleic acid sequence clusters corresponds to the corresponding sequence shown in SEQ ID NO: the sequence similarity of the sequences in 300-453 is not less than 90.
  2. The nucleic acid of claim 1, characterized in that said nucleic acid comprises said first nucleic acid sequence cluster.
  3. The nucleic acid of claim 2, wherein the nucleic acid sequences in each of said first nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: 1-110 is not less than 95% sequence similarity.
  4. The nucleic acid of claim 2, characterized in that said nucleic acid further comprises at least one of said second nucleic acid sequence cluster and said third nucleic acid sequence cluster.
  5. The nucleic acid of claim 4, wherein the nucleic acid sequences in each of said second nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 111-299 was not less than 95%.
  6. The nucleic acid of claim 4, wherein the nucleic acid sequences in each of said third nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 300-453 was not less than 95%.
  7. The nucleic acid of claim 1, characterized in that said nucleic acid comprises said second nucleic acid sequence cluster.
  8. The nucleic acid of claim 7, wherein the nucleic acid sequences in each of said second nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 111-299 was not less than 95%.
  9. The nucleic acid of claim 7, characterized in that said nucleic acid further comprises at least one of said first nucleic acid sequence cluster and said third nucleic acid sequence cluster.
  10. The nucleic acid of claim 9, wherein the nucleic acid sequences in each of said first nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: 1-110 is not less than 95% sequence similarity.
  11. The nucleic acid of claim 9, wherein the nucleic acid sequences in each of said third nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 300-453 was not less than 95%.
  12. The nucleic acid of claim 1, characterized in that said nucleic acid comprises said third nucleic acid sequence cluster.
  13. The nucleic acid of claim 12, wherein the nucleic acid sequences in each of said third nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: the sequence similarity of the sequences in 300-453 was not less than 95%.
  14. The nucleic acid of claim 12, characterized in that said nucleic acid further comprises at least one of said first nucleic acid sequence cluster and said second nucleic acid sequence cluster.
  15. The nucleic acid of claim 14, wherein the nucleic acid sequences in each of said first nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NO: 1-110 is not less than 95% sequence similarity.
  16. The nucleic acid of claim 14, wherein the nucleic acid sequences in each of said second nucleic acid sequence clusters correspond to the nucleic acid sequences of SEQ ID NOs: the sequence similarity of the sequences in 111-299 was not less than 95%.
  17. The nucleic acid of any one of claims 1 to 16, further comprising a fourth nucleic acid sequence cluster, wherein the nucleic acid sequences in the fourth nucleic acid sequence cluster are identical to the nucleic acid sequences of SEQ ID NOs: 454-524, and the nucleic acid sequence in each of the fourth nucleic acid sequence clusters corresponds to the sequence shown in SEQ ID NO: the sequence similarity of the sequences in 454-524 is not less than 90%.
  18. The nucleic acid of claim 17, wherein the nucleic acid sequence in each of said fourth nucleic acid sequence clusters has a sequence identical to its corresponding sequence of SEQ ID NO: the sequence similarity of the sequences in 454-524 is not less than 95%.
  19. Use of the nucleic acid according to any of claims 1 to 18 for the detection of ankylosing spondylitis, and/or for the treatment of ankylosing spondylitis, and/or for the preparation of a medicament for the treatment of ankylosing spondylitis, and/or for the preparation of a functional food.
  20. A method of obtaining the nucleic acid of any one of claims 1-18, comprising:
    (1) obtaining a first sequencing result and a second sequencing result,
    the first sequencing result is a sequencing result of nucleic acids of stool samples of a plurality of ankylosing spondylitis patients, comprising a plurality of first reads,
    the second sequencing result is a sequencing result of nucleic acids of a stool sample of a plurality of healthy individuals, comprising a plurality of second reads;
    (2) respectively assembling a first reading segment and a second reading segment, and correspondingly obtaining a plurality of first assembling sequences and a plurality of second assembling sequences;
    (3) determining the abundance of the first assembly sequence and the abundance of the second assembly sequence based on the support of the first assembly sequence by the first read and the support of the second assembly sequence by the second read, respectively;
    (4) clustering the first assembly sequences and the second assembly sequences according to the abundance of the first assembly sequences and the abundance of the second assembly sequences determined in (3) to obtain a plurality of gene clusters, wherein each gene cluster comprises a plurality of first assembly sequences and/or second assembly sequences;
    (5) a statistical test to determine gene clusters significantly enriched in fecal samples of the plurality of ankylosing spondylitis patients and/or the plurality of healthy individuals to obtain the nucleic acid.
  21. The method of claim 20, wherein (3) comprises:
    respectively aligning the first reading section and the second reading section to the first assembly sequence and the second assembly sequence to correspondingly obtain a first alignment result and a second alignment result;
    determining the abundance of the first and second assembly sequences based on the obtained first and second alignments respectively using the following formula:
    abundance of assembly sequence S Ab (S) ═ Ab (U)S)+Ab(MS) Wherein, in the step (A),
    Ab(US)=US/lS
    USfor the number of reads of the assembled sequence S that are uniquely aligned,
    lSfor the length of the assembled sequence S,
    MSthe number of reads of the assembled sequence S over a non-unique alignment,
    i denotes the number of reads of the assembled sequence S on the non-unique alignment,
    Coifor the abundance coefficients corresponding to reads i of the assembled sequence S on the non-unique alignment,
    n1 is the total number of assembled sequences in a read alignment of the assembled sequence S in a non-unique alignment,
    j is the number of the assembled sequence in the read alignment of the assembled sequence S in the non-unique alignment,
    Ujthe number of reads that assemble sequence j for the unique alignment.
  22. The method of claim 20, characterized in that prior to performing (4), the first assembly sequence and the second assembly sequence are removed from stool samples that are present in less than one-fifteenth of the total number of stool samples.
  23. The method of claim 20, wherein (4) comprises:
    performing first clustering on the first assembly sequence and the second assembly sequence to obtain a first clustering result;
    and performing second clustering on the first clustering result obtained after the clusters only comprising one assembly sequence are removed, so as to obtain the plurality of gene clusters.
  24. A method of determining the status of an individual using a nucleic acid according to any one of claims 1 to 18, comprising:
    determining the abundance of the nucleic acid sequence cluster in the nucleic acid in a fecal sample of the individual and in a control group;
    comparing the abundance of the nucleic acid sequence cluster in the individual's stool sample to the abundance in a control group, determining the status of the individual based on whether the difference is statistically significant,
    the control group consists of stool samples from one or more groups of individuals of the same condition,
    the states include suffering from ankylosing spondylitis and not suffering from ankylosing spondylitis.
  25. The method of claim 24, characterized in that the abundance of nucleic acid sequence clusters in a fecal sample of said individual and/or in a control group is determined by:
    obtaining sequencing data for nucleic acids in a fecal sample of the individual and nucleic acids in a control group, the sequencing data from either source comprising a plurality of reads;
    determining the abundance of each nucleic acid sequence in the nucleic acid sequence cluster based on the support of the reads in the sequencing data from any source for that nucleic acid sequence cluster in a fecal sample from that source;
    and determining the abundance of the nucleic acid sequence cluster in the same fecal sample according to the abundance of the nucleic acid sequence in the fecal sample of the source.
  26. The method of claim 25, wherein determining the abundance of said nucleic acid sequences based on the support of reads in the sequencing data from any one of the sources for each of the nucleic acid sequences in said cluster of nucleic acid sequences comprises:
    aligning the reads to the nucleic acid sequence,
    based on the obtained alignment results, determining the abundance of the nucleic acid sequence using the following formula:
    abundance of nucleic acid sequence G Ab (G) ═ Ab (U)G)+Ab(MG) Wherein, in the step (A),
    Ab(UG)=UG/lG
    UGfor the number of reads of the nucleic acid sequence G that are uniquely aligned,
    lGis the length of the nucleic acid sequence G,
    MGthe number of reads of the nucleic acid sequence G that are not uniquely aligned,
    x represents the number of reads of the nucleic acid sequence G that are not uniquely aligned,
    Coxthe abundance coefficient corresponding to the read x of the nucleic acid sequence G on the non-unique alignment,
    n2 is the total number of nucleic acid fragments in a read alignment of the nucleic acid sequence G in a non-unique alignment,
    y is the number of the nucleic acid sequence in the read alignment of the nucleic acid sequence G in the non-unique alignment,
    Uynumber of reads of nucleic acid sequence j that are unique to the alignment.
  27. The method of claim 25, wherein the abundance of said nucleic acid sequence cluster is the mean or median of the abundances of the nucleic acid sequences it comprises.
  28. The method of claim 24, wherein said control group consists of fecal samples from a plurality of individuals with ankylosing spondylitis,
    when the abundance of all nucleic acid sequence clusters in the nucleic acid, except the fourth nucleic acid sequence cluster, in the fecal sample of the individual is not statistically different from the abundance of the nucleic acid sequence cluster in the control group, and/or
    Determining the status of the individual as having ankylosing spondylitis when the abundance of the fourth nucleic acid sequence cluster in the nucleic acid comprising the fourth nucleic acid sequence cluster does not differ statistically from its abundance in the control group.
  29. The method of claim 24, wherein said control group consists of stool samples from a plurality of healthy individuals,
    when the abundance of all nucleic acid sequence clusters in the nucleic acid other than the fourth nucleic acid sequence cluster in the individual's stool sample is statistically higher than that in the control group, and/or
    Determining the status of the individual as having ankylosing spondylitis when the abundance of the fourth nucleic acid sequence cluster in the nucleic acid comprising the fourth nucleic acid sequence is statistically lower than its abundance in the control group.
  30. A device for determining the status of an individual using a nucleic acid according to any one of claims 1 to 18, comprising:
    an abundance determining unit for determining the abundance of the nucleic acid sequence cluster in the nucleic acid in a stool sample of the individual and in a control group;
    an individual status determination unit for comparing the abundance of the nucleic acid sequence cluster in a fecal sample of the individual to the abundance in a control group and determining the status of the individual depending on whether the difference is statistically significant,
    the control group consists of stool samples from one or more groups of individuals of the same condition,
    the states include suffering from ankylosing spondylitis and not suffering from ankylosing spondylitis.
  31. The apparatus of claim 30, characterized in that the following is performed in the abundance determining unit:
    obtaining sequencing data for nucleic acids in a fecal sample of the individual and nucleic acids in a control group, the sequencing data from either source comprising a plurality of reads;
    determining the abundance of the nucleic acid sequence in the fecal sample from any source based on the support of the reads in the sequencing data from that source for the nucleic acid sequences in the nucleic acid sequence cluster;
    and determining the abundance of the nucleic acid sequence cluster in the same fecal sample according to the abundance of the nucleic acid sequence in the fecal sample of the source.
  32. The device of claim 31, wherein determining the abundance of said nucleic acid sequences in the fecal sample from any source based on the support of reads in the sequencing data from that source for the nucleic acid sequences in said nucleic acid sequence clusters comprises:
    aligning the reads to the nucleic acid sequence,
    based on the obtained alignment results, determining the abundance of the nucleic acid sequence using the following formula:
    abundance of nucleic acid sequence G Ab (G) ═ Ab (U)G)+Ab(MG) Wherein, in the step (A),
    Ab(UG)=UG/lG
    UGfor the number of reads of the nucleic acid sequence G that are uniquely aligned,
    lGis the length of the nucleic acid sequence G,
    MGthe number of reads of the nucleic acid sequence G that are not uniquely aligned,
    x represents the number of reads of the nucleic acid sequence G that are not uniquely aligned,
    Coxthe abundance coefficient corresponding to the read x of the nucleic acid sequence G on the non-unique alignment,
    n2 is the total number of nucleic acid fragments in a read alignment of the nucleic acid sequence G in a non-unique alignment,
    y is the number of the nucleic acid sequence in the read alignment of the nucleic acid sequence G in the non-unique alignment,
    Uynumber of reads of nucleic acid sequence j that are unique to the alignment.
  33. The device of claim 31, wherein the abundance of said nucleic acid sequence cluster is the mean or median of the abundance of the nucleic acid sequences it comprises.
  34. The device of claim 30, wherein said control group consists of fecal samples from a plurality of individuals with ankylosing spondylitis,
    when the abundance of all nucleic acid sequence clusters in the nucleic acid, except the fourth nucleic acid sequence cluster, in the fecal sample of the individual is not statistically different from the abundance of the nucleic acid sequence cluster in the control group, and/or
    Determining the status of the individual as having ankylosing spondylitis when the abundance of the fourth nucleic acid sequence cluster in the nucleic acid comprising the fourth nucleic acid sequence cluster does not differ statistically from its abundance in the control group.
  35. The device of claim 30, wherein said control group consists of fecal samples from a plurality of healthy individuals,
    when the abundance of all nucleic acid sequence clusters in the nucleic acid other than the fourth nucleic acid sequence cluster in the individual's stool sample is statistically higher than that in the control group, and/or
    Determining the status of the individual as having ankylosing spondylitis when the abundance of the fourth nucleic acid sequence cluster in the nucleic acid comprising the fourth nucleic acid sequence is statistically lower than its abundance in the control group.
  36. A system for determining the status of an individual using a nucleic acid according to any one of claims 1 to 18, comprising:
    the data input module is used for inputting data;
    the data output module is used for outputting data;
    a processor for executing an executable program, executing the executable program comprising performing the method of any of claims 24-29;
    and the storage module is connected with the data input module, the data output module and the processor and is used for storing data, wherein the storage module comprises the executable program.
  37. A medicament for treating ankylosing spondylitis, the medicament causing a decrease in the abundance of a first, second and/or third nucleic acid sequence cluster in a nucleic acid according to any one of claims 1 to 18 and/or causing an increase in the abundance of a fourth nucleic acid sequence cluster in a nucleic acid comprising the fourth nucleic acid sequence cluster in the intestine of a patient.
  38. A method for producing or screening a drug according to claim 37, comprising the step of screening a substance that causes a decrease in the abundance of the first, second and/or third nucleic acid sequence cluster in any one of the nucleic acids of claims 1-18, and/or a substance that causes an increase in the abundance of the fourth nucleic acid sequence cluster in a nucleic acid comprising the fourth nucleic acid sequence cluster as the drug.
  39. A method of using a nucleic acid according to any one of claims 1 to 18 to classify a plurality of individuals, comprising:
    determining the status of each individual using the method of any one of claims 24-29, respectively;
    and classifying the individuals according to the obtained states of the individuals.
CN201680083628.XA 2016-03-18 2016-03-18 Isolated nucleic acid and application Pending CN109072278A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/076708 WO2017156764A1 (en) 2016-03-18 2016-03-18 Isolated nucleic acid application thereof

Publications (1)

Publication Number Publication Date
CN109072278A true CN109072278A (en) 2018-12-21

Family

ID=59850795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680083628.XA Pending CN109072278A (en) 2016-03-18 2016-03-18 Isolated nucleic acid and application

Country Status (2)

Country Link
CN (1) CN109072278A (en)
WO (1) WO2017156764A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115992115A (en) * 2021-10-26 2023-04-21 山东舜丰生物科技有限公司 Novel CRISPR enzymes and systems and uses

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114517235A (en) * 2021-12-31 2022-05-20 杭州拓宏生物科技有限公司 Myalgic encephalomyelitis marker gene and application thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2420834A1 (en) * 2010-08-16 2012-02-22 Medizinische Hochschule Hannover Methods and means for diagnosing spondyloarthritis using autoantibody markers
WO2015167087A1 (en) * 2014-04-29 2015-11-05 가톨릭대학교 산학협력단 Method for predicting risk of ankylosing spondylitis using dna copy number variants
CN105132518A (en) * 2015-09-30 2015-12-09 上海锐翌生物科技有限公司 Colon cancer marker and application thereof
CN105296590A (en) * 2015-09-30 2016-02-03 上海锐翌生物科技有限公司 Colorectal cancer marker and application thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2420834A1 (en) * 2010-08-16 2012-02-22 Medizinische Hochschule Hannover Methods and means for diagnosing spondyloarthritis using autoantibody markers
WO2015167087A1 (en) * 2014-04-29 2015-11-05 가톨릭대학교 산학협력단 Method for predicting risk of ankylosing spondylitis using dna copy number variants
CN105132518A (en) * 2015-09-30 2015-12-09 上海锐翌生物科技有限公司 Colon cancer marker and application thereof
CN105296590A (en) * 2015-09-30 2016-02-03 上海锐翌生物科技有限公司 Colorectal cancer marker and application thereof

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CHEN ZHOU等: ""Metagenomic profling of the pro-inflammatory gut microbiota in ankylosing spondylitis"", 《JOURNAL OF AUTOIMMUNITY》 *
CHENGPING WEN等: ""Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis"", 《GENOME BIOLOGY》 *
LI ZHANG等: ""Fecal microbiota in patients with ankylosing spondylitis: Correlation with dietary factors and disease activity"", 《CLINICA CHIMICA ACTA》 *
MARY-ELLEN COSTELLO等: ""Intestinal Dysbiosis in Ankylosing Spondylitis"", 《ARTHRITIS & RHEUMATOLOGY》 *
S. STEBBINGS等: ""Comparison of the faecal microflora of patients with ankylosing spondylitis and controls using molecular methods of analysis"", 《RHEUMATOLOGY》 *
ZENA CHEN等: ""Variations in gut microbial profiles in ankylosing spondylitis:disease phenotype-related dysbiosis"", 《ANN TRANSL MED》 *
徐永跃等: ""肠道菌群与强直性脊柱炎关系的研究进展"", 《广东医学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115992115A (en) * 2021-10-26 2023-04-21 山东舜丰生物科技有限公司 Novel CRISPR enzymes and systems and uses
CN115992115B (en) * 2021-10-26 2023-09-01 山东舜丰生物科技有限公司 Novel CRISPR enzymes and systems and uses

Also Published As

Publication number Publication date
WO2017156764A1 (en) 2017-09-21

Similar Documents

Publication Publication Date Title
CN107217089B (en) Method and device for determining individual state
US10626471B2 (en) Gene signatures of inflammatory disorders that relate to the liver
CN105296590B (en) Large intestine carcinoma marker and its application
CN111440884A (en) Intestinal flora for diagnosing sarcopenia and application thereof
CN105132518B (en) Large intestine carcinoma marker and its application
CN107217088B (en) Ankylosing spondylitis microbial markers
WO2014019180A1 (en) Method and system for determining biomarker in abnormal state
CN110904213B (en) Ulcerative colitis biomarker based on intestinal flora and application thereof
CN109658980A (en) A kind of screening and application of excrement gene marker
CN110838365A (en) Irritable bowel syndrome related flora marker and kit thereof
CN109072306A (en) Isolated nucleic acid and application
CN114182007B (en) Behcet disease marker gene and application thereof
CN113913490A (en) Non-alcoholic fatty liver marker microorganism and application thereof
CN105671177B (en) Ankylosing spondylitis marker and application thereof
CN109072278A (en) Isolated nucleic acid and application
CN107217086B (en) Disease marker and application
CN105733988B (en) Composition and application
CN111020020A (en) Biomarker combination for schizophrenia, application thereof and metaplan 2 screening method
WO2016049927A1 (en) Biomarkers for obesity related diseases
CN112063709A (en) Diagnostic kit for myasthenia gravis by taking microorganisms as diagnostic marker and application
CN111020021A (en) Intestinal flora-based small-scale schizophrenia biomarker combination, application thereof and mOTU screening method
CN116656851B (en) Biomarker and application thereof in diagnosis of chronic obstructive pulmonary disease
CN114317674B (en) Rheumatoid arthritis marker microorganism and application thereof
CN110396537B (en) Asthma biomarker and application thereof
CN114517235A (en) Myalgic encephalomyelitis marker gene and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 202-3 and 302, Building No. 138, Xinjunhuan Road, Minhang District, Shanghai, 20114

Applicant after: SHANGHAI REALBIO TECHNOLOGY Co.,Ltd.

Address before: Room 119, 1st floor, 3058 Pusan Road, Pudong New Area, Shanghai 200050

Applicant before: SHANGHAI REALBIO TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20181221

RJ01 Rejection of invention patent application after publication