CN110021363B - Device and method for constructing user-friendly chromosome gene variation map - Google Patents

Device and method for constructing user-friendly chromosome gene variation map Download PDF

Info

Publication number
CN110021363B
CN110021363B CN201711423213.5A CN201711423213A CN110021363B CN 110021363 B CN110021363 B CN 110021363B CN 201711423213 A CN201711423213 A CN 201711423213A CN 110021363 B CN110021363 B CN 110021363B
Authority
CN
China
Prior art keywords
gene
variation
submodule
information
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711423213.5A
Other languages
Chinese (zh)
Other versions
CN110021363A (en
Inventor
玄兆伶
李大为
梁峻彬
陈重建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Annoroad Gene Technology Beijing Co ltd
Beijing Annoroad Medical Laboratory Co ltd
Original Assignee
Anouta Gene Technology Beijing Co ltd
Annoroad Yiwu Medical Inspection Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anouta Gene Technology Beijing Co ltd, Annoroad Yiwu Medical Inspection Co ltd filed Critical Anouta Gene Technology Beijing Co ltd
Priority to CN201711423213.5A priority Critical patent/CN110021363B/en
Publication of CN110021363A publication Critical patent/CN110021363A/en
Application granted granted Critical
Publication of CN110021363B publication Critical patent/CN110021363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a device and a method for constructing a user-friendly chromosome gene variation map. The device for constructing the user-friendly chromosome genetic variation map comprises a data acquisition module, a data sorting module, a gene segment map drawing module, a chromosome map drawing module and a genetic variation information labeling module. The device and the method for constructing the user-friendly chromosome gene variation map can automatically, truly, intuitively and beautifully display the specific variation condition of any gene on the whole chromosome.

Description

Device and method for constructing user-friendly chromosome gene variation map
Technical Field
The invention belongs to the field of gene detection, and particularly relates to a method and a device for constructing a user-friendly chromosome gene variation map.
Background
Gene testing is a technique for detecting DNA by blood, other body fluids, or cells, and can be used for diagnosis of disease, and also for prediction of disease risk. The gene detection is generally to take oral mucosa cells or other tissue cells dropped by a detected person, amplify the gene information, detect DNA molecular information in the cells of the detected person through specific equipment, predict the risk of the body suffering from diseases, analyze various gene conditions contained in the cells, enable people to know the gene information of the people, and avoid or delay the occurrence of the diseases by improving the living environment and the living habits of the people.
With the development of new-Generation sequencing technologies, gene detection technologies based on ngs (next Generation sequencing) have been developed rapidly, which can detect changes in the primary structure of DNA caused by changes in the base composition or arrangement sequence of gene DNA sequences due to various factors in vivo and in vitro, and mainly include: single base changes (i.e., Single Nucleotide variations, SNV), insertions and deletions of large and small sequence fragments (i.e., insertions & deletions, InDel), Copy Number variations of sequence fragments (CNV), structural variations of Sequences (SV), and the like.
Genetic testing agencies typically report test results to users in the form of genetic variation maps. For the whole genome (chromosome) gene detection, the existing gene variation map construction methods pay attention to the variation condition of the whole displayed genome or the chromosome position of the displayed gene, and no gene variation map construction method can truly and intuitively display which variation occurs in a certain gene in a gene detection report. This makes it impossible for the reader of the gene test report to intuitively obtain information about the gene test results, i.e., it is not user-friendly.
Reference to the literature
1.Yang D,Khan S,et al.Association of BRCA1and BRCA2mutations with survival,chemotherapy sensitivity,and gene mutator phenotype in patients with ovarian cancer.JAMA.2011;306(14):1557-1565.
2.M Krzywinski,J Schein,et al.Circos:an information aesthetic for comparative genomics.Genome Res.2009.19:1639-1645.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention aims to provide an apparatus and method for constructing a user-friendly chromosomal gene variation map, which can automatically, truly, intuitively and aesthetically display the specific variation of any gene on the whole chromosome. Furthermore, the gene detection result can be represented as a colored image, so that the data can be more easily identified through visual detection, and the readability of the gene detection result is improved.
The inventors of the present invention have made intensive studies to solve the above-mentioned technical problems, and as a result, found that: by adopting the optimized information labeling rule, the information to be labeled can be reasonably arranged in the limited display space, so that the technical problem is solved.
Namely, the present invention comprises:
1. an apparatus for constructing a user-friendly chromosomal genetic variation profile, comprising:
the data acquisition module is used for acquiring gene variation detection data, gene information and chromosome G banding data; here, the genetic variation detection data includes, for example, snp or indel variation information obtained by processing, aligning, detecting by a variation algorithm, and annotating raw sequencing data. Genetic information includes information such as all transcripts, chromosomes, exon numbering, start and stop positions of each gene provided from the refseq database. Chromosome G banding data refer to: after the chromosome is treated by the fluorescent dye, the chromosome can be observed to display a horizontal stripe with different width and brightness along the long axis of the chromosome under a fluorescent microscope, and the horizontal stripe information is converted into a file for electronically recording the absolute position and the interval of each horizontal stripe in the chromosome.
The data sorting module is connected with the data acquisition module and is used for matching the transcript of the input gene, extracting the exon of the transcript and all the gene variation detection data within the range of 15-30 bp, preferably 20bp around the exon, and sorting and outputting the gene variation detection data according to a specified format; here, a transcript refers to one or more mature mRNAs encoding a protein formed by transcription of a gene, each gene having multiple transcripts, i.e., multiple transcript numbers, and the input data required to provide the particular transcript number of the gene to be mapped. Specifying the format means that the extracted information is sorted in the form that it is desired to display in the figure, e.g. "exon 9, heterozygous: c.6513g > C: p.v2171v", and output to a temporary file, e.g. named gene _ mutpos.txt, and finally the algorithm runs out and can delete all temporary files.
The gene segment drawing module is connected with the data sorting module and is used for converting the lengths of all the exons into proportion according to the gene segment information, and meanwhile, each exon is cumulatively added with an intron segment with the same length; here, "drawing" means that the script draws a graphic into the device, stores the drawing in a file in png format, for example, after the drawing is completed, and displays the graphic on a display screen, for example, when the file is opened. The gene segment information includes information such as gene transcript, exon id, chromosome id, start position and end position. The length is converted into proportion: since introns of a gene are several times longer than exons, direct mapping results in no exons being visible to the naked eye at all. If it is desired to display only exons and 20bp variations around the exons, all intron regions can be replaced with equal lengths. If the whole canvas is regarded as a1 × 1 canvas, each position of the exon region can be converted into a proportional value on the canvas: the actual position/(total length of all exon regions + length of intron x number of introns), i.e., is converted to a ratio.
The chromosome map drawing module is connected with the gene segment map drawing module and is used for marking the chromosome G banding with different colors, converting the lengths of all G banding zones into proportion, judging whether each zone is positioned at a p arm or a q arm, drawing a map of the chromosome where the gene is positioned and marking the position of the gene on the map of the chromosome; here, a map can be made for a gene, first drawing a chromosome, and then marking where the gene is located on the chromosome. And
the genetic variation information labeling module is connected with the data sorting module, the gene segment drawing module and the chromosome drawing module and comprises:
and the submodule A is used for judging whether the gene has a gene variation site, if so, outputting an instruction to start the following submodule B, and if not, outputting a chromosome map and a gene segment map as final results without marking any gene variation information.
The submodule B is connected with the submodule A and used for judging whether the current mapping space is enough to place all the gene variation locus information or not, and if so, outputting an instruction to start the following submodule C; if not, an error prompt pops up to inform that the gene mutation sites are too many to map. It should be noted that, in the case of reporting that mapping is not possible, some sites may be filtered out and mapped, the number of mapped sites is preferably within 50, and the sites remaining after filtering are preferably sites of interest to the user or sites relevant to disease/prognosis.
The submodule C is connected with the submodule B and the submodule H and used for judging whether the current point is positioned in the gene or not, and if so, an instruction is output to start the following submodule D; if not, popping up a warning to inform that the current point is not in the gene range, and outputting an instruction to start the following sub-module H; here, the space currently used for presenting maps relevant to the results of the detection of genetic variation is referred to as the current mapping space. Generally, a map is made for a gene, and the amount that a map can exhibit is limited, i.e., the mapping space (the apparatus of the present invention focuses on the use of users, and the mapping space generally refers to a gene testing report) is limited. According to the pixel and font size of the atlas drawn by the device, the upper limit of the drawn variation information quantity can be selected and set. On the premise of keeping clear content and beautiful layout, only about 50 variable information can be drawn in the graph under the condition of ensuring that the graph can be visually identified, and if more than 50 variable information is drawn, the graph is not necessary to be drawn. On the other hand, almost all patients have no more than 15 variation sites on one gene at the maximum, and 50 variation sites are basically guaranteed to be mapped to all samples. Drawing a variable locus information according to the position sequence, wherein the drawn variable locus information is the current point; before the current point is mapped, the latest drawn mutation site information is the last point; if the current point is drawn, the information of the mutation site to be drawn is the next point; the remaining points are all the information of the variation sites which are not drawn.
The submodule D is connected with the submodule C and the submodule H and used for judging whether the current space is enough for placing the current point and the residual points; if yes, performing a submodule E; if not, directly moving upwards by a specified distance to mark the gene variation information of the current point, and outputting an instruction to start the following sub-module H.
The submodule E is connected with the submodule D and the submodule H and is used for judging whether the distance between the current point and the previous point is extremely close or not so that the labeling information of the two points can be overlapped; if yes, downwards moving the specified distance from the previous point position to mark the gene variation information of the current point, and outputting an instruction to start a sub-module H; if not, outputting an instruction to start the following sub-module G; here, "particularly close" or "particularly far" is a comparison of the distance between two points with a predetermined value. "particularly close" if the distance between two points is less than a predetermined value (e.g., 0.01); "extra far" is indicated if the distance between two points is greater than a predetermined value (e.g., 0.1). The preset value can be set according to needs.
The submodule G is connected with the submodule E and the submodule H and used for judging whether the current point is particularly far away from the previous point and particularly near to the next point, if so, the current point moves upwards by a specified distance to label the genetic variation information of the current point, and an instruction is output to start the following submodule H; if not, marking the gene variation information of the current point at the current position directly, and outputting an instruction to start the following sub-module H.
And
and the submodule H is used for judging whether the current point is the last gene mutation site or not, if so, marking is finished, the result obtained in the judging process is output as a final result, and if not, jumping to the next gene mutation site and outputting an instruction to start the submodule C.
2. A method for constructing a user-friendly chromosomal genetic variation profile, comprising:
acquiring data, namely acquiring gene variation detection data, gene information and chromosome G banding data; here, the genetic variation detection data includes, for example, snp or indel variation information obtained by processing, aligning, detecting by a variation algorithm, and annotating raw sequencing data. Genetic information includes information such as all transcripts, chromosomes, exon numbering, start and stop positions of each gene provided from the refseq database. Chromosome G banding data refer to: after the chromosome is treated by the fluorescent dye, the chromosome can be observed to display a horizontal stripe with different width and brightness along the long axis of the chromosome under a fluorescent microscope, and the horizontal stripe information is converted into a file for electronically recording the absolute position and the interval of each horizontal stripe in the chromosome.
Data sorting, namely matching the transcript of an input gene, extracting exons of the transcript and all gene variation detection data within the range of 15-30 bp, preferably 20bp around the exons, and sorting and outputting the gene variation detection data according to a specified format; here, a transcript refers to one or more mature mRNAs encoding a protein formed by transcription of a gene, each gene having multiple transcripts, i.e., multiple transcript numbers, and the input data required to provide the particular transcript number of the gene to be mapped. Specifying the format means that the extracted information is sorted in the form that it is desired to display in the figure, e.g. "exon 9, heterozygous: c.6513g > C: p.v2171v", and output to a temporary file, e.g. named gene _ mutpos.txt, and finally the algorithm runs out and can delete all temporary files.
Drawing a gene segment graph, converting the lengths of all exons into proportion according to gene segment information, and adding an intron segment with equal length into each exon in an accumulated manner; for example, in fig. 1 and 2, each rectangle with gradually changed color represents an exon, a blank between two exons represents the existence of an intron, the exons are arranged and drawn according to the ID sequence, and the rectangles with lighter and lighter same color systems are gradually drawn to produce the gradually changed color which is visually seen, so that the adoption of the gradually changed color system can be beneficial to visual acceptance and improve the aesthetic property; here, "drawing" means that the script draws a graphic into the device, stores the drawing in a file in png format, for example, after the drawing is completed, and displays the graphic on a display screen, for example, when the file is opened. The gene segment information includes information such as gene transcript, exon id, chromosome id, start position and end position. The length is converted into proportion: since introns of a gene are several times longer than exons, direct mapping results in no exons being visible to the naked eye at all. If it is desired to display only exons and 20bp variations around the exons, all intron regions can be replaced with equal lengths. If the whole canvas is regarded as a1 × 1 canvas, each position of the exon region can be converted into a proportional value on the canvas: the actual position/(total length of all exon regions + length of intron + number of introns), i.e. scaled to a ratio.
Drawing a chromosome map, marking chromosome G bands with different colors, converting the lengths of all G band bands into proportion, judging whether each segment is positioned at a p arm or a q arm, drawing a map of a chromosome where genes are positioned, and marking the positions of the genes on the map of the chromosome; for example, the left part in fig. 1 and 2 represents a chromosome, and each color-gradient rectangle or semicircle represents a G band of the chromosome, which is drawn in absolute positional order; here, a map can be made for a gene, first drawing a chromosome, and then marking where the gene is located on the chromosome. And
a genetic variation information annotation comprising:
and step A, judging whether the gene has a gene variation site, if so, performing the following step B, otherwise, not labeling any gene variation information, and outputting only a chromosome map and a gene segment map as final results.
B, judging whether the current mapping space is enough to place all the gene variation site information, and if so, performing the following step C; if not, an error prompt pops up to inform that the gene mutation sites are too many to map. It should be noted that, in the case of reporting that mapping is not possible, some sites may be filtered out and mapped, the number of mapped sites is preferably within 50, and the sites remaining after filtering are preferably sites of interest to the user or sites relevant to disease/prognosis.
Step C, judging whether the current point is positioned in the gene, if so, performing the following step D; if not, popping up a warning to inform that the current point is not in the gene range, and carrying out the following step H; here, the space currently used for presenting maps relevant to the results of the detection of genetic variation is referred to as the current mapping space. Generally, a map is drawn for a gene, and the amount of the map that can be displayed is limited, i.e., the mapping space is limited. On the premise of keeping clear content and beautiful layout, only about 50 variable information can be drawn in the graph under the condition of ensuring that the graph can be visually identified, and if more than 50 variable information is drawn, the graph is not necessary to be drawn. On the other hand, almost all patients have no more than 15 variation sites on one gene at the maximum, and 50 variation sites are basically guaranteed to be mapped to all samples. Drawing a variable locus information according to the position sequence, wherein the drawn variable locus information is the current point; before the current point is mapped, the latest drawn mutation site information is the last point; if the current point is drawn, the information of the mutation site to be drawn is the next point; the remaining points are all the information of the variation sites which are not drawn.
Step D, judging whether the current space is enough to place the current point and the residual points; if yes, performing step E; if not, directly moving upwards by a specified distance to mark the gene variation information of the current point, and carrying out the following step H.
Step E, judging whether the distance between the current point and the previous point is extremely close or not, so that the labeling information of the two points can be overlapped; if yes, downwards moving the position of the previous point by a specified distance to mark the gene variation information of the current point, and carrying out the following step H; if not, the following step G is carried out; here, "particularly close" or "particularly far" is a comparison of the distance between two points with a predetermined value. "particularly close" if the distance between two points is less than a predetermined value (e.g., 0.01); "extra far" is indicated if the distance between two points is greater than a predetermined value (e.g., 0.1). The preset value can be set according to needs.
G, judging whether the current point is particularly far away from the previous point and particularly near to the next point, if so, moving the current point upwards by a specified distance to mark the genetic variation information of the current point, and carrying out the following step H; if not, directly marking the gene variation information of the current point at the current position, and carrying out the following step H; and
and step H, judging whether the current point is the last gene mutation site, if so, ending the marking, outputting the result obtained in the judging process as a final result, and if not, jumping to the next gene mutation site and performing the step C.
Effects of the invention
According to the device and the method for constructing the user-friendly chromosome gene variation map, the specific variation condition of any gene on the whole chromosome can be displayed automatically, truly, intuitively and beautifully. Furthermore, the gene detection result can be represented as a colored image, so that the data can be more easily identified through visual detection, and the readability of the gene detection result is improved.
Drawings
Fig. 1 is a diagram showing a user-friendly chromosomal gene variation map of BRCA1 variation assay data of sample VB01562 obtained in example 1.
Fig. 2 is a diagram showing a user-friendly chromosomal gene variation map of BRCA2 variation assay data of sample VB01562 obtained in example 1.
Detailed description of the invention
The BRCA1/2 user-friendly chromosome gene variation map construction device is used for constructing the chromosome gene variation map of the variation data of a sample (sample number VB01562), and the device comprises:
and the data acquisition module is used for acquiring gene variation detection data, gene information and chromosome G band data.
The genetic variation detection data comprises snp or indel variation information obtained by processing, comparing, detecting by a variation algorithm and annotating original sequencing data; genetic information includes information such as all transcripts, chromosomes, exon numbering, start and stop positions of each gene provided from the refseq database.
And the data sorting module is connected with the data acquisition module and is used for matching the transcript of the input gene, extracting the exon of the transcript and all the gene variation detection data within the range of 20bp around the exon, and sorting and outputting the gene variation detection data according to a specified format.
Wherein, the specified format means that the extracted information is arranged in a form which is expected to be shown in the figure, such as 'exon 9, heterozygosity: c.6513G > C: p.V2171V', and is output to a temporary file named gene _ mutpos.txt, and finally, all the temporary files can be deleted after the algorithm runs.
And the gene segment drawing module is connected with the data sorting module and is used for drawing a gene segment drawing, wherein the lengths of all exons are converted into proportion according to the gene segment information, and meanwhile, each exon is cumulatively added with an intron segment with the same length.
And the chromosome map drawing module is connected with the data sorting module and the gene segment map drawing module and is used for drawing a chromosome map, wherein chromosome G display bands are marked by different colors, the lengths of all G display band segments are converted into proportion, whether each segment is positioned at a p arm or a q arm is judged, a chromosome map of a chromosome where the gene is positioned is drawn, and the position of the gene is marked on the chromosome map. And
the genetic variation information labeling module is connected with the data sorting module, the gene segment drawing module and the chromosome drawing module and comprises:
and the submodule A is used for judging whether the gene has a gene variation site, if so, outputting an instruction to start the following submodule B, and if not, outputting a chromosome map and a gene segment map as final results without marking any gene variation information.
The submodule B is connected with the submodule A and used for judging whether the current mapping space is enough to place all the gene variation locus information or not, and if so, outputting an instruction to start the following submodule C; if not, the genetic variation sites are informed to be too many to map.
The submodule C is connected with the submodule B and the submodule H and used for judging whether the current point is positioned in the gene or not, and if so, an output instruction is carried out to start the following submodule D; if not, a warning is popped up to inform that the current point is not in the gene range, and an instruction is output to start the following sub-module H.
The submodule D is connected with the submodule C and the submodule H and used for judging whether the current space is enough for placing the current point and the residual points; if yes, outputting an instruction to start the following sub-module E; if not, directly moving upwards by a specified distance to mark the gene variation information of the current point, and outputting an instruction to start the following sub-module H.
The submodule E is connected with the submodule D and the submodule H and is used for judging whether the distance between the current point and the previous point is extremely close or not so that the labeling information of the two points can be overlapped; if yes, downwards moving the specified distance from the previous point position to mark the gene variation information of the current point, and outputting an instruction to start a sub-module H; if not, an output command starts the following sub-module G.
Wherein, the distance between two points is compared with a preset value by the means of 'particularly close' or 'particularly far'; "particularly close" if the distance between the two points is less than a preset value of 0.01; if the distance between two points is greater than 0.1, it is "extra far". The specified distance is 0.01.
The submodule G is connected with the submodule E and the submodule H and used for judging whether the current point is particularly far away from the previous point and particularly near to the next point, if so, the current point moves upwards by a specified distance to label the genetic variation information of the current point, and an instruction is output to start the following submodule H; if not, marking the gene variation information of the current point at the current position directly, and outputting an instruction to start the following sub-module H. And
the submodule H is used for judging whether the current point is the last gene variation site or not, if so, the marking is finished, and the result generated in the submodule is taken as the final result to be output; if not, jumping to the next gene mutation site and outputting an instruction to start the submodule C.
After the chromosome gene variation map is constructed by using the BRCA1/2 user-friendly chromosome gene variation map construction device, the VB01562_ BRCA1.png (see figure 1) and VB01562_ BRCA2.png of the visual data of the BRCA1 and BRCA2 variation detection data of a sample VB01562 are obtained. As can be seen from the document VB01562_ BRCA1.png (see fig. 2), the sample VB01562 does not detect the mutation in the BRCA1 gene exon and the 20bp around, and only outputs the chromosome map and the gene segment map as final results; in the document VB01562_ BRCA2.png, it can be seen that the sample VB01562 detects 6 mutations in the exons of the BRCA1 gene and within 20bp of the periphery of the exons.
Industrial applicability
According to the invention, the device and the method for constructing the user-friendly chromosome genetic variation map can automatically, truly, intuitively and beautifully display the specific variation condition of any gene on the whole chromosome.

Claims (6)

1. An apparatus for constructing a user-friendly chromosomal genetic variation profile, comprising:
the data acquisition module is used for acquiring gene variation detection data, gene information and chromosome G banding data;
the data sorting module is connected with the data acquisition module and is used for matching the transcript of the input gene, extracting the exon of the transcript and all the gene variation detection data within the range of 15-30 bp around the exon and sorting and outputting the gene variation detection data according to a specified format;
the gene segment drawing module is connected with the data sorting module and is used for converting the lengths of all the exons into proportion according to the gene segment information, and meanwhile, each exon is cumulatively added with an intron segment with the same length;
the chromosome map drawing module is connected with the gene segment map drawing module and is used for marking the chromosome G banding with different colors, converting the lengths of all G banding zones into proportion, judging whether each zone is positioned at a p arm or a q arm, drawing a map of the chromosome where the gene is positioned and marking the position of the gene on the map of the chromosome;
and
the genetic variation information labeling module is connected with the data sorting module, the gene segment drawing module and the chromosome drawing module and comprises:
the submodule A is used for judging whether the gene has a gene variation site, if so, an output instruction starts the submodule B, if not, no gene variation information is marked, and only the chromosome map and the gene segment map are taken as final results to be output;
the submodule B is connected with the submodule A and used for judging whether the current mapping space is enough to place all the gene variation locus information or not, and if so, outputting an instruction to start the following submodule C; if not, popping up an error prompt to inform that the gene mutation sites are too many to map;
the submodule C is connected with the submodule B and the submodule H and used for judging whether the current point is positioned in the gene or not, and if so, an instruction is output to start the following submodule D; if not, popping up a warning to inform that the current point is not in the gene range, and outputting an instruction to start the following sub-module H;
the submodule D is connected with the submodule C and the submodule H and used for judging whether the current space is enough for placing the current point and the residual points; if yes, performing a submodule E; if not, directly moving upwards for a specified distance to mark the gene variation information of the current point, and outputting an instruction to start a sub-module H;
the submodule E is connected with the submodule D and the submodule H and is used for judging whether the distance between the current point and the previous point is extremely close or not so that the labeling information of the two points can be overlapped; if yes, downwards moving the specified distance from the previous point position to mark the gene variation information of the current point, and outputting an instruction to start a sub-module H; if not, outputting an instruction to start the following sub-module G;
the submodule G is connected with the submodule E and the submodule H and used for judging whether the current point is particularly far away from the previous point and particularly near to the next point, if so, the current point moves upwards by a specified distance to label the genetic variation information of the current point, and an instruction is output to start the following submodule H; if not, directly marking the gene variation information of the current point at the current position, and outputting an instruction to start a sub-module H;
and
a submodule H for judging whether the current point is the last gene variation site, if yes, marking is finished, the result obtained in the judging process is output as the final result, if no, jumping to the next gene variation site and outputting an instruction to start the submodule C,
wherein, especially close means that the distance between two points is less than 0.01, and especially far means that the distance between two points is more than 0.1.
2. The apparatus for constructing a user-friendly chromosomal gene variation map according to claim 1, wherein all gene variation detection data within 20bp around an exon and an exon of the transcript are extracted and output in a prescribed format.
3. The apparatus for constructing a user-friendly chromosomal gene variation map of claim 1, wherein the data acquisition module is configured to acquire genetic variation detection data, genetic information, and chromosomal G banding data,
the genetic variation detection data comprises snp or indel variation information obtained by processing, comparing, detecting by a variation algorithm and annotating original sequencing data;
the gene information includes all transcripts, chromosomes, exon numbering, start positions and end positions of each gene from the refseq database.
4. A method for constructing a user-friendly chromosomal genetic variation profile, comprising:
acquiring data, namely acquiring gene variation detection data, gene information and chromosome G banding data;
data sorting, namely matching the transcript of an input gene, extracting exons of the transcript and all gene variation detection data within the range of 15-30 bp around the exons, and sorting and outputting the gene variation detection data according to a specified format;
drawing a gene segment graph, converting the lengths of all exons into proportion according to gene segment information, and adding an intron segment with equal length into each exon in an accumulated manner;
drawing a chromosome map, marking chromosome G bands with different colors, converting the lengths of all G band bands into proportion, judging whether each segment is positioned at a p arm or a q arm, drawing a map of a chromosome where genes are positioned, and marking the positions of the genes on the map of the chromosome;
and
a genetic variation information annotation comprising:
step A, judging whether the gene has a gene variation site, if so, performing the following step B, otherwise, not marking any gene variation information, and only outputting a chromosome map and a gene segment map as final results;
b, judging whether the current mapping space is enough to place all the gene variation site information, and if so, performing the following step C; if not, popping up an error prompt to inform that the gene mutation sites are too many to map;
step C, judging whether the current point is positioned in the gene, if so, performing the following step D; if not, popping up a warning to inform that the current point is not in the gene range, and carrying out the following step H;
step D, judging whether the current space is enough to place the current point and the residual points; if yes, performing step E; if not, directly moving upwards for a specified distance to mark the gene variation information of the current point, and performing the following step H;
step E, judging whether the distance between the current point and the previous point is extremely close or not, so that the labeling information of the two points can be overlapped; if yes, downwards moving the position of the previous point by a specified distance to mark the gene variation information of the current point, and carrying out the following step H; if not, the following step G is carried out;
g, judging whether the current point is particularly far away from the previous point and particularly near to the next point, if so, moving the current point upwards by a specified distance to mark the genetic variation information of the current point, and carrying out the following step H; if not, directly marking the gene variation information of the current point at the current position, and carrying out the following step H; and
step H, judging whether the current point is the last gene mutation site, if so, marking is finished, outputting the result obtained in the judging process as a final result, if not, jumping to the next gene mutation site and carrying out the step C,
wherein, especially close means that the distance between two points is less than 0.01, and especially far means that the distance between two points is more than 0.1.
5. The method for constructing a user-friendly chromosomal gene variation map according to claim 4, wherein all gene variation detection data within 20bp around an exon and an exon of the transcript are extracted and output in a prescribed format.
6. The method for constructing a user-friendly chromosomal gene variation map of claim 4, wherein the data acquisition module is configured to acquire genetic variation detection data, genetic information, and chromosomal G banding data,
the genetic variation detection data comprises snp or indel variation information obtained by processing, comparing, detecting by a variation algorithm and annotating original sequencing data;
the gene information includes all transcripts, chromosomes, exon numbering, start positions and end positions of each gene from the refseq database.
CN201711423213.5A 2017-12-25 2017-12-25 Device and method for constructing user-friendly chromosome gene variation map Active CN110021363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711423213.5A CN110021363B (en) 2017-12-25 2017-12-25 Device and method for constructing user-friendly chromosome gene variation map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711423213.5A CN110021363B (en) 2017-12-25 2017-12-25 Device and method for constructing user-friendly chromosome gene variation map

Publications (2)

Publication Number Publication Date
CN110021363A CN110021363A (en) 2019-07-16
CN110021363B true CN110021363B (en) 2021-01-15

Family

ID=67187019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711423213.5A Active CN110021363B (en) 2017-12-25 2017-12-25 Device and method for constructing user-friendly chromosome gene variation map

Country Status (1)

Country Link
CN (1) CN110021363B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955630A (en) * 2014-03-26 2014-07-30 田埂 Method for preparing reference database and performing target area sequence alignment on to-be-tested free nucleic acid samples
CN106520940A (en) * 2016-11-04 2017-03-22 深圳华大基因研究院 Chromosomal aneuploid and copy number variation detecting method and application thereof
CN107133494A (en) * 2017-04-21 2017-09-05 天津大学 A kind of new analysis biological genome copies the method for visualizing of number variation
CN107194208A (en) * 2017-04-25 2017-09-22 北京荣之联科技股份有限公司 A kind of genetic analysis annotates method and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004090100A2 (en) * 2003-04-04 2004-10-21 Agilent Technologies, Inc. Visualizing expression data on chromosomal graphic schemes
US9587278B2 (en) * 2010-01-08 2017-03-07 Oxford Gene Technology (Operations) Ltd. Combined CGH and allele specific hybridisation method
US8725422B2 (en) * 2010-10-13 2014-05-13 Complete Genomics, Inc. Methods for estimating genome-wide copy number variations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955630A (en) * 2014-03-26 2014-07-30 田埂 Method for preparing reference database and performing target area sequence alignment on to-be-tested free nucleic acid samples
CN106520940A (en) * 2016-11-04 2017-03-22 深圳华大基因研究院 Chromosomal aneuploid and copy number variation detecting method and application thereof
CN107133494A (en) * 2017-04-21 2017-09-05 天津大学 A kind of new analysis biological genome copies the method for visualizing of number variation
CN107194208A (en) * 2017-04-25 2017-09-22 北京荣之联科技股份有限公司 A kind of genetic analysis annotates method and apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"INTEGRATE-Vis: a tool for comprehensive gene fusion visualization";Jin Zhang et al.;《SCIENTIFIC REPORTS》;20171219;第1-4页 *
"分形图形与基因序列可视化";苏珊;《中国优秀硕士学位论文全文数据库 基础科学辑》;20160815;第A006-286页 *

Also Published As

Publication number Publication date
CN110021363A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
US20200243313A1 (en) Interactive analysis of mass spectrometry data including peak selection and dynamic labeling
CN109086571B (en) A kind of method and system that monogenic disease hereditary variation is intelligently interpreted and reported
Kapushesky et al. Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments
US10199206B2 (en) Interactive analysis of mass spectrometry data
Zerr et al. Automated band mapping in electrophoretic gel images using background information
Schwessinger et al. Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell-and tissue-specific DNase footprints
Smadbeck et al. C opy number variant analysis using genome‐wide mate‐pair sequencing
Barturen et al. MethylExtract: high-quality methylation maps and SNV calling from whole genome bisulfite sequencing data
Goodeve et al. Nomenclature of genetic variants in hemostasis
US11626274B2 (en) Interactive analysis of mass spectrometry data including peak selection and dynamic labeling
CN113035272B (en) Method and device for obtaining immunotherapeutic new antigen based on intein cell variation
Minton et al. Mutation surveyor: software for DNA sequence analysis
Fan et al. Evolution of genomic structural variation and genomic architecture in the adaptive radiations of African cichlid fishes
CN105930690A (en) Whole-exome sequencing data analysis method
US20160203195A1 (en) Genome browser
CN110021363B (en) Device and method for constructing user-friendly chromosome gene variation map
He et al. T2T-YAO: a telomere-to-telomere assembled diploid reference genome for Han Chinese
CN108710782A (en) Genotype conversion method, device and electronic equipment
JP2015089364A (en) Cancer diagnostic method by multiplex somatic mutation, development method of cancer pharmaceutical, and cancer diagnostic device
CN114974412A (en) Method, apparatus, and medium generating tumor detection data of target object
KR102572274B1 (en) An apparatus for analyzing nucleic sequencing data and a method for operating it
CN106599613A (en) Method for judging genetic tumor variation site classification
Brancato et al. Forensic DNA Phenotyping: Genes and Genetic Variants for Eye Color Prediction
KR101882867B1 (en) Method and apparatus for determining the reliability of variant detection markers
EP1798651B1 (en) Gene information display method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201201

Address after: 322000 3rd floor, building 9, standard factory building, No. 10, Gaoxin Road, chuojiang street, Yiwu City, Jinhua City, Zhejiang Province

Applicant after: ANNOROAD (YIWU) MEDICAL INSPECTION CO.,LTD.

Applicant after: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 100176 Beijing City, Daxing District branch of Beijing economic and Technological Development Zone Street 88 Hospital No. 8 Building 2 unit 701 room

Applicant before: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TA01 Transfer of patent application right

Effective date of registration: 20210104

Address after: 322000 3rd floor, building 9, standard workshop, No.10 Gaoxin Road, Houjiang street, Yiwu City, Jinhua City, Zhejiang Province

Applicant after: ANNOROAD (YIWU) MEDICAL INSPECTION CO.,LTD.

Applicant after: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: Room 701, unit 2, building 8, yard 88, Kechuang 6th Street, Daxing District, Beijing 100176

Applicant before: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

TA01 Transfer of patent application right
TR01 Transfer of patent right

Effective date of registration: 20240625

Address after: Room 101 and 201, Unit 2, Building 8, No. 88 Kechuang 6th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing, 100176

Patentee after: BEIJING ANNOROAD MEDICAL LABORATORY Co.,Ltd.

Country or region after: China

Patentee after: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 322000 3rd floor, building 9, standard workshop, No.10 Gaoxin Road, Houjiang street, Yiwu City, Jinhua City, Zhejiang Province

Patentee before: ANNOROAD (YIWU) MEDICAL INSPECTION CO.,LTD.

Country or region before: China

Patentee before: ANNOROAD GENE TECHNOLOGY (BEIJING) Co.,Ltd.