CN117746980A - Automatic rapid typing method, device, equipment and medium for influenza virus - Google Patents

Automatic rapid typing method, device, equipment and medium for influenza virus Download PDF

Info

Publication number
CN117746980A
CN117746980A CN202311747772.7A CN202311747772A CN117746980A CN 117746980 A CN117746980 A CN 117746980A CN 202311747772 A CN202311747772 A CN 202311747772A CN 117746980 A CN117746980 A CN 117746980A
Authority
CN
China
Prior art keywords
target
influenza virus
sequencing data
analyzed
influenza
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311747772.7A
Other languages
Chinese (zh)
Inventor
梁家豪
万锈琳
郑焱
谢龙旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Kaipu Medical Laboratory Co ltd
Hybribio Ltd
Guangzhou Hybribio Medical Laboratory Co ltd
Original Assignee
Changsha Kaipu Medical Laboratory Co ltd
Hybribio Ltd
Guangzhou Hybribio Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Kaipu Medical Laboratory Co ltd, Hybribio Ltd, Guangzhou Hybribio Medical Laboratory Co ltd filed Critical Changsha Kaipu Medical Laboratory Co ltd
Priority to CN202311747772.7A priority Critical patent/CN117746980A/en
Publication of CN117746980A publication Critical patent/CN117746980A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of biology, and discloses an automatic rapid typing method, device, equipment and medium for influenza viruses. And comparing the plurality of target high-throughput sequencing data with an influenza virus reference genome sequence database for analysis, so that the types of the plurality of influenza viruses to be analyzed can be preliminarily determined, and genome assembly is performed on the determined target high-throughput sequencing data of the plurality of influenza viruses to be analyzed by taking the influenza virus reference genome sequence database as a reference, so that the target typing results of the plurality of influenza viruses to be analyzed can be further determined. Therefore, by implementing the method, the whole analysis flow can be automatically completed only by the initial high-throughput sequencing data, the specific type of the influenza virus can be rapidly, accurately and automatically confirmed, and the requirement of rapid typing diagnosis can be completely met.

Description

Automatic rapid typing method, device, equipment and medium for influenza virus
Technical Field
The invention relates to the technical field of biology, in particular to an automatic rapid typing method, device, equipment and medium for influenza viruses.
Background
Influenza virus is called influenza virus (influenza virus) for short, and is mainly caused by acute respiratory diseases after infection of influenza A or B virus, and patients have symptoms of aversion to cold, general ache, headache, fever, nasal obstruction, watery nasal discharge, pharyngalgia, cough and the like, and secondary pneumonia can be caused in severe cases. Each influenza a virus HAs a surface receptor complex consisting of influenza Hemagglutinin (HA) and Neuraminidase (NA), for a total of 18 influenza Hemagglutinin types, each designated by the letter "H", and similarly, 11 known neuraminidases are designated by the letter "N" followed by a number, the different influenza a strains being designated by their combination of Hemagglutinin and Neuraminidase; the influenza B virus mainly comprises Victoria and Yamagata, and the genome of the influenza A virus consists of 8 single-strand negative-strand RNAs.
The influenza virus has the characteristics of strong mutation and rapid variation, the drug resistance of the influenza virus to the medicine is continuously enhanced, and the protection effect of the vaccine is continuously weakened. Therefore, the rapid and accurate typing and traceability analysis of the infected pathogen is of great significance to the prevention of influenza outbreaks and the treatment of diseases. The current differential diagnosis of influenza virus mainly depends on methods such as virus separation culture, antigen detection, RT-PCR expansion and the like, and the analysis methods are time-consuming and labor-consuming and can not realize the requirement of rapid typing diagnosis.
Disclosure of Invention
In view of the above, the invention provides an automatic rapid typing method, device, equipment and medium for influenza virus, which are used for solving the problems that the existing virus analysis method is time-consuming and labor-consuming and the rapid typing diagnosis cannot be realized.
In a first aspect, the present invention provides an automated rapid typing method for influenza virus, the method comprising:
acquiring an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed; performing sequencing quality evaluation and processing on each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data; determining a plurality of types of influenza viruses to be analyzed based on the plurality of target throughput sequencing data and the influenza virus reference genome sequence database; based on an influenza virus reference genome sequence database, performing genome assembly on target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed to obtain an assembly sequence; when the assembly sequence meets the preset condition, virus analysis is carried out by utilizing the influenza virus reference genome sequence database and the assembly sequence, and a plurality of target typing results of the influenza viruses to be analyzed with a determined type are obtained.
According to the automatic rapid influenza virus typing method provided by the invention, the quality of high-flux sequencing data can be improved by evaluating and processing the sequencing quality of the initial high-flux sequencing data of each influenza virus to be analyzed. And comparing the plurality of target high-throughput sequencing data with an influenza virus reference genome sequence database for analysis, so that the types of the plurality of influenza viruses to be analyzed can be preliminarily determined, and genome assembly is performed on the determined target high-throughput sequencing data of the plurality of influenza viruses to be analyzed by taking the influenza virus reference genome sequence database as a reference, so that the target typing results of the plurality of influenza viruses to be analyzed can be further determined. Therefore, by implementing the invention, the whole analysis flow can be automatically completed only by the initial high-throughput sequencing data, the specific type of the influenza virus can be rapidly, accurately and automatically confirmed, the requirement of rapid typing diagnosis can be completely met, further, the detection of influenza virus variation can be enhanced, the accurate prediction can be rapidly made, the data support is provided for preventing and treating the influenza virus, the method has guiding significance for clinicians, and the method has important significance for preventing and controlling the influenza virus.
In an alternative embodiment, the sequencing quality evaluation and processing is performed on each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data, including:
performing sequencing quality assessment on each initial high-throughput sequencing data to obtain a quality assessment result; and processing each initial high-throughput sequencing data based on the quality evaluation result to obtain a plurality of target high-throughput sequencing data.
According to the invention, the quality of the high-throughput sequencing data can be improved by performing quality evaluation and processing on the initial high-throughput sequencing data.
In an alternative embodiment, determining a plurality of types of influenza viruses to be analyzed based on a plurality of target throughput sequencing data and an influenza virus reference genome sequence database, comprises:
comparing the plurality of target high-throughput sequencing data to an influenza virus reference genome sequence database to obtain a first comparison file; the types of the plurality of influenza viruses to be analyzed are determined based on the first comparison file.
The invention can preliminarily determine the types of a plurality of influenza viruses to be analyzed through comparing the target high-throughput sequencing data with the influenza virus reference genome sequence database.
In an alternative embodiment, after aligning the plurality of target throughput sequencing data into the influenza virus reference genome sequence database to obtain the first alignment file, the method further comprises:
Calculating average sequencing depth, coverage and abundance based on the first comparison file; based on the average sequencing depth, coverage and abundance, the type of the plurality of influenza viruses to be analyzed is determined.
The invention confirms the type of the influenza virus to be analyzed through the average sequencing depth, the coverage rate and the abundance, and can lead the influenza virus to be more accurate and reliable in typing.
In an alternative embodiment, genome assembly of a defined plurality of target high throughput sequencing data of influenza viruses to be analyzed based on an influenza virus reference genome sequence database, resulting in an assembled sequence, comprising:
obtaining a target reference genome sequence of a plurality of influenza viruses to be analyzed in a determined type based on an influenza virus reference genome sequence database; comparing the target high-throughput sequencing data of the determined multiple influenza viruses to be analyzed with a target reference genome sequence to obtain a second comparison file; and performing genome assembly based on the second alignment file and the target reference genome sequence to obtain an assembly sequence.
In an alternative embodiment, when the assembly sequence meets a preset condition, performing virus analysis by using the influenza virus reference genome sequence database and the assembly sequence to obtain target typing results of a plurality of influenza viruses to be analyzed, wherein the target typing results comprise:
When the assembly sequence meets the preset condition, combining the influenza virus reference genome sequence database with the assembly sequence to obtain a target sequence; performing multi-sequence comparison based on the target sequence to obtain a multi-sequence comparison result; and performing virus tracing analysis by utilizing the multi-sequence comparison result to obtain a plurality of determined target typing results of the influenza viruses to be analyzed.
The invention carries out traceability analysis through multi-sequence comparison, and can more accurately and reliably determine the target typing result of the influenza virus to be analyzed.
In an alternative embodiment, the virus tracing analysis is performed by using the multi-sequence comparison result to obtain a plurality of target typing results of the influenza viruses to be analyzed in a determined type, including:
constructing a phylogenetic tree based on the multi-sequence comparison result; verifying a target typing result by using a phylogenetic tree; and when the verification is passed, obtaining target typing results of a plurality of influenza viruses to be analyzed in a determined type.
According to the invention, the phylogenetic tree is constructed through the multi-sequence comparison result, so that the accuracy of the parting result can be verified, and the accuracy of the target parting result is further improved.
In a second aspect, the present invention provides an automated rapid typing apparatus for influenza virus, the apparatus comprising:
The acquisition module is used for acquiring an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed; the evaluation and processing module is used for evaluating and processing the sequencing quality of each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data; the determining module is used for determining the types of the influenza viruses to be analyzed based on the target high-throughput sequencing data and the influenza virus reference genome sequence database; the assembly module is used for carrying out genome assembly on a plurality of target high-throughput sequencing data of the determined influenza viruses to be analyzed based on an influenza virus reference genome sequence database to obtain an assembly sequence; and the analysis module is used for carrying out virus analysis by utilizing the influenza virus reference genome sequence database and the assembly sequence when the assembly sequence meets the preset condition, so as to obtain a plurality of target typing results of the influenza viruses to be analyzed in a determined type.
In a third aspect, the present invention provides a computer device comprising: the device comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the influenza virus automatic rapid typing method according to the first aspect or any corresponding implementation mode.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions for causing a computer to perform the influenza virus automated rapid typing method of the first aspect or any of its corresponding embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of an automated rapid typing method for influenza viruses according to an embodiment of the present invention;
FIG. 2 is a flow diagram of another automated rapid typing method of influenza viruses according to an embodiment of the present invention;
FIG. 3 is a flow diagram of yet another automated rapid typing method for influenza viruses according to an embodiment of the present invention;
FIG. 4 is a flow chart of yet another automated rapid genotyping method for influenza viruses according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a phylogenetic tree according to an embodiment of the present invention;
FIG. 6 is a block diagram of an automated rapid typing apparatus for influenza viruses according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides an automatic rapid typing method for influenza viruses, which can automatically complete the whole analysis flow by only initial high-throughput sequencing data, rapidly, accurately and automatically confirms the specific types of the influenza viruses and can completely meet the requirement of rapid typing diagnosis.
According to an embodiment of the present invention, there is provided an automated rapid typing method embodiment for influenza virus, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.
In this embodiment, an automated rapid influenza virus typing method is provided, and fig. 1 is a flowchart of the automated rapid influenza virus typing method according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:
step S101, obtaining an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed.
Specifically, the influenza virus reference genome sequence database can be obtained by screening and downloading from a public database, can comprise the existing influenza A virus complete genome reference sequence, and is in a type covering HA1-HA18 and NA1-NA11; genomic reference sequences of influenza b viruses Victoria and Yamagata.
Further, initial high-throughput sequencing data can be obtained by high-throughput sequencing of each influenza virus to be analyzed.
Among them, high throughput sequencing (High-throughput sequencing) is also called "Next generation" sequencing technology (Next-generation sequencing technology), or large-scale parallel sequencing (Massively parallel sequencing, MPS). Unlike conventional Sanger (dideoxy) sequencing, techniques that allow parallel sequencing of a large number of nucleic acid molecules in parallel at a time, typically a single sequencing reaction yields no less than 100Mb of sequencing data.
Step S102, carrying out sequencing quality evaluation and processing on each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data.
Specifically, the sequencing quality evaluation and processing are performed on each obtained initial high-throughput sequencing data, so that the joint data and the high-throughput sequencing data with poor quality can be removed, and further a plurality of final target throughput sequencing data are obtained, and data support can be provided for subsequently improving the parting precision.
Step S103, determining the types of the influenza viruses to be analyzed based on the target flux sequencing data and the influenza virus reference genome sequence database.
Specifically, a plurality of target high-throughput sequencing data obtained through combination are used for reference through an influenza virus reference genome sequence database, so that a plurality of types of influenza viruses to be analyzed can be obtained.
Step S104, based on the influenza virus reference genome sequence database, performing genome assembly on the target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed to obtain an assembly sequence.
Specifically, the genome assembly is carried out on target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed by taking an influenza virus reference genome sequence database as a reference, so that support can be provided for subsequent virus analysis.
Step S105, when the assembly sequence meets the preset condition, virus analysis is carried out by utilizing the influenza virus reference genome sequence database and the assembly sequence, and a plurality of target typing results of the influenza viruses to be analyzed with the determined types are obtained.
Specifically, the influenza virus reference genome sequence database is used as a reference, and whether the obtained assembly sequence meets the preset condition can be judged.
Further, when the assembly sequence meets the preset condition, the assembly sequence obtained by combining the assembly sequence with the influenza virus reference genome sequence database is used as a reference, and virus analysis can be continuously carried out on the determined plurality of influenza viruses to be analyzed, so that the target typing result of the determined plurality of influenza viruses to be analyzed can be obtained.
According to the automatic rapid influenza virus typing method provided by the embodiment, the quality of high-throughput sequencing data can be improved by evaluating and processing the sequencing quality of the initial high-throughput sequencing data of each influenza virus to be analyzed. And comparing the plurality of target high-throughput sequencing data with an influenza virus reference genome sequence database for analysis, so that the types of the plurality of influenza viruses to be analyzed can be preliminarily determined, and genome assembly is performed on the determined target high-throughput sequencing data of the plurality of influenza viruses to be analyzed by taking the influenza virus reference genome sequence database as a reference, so that the target typing results of the plurality of influenza viruses to be analyzed can be further determined. Therefore, by implementing the invention, the whole analysis flow can be automatically completed only by the initial high-throughput sequencing data, the specific type of the influenza virus can be rapidly, accurately and automatically confirmed, the requirement of rapid typing diagnosis can be completely met, further, the detection of influenza virus variation can be enhanced, the accurate prediction can be rapidly made, the data support is provided for preventing and treating the influenza virus, the method has guiding significance for clinicians, and the method has important significance for preventing and controlling the influenza virus.
In this embodiment, an automated rapid influenza virus typing method is provided, and fig. 2 is a flowchart of the automated rapid influenza virus typing method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:
step S201, obtaining an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S202, carrying out sequencing quality evaluation and processing on each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data.
Specifically, the step S202 includes:
step S2021, performing sequencing quality assessment on each initial high-throughput sequencing data to obtain a quality assessment result.
Specifically, sequencing quality assessment may be performed on each initial high-throughput sequencing data as needed, such as assessing whether the sequencing quality value of bases at each position of the initial high-throughput sequencing data is greater than 20.
Step S2022, based on the quality evaluation result, processes each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data.
Specifically, according to the quality assessment results, the linker data and the low quality sequencing data in the initial high throughput sequencing data are removed such that the sequencing quality value of the bases at each position in the initial high throughput sequencing data is greater than 20 and the percentages of Q20 bases, Q30 bases are greater than 90% for subsequent analysis.
Step S203, determining a plurality of types of influenza viruses to be analyzed based on the plurality of target flux sequencing data and the influenza virus reference genome sequence database.
Specifically, the step S203 includes:
step S2031, comparing the plurality of target high throughput sequencing data to the influenza virus reference genome sequence database to obtain a first comparison file.
Specifically, a plurality of target throughput sequencing data are compared to an influenza virus reference genome sequence database, and a corresponding first comparison file can be obtained.
Step S2032, determining the types of the plurality of influenza viruses to be analyzed based on the first comparison file.
Specifically, through the first comparison file, the types of a plurality of influenza viruses to be analyzed can be determined.
In some alternative embodiments, as shown in fig. 3, after the step S2031, the step S203 further includes:
step S2033, calculating average sequencing depth, coverage and abundance based on the first comparison file.
Specifically, after comparing the plurality of target high-throughput sequencing data to the influenza virus reference genome sequence database, according to the obtained first comparison file, the average sequencing depth, coverage rate and abundance of each target high-throughput sequencing data can be calculated.
The calculation may be performed by using open source software, which is not limited in this embodiment, and the corresponding open source software may be determined according to the actual requirement.
Step S2034, determining the types of the plurality of influenza viruses to be analyzed based on the average sequencing depth, the coverage and the abundance.
Specifically, the type of a plurality of influenza viruses to be analyzed can be more accurately determined by combining the average sequencing depth, the coverage rate and the abundance.
Step S204, based on the influenza virus reference genome sequence database, performing genome assembly on the target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed to obtain an assembly sequence. Please refer to step S104 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S205, when the assembly sequence meets the preset condition, the virus analysis is performed by utilizing the influenza virus reference genome sequence database and the assembly sequence, and a plurality of target typing results of the influenza viruses to be analyzed with the determined types are obtained. Please refer to step S105 in the embodiment shown in fig. 1 in detail, which is not described herein.
According to the automatic rapid influenza virus typing method provided by the embodiment, the quality of high-throughput sequencing data can be improved by evaluating and processing the sequencing quality of the initial high-throughput sequencing data of each influenza virus to be analyzed. Furthermore, the target high-throughput sequencing data are compared with an influenza virus reference genome sequence database for analysis, so that the types of a plurality of influenza viruses to be analyzed can be preliminarily determined, and the types of the influenza viruses to be analyzed are further confirmed by combining the average sequencing depth, the coverage rate and the abundance, so that the influenza viruses can be more accurately and reliably typed. Furthermore, the genome assembly is carried out on the target high-throughput sequencing data of a plurality of influenza viruses to be analyzed in a determined type by taking the influenza virus reference genome sequence database as a reference, so that the target typing results of the influenza viruses to be analyzed can be further determined. Therefore, by implementing the invention, the whole analysis flow can be automatically completed only by the initial high-throughput sequencing data, the specific type of the influenza virus can be rapidly, accurately and automatically confirmed, the requirement of rapid typing diagnosis can be completely met, further, the detection of influenza virus variation can be enhanced, the accurate prediction can be rapidly made, the data support is provided for preventing and treating the influenza virus, the method has guiding significance for clinicians, and the method has important significance for preventing and controlling the influenza virus.
In this embodiment, an automatic rapid typing method for influenza virus is provided, and fig. 4 is a flowchart of the automatic rapid typing method for influenza virus according to an embodiment of the present invention, as shown in fig. 4, where the flowchart includes the following steps:
step S401, obtaining an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S402, performing sequencing quality evaluation and processing on each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data. Please refer to step S202 in the embodiment shown in fig. 2, which is not described herein.
Step S403, determining a plurality of types of influenza viruses to be analyzed based on the plurality of target flux sequencing data and the influenza virus reference genome sequence database. Please refer to step S203 in the embodiment shown in fig. 3 in detail, which is not described herein.
Step S404, based on the influenza virus reference genome sequence database, performing genome assembly on the target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed to obtain an assembly sequence.
Specifically, the step S404 includes:
Step S4041, obtaining a target reference genome sequence of a plurality of influenza viruses to be analyzed in a determined type based on the influenza virus reference genome sequence database.
Specifically, a plurality of genome sequences corresponding to influenza viruses to be analyzed, namely target reference genome sequences, of a determined type can be determined in an influenza virus reference genome sequence database.
Step S4042, comparing the target high-throughput sequencing data of the determined multiple influenza viruses to be analyzed with the target reference genome sequence to obtain a second comparison file.
Specifically, continuing to compare the target high-throughput sequencing data of the determined plurality of influenza viruses to be analyzed to the target reference genome sequence, a corresponding second comparison file can be obtained.
Step S4043, performing genome assembly based on the second alignment file and the target reference genome sequence to obtain an assembly sequence.
Specifically, the second aligned file after alignment and the target reference genome sequence are subjected to genome assembly, and corresponding assembly sequences are obtained.
And step S405, when the assembly sequence meets the preset condition, performing virus analysis by utilizing the influenza virus reference genome sequence database and the assembly sequence to obtain target typing results of a plurality of influenza viruses to be analyzed in a determined type.
Specifically, the step S405 includes:
step S4051, merging the influenza virus reference genome sequence database and the assembly sequence to obtain the target sequence when the assembly sequence meets the preset condition.
Firstly, judging whether the obtained assembly sequence meets the preset condition.
Specifically, the alignment of the assembled sequence into the target reference genome sequence may determine whether the assembled sequence satisfies a preset condition. For example, if the alignment length is longer than 90% of the target reference genome sequence length and if the sequence identity is greater than 90%.
Further, if it is more than 90%, it means that the assembly sequence satisfies the preset condition, and the subsequent analysis can be continued.
Further, when the assembly sequence meets the preset condition, combining the influenza virus reference genome sequence database with the assembly sequence to obtain a corresponding target sequence, for example, 6 influenza virus reference genome sequence databases have 1 assembly sequence, and combining the two sequences together to obtain 7 independent sequences, namely the target sequence.
Step S4052, performing multi-sequence comparison based on the target sequence to obtain a multi-sequence comparison result.
Specifically, the corresponding sequences in the target sequences are subjected to multi-sequence comparison, so that corresponding multi-sequence comparison results can be obtained.
And step S4053, performing virus traceability analysis by utilizing the multi-sequence comparison result to obtain a plurality of determined target typing results of the influenza viruses to be analyzed.
Specifically, the virus traceability analysis can be performed on the influenza viruses to be analyzed through the multi-sequence comparison result, and the final typing result of a plurality of influenza viruses to be analyzed in a determined type can be obtained.
In an alternative embodiment, the step S4053 includes:
and a step a1, constructing a phylogenetic tree based on the multi-sequence comparison result.
And a2, verifying a target typing result by using a phylogenetic tree.
And a3, when the verification is passed, obtaining target typing results of a plurality of influenza viruses to be analyzed in a determined type.
Wherein, phylogenetic tree is also called molecular evolutionary tree, which is a method for describing the correlation relationship between different organisms in bioinformatics.
Specifically, after the multiple sequence alignment, a corresponding phylogenetic tree can be constructed according to the multiple sequence alignment result.
Further, the bootstrap sampling value of the phylogenetic tree is set to be 1000, and bootstrap values larger than 0.7 are considered to be reliable, namely the traceability analysis and verification of the target typing result of the influenza virus are realized.
Further, when the verification is passed, the final typing results of a plurality of influenza viruses to be analyzed in a determined type, namely, the target typing results, can be obtained.
According to the automatic rapid influenza virus typing method provided by the embodiment, the quality of high-throughput sequencing data can be improved by evaluating and processing the sequencing quality of the initial high-throughput sequencing data of each influenza virus to be analyzed. Further, the plurality of target flux sequencing data are compared with an influenza virus reference genome sequence database for analysis, so that the types of a plurality of influenza viruses to be analyzed can be preliminarily determined, and further, the tracing analysis is performed through the multi-sequence comparison, so that the target typing result of the influenza viruses to be analyzed can be accurately and reliably determined. Furthermore, a phylogenetic tree can be constructed through the multi-sequence comparison result, so that the accuracy verification of the target typing result can be realized.
In an example, an automated rapid typing and analysis process for influenza virus is provided, which can automatically, rapidly and accurately perform typing and traceability analysis on influenza a virus and influenza b virus, and comprises the following steps: (1) Screening and downloading a reference genome sequence of the influenza A virus and the influenza B virus; (2) Receiving high-throughput sequencing data, performing sequencing quality assessment on the high-throughput sequencing data, and removing joint data and low-quality sequencing data; (3) Comparing the processed high-throughput sequencing data with an influenza virus reference genome sequence database, calculating average sequencing depth, coverage rate and abundance, and comprehensively determining the specific type or negative of the influenza virus according to the three results; (4) Taking an influenza virus reference genome sequence database as a reference, and performing genome parametric assembly; (5) And merging the assembled sequence with an influenza virus reference genome sequence database, and then carrying out traceability analysis. Compared with other parting methods, the method has the advantages of rapidness, automation, simplicity, convenience and accuracy in parting, and can play an important role in scientific research and emergency prevention and control of influenza epidemic situation.
In an alternative example, a specific embodiment is provided for typing and traceability analysis of SRR6987291 (H7N 9) influenza virus high throughput sequencing sample data downloaded from ENA database based on an automated rapid typing method for influenza virus provided in the above example:
1. the genome sequences of the influenza A virus and the influenza B virus are screened and downloaded from a public database as an influenza virus reference genome sequence database, the database covers all types of genome reference sequences of the existing influenza A viruses HA1-HA18 and NA1-NA18 in the public database, and the genome reference sequences of the influenza B viruses Yamagata and Victoria can be used as a reference database of a subsequent analysis sample after one-time construction.
2. Raw data of high throughput sequencing is first evaluated to assess whether it meets quality requirements for subsequent analysis. The sequencing quality value of the base at each position at both ends of the high-throughput sequencing data is greater than 20, and the high-throughput sequencing data can be used for subsequent analysis.
3. After sequencing quality assessment of the high throughput sequencing data, the sequencing adapter data and the low quality sequencing data are further removed, and the percentage of Q20 bases and Q30 bases after removal of unqualified sequences is over 90% for subsequent analysis.
4. Comparing the processed high-throughput sequencing data with an influenza virus reference genome sequence database, determining the specific type or negative type of the influenza virus, calculating the average sequencing depth, coverage rate and abundance of the influenza virus after comparison, jointly determining the typing result by a plurality of values, so that the typing result is more accurate, wherein the average sequencing depth, coverage rate and abundance value in the step are obviously larger than those of other types in the determined types, and the type of the influenza virus determined by the sample is H7N9 as shown in the following table 1.
TABLE 1 results of influenza Virus alignment typing analysis
5. Genome assembly is carried out on high-flux sequencing data for determining the type of the influenza virus, so that 8 segments of assembled influenza virus sequences are obtained.
6. And comparing the assembled sequence of the influenza virus with a target reference genome sequence of the influenza virus, wherein the comparison result requires that the comparison length is longer than 90% of the length of the target reference genome sequence and the sequence consistency is greater than 90% for subsequent analysis. As shown in Table 2 below, the assembled sequence length of this example is identical or nearly identical to the target reference genome sequence length, with sequence identity exceeding 98%.
TABLE 2 alignment of assembled sequences
7. The assembled Segment4 (HA) sequence fragment is combined with the Segment4 (HA) sequence fragment of the reference genome of influenza A, and multiple sequence alignment is performed (multiple sequence alignment of influenza B virus is alignment after combining the assembled whole genome sequence with the influenza B whole genome reference sequence).
8. The files after the multi-sequence comparison are used for constructing a phylogenetic tree, bootstrap sampling values of the phylogenetic tree are set to be 1000, bootstrap values are larger than 0.7 and are considered to be reliable, the traceability analysis is realized after the phylogenetic tree is constructed, as shown in fig. 5, the phylogenetic tree divides an assembled SRR6987291 sample and H7N9 into one branch, so that the influenza virus sequenced by the SRR6987291 sample is H7N9, and the accuracy of the typing result of the invention is further illustrated.
In this embodiment, an automatic rapid typing device for influenza virus is further provided, and the device is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The embodiment provides an automatic rapid typing device for influenza virus, as shown in fig. 6, comprising:
an acquisition module 601 is used for acquiring an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed.
The evaluation and processing module 602 is configured to perform sequencing quality evaluation and processing on each initial high-throughput sequencing data, so as to obtain a plurality of target high-throughput sequencing data.
A determining module 603 for determining a plurality of types of influenza viruses to be analyzed based on the plurality of target high throughput sequencing data and the influenza virus reference genome sequence database.
The assembly module 604 is configured to perform genome assembly on a determined plurality of target high-throughput sequencing data of influenza viruses to be analyzed based on the influenza virus reference genome sequence database, so as to obtain an assembly sequence.
And the analysis module 605 is used for carrying out virus analysis by utilizing the influenza virus reference genome sequence database and the assembly sequence when the assembly sequence meets the preset condition, so as to obtain a plurality of target typing results of the influenza viruses to be analyzed in a determined type.
In some alternative embodiments, the evaluation and processing module 602 includes:
and the quality evaluation unit is used for carrying out sequencing quality evaluation on each initial high-throughput sequencing data to obtain a quality evaluation result.
And the processing unit is used for processing each initial high-throughput sequencing data based on the quality evaluation result to obtain a plurality of target high-throughput sequencing data.
In some alternative embodiments, the determining module 603 includes:
and the first comparison unit is used for comparing the plurality of target high-throughput sequencing data to the influenza virus reference genome sequence database to obtain a first comparison file.
And the first determining unit is used for determining the types of the influenza viruses to be analyzed based on the first comparison file.
In some alternative embodiments, the determining module 603 further includes:
a calculation unit for calculating an average sequencing depth, coverage and abundance based on the first comparison file.
And a second determining unit for determining the types of the plurality of influenza viruses to be analyzed based on the average sequencing depth, the coverage rate and the abundance.
In some alternative embodiments, the assembly module 604 includes:
and the acquisition unit is used for acquiring target reference genome sequences of a plurality of influenza viruses to be analyzed in a determined type based on the influenza virus reference genome sequence database.
And the second comparison unit is used for comparing the target high-throughput sequencing data of the determined multiple influenza viruses to be analyzed to the target reference genome sequence to obtain a second comparison file.
And the assembling unit is used for assembling the genome based on the second comparison file and the target reference genome sequence to obtain an assembling sequence.
In some alternative embodiments, the analysis module 605 includes:
and the merging unit is used for merging the influenza virus reference genome sequence database and the assembly sequence to obtain the target sequence when the assembly sequence meets the preset condition.
And the third comparison unit is used for comparing multiple sequences based on the target sequence to obtain a multiple sequence comparison result.
And the analysis unit is used for carrying out virus traceability analysis by utilizing the multi-sequence comparison result to obtain a plurality of determined target typing results of the influenza viruses to be analyzed.
In some alternative embodiments, the apparatus further comprises:
and the construction module is used for constructing a phylogenetic tree based on the multi-sequence comparison result.
And the verification module is used for verifying the target typing result by utilizing the phylogenetic tree.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The automated rapid influenza virus typing apparatus in this embodiment is in the form of functional units, where the units are ASIC (Application Specific Integrated Circuit ) circuits, processors and memory executing one or more software or firmware programs, and/or other devices that can provide the above functions.
The embodiment of the invention also provides computer equipment, which is provided with the automatic rapid influenza virus typing device shown in the figure 6.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 7, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 7.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (10)

1. An automated rapid typing method for influenza virus, the method comprising:
acquiring an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed;
performing sequencing quality evaluation and processing on each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data;
determining a plurality of types of the influenza viruses to be analyzed based on the plurality of target throughput sequencing data and the influenza virus reference genome sequence database;
based on the influenza virus reference genome sequence database, performing genome assembly on the target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed to obtain an assembly sequence;
when the assembly sequence meets the preset condition, carrying out virus analysis by utilizing the influenza virus reference genome sequence database and the assembly sequence to obtain a plurality of target typing results of the influenza viruses to be analyzed with a determined type.
2. The method of claim 1, wherein performing sequencing quality assessment and processing on each of the initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data comprises:
performing sequencing quality assessment on each initial high-throughput sequencing data to obtain a quality assessment result;
and processing each initial high-throughput sequencing data based on the quality evaluation result to obtain a plurality of target high-throughput sequencing data.
3. The method of claim 1, wherein determining a plurality of types of the influenza viruses to be analyzed based on the plurality of target throughput sequencing data and the influenza virus reference genome sequence database comprises:
comparing the plurality of target flux sequencing data to the influenza virus reference genome sequence database to obtain a first comparison file;
and determining the types of the influenza viruses to be analyzed based on the first comparison file.
4. The method of claim 3, wherein after aligning the plurality of target throughput sequencing data into the influenza reference genome sequence database to obtain a first alignment file, the method further comprises:
Calculating average sequencing depth, coverage and abundance based on the first comparison file;
based on the average sequencing depth, the coverage and the abundance, determining a plurality of types of the influenza viruses to be analyzed.
5. The method of claim 1, wherein genome assembling the target high-throughput sequencing data for a determined plurality of influenza viruses to be analyzed based on the influenza virus reference genome sequence database to obtain an assembled sequence, comprising:
obtaining a plurality of target reference genome sequences of the influenza viruses to be analyzed in a determined type based on the influenza virus reference genome sequence database;
comparing the target high-throughput sequencing data of the determined multiple influenza viruses to be analyzed with the target reference genome sequence to obtain a second comparison file;
and carrying out genome assembly based on the second alignment file and the target reference genome sequence to obtain the assembly sequence.
6. The method according to claim 1, wherein when the assembly sequence satisfies a preset condition, performing virus analysis using the influenza virus reference genome sequence database and the assembly sequence to obtain target typing results of a plurality of influenza viruses to be analyzed in a determined type, comprising:
When the assembly sequence meets the preset condition, combining the influenza virus reference genome sequence database with the assembly sequence to obtain a target sequence;
performing multi-sequence comparison based on the target sequence to obtain a multi-sequence comparison result;
and performing virus tracing analysis by using the multi-sequence comparison result to obtain a plurality of target typing results of the influenza viruses to be analyzed in a determined type.
7. The method of claim 6, wherein performing virus traceability analysis using the multiple sequence alignment results to obtain a determined type of target typing results for the plurality of influenza viruses to be analyzed, comprises:
constructing a phylogenetic tree based on the multi-sequence comparison result;
verifying the target typing result by using the phylogenetic tree;
and when the verification is passed, obtaining the target typing results of a plurality of influenza viruses to be analyzed in a determined type.
8. An automated rapid typing apparatus for influenza virus, the apparatus comprising:
the acquisition module is used for acquiring an influenza virus reference genome sequence database and initial high-throughput sequencing data of each influenza virus to be analyzed;
The evaluation and processing module is used for evaluating and processing the sequencing quality of each initial high-throughput sequencing data to obtain a plurality of target high-throughput sequencing data;
a determining module for determining a plurality of types of the influenza viruses to be analyzed based on the plurality of target throughput sequencing data and the influenza virus reference genome sequence database;
the assembly module is used for carrying out genome assembly on the target high-throughput sequencing data of a plurality of determined influenza viruses to be analyzed based on the influenza virus reference genome sequence database to obtain an assembly sequence;
and the analysis module is used for carrying out virus analysis by utilizing the influenza virus reference genome sequence database and the assembly sequence when the assembly sequence meets the preset condition, so as to obtain a plurality of determined target typing results of the influenza viruses to be analyzed.
9. A computer device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the automated rapid typing method of influenza virus of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the automated rapid typing method of influenza virus of any one of claims 1 to 7.
CN202311747772.7A 2023-12-18 2023-12-18 Automatic rapid typing method, device, equipment and medium for influenza virus Pending CN117746980A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311747772.7A CN117746980A (en) 2023-12-18 2023-12-18 Automatic rapid typing method, device, equipment and medium for influenza virus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311747772.7A CN117746980A (en) 2023-12-18 2023-12-18 Automatic rapid typing method, device, equipment and medium for influenza virus

Publications (1)

Publication Number Publication Date
CN117746980A true CN117746980A (en) 2024-03-22

Family

ID=90253888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311747772.7A Pending CN117746980A (en) 2023-12-18 2023-12-18 Automatic rapid typing method, device, equipment and medium for influenza virus

Country Status (1)

Country Link
CN (1) CN117746980A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989247A (en) * 2016-01-26 2016-10-05 中国动物卫生与流行病学中心 Influenza A virus fast typing and analyzing process
US20180203976A1 (en) * 2015-09-21 2018-07-19 The Regents Of The University Of California Pathogen detection using next generation sequencing
CN108350498A (en) * 2016-02-18 2018-07-31 深圳华大生命科学研究院 Classifying method and device
CN112967753A (en) * 2021-02-25 2021-06-15 美格医学检验所(广州)有限公司 Pathogenic microorganism detection system and method based on nanopore sequencing
CN113096736A (en) * 2021-03-26 2021-07-09 北京源生康泰基因科技有限公司 Method and system for automatically analyzing viruses in real time based on nanopore sequencing
CN113409890A (en) * 2021-05-21 2021-09-17 银丰基因科技有限公司 HLA typing method based on next generation sequencing data
CN114420212A (en) * 2022-01-27 2022-04-29 上海序祯达生物科技有限公司 Escherichia coli strain identification method and system
CN115976178A (en) * 2022-11-23 2023-04-18 山东大学 SFTSV (Small form-factor TSV) detection method based on nanopore metagenome sequencing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180203976A1 (en) * 2015-09-21 2018-07-19 The Regents Of The University Of California Pathogen detection using next generation sequencing
CN105989247A (en) * 2016-01-26 2016-10-05 中国动物卫生与流行病学中心 Influenza A virus fast typing and analyzing process
CN108350498A (en) * 2016-02-18 2018-07-31 深圳华大生命科学研究院 Classifying method and device
CN112967753A (en) * 2021-02-25 2021-06-15 美格医学检验所(广州)有限公司 Pathogenic microorganism detection system and method based on nanopore sequencing
CN113096736A (en) * 2021-03-26 2021-07-09 北京源生康泰基因科技有限公司 Method and system for automatically analyzing viruses in real time based on nanopore sequencing
CN113409890A (en) * 2021-05-21 2021-09-17 银丰基因科技有限公司 HLA typing method based on next generation sequencing data
CN114420212A (en) * 2022-01-27 2022-04-29 上海序祯达生物科技有限公司 Escherichia coli strain identification method and system
CN115976178A (en) * 2022-11-23 2023-04-18 山东大学 SFTSV (Small form-factor TSV) detection method based on nanopore metagenome sequencing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
闫晓敏: "基于纳米孔测序技术在动物病毒快速检测方法中的建立与评估", 《中国优秀硕士论文全文数据库 基础科学辑》, no. 02, 15 February 2021 (2021-02-15), pages 2 *

Similar Documents

Publication Publication Date Title
Sheng et al. Multi-perspective quality control of Illumina RNA sequencing data analysis
CN107229841B (en) A kind of genetic mutation appraisal procedure and system
Korpelainen et al. RNA-seq data analysis: a practical approach
US10176294B2 (en) Accurate typing of HLA through exome sequencing
CN111341383B (en) Method, device and storage medium for detecting copy number variation
WO2022028624A1 (en) Method and apparatus for determining microbial species and acquiring related information by means of sequencing, computer-readable storage medium, and electronic device
US20140149049A1 (en) Accurate and fast mapping of reads to genome
US20150142334A1 (en) System, method and computer-accessible medium for genetic base calling and mapping
NZ759420A (en) Process for aligning targeted nucleic acid sequencing data
CN105740650A (en) Method for rapidly and accurately identifying high-throughput genome data pollution sources
CN106407747A (en) Method and device for acquiring mutation sites of genes corresponding to tumors
CN111081315A (en) Method for detecting homologous pseudogene variation
Zhao et al. Bioinformatics for RNA-seq data analysis
US11289177B2 (en) Computer method and system of identifying genomic mutations using graph-based local assembly
Morisse et al. Long-read error correction: a survey and qualitative comparison
CN112086131A (en) Screening method of false positive variant sites in high-throughput sequencing
CN110782946A (en) Method and device for identifying repeated sequence, storage medium and electronic equipment
Roder et al. Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data
Chen et al. Identification of conserved and polymorphic STRs for personal genomes
CN117746980A (en) Automatic rapid typing method, device, equipment and medium for influenza virus
Prezza et al. Detecting mutations by eBWT
CN104131093B (en) The DNase high pass order-checking detection signal treatment process of DNA protein binding site
US20180121600A1 (en) Methods, Systems and Computer Readable Storage Media for Generating Accurate Nucleotide Sequences
CN106407745A (en) Mutation site acquisition method and device for a gene corresponding to skin
Ismail Bioinformatics: A Practical Guide to Next Generation Sequencing Data Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination