EA201991907A1 - Способ и системы для эффективного сжатия прочтений геномной последовательности - Google Patents
Способ и системы для эффективного сжатия прочтений геномной последовательностиInfo
- Publication number
- EA201991907A1 EA201991907A1 EA201991907A EA201991907A EA201991907A1 EA 201991907 A1 EA201991907 A1 EA 201991907A1 EA 201991907 A EA201991907 A EA 201991907A EA 201991907 A EA201991907 A EA 201991907A EA 201991907 A1 EA201991907 A1 EA 201991907A1
- Authority
- EA
- Eurasian Patent Office
- Prior art keywords
- genomic sequence
- readings
- systems
- genomic
- effective compression
- Prior art date
Links
- 238000000034 method Methods 0.000 title abstract 3
- 230000006835 compression Effects 0.000 title 1
- 238000007906 compression Methods 0.000 title 1
- 230000009466 transformation Effects 0.000 abstract 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3086—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
- G06F21/12—Protecting executable software
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/50—Compression of genetic data
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3091—Data deduplication
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3091—Data deduplication
- H03M7/3095—Data deduplication using variable length segments
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/70—Type of the data to be coded, other than image and sound
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioethics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Chemical & Material Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Technology Law (AREA)
- Multimedia (AREA)
- Wood Science & Technology (AREA)
- Signal Processing (AREA)
- Biochemistry (AREA)
- Public Health (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Artificial Intelligence (AREA)
- Microbiology (AREA)
- Immunology (AREA)
Abstract
Способ и устройство для сжатия данных геномной последовательности, созданных секвенаторами генома. Прочтения последовательности кодируют путем выравнивания их относительно ранее существующих или построенных референсных последовательностей, причем процесс кодирования состоит из классифицирования прочтений в классы данных с последующим кодированием каждого класса посредством множества геномных дескрипторов. Геномные дескрипторы одного типа организуют в блоки, которые сжимают путем применения последовательных этапов преобразования, бинаризации и энтропийного кодирования. Для каждого класса данных и для каждого соответствующего дескриптора используют специальные модели источника и энтропийные кодеры.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2017/017842 WO2018071055A1 (en) | 2016-10-11 | 2017-02-14 | Method and apparatus for the compact representation of bioinformatics data |
PCT/US2017/041579 WO2018071078A1 (en) | 2016-10-11 | 2017-07-11 | Method and apparatus for the access to bioinformatics data structured in access units |
PCT/US2017/066863 WO2018151788A1 (en) | 2017-02-14 | 2017-12-15 | Method and systems for the efficient compression of genomic sequence reads |
Publications (1)
Publication Number | Publication Date |
---|---|
EA201991907A1 true EA201991907A1 (ru) | 2020-01-20 |
Family
ID=69374527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EA201991907A EA201991907A1 (ru) | 2017-02-14 | 2017-12-15 | Способ и системы для эффективного сжатия прочтений геномной последовательности |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3583250B1 (ru) |
JP (1) | JP7324145B2 (ru) |
EA (1) | EA201991907A1 (ru) |
MX (1) | MX2019009681A (ru) |
WO (1) | WO2018151788A1 (ru) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116018647A (zh) * | 2020-07-10 | 2023-04-25 | 皇家飞利浦有限公司 | 通过基于可配置机器学习的算术编码进行的基因组信息压缩 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070192139A1 (en) | 2003-04-22 | 2007-08-16 | Ammon Cookson | Systems and methods for patient re-identification |
US10902937B2 (en) | 2014-02-12 | 2021-01-26 | International Business Machines Corporation | Lossless compression of DNA sequences |
US20160100177A1 (en) * | 2014-10-06 | 2016-04-07 | Qualcomm Incorporated | Non-uniform exponential-golomb codes for palette mode coding |
-
2017
- 2017-12-15 WO PCT/US2017/066863 patent/WO2018151788A1/en active Search and Examination
- 2017-12-15 EA EA201991907A patent/EA201991907A1/ru unknown
- 2017-12-15 JP JP2019542691A patent/JP7324145B2/ja active Active
- 2017-12-15 EP EP17896462.3A patent/EP3583250B1/en active Active
- 2017-12-15 MX MX2019009681A patent/MX2019009681A/es unknown
Also Published As
Publication number | Publication date |
---|---|
JP2020510907A (ja) | 2020-04-09 |
JP7324145B2 (ja) | 2023-08-09 |
MX2019009681A (es) | 2019-10-09 |
EP3583250A1 (en) | 2019-12-25 |
WO2018151788A1 (en) | 2018-08-23 |
EP3583250B1 (en) | 2023-07-12 |
EP3583250A4 (en) | 2020-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
PH12019501881A1 (en) | Method and apparatus for the efficient compression of genomic sequence reads | |
PH12017501183A1 (en) | Palette index grouping for high throughput cabac coding | |
WO2018155986A3 (ko) | 비디오 신호 처리 방법 및 장치 | |
EA201791429A1 (ru) | Контексты для больших элементов кодового дерева | |
CO2019009919A2 (es) | Método y sistemas para la compresión eficiente de lecturas de secuencias genómicas | |
MX354002B (es) | Aparato y método para decodificar y codificar una señal de audio utilizando selección de mosaicos espectrales adaptativos. | |
SG10201808973XA (en) | Image encoding device, image decoding device, image encoding method and image decoding method | |
EP3754484A3 (en) | Generating encoding software and decoding means | |
MY190014A (en) | Data compression | |
EA201991908A1 (ru) | Способ и устройство для компактного представления биоинформационных данных с помощью нескольких геномных дескрипторов | |
PH12019500791A1 (en) | Efficient data structures for bioinformatics information presentation | |
PH12019500294A1 (en) | Method and apparatuse for coding and decoding polar codes | |
MY178527A (en) | Encoder, decoder, system and methods for encoding and decoding | |
TW201615016A (en) | Transport stream for carriage of video coding extensions | |
EA201991906A1 (ru) | Способ и системы для восстановления геномных референсных последовательностей из сжатых прочтений геномной последовательности | |
PH12019500793A1 (en) | Method and apparatus for compact representation of bioinformatics data | |
EA201991907A1 (ru) | Способ и системы для эффективного сжатия прочтений геномной последовательности | |
PH12017500790A1 (en) | Image coding device, image coding method, image coding program, transmission device, transmission method, transmission program, image decoding device, image decoding method, image decoding program, reception device, reception method, and reception program | |
MX2020002143A (es) | Metodos y aparatos para codificar y decodificar informacion de modo y dispositivo electronico. | |
FI4029023T3 (fi) | Menetelmä genomin sekvenssitietojen pakkaamiseksi | |
PL412844A1 (pl) | System oraz sposób kodowania obszaru odsłoniętego w strumieniu danych sekwencji wielowidokowych | |
MX2022005226A (es) | Codificador, decodificador, metodo de codificacion, metodo de decodificacion, y programa de compresion de representaciones visuales. | |
MY178056A (en) | Method for encoding and method for decoding a list of identifiers, related computer program products, transmitter and receiver implementing said methods |