WO2020132935A1 - 一种定点编辑存储有数据的核酸序列的方法及装置 - Google Patents

一种定点编辑存储有数据的核酸序列的方法及装置 Download PDF

Info

Publication number
WO2020132935A1
WO2020132935A1 PCT/CN2018/123858 CN2018123858W WO2020132935A1 WO 2020132935 A1 WO2020132935 A1 WO 2020132935A1 CN 2018123858 W CN2018123858 W CN 2018123858W WO 2020132935 A1 WO2020132935 A1 WO 2020132935A1
Authority
WO
WIPO (PCT)
Prior art keywords
partition
sequence
joint
fragments
fragment
Prior art date
Application number
PCT/CN2018/123858
Other languages
English (en)
French (fr)
Inventor
平质
黄小罗
陈世宏
柴晨
沈玥
徐讯
杨焕明
Original Assignee
深圳华大生命科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大生命科学研究院 filed Critical 深圳华大生命科学研究院
Priority to PCT/CN2018/123858 priority Critical patent/WO2020132935A1/zh
Priority to EP18944810.3A priority patent/EP3904527A4/en
Priority to CN201880100481.XA priority patent/CN113228193B/zh
Priority to US17/417,702 priority patent/US20220064705A1/en
Publication of WO2020132935A1 publication Critical patent/WO2020132935A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/123DNA computing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0009RRAM elements whose operation depends upon chemical change
    • G11C13/0014RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material
    • G11C13/0019RRAM elements whose operation depends upon chemical change comprising cells based on organic memory material comprising bio-molecules

Definitions

  • the invention belongs to the field of molecular biology, in particular to the technical field of nucleic acid storage, and more specifically relates to a method and a corresponding device for editing a nucleic acid sequence in which data is stored in a fixed point.
  • DNA deoxyribonucleotide
  • A adenine
  • T thymine
  • C cytosine
  • G guanine
  • Structure is the carrier of genetic information and controls the development and continuation of life and the operation of life functions.
  • DNA is one of the most dense and stable information storage carriers known in nature. The development of DNA synthesis and sequencing technology makes it possible to become a digital information storage carrier.
  • DNA as a storage medium for information storage has a long storage time-it can reach more than a few thousand years, which is more than 100 times that of existing tape and optical disc media, and the storage density is high-can reach 10 9 Gb /mm 3 is more than ten million times that of magnetic tape and optical disc media, and has good storage security.
  • DNA data storage usually includes the following steps: 1) encoding: the binary 0/1 code of computer information is converted into A/T/C/G DNA sequence information; 2) synthesis: using DNA synthesis technology to synthesize DNA molecules with corresponding sequences And store the obtained synthetic DNA molecules in an in vitro medium or living cells; 3) Sequencing: read the DNA sequence of the stored DNA molecules using sequencing technology; 4) decode: use the method corresponding to the encoding process in step 1 , Convert the DNA sequence obtained by sequencing into binary 0/1 code, and further convert it into computer information. In order to achieve effective DNA data storage, it is necessary to further develop technologies for the above steps.
  • the inventor of the present disclosure found that the existing DNA storage method has a problem that it cannot be modified, added and deleted in a targeted manner.
  • Existing DNA storage methods store data information for the purpose of one-time synthesis and long-term storage. Suppose that after the synthesis is completed, the original information that needs to be stored is found to be incorrect, or when an individual error occurs during synthesis and cannot be recovered using the coded error correction code, the existing method can only discard the original synthesized DNA and re-synthesize , which greatly reduces the fault tolerance of DNA storage.
  • the present disclosure proposes a method and a corresponding device for editing a nucleic acid sequence in which data is stored in a fixed point.
  • the present invention provides a method for editing a nucleic acid sequence in which data is stored, including the following steps:
  • sequence fragments in each partition described in the synthesis step (2) are nucleic acid fragments
  • partition primer library uses the partition primer library to amplify the sequence fragments of all partitions except the sequence fragment of the nth partition, wherein the partition primer library includes the first partition, the second partition, ..., the n-1 partition , The n+1th partition, ..., and the i-th partitioning primer sequence are at least partially complementary primers, so as to obtain the first partition, the second partition, ..., the n-1 partition, the n+1 partition, ..., and the library of sequence fragments of the i-th partition; and
  • the data is text information, image information, or sound information.
  • the data is encoded into binary data by the first encoding rule.
  • the first encoding rule is a binary encoding rule known to those skilled in the art.
  • the binary data is encoded into a nucleic acid sequence by a second encoding rule, thereby obtaining a nucleic acid sequence in which data is stored.
  • the second coding rule is known to those skilled in the art, where the second coding rule includes, but is not limited to, Huffman coding rule, fountain code coding rule, XOR coding rule, and Grass coding rule.
  • the nucleic acid sequence storing the data is split into multiple sequence fragments.
  • the length of the sequence fragment is not particularly limited, but considering the convenience of synthesis in step (3) and the limitations of the synthesis technique, the nucleic acid sequence storing data can generally be split into sequence fragments not exceeding 200 nt.
  • the length of each fragment may be the same or different, preferably it is split into sequence fragments of the same length.
  • step (1) all sequence fragments are divided into i partitions, i is a positive integer.
  • the number of sequence fragments contained in each partition may be the same or different.
  • a partition joint A1 is added at one end or both ends of each sequence segment in the first partition, and is added at one end or both ends of each sequence segment in the second partition.
  • Partition joint A2, ..., a partition joint Ai is added to one or both ends of all sequence fragments in the i-th partition, wherein the partition joint sequences are different from each other but have the same length, preferably 16-20 nt.
  • step (2) the forward partition connector of the partition is added at the 5'end of the sequence fragment of each partition, and the partition of the partition is added at the 3'end of the sequence fragment of each partition Reverse zone connector.
  • a partition joint A1 is added to the 5'end of each sequence fragment in the first partition, and a partition joint A1' is added to the 3'end of each sequence fragment in the first partition,
  • Add a partition connector A2 at the 5'end of each sequence fragment in the second partition add a partition connector A2' at the 3'end of each sequence fragment in the second partition, ..., each in the i-th partition
  • a partitioning linker Ai is added to the 5'end of the sequence segment, and a partitioning linker Ai' is added to the 3'end of each sequence segment in the i-th partition, wherein the partitioning linker sequences are different from each other but have the same length, preferably 16-20 nt.
  • step (2) a universal joint is added at the 5'end of each partitioned sequence fragment, and a partitioned joint of the partition is added at the 3'end of each partitioned sequence fragment.
  • a universal joint A can be added at the 5'end of each sequence fragment in the partition, and a partition joint A1 can be added at the 3'end of each sequence fragment in the first partition.
  • a partition joint A2 to the 3'end of each sequence segment in the partition, ..., add a partition joint Ai at the 3'end of each sequence segment in the i partition, to form: each sequence in the first partition A universal joint A is connected to the 5'end of the fragment, a partitioned joint A1 is connected to the 3'end, a universal joint A is connected to the 5'end of each sequence fragment in the second partition, a partitioned joint A2 is connected to the 3'end, ...
  • a universal linker A is connected to the 5′ end of each sequence segment, and a partition linker Ai is connected to the 3′ end; wherein the partition linker sequences are different from each other but have the same length, preferably 16-20nt.
  • step (2) a universal linker A is added to the 3'end of the sequence fragments of each partition, and a partition linker A1 is added to the 5'end of each sequence fragment in the first partition, Add a partition joint A2 at the 5'end of each sequence segment in the second partition, ..., add a partition joint Ai at the 5'end of each sequence fragment in the i partition, wherein the partition joint sequences are different from each other But the length is the same, preferably 16-20nt.
  • the design of the zoned joint is carried out according to the following rules including but not limited to: 1) avoiding the continuous occurrence of more than 4 single bases, that is: “AAA” can be accepted but “AAAA” is not acceptable; 2) cannot exceed 3 base tandem repeats or complementary repeats, that is, tandem repeats such as “ATCATCATC” and complementary repeats such as "ATCXXXGAT” cannot be accepted; 3) No secondary structure of DNA or RNA; 4) Between different linkers Dimers cannot be formed; 5) The linker sequence only has as little overlap as possible with the sequence fragment to be stored.
  • the method further includes, after obtaining the sequence segment to which the zone joint is added in step (2), adding an index number to each sequence segment, wherein the index number is adjacent to the zone joint.
  • rules are user-defined rules, as long as the rules can achieve one-to-one correspondence between index codes and sequence position information of the sequence, specific coding rules are not limited.
  • index numbers are added to each sequence segment, as long as the index number is adjacent to the partition joint, the specific position where the index number is added is not limited. For example, after adding the index number at the 5'end of the sequence fragment, from the 5'to the 3'end of the sequence, "partition joint-index number-sequence segment with stored data-partition joint", “universal joint-index number-stored Sequence segment of data-partition joint” or “partition joint-index number-sequence segment with data stored-universal joint”; again, after adding index number at the 3'end of the sequence segment, from 5'to 3'of the sequence End to form a "partition joint-sequence segment with stored data-index number-partition joint", “partition joint-sequence segment with stored data-index number-universal joint” or “universal joint-sequence segment with stored data-index” Numbering-Partition Connector".
  • the length of the zoned joint is 18 nt, and the length of the index number sequence is 5 nt-10 nt, preferably 6 nt.
  • the partition n where the sequence segment to be edited is located is determined according to the encoding rule used when storing data.
  • the encoding rules used when storing the data such as binary encoding rules, Huffman encoding rules, fountain code encoding rules, XOR encoding rules, or Grass encoding rules, etc. , According to the coding rules to find the partition n where the wrong data is located.
  • the partition n where the sequence fragment to be edited is located is determined by sequencing the nucleic acid sequence fragments synthesized in step (3) and performing sequence alignment.
  • multiplex PCR is used to amplify the sequence fragments.
  • multiplex PCR can be performed by those skilled in the art based on the prior art knowledge.
  • the multiplex PCR process may include but is not limited to touch up, touch down and other forms of PCR, and the polymerase used may include but not limited to Taq, Phusion, Q5, Vent, KlenTaq and other different types of enzymes or they are in different ratios The combination.
  • the primer sequence in the partitioned primer library described in step (5) is at least partially complementary to the partitioned linker sequence described in the first aspect of the present invention
  • the partitioned primer library includes the first partition , The 2nd partition, ..., the n-1th partition, the n+1th partition, ..., and the ith partition are at least partially complementary primer sequences.
  • the sequence fragments of all the partitions except the sequence fragment of the nth partition are amplified, thereby obtaining the partitions including the first partition, the second partition, ..., the n-1 partition , The n+1th partition, ..., and the library of sequence fragments of the ith partition.
  • the sequence in the nth partition has not been exponentially amplified, so its copy number is much smaller than the correct sequence of the other partitions after exponential amplification.
  • the multiplex PCR amplification can achieve the purpose of diluting the sequence fragment of the nth partition.
  • the dilution refers to increasing the copy number of the target fragment through exponential amplification, so that the proportion of non-target fragments that have not been exponentially amplified in the final product is significantly reduced, so as to achieve the purpose of dilution.
  • the sequence is theoretically amplified by 109 times, while the sequence fragment of the nth partition will undergo linear amplification due to the existence of a universal linker, which is the theoretical It will be amplified 32768 times (10 5 ). Therefore, in the final amplification product, the proportion of sequence fragments in the nth partition significantly decreases.
  • the wrong sequence in the sequence segment to be edited in the nth partition can be re-encoded to obtain the correct sequence, and all sequence fragments in the nth partition can be synthesized according to the correct sequence, and then the It is mixed with a library containing sequence fragments of the first partition, the second partition, ..., the n-1th partition, the n+1th partition, ..., and the ith partition, thereby obtaining a library with the correct sequence.
  • each sequence fragment in the library can be ligated into a vector, or each sequence fragment in the library can be assembled.
  • the library with the correct sequence, the vector to which the sequence fragments are connected, or the assembled sequence fragments can be stored in a medium, where the medium includes but is not limited to a liquid phase, dry powder, live cells, and the like.
  • the method of the present invention locates the nucleic acid sequence to be edited by means of "index partitioning", and can perform low-cost correction on erroneous data that occurs during storage. Compared with the existing DNA storage method, this method greatly reduces the correction cost when an error occurs in the stored information, and at the same time, greatly improves the fault tolerance rate of the existing DNA storage system.
  • the present invention provides a decoding method, which includes sequencing a library obtained using the method of the first aspect of the present invention to obtain each sequence fragment; according to the index number of each sequence fragment, obtaining each sequence Position sequence information of fragments; according to the position sequence information, the sequence fragments are spliced into nucleic acid sequences storing data.
  • the acquired nucleic acid sequence with stored data is transcoded into a corresponding binary code, and then the binary code is transcoded into corresponding data information.
  • the second encoding rule is used to transcode the acquired nucleic acid sequence storing data into a corresponding binary code, and then the first encoding rule is used to transcode the binary code into corresponding data information,
  • the first encoding rule and the second encoding rule are as defined in the first aspect of the present invention.
  • the present invention provides a device for editing a nucleic acid sequence storing data in a fixed point, including: a sequence splitting and partitioning module, which is configured to split the nucleic acid sequence storing data into a plurality of sequence fragments, And divide all the sequence fragments into i partitions, where i is a positive integer; the partition joint addition module is configured to add a partition joint at one or both ends of the sequence segment in each partition, wherein the partition joint of each partition The sequences of each are different from each other; the nucleic acid synthesis module, which is configured to synthesize the sequence fragments to which the partition linker is added into the nucleic acid fragments; An amplification module configured to amplify the sequence fragments of all partitions except the sequence fragment of the nth partition using the partition primer library, wherein the partition primer library includes the first partition, the second partition, ..., The n-1 partition, the n+1 partition, ..., and the i-part partition adapter sequences are at least partially complementary primer
  • the device further includes an index number adding module configured to add an index number on the sequence segment to which the zone joint is added, wherein the index number is adjacent to the zone joint.
  • the length of the sequence fragments and the number of sequence fragments contained in each partition are as defined in the first aspect of the present invention.
  • the zone joint and index number are as defined in the first aspect of the invention.
  • the device further includes an assembly module configured to assemble each sequence fragment in the library.
  • the device further includes a carrier connection module configured to connect each sequence fragment in the library to the carrier.
  • the device further includes a media storage module configured to store each sequence fragment in the library in the medium, or store the carrier connected with the sequence fragment in the medium, or store the assembled sequence
  • the fragments are stored in a medium; wherein, the medium includes, but is not limited to, liquid phase, dry powder, living cells and the like.
  • the present invention provides a decoding device, including: a sequencing module configured to sequence a library obtained using the method of the first aspect of the present invention to obtain each sequence fragment; a position information acquisition module, It is configured to acquire the position and sequence information of each sequence segment according to the index number of each sequence segment; the splicing module is configured to splice each sequence segment into nucleic acid with stored data according to the position sequence information sequence.
  • the decoding device further includes a transcoding module configured to transcode the nucleic acid sequence storing the data into a corresponding binary code, and then transcode the binary code into corresponding data information.
  • a transcoding module configured to transcode the nucleic acid sequence storing the data into a corresponding binary code, and then transcode the binary code into corresponding data information.
  • the transcoding module transcodes the acquired nucleic acid sequence with stored data into a corresponding binary code through a second encoding rule, and then transcodes the binary code into Corresponding data information, wherein the first encoding rule and the second encoding rule are as defined in the first aspect of the present invention.
  • the present invention provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements at least one of the following methods: the fixed point according to the first aspect of the present invention
  • the method of editing a nucleic acid sequence storing data is according to the decoding method of the second aspect of the present invention.
  • Figure 1 shows a flowchart of DNA storage.
  • FIG. 2 shows a schematic diagram of a sequence fragment after being split according to some embodiments of the present disclosure.
  • FIG. 3 shows a flowchart of a DNA storage sequence fixed-point editing process according to some embodiments of the present disclosure.
  • Partition A is sequence 1 to 22, where the above-mentioned universal joint is added to the 5'end of each sequence, and partition A is added to the partition A at the 3'end; partition B is the sequence 23 to 44, where the above-mentioned universal joint is added to each sequence 5'end, At the 3'end, add a partition connector of partition B; ...
  • partition H is a sequence of 155 to 176, in which the above universal connector is added at the 5'end of each sequence, and the partition connector of partition H is added at the 3'end.
  • the sequences of the zone joints of zones A to H are different from each other, and are all 18 nt in length.
  • the single underline is the universal joint sequence
  • the double underline is the partition joint sequence of partition C
  • the framed sequence is the index number area.
  • reaction program is: 98°C, 5min; 25 cycles, each cycle cooling 0.2°C (98°C, 20s; 55.2°C-60°C, 30s; 72°C, 10s); 72°C, 5min; 12°C, hold.
  • step 5 multiplex PCR amplification and dilution, obtain Oligo library containing only partitions A, B, D, E, F, G, H.
  • the single underline is the universal joint sequence
  • the double underline is the partition joint sequence of partition C
  • the framed sequence is the index number area.
  • the original index number area is AGCCTA, two new sequences are added, the index number area is A-AGCCTA, T-AGCCTA, and the newly added sequences 89-A and 89-B are respectively for:
  • the single underline is the universal joint sequence
  • the double underline is the partition joint sequence
  • the framed sequence is the index number area.
  • step 8 Mix the newly synthesized sequence in step 7 with the Oligo library obtained in step 6 to obtain a new mixed library.
  • the sequencing results are returned to the computer for decoding to obtain the correct original file.
  • Embodiment 2 decoding
  • the edited correct Oligo library in Example 1 was sequenced, and the sequence group A after the sequencing was removed at both ends of 18 nt in length (respectively universal joints and partitioned joints) to obtain sequence group A'. First read the index number information and decode the index number to get numbers of different sizes.
  • sequence group A' is rearranged according to the index rule, in ascending order, and then the index number is removed to obtain the sequence group A".
  • the nucleic acid sequences of sequence group A" are transcoded into corresponding binary codes, the binary codes of all sequences are connected according to the previous index order, and then the binary codes can be read according to the computer language. Restore the original file.

Abstract

一种定点编辑存储有数据的核酸序列的方法和相应的装置。

Description

一种定点编辑存储有数据的核酸序列的方法及装置 技术领域
本发明属于分子生物学领域,特别是属于核酸存储技术领域,更具体地涉及一种定点编辑存储有数据的核酸序列的方法和相应的装置。
背景技术
随着现代科技,尤其是互联网大数据的发展,全球的数据呈现指数级攀升的态势。不断增长的数据量对存储技术提出越来越高的要求。传统的存储技术,如磁带以及光碟存储,因为存储密度和时间有限越来越无法满足当前的数据需求。
近年来发展起来的DNA存储技术为解决这些问题提供了一条新的途径。DNA(脱氧核糖核苷酸)是一种由脱氧核糖以及四种含氮碱基(腺嘌呤(A)、胸腺嘧啶(T)、胞嘧啶(C)、鸟嘌呤(G))组成的双链结构,是遗传信息的载体,控制着生命的发育和延续以及生命机能的运作。DNA是自然界已知密度最大,最稳定的信息存储载体之一。DNA合成和测序技术的发展使其成为数字化信息存储载体提供了可能。与传统的存储介质相比,以DNA作为存储介质进行信息存储具有存储时间长——可以达到几千年以上,是现有磁带和光盘介质的百倍以上,存储密度高——可以达到10 9Gb/mm 3,是有磁带和光盘介质的千万倍以上,以及存储安全性好等特点。
DNA数据存储通常包括以下步骤:1)编码:将电脑信息的二进制0/1代码转换为A/T/C/G的DNA序列信息;2)合成:利用DNA合成技术合成具有相应序列的DNA分子,并将获得的合成DNA分子保藏在离体介质或者活体细胞内;3)测序:利用测序技术读取存储的DNA分子的DNA序列;4)解码:利用步骤1中与编码过程相对应的方式,将测序获得的DNA序列转换为二进制0/1代码,进一步转换为电脑信息。为了实现有效的DNA数据存储,需要进一步开发针对以上步骤流程的技术。
发明内容
本公开的发明人发现,现有的DNA存储方法存在无法定点修改、添加和删除的问题。现有的DNA存储方法都是以一次性合成为目的的存储数据信息并长期保存。假设在合成完成后,发现需要存储的原始信息有误,或者合成时发生了个别错误且无法利用编码纠错码进行恢复时,现有的方法只能把原有合成的DNA全部丢弃,重新 合成,从而大大降低了DNA存储的容错率。针对上述问题,本公开提出了一种定点编辑存储有数据的核酸序列的方法以及相应的装置。
在第一方面,本发明提供了一种定点编辑存储有数据的核酸序列的方法,包括以下步骤:
(1)将存储有数据的核酸序列拆分成多个序列片段,将所有的序列片段划分成i个分区,i为正整数;
(2)在每个分区中的序列片段的一端或两端添加分区接头,其中每个分区的分区接头序列彼此不同;
(3)合成步骤(2)所述的每个分区中的序列片段为核酸片段;
(4)确定待编辑的序列片段所在的分区n,记为第n分区;
(5)使用分区引物库扩增除了第n分区的序列片段之外的所有分区的序列片段,其中所述分区引物库包括分别与第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的分区接头序列至少部分互补的引物,从而获得包含第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的序列片段的文库;和
(6)更正第n分区中待编辑的序列片段中的错误序列,得到正确的序列后,按照正确的序列合成第n分区中的所有序列片段,并将其加入到步骤(5)所述的文库中,从而获得序列正确的文库。
在具体实施方式中,在步骤(1)中,所述数据为文字信息、图像信息、或声音信息。
在具体实施方式中,在步骤(1)之前,通过第一编码规则将所述数据编码为二进制数据。所述第一编码规则是本领域技术人员已知的二进制编码规则。
在具体实施方式中,在步骤(1)之前,通过第二编码规则将所述二进制数据编码为核酸序列,从而获得存储有数据的核酸序列。所述第二编码规则是本领域技术人员已知的,其中,第二编码规则包括但不限于Huffman编码规则、喷泉码编码规则、异或编码规则、Grass编码规则。
在具体实施方式中,在步骤(1)中,将存储有数据的核酸序列拆分成多个序列片段。序列片段的长度无特别限制,但是考虑到步骤(3)中合成时的方便以及合成技术上的限制,一般可将存储有数据的核酸序列拆分成不超过200nt的序列片段。每个片段的长度可以相同或不相同,优选为拆分成长度相同的序列片段。
在具体实施方式中,在步骤(1)中,将所有的序列片段划分成i个分区,i为正整数。每个分区所包含的序列片段数目可以相同,也可以不同。
在一个具体实施方式中,在步骤(2)中,在第1分区中的每个序列片段的一端或两端添加分区接头A1,在第2分区中的每个序列片段的一端或两端添加分区接头A2,……,在第i分区中的所有序列片段的一端或两端添加分区接头Ai,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt。
在另一个具体实施方式中,在步骤(2)中,在每个分区的序列片段的5’端添加该分区的正向分区接头,在每个分区的序列片段的3’端添加该分区的反向分区接头。具体而言,在步骤(2)中,在第1分区中的每个序列片段的5’端添加分区接头A1,在第1分区中的每个序列片段的3’端添加分区接头A1’,在第2分区中的每个序列片段的5’端添加分区接头A2,在第2分区中的每个序列片段的3’端添加分区接头A2’,……,在第i分区中的每个序列片段的5’端添加分区接头Ai,在第i分区中的每个序列片段的3’端添加分区接头Ai’,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt。
在另一个具体实施方式中,在步骤(2)中,在每个分区的序列片段的5’端添加通用接头,在每个分区的序列片段的3’端添加该分区的分区接头。具体而言,在步骤(2)中,可以在每个分区的序列片段的5’端添加通用接头A,在第1分区中的每个序列片段的3’端添加分区接头A1,在第2分区中的每个序列片段的3’端添加分区接头A2,……,在第i分区中的每个序列片段的3’端添加分区接头Ai,以此形成:第1分区中的每个序列片段的5’端连接有通用接头A,3’端连接有分区接头A1,第2分区中的每个序列片段的5’端连接有通用接头A,3’端连接有分区接头A2,……,第i分区中的每个序列片段的5’端连接有通用接头A,3’端连接有分区接头Ai;其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt。
在另一个具体实施方式中,在步骤(2)中,在每个分区的序列片段的3’端添加通用接头A,在第1分区中的每个序列片段的5’端添加分区接头A1,在第2分区中的每个序列片段的5’端添加分区接头A2,……,在第i分区中的每个序列片段的5’端添加分区接头Ai,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt。
在本发明中,根据包括但不限于以下规则进行分区接头的设计:1)避免连续出现4个以上的单一碱基,即:可以接受“AAA”但不可接受“AAAA”;2)不能出现超过3个碱基的串联重复或者互补重复,即不可以接受例如“ATCATCATC”这样的 串联重复以及例如“ATCXXXGAT”这样的互补重复;3)不能出现DNA或RNA二级结构;4)不同接头之间不能形成二聚体;5)接头序列与待存储的序列片段只有尽量少的重合度。
在一个具体实施方式中,分区接头可以按二进制大小排列(即A或T代表0,C或G代表1,也可能A或C代表0,T或G代表1等等,共12种组合方式),或者按四进制大小进行排列(例:A=“0”,T=“1”,C=“2”,G=“3”,共24种方式),达到添加索引编号的目的,根据所述索引编号能够将各分区序列按照编号顺序进行组装。
在另一个具体实施方式中,进一步包括在步骤(2)获得添加了分区接头的序列片段后,为各序列片段添加索引编号,其中所述索引编号与所述分区接头相邻。具体而言,所述索引编号为按照规则制定的索引码,如规定“AAAA”=1、“CCCC”=2、“TTTT”=3、“GGGG”=4、“ATCG”=5等,本领域技术人员可以理解,所述规则为使用者的自定义规则,只要所述规则能够实现将索引码与序列的位置顺序信息一一对应,具体的编码规则不受限制。再者,本领域技术人员可以理解,为各序列片段添加索引编号,只要索引编号与分区接头相邻,索引编号添加的具***置不受限制。如在序列片段的5’端添加索引编号后,从序列的5’到3’端形成“分区接头-索引编号-存储有数据的序列片段-分区接头”、“通用接头-索引编号-存储有数据的序列片段-分区接头”或着“分区接头-索引编号-存储有数据的序列片段-通用接头”;又如在序列片段的3’端添加索引编号后,从序列的5’到3’端形成“分区接头-存储有数据的序列片段-索引编号-分区接头”、“分区接头-存储有数据的序列片段-索引编号-通用接头”或者“通用接头-存储有数据的序列片段-索引编号-分区接头”。
在一个具体实施方式中,所述分区接头的长度为18nt,所述索引编号序列的长度为5nt-10nt,优选为6nt。
在一个具体实施方式中,根据存储数据时所使用的编码规则来确定待编辑的序列片段所在的分区n。当所存储的数据需要编辑时,如原始数据自身出现错误需要更正,通过存储数据时所用的编码规则,如二进制编码规则,Huffman编码规则、喷泉码编码规则、异或编码规则、或Grass编码规则等,按照编码规则寻找到错误数据所在的分区n。
在另一个具体实施方式中,通过对步骤(3)所合成的核酸序列片段进行测序并进行序列比对来确定待编辑的序列片段所在的分区n。
在一个具体实施方式中,在步骤(5)中,使用多重PCR扩增所述序列片段。在本发明中,多重PCR可由本领域技术人员根据现有技术知识来进行。多重PCR过程可以是包括但不限于Touch up,Touch down等形式的PCR,所使用的聚合酶可以是包括但不限于Taq,Phusion,Q5,Vent,KlenTaq等不同种类的酶或者它们以不同配比的组合。
本领域技术人员可以理解,步骤(5)所述的分区引物库中的引物序列与本发明第一方面所述的分区接头序列是至少部分互补的,所述分区引物库包括分别与第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的分区接头序列至少部分互补的引物。
经过步骤(5)所述扩增之后,除了第n分区的序列片段之外的所有分区的序列片段均被扩增,从而获得包含第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的序列片段的文库。第n分区中的序列没有经过指数扩增,因此其拷贝数远远小于其他分区经指数扩增的正确序列。
本领域技术人员可以理解,通过多重PCR扩增,可以达到稀释第n分区的序列片段的目的。在本申请中,所述稀释是指经过指数扩增来增加目的片段拷贝数,从而使得未经指数扩增的非目的片段在终产物中的占比显著下降,以此达到稀释的目的。例如,对第n分区以外的所有序列片段进行指数扩增30个循环,理论上序列被扩增了10 9倍,而第n分区序列片段由于存在通用接头,其会进行线性扩增,即理论上会被扩增32768倍(10 5)。因此,在最终的扩增产物中,第n分区的序列片段的占比显著下降。
接下来,可根据相应的编码规则,对第n分区中待编辑的序列片段中的错误序列进行重新编码,获得正确的序列,并按照正确的序列合成第n分区中的所有序列片段,然后将其与包含第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的序列片段的文库混合,从而获得序列正确的文库。
任选地,可以将所述文库中的各序列片段连接到载体中,或者将所述文库中的各序列片段进行组装。
任选地,可以将所述的序列正确的文库、连接有序列片段的载体、或者组装后的序列片段存储在介质中,其中所述介质包括但不限于液相,干粉,活细胞等。
本发明的方法利用“索引分区”的方式定位需要编辑的核酸序列,能够对存储过程中发生的错误数据进行低成本的修正。相较于现有的DNA存储方法,该方法大大 降低了在存储信息发生错误时的修正成本,同时,大大提高了现有DNA存储***的容错率。
在第二方面,本发明提供了一种解码方法,包括对使用本发明第一方面所述的方法获得的文库进行测序,获得各序列片段;根据各序列片段的索引编号,获取所述各序列片段的位置顺序信息;根据所述位置顺序信息,将所述各序列片段拼接为存储有数据的核酸序列。
可选地,将获取的存储有数据的核酸序列转码为相应的二进制代码,然后再将所述二进制代码转码为相应的数据信息。
在具体实施方式中,通过第二编码规则,将获取的存储有数据的核酸序列转码为相应的二进制代码,然后再通过第一编码规则,将所述二进制代码转码为相应的数据信息,其中,所述第一编码规则与第二编码规则如本发明第一方面所定义。
在第三方面,本发明提供了一种定点编辑存储有数据的核酸序列的装置,包括:序列拆分和分区模块,其被配置为将存储有数据的核酸序列拆分为多个序列片段,并将所有的序列片段划分成i个分区,i为正整数;分区接头添加模块,其被配置为在每个分区中的序列片段的一端或两端添加分区接头,其中每个分区的分区接头的序列彼此不同;核酸合成模块,其被配置为将添加了分区接头的序列片段合成为核酸片段;定位模块,其被配置为确定待编辑的序列片段所在的分区n,记为第n分区;扩增模块,其被配置为使用分区引物库扩增除了第n分区的序列片段之外的所有分区的序列片段,其中所述分区引物库包括分别与第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的分区接头序列至少部分互补的引物,从而获得包含第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的序列片段的文库;和更正模块,其被配置为更正第n分区中待编辑的序列片段中的错误序列,得到正确的序列后,按照正确的序列合成第n分区中的所有序列片段,并将其加入到扩增模块所获得的文库中,从而获得序列正确的文库。
可选地,所述装置进一步包括索引编号添加模块,其被配置为在添加了分区接头的序列片段上添加索引编号,其中所述索引编号与所述分区接头相邻。
所述序列片段长度和每个分区所包含的序列片段数目如本发明第一方面所定义。
所述分区接头和索引编号如本发明第一方面所定义。
可选地,所述装置还包括组装模块,其被配置为组装所述文库中的各序列片段。
可选地,所述装置还包括载体连接模块,其被配置为将所述文库中的各序列片段 连接到载体中。
可选地,所述装置还包括介质存储模块,其被配置为将所述文库中的各序列片段存储在介质中,或者将连接有序列片段的载体存储在介质中,或者将组装后的序列片段存储在介质中;其中,所述介质包括但不限于液相,干粉,活细胞等。
在第四方面,本发明提供了一种解码装置,包括:测序模块,其被配置为对使用本发明第一方面所述的方法获得的文库进行测序,获得各序列片段;位置信息获取模块,其被配置为根据各序列片段的索引编号,获取所述各序列片段的位置顺序信息;拼接模块,其被配置为根据所述位置顺序信息,将所述各序列片段拼接为存储有数据的核酸序列。
可选地,所述解码装置还包括转码模块,其被配置为将存储有数据的核酸序列转码为相应的二进制代码,然后再将所述二进制代码转码为相应的数据信息。
在具体实施方式中,所述转码模块通过第二编码规则,将获取的存储有数据的核酸序列转码为相应的二进制代码,然后再通过第一编码规则,将所述二进制代码转码为相应的数据信息,其中,所述第一编码规则与第二编码规则如本发明第一方面所定义。
在第五方面,本发明提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如下方法中的至少一种:根据本发明第一方面所述的定点编辑存储有数据的核酸序列的方法,根据本发明第二方面所述的解码方法。
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1示出DNA存储的流程图。
图2示出根据本公开一些实施例的经拆分后的序列片段示意图。
图3示出根据本公开一些实施例的DNA存储序列定点编辑过程的流程图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
实施例1.定点编辑存储有数据的核酸序列
原始文件:莎士比亚十四行诗两首(英文)
模拟情景:在合成完DNA序列后发现存储的原文件出错,需要对合成完的序列进行修改、添加的操作。
实验流程:
1.在计算机终端通过Church simple code[Next-Generation Digital Information Storage in DNA George M.Church,Yuan Gao and Sriram Kosuri(August 16,2012)Science 337(6102),1628.[doi:10.1126/science.1226355]]结合Reed Solomon纠错码对错版原始文件进行DNA编码,得到176条序列,其中第11行原文为“Like feeble age”,错版文件为“Like feeble old man”,第17行原文为“Lord of my love”,错版文件为“Lord of my”。
2.编码后,将所有序列分为8个分区,并通过对每条序列添加索引编号和分区接头(共8个,A-H),并加上通用接头ATGGTCAGATCGTGCATC,得到长度为114的DNA序列176条,每个分区包含DNA序列22条。分区A为序列1至22,其中每条序列5’端添加上述通用接头,3’端添加分区A的分区接头;分区B为序列23至44, 其中每条序列5’端添加上述通用接头,3’端添加分区B的分区接头;……分区H为序列155至176,其中每条序列5’端添加上述通用接头,3’端添加分区H的分区接头。分区A至H的分区接头的序列彼此各不相同,长度均为18nt。
每条序列的结构从5’至3’依次为:通用接头-待存储信息序列-索引编号-分区接头。
3.合成由步骤2得到的176条序列。
4.对比序列后发现,第11行需要修改的内容在分区C中,为第58条序列,其错版序列为:
Figure PCTCN2018123858-appb-000001
其中单下划线为通用接头序列,双下划线为分区C的分区接头序列,加框的序列为索引编号区。
5.在引物库中加入与分区接头A,B,D,E,F,G,H以及与通用接头序列互补的引物,进行多重PCR,从而分区A,B,D,E,F,G,H中的154条序列均被扩增。
其中,多重PCR采用touchdown PCR,使用
Figure PCTCN2018123858-appb-000002
Reaction Buffer Pack试剂盒,两种酶比例为Q5:Ex Taq=8:1。反应程序为:98℃,5min;25个循环,每个循环降温0.2℃(98℃,20s;55.2℃-60℃,30s;72℃,10s);72℃,5min;12℃,保持。
6.通过步骤5多重PCR扩增稀释,获得仅含有分区A,B,D,E,F,G,H的Oligo库。
7.通过重新编码分区C的信息,获得分区C的新的22条序列,其中更正后的第58条序列为(分区C的其余21条序列没有变化):
Figure PCTCN2018123858-appb-000003
其中单下划线为通用接头序列,双下划线为分区C的分区接头序列,加框的序列为索引编号区。
同时,对第17行需要添加的内容进行设计,原索引编号区为AGCCTA,新加入序列两条,索引编号区为A-AGCCTA,T-AGCCTA,新添加的序列89-A和89-B分别为:
序列89-A:
Figure PCTCN2018123858-appb-000004
Figure PCTCN2018123858-appb-000005
序列89-B:
Figure PCTCN2018123858-appb-000006
其中单下划线为通用接头序列,双下划线为分区接头序列,加框的序列为索引编号区。
8.将步骤7中新合成的序列和步骤6获得的Oligo库进行混合,得到新的混合库。
9.将步骤8得到的新Oligo库进行Sanger测序。
10.测序结果返回电脑进行解码,获得正确的原始文件。
11.将步骤8得到的新Oligo库冻成干粉,-20℃保存。
实施例2:解码
将实施例1中经编辑的正确的Oligo库进行测序,测序后的序列组A去除长度18nt的两端(分别为通用接头和分区接头),得到序列组A’。首先读取索引编号信息,将索引编号解码,得到大小不一的数字。
然后将序列组A’按照索引规则,按照从小到大的顺序重新排列,之后去除索引编号得到序列组A”。
根据实施例1所使用的编码规则,将序列组A”的核酸序列转码为相应的二进制代码,将所有序列的二进制代码按照之前的索引顺序连接起来,再根据计算机语言读取二进制代码即可恢复原始文件。

Claims (17)

  1. 一种定点编辑存储有数据的核酸序列的方法,包括以下步骤:
    (1)将存储有数据的核酸序列拆分成多个序列片段,将所有的序列片段划分成i个分区,i为正整数;
    (2)在每个分区中的序列片段的一端或两端添加分区接头,其中每个分区的分区接头序列彼此不同;
    (3)合成步骤(2)所述的每个分区中的序列片段为核酸片段;
    (4)确定待编辑的序列片段所在的分区n,记为第n分区;
    (5)使用分区引物库扩增除了第n分区的序列片段之外的所有分区的序列片段,其中所述分区引物库包括分别与第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的分区接头序列至少部分互补的引物,从而获得包含第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的序列片段的文库;和
    (6)更正第n分区中待编辑的序列片段中的错误序列,得到正确的序列后,按照正确的序列合成第n分区中的所有序列片段,并将其加入到步骤(5)所述的文库中,从而获得序列正确的文库。
  2. 根据权利要求1所述的方法,其特征在于进一步包括以下一项或多项:
    (a)在步骤(1)中,所述数据为文字信息、图像信息、或声音信息;
    (b)在步骤(1)之前,通过第一编码规则将所述数据编码为二进制数据,优选地所述第一编码规则为二进制编码规则;和/或
    通过第二编码规则将二进制数据编码为核酸序列,从而获得存储有数据的核酸序列,优选地所述第二编码规则为Huffman编码规则、喷泉码编码规则、异或编码规则、或Grass编码规则;
    (c)在步骤(1)中,将存储有数据的核酸序列拆分成多个长度不超过200nt的序列片段,其中每个片段的长度相同。
  3. 根据权利要求1或2所述的方法,其中在步骤(2)中,按照以下任一项规则在每个分区中的序列片段的一端或两端添加分区接头:
    在第1分区中的每个序列片段的一端或两端添加分区接头A1,在第2分区中的每个序列片段的一端或两端添加分区接头A2,……,在第i分区中的所有序列片段的一 端或两端添加分区接头Ai,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt;
    在第1分区中的每个序列片段的5’端添加分区接头A1,在第1分区中的每个序列片段的3’端添加分区接头A1’,在第2分区中的每个序列片段的5’端添加分区接头A2,在第2分区中的每个序列片段的3’端添加分区接头A2’,……,在第i分区中的每个序列片段的5’端添加分区接头Ai,在第i分区中的每个序列片段的3’端添加分区接头Ai’,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt;
    在每个分区的序列片段的5’端添加通用接头A,在第1分区中的每个序列片段的3’端添加分区接头A1,在第2分区中的每个序列片段的3’端添加分区接头A2,……,在第i分区中的每个序列片段的3’端添加分区接头Ai,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt;或
    在每个分区的序列片段的3’端添加通用接头A,在第1分区中的每个序列片段的5’端添加分区接头A1,在第2分区中的每个序列片段的5’端添加分区接头A2,……,在第i分区中的每个序列片段的5’端添加分区接头Ai,其中所述各分区接头序列彼此不同但长度相同,优选为16-20nt。
  4. 根据权利要求1-3任一项所述的方法,其中将步骤(6)所述文库中的序列片段存储在介质中,或者将步骤(6)所述文库中的序列片段连接到载体中,并将所述载体存储在介质中,或者将步骤(6)所述文库中的序列片段进行组装,并将组装后的序列片段存储在介质中,
    优选地,所述介质选自液相,干粉,活细胞,或其组合。
  5. 根据权利要求1-4任一项所述的方法,其中在步骤(2)获得添加了分区接头的序列片段后,为所述序列片段添加索引编号,其中所述索引编号与所述分区接头相邻。
  6. 根据权利要求1-5任一项所述的方法,其中所述分区接头的长度为18nt,所述索引编号序列的长度为5nt-10nt,优选为6nt。
  7. 根据权利要求1-6任一项所述的方法,其中通过以下方式来确定待编辑的序列片段所在的分区n:
    根据存储数据时所使用的编码规则来确定所述待编辑的序列片段所在的分区n,或者通过对步骤(3)所合成的核酸序列片段进行测序并进行序列比对来确定所述待编辑的序列片段所在的分区n。
  8. 根据权利要求1-7任一项所述的方法,其中在步骤(5)中使用多重PCR扩增所述序列片段,
    优选地,所述多重PCR为Touch up,或Touch down PCR,
    优选地,所使用的聚合酶选自Taq,Phusion,Q5,Vent,KlenTaq,或其组合。
  9. 一种解码方法,包括对使用权利要求1-8任一项所述的方法获得的文库进行测序,获得各序列片段;根据各序列片段的索引编号,获取所述各序列片段的位置顺序信息;根据所述位置顺序信息,将所述各序列片段拼接为存储有数据的核酸序列,
    可选地,将获取的存储有数据的核酸序列转码为相应的二进制代码,然后再将所述二进制代码转码为相应的数据信息。
  10. 一种定点编辑存储有数据的核酸序列的装置,包括:序列拆分和分区模块,其被配置为将存储有数据的核酸序列拆分为多个序列片段,并将所有的序列片段划分成i个分区,i为正整数;分区接头添加模块,其被配置为在每个分区中的序列片段的一端或两端添加分区接头,其中每个分区的分区接头的序列彼此不同;核酸合成模块,其被配置为将添加了分区接头的序列片段合成为核酸片段;定位模块,其被配置为确定待编辑的序列片段所在的分区n,记为第n分区;扩增模块,其被配置为使用分区引物库扩增除了第n分区的序列片段之外的所有分区的序列片段,其中所述分区引物库包括分别与第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的分区接头序列至少部分互补的引物,从而获得包含第1分区,第2分区,……,第n-1分区,第n+1分区,……,以及第i分区的序列片段的文库;和更正模块,其被配置为更正第n分区中待编辑的序列片段中的错误序列,得到正确的序列后,按照正确的序列合成第n分区中的所有序列片段,并将其加入到扩增模块所获得的文库中,从而获得序列正确的文库,
    可选地,所述装置进一步包括索引编号添加模块,其被配置为在添加了分区接头的序列片段上添加索引编号,其中所述索引编号与所述分区接头相邻。
  11. 根据权利要求10所述的装置,其中所述分区接头和索引编号如权利要求3或6所定义。
  12. 根据权利要求10或11所述的装置,还包括组装模块,其被配置为组装所述文库中的各序列片段。
  13. 根据权利要求10-12任一项所述的装置,还包括载体连接模块,其被配置为将所述文库中的各序列片段连接到载体中。
  14. 根据权利要求10-13任一项所述的装置,还包括介质存储模块,其被配置为将所述文库中的各序列片段存储在介质中,或者将连接有序列片段的载体存储在介质中,或者将组装后的序列片段存储在介质中,
    优选地,所述介质选自液相,干粉,活细胞,或其组合。
  15. 一种解码装置,包括:
    测序模块,其被配置为对使用权利要求1-8任一项所述的方法获得的文库进行测序,获得各序列片段;位置信息获取模块,其被配置为根据各序列片段的索引编号,获取所述各序列片段的位置顺序信息;拼接模块,其被配置为根据所述位置顺序信息,将所述各序列片段拼接为存储有数据的核酸序列。
  16. 根据权利要求15所述的解码装置,还包括转码模块,其被配置为将存储有数据的核酸序列转码为相应的二进制代码,然后再将所述二进制代码转码为相应的数据信息。
  17. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-9任一项所述的方法。
PCT/CN2018/123858 2018-12-26 2018-12-26 一种定点编辑存储有数据的核酸序列的方法及装置 WO2020132935A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2018/123858 WO2020132935A1 (zh) 2018-12-26 2018-12-26 一种定点编辑存储有数据的核酸序列的方法及装置
EP18944810.3A EP3904527A4 (en) 2018-12-26 2018-12-26 METHOD AND APPARATUS FOR FIXED-POINT EDITING OF A DATA-STORED NUCLEOTIDE SEQUENCE
CN201880100481.XA CN113228193B (zh) 2018-12-26 2018-12-26 一种定点编辑存储有数据的核酸序列的方法及装置
US17/417,702 US20220064705A1 (en) 2018-12-26 2018-12-26 Method and device for fixed-point editing of nucleotide sequence with stored data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/123858 WO2020132935A1 (zh) 2018-12-26 2018-12-26 一种定点编辑存储有数据的核酸序列的方法及装置

Publications (1)

Publication Number Publication Date
WO2020132935A1 true WO2020132935A1 (zh) 2020-07-02

Family

ID=71125899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123858 WO2020132935A1 (zh) 2018-12-26 2018-12-26 一种定点编辑存储有数据的核酸序列的方法及装置

Country Status (4)

Country Link
US (1) US20220064705A1 (zh)
EP (1) EP3904527A4 (zh)
CN (1) CN113228193B (zh)
WO (1) WO2020132935A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758703A (zh) * 2022-06-14 2022-07-15 深圳先进技术研究院 基于重组质粒dna分子的数据信息存储方法
CN114958828A (zh) * 2022-06-14 2022-08-30 深圳先进技术研究院 基于dna分子介质的数据信息存储方法
WO2022203958A1 (en) * 2021-03-24 2022-09-29 Catalog Technologies, Inc. Fixed point number representation and computation circuits
US11535842B2 (en) 2019-10-11 2022-12-27 Catalog Technologies, Inc. Nucleic acid security and authentication
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
WO2023108616A1 (zh) * 2021-12-17 2023-06-22 深圳华大生命科学研究院 利用dna进行信息存储的方法和***
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850760A (zh) * 2015-03-27 2015-08-19 苏州泓迅生物科技有限公司 带有编码信息的人工合成dna存储介质及信息的存储读取方法和应用
WO2017083177A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Error correction for nucleotide data stores
CN106845158A (zh) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 一种利用dna进行信息存储的方法
WO2018094115A1 (en) * 2016-11-16 2018-05-24 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
WO2018132457A1 (en) * 2017-01-10 2018-07-19 Roswell Biotechnologies, Inc. Methods and systems for dna data storage
WO2018148257A1 (en) * 2017-02-13 2018-08-16 Thomson Licensing Apparatus, method and system for digital information storage in deoxyribonucleic acid (dna)
CN108875312A (zh) * 2012-07-19 2018-11-23 哈佛大学校长及研究员协会 利用核酸存储信息的方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682226B (zh) * 2012-04-18 2015-09-30 盛司潼 一种核酸测序信息处理***及方法
EP3346404A1 (en) * 2012-06-01 2018-07-11 European Molecular Biology Laboratory High-capacity storage of digital information in dna
CN105022935A (zh) * 2014-04-22 2015-11-04 中国科学院青岛生物能源与过程研究所 一种利用dna进行信息存储的编码方法和解码方法
CN104293941B (zh) * 2014-09-30 2017-01-11 天津华大基因科技有限公司 构建测序文库的方法及其应用
US10838939B2 (en) * 2016-10-28 2020-11-17 Integrated Dna Technologies, Inc. DNA data storage using reusable nucleic acids
US10650312B2 (en) * 2016-11-16 2020-05-12 Catalog Technologies, Inc. Nucleic acid-based data storage
SG11201907713WA (en) * 2017-02-22 2019-09-27 Twist Bioscience Corp Nucleic acid based data storage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875312A (zh) * 2012-07-19 2018-11-23 哈佛大学校长及研究员协会 利用核酸存储信息的方法
CN104850760A (zh) * 2015-03-27 2015-08-19 苏州泓迅生物科技有限公司 带有编码信息的人工合成dna存储介质及信息的存储读取方法和应用
WO2017083177A1 (en) * 2015-11-13 2017-05-18 Microsoft Technology Licensing, Llc Error correction for nucleotide data stores
WO2018094115A1 (en) * 2016-11-16 2018-05-24 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
WO2018132457A1 (en) * 2017-01-10 2018-07-19 Roswell Biotechnologies, Inc. Methods and systems for dna data storage
WO2018148257A1 (en) * 2017-02-13 2018-08-16 Thomson Licensing Apparatus, method and system for digital information storage in deoxyribonucleic acid (dna)
CN106845158A (zh) * 2017-02-17 2017-06-13 苏州泓迅生物科技股份有限公司 一种利用dna进行信息存储的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GEORGE M. CHURCHYUAN GAOSRIRAM KOSURI, SCIENCE, vol. 337, no. 6102, 16 August 2012 (2012-08-16), pages 1628
See also references of EP3904527A4

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11763169B2 (en) 2016-11-16 2023-09-19 Catalog Technologies, Inc. Systems for nucleic acid-based data storage
US11610651B2 (en) 2019-05-09 2023-03-21 Catalog Technologies, Inc. Data structures and operations for searching, computing, and indexing in DNA-based data storage
US11535842B2 (en) 2019-10-11 2022-12-27 Catalog Technologies, Inc. Nucleic acid security and authentication
WO2022203958A1 (en) * 2021-03-24 2022-09-29 Catalog Technologies, Inc. Fixed point number representation and computation circuits
WO2023108616A1 (zh) * 2021-12-17 2023-06-22 深圳华大生命科学研究院 利用dna进行信息存储的方法和***
CN114758703A (zh) * 2022-06-14 2022-07-15 深圳先进技术研究院 基于重组质粒dna分子的数据信息存储方法
CN114958828A (zh) * 2022-06-14 2022-08-30 深圳先进技术研究院 基于dna分子介质的数据信息存储方法
CN114758703B (zh) * 2022-06-14 2022-09-13 深圳先进技术研究院 基于重组质粒dna分子的数据信息存储方法
CN114958828B (zh) * 2022-06-14 2024-04-19 深圳先进技术研究院 基于dna分子介质的数据信息存储方法

Also Published As

Publication number Publication date
EP3904527A4 (en) 2022-08-10
CN113228193B (zh) 2023-06-09
US20220064705A1 (en) 2022-03-03
CN113228193A (zh) 2021-08-06
EP3904527A1 (en) 2021-11-03

Similar Documents

Publication Publication Date Title
WO2020132935A1 (zh) 一种定点编辑存储有数据的核酸序列的方法及装置
CN112382340B (zh) 用于dna数据存储的编解码方法和编解码装置
Chen et al. An artificial chromosome for data storage
AU2018247323B2 (en) High-Capacity Storage of Digital Information in DNA
Ping et al. Carbon-based archiving: current progress and future prospects of DNA-based data storage
CN110945595B (zh) 基于dna的数据存储和检索
Lartigue et al. Creating bacterial strains from genomes that have been cloned and engineered in yeast
CN111858510B (zh) Dna活字存储***和方法
CN114958828B (zh) 基于dna分子介质的数据信息存储方法
Wang et al. Oligo design with single primer binding site for high capacity DNA-based data storage
Zaragoza-Solas et al. Metagenome mining reveals hidden genomic diversity of pelagimyophages in aquatic environments
Zhang et al. A high storage density strategy for digital information based on synthetic DNA
Ezekannagha et al. Design considerations for advancing data storage with synthetic DNA for long-term archiving
WO2023240952A1 (zh) 基于重组质粒dna分子的数据信息存储方法
Garafutdinov et al. Encoding of non-biological information for its long-term storage in DNA
Wang et al. Mainstream encoding–decoding methods of DNA data storage
Yachie et al. Stabilizing synthetic data in the DNA of living organisms
Lau et al. Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing
WO2022109879A1 (zh) 用于dna数据存储的二进制信息到碱基序列的编解码方法和编解码装置
US20200321079A1 (en) Encoding/decoding method, encoder/decoder, storage method and device
Wei et al. Dna storage: A promising large scale archival storage?
Lee et al. DNA data storage in Perl
US20230032409A1 (en) Method for Information Encoding and Decoding, and Method for Information Storage and Interpretation
CN115249509A (zh) 基于脱氧核糖核酸技术的数据编码方法及解码方法
Brenner et al. Complete mitochondrial genome sequence of the Gulf Coast tick (Amblyomma maculatum)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944810

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018944810

Country of ref document: EP

Effective date: 20210726