CN117955629A

CN117955629A - Encryption encoding method and device in DNA storage, electronic equipment and storage medium

Info

Publication number: CN117955629A
Application number: CN202311738756.1A
Authority: CN
Inventors: 袁涛; 曲强
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2023-12-15
Filing date: 2023-12-15
Publication date: 2024-04-30

Abstract

The application provides an encryption encoding method and device in DNA storage, electronic equipment and a storage medium, and relates to the technical field of DNA storage. The encryption coding method comprises the following steps: acquiring data to be stored, and a first random number sequence and a second random number sequence generated by a hyper-chaos pseudo-random sequence generator; encrypting the data to be stored by using the first random number sequence to obtain source data; performing DNA Raptor coding on the source data, and introducing the second random number sequence into the DNA Raptor coding process to serve as a random number seed, so as to obtain at least one piece of coded data; based on DNA base mapping rules, each piece of coding data is respectively converted into a corresponding DNA sequence, and a target DNA sequence is generated by the DNA sequence corresponding to each piece of coding data. The application solves the problem that the data security is not considered in the DNA storage process in the related technology.

Description

Encryption encoding method and device in DNA storage, electronic equipment and storage medium

Technical Field

The application relates to the technical field of DNA storage, in particular to an encryption coding method, an encryption coding device, electronic equipment and a storage medium in DNA storage.

Background

DNA is a promising storage medium with high density, long durability and low maintenance costs compared to conventional storage media. Theoretically, all information of the human history can be stored in a space of approximately the size of a double garage. These characteristics make DNA an ideal choice for information storage and are expected to provide large-scale practical applications in the future, so research into data storage with DNA is becoming a research hotspot in the current computer and biological intersection field. The general flow of DNA storage is divided into six steps of coding, synthesizing, storing, searching, sequencing and decoding. In order to meet the requirements of DNA storage, a large number of coding methods are proposed from the viewpoints of cost and related biochemical technology. Mainstream coding methods can be divided into two categories: the first class maps codes based on fixed rules, e.g., DNA cryptography; the second class is based on screening operation codes, e.g., DNA fountain codes.

However, both of these types of mainstream coding have major limitations: for example, the DNA fountain code is suitable for the technical field of DNA storage, but the problem of data security is not considered, and the data is completely transparent in the encoding and decoding process, so that privacy is easy to reveal. In theory, the data can be encrypted firstly by a conventional encryption method and then encoded, so that the encoding process is more complex, the security is completely dependent on the conventional encryption algorithm, and the decoding is easier; the study of the DNA cryptography in the aspect of encryption method is only to encrypt by using DNA base calculation rules and mapping rules, and DNA sequences meeting the requirement of DNA storage with high information density and specific constraint conditions cannot be directly generated.

It follows that how to ensure data security during DNA storage remains to be resolved.

Disclosure of Invention

In order to solve the technical problems, embodiments of the present application provide an encryption encoding method, an encryption encoding device, an electronic device, and a storage medium in DNA storage. The technical scheme is as follows:

According to one aspect of the present application, a method of encryption encoding in DNA storage, the method comprising: acquiring data to be stored, and a first random number sequence and a second random number sequence generated by a hyper-chaos pseudo-random sequence generator; encrypting the data to be stored by using the first random number sequence to obtain source data; performing DNA Raptor coding on the source data, and introducing the second random number sequence into DNARaptor coding process to serve as a random number seed to obtain at least one coded data; based on DNA base mapping rules, each piece of coding data is respectively converted into a corresponding DNA sequence, and a target DNA sequence is generated by the DNA sequence corresponding to each piece of coding data.

According to one aspect of the present application, an encryption encoding apparatus in a DNA store, the apparatus comprising: the data acquisition module is used for acquiring data to be stored and a first random number sequence and a second random number sequence generated by the hyper-chaos pseudo-random sequence generator; the data encryption module is used for encrypting the data to be stored by using the first random number sequence to obtain source data; the DNA encoding module is used for carrying out DNARaptor encoding on the source data, introducing the second random number sequence into a DNA Raptor encoding process to be used as a random number seed, and obtaining at least one encoding data; the DNA mapping module is used for respectively converting each coded data into corresponding DNA sequences based on DNA base mapping rules and generating target DNA sequences from the DNA sequences corresponding to each coded data

According to one aspect of the application, an electronic device comprises at least one processor and at least one memory, wherein the memory has computer readable instructions stored thereon; the computer readable instructions are executed by one or more of the processors to cause an electronic device to implement the method of encoding encryption in DNA storage as described above.

According to one aspect of the application, a storage medium has stored thereon computer readable instructions that are executed by one or more processors to implement the method of encoding encryption in DNA storage as described above.

According to one aspect of the application, a computer program product includes computer readable instructions stored in a storage medium, one or more processors of an electronic device reading the computer readable instructions from the storage medium, loading and executing the computer readable instructions, causing the electronic device to implement a method of encoding encryption in DNA storage as described above.

The technical scheme provided by the application has the beneficial effects that:

In the technical scheme, firstly, a first random number sequence and a second random number sequence are generated by a hyperchaotic pseudo-random sequence generator, then, on one hand, the first random number sequence is utilized to encrypt data to be stored to obtain source data, on the other hand, DNARaptor codes which are introduced into the second random number sequence as random number seeds are utilized to encode the source data DNARaptor to obtain at least one encoded data, and finally, the DNA sequences obtained by converting all the encoded data according to a DNA base mapping rule generate a target DNA sequence, thereby, the requirements of encoding and encryption in DNA storage are integrated, and a hyperchaotic pseudo-random sequence generator in a chaotic system and DNARaptor codes in a DNA fountain code are combined in a DNA encryption encoding scheme to integrate a data encryption process into a DNA encoding process, so that the data safety in the DNA storage process is fully ensured, and the problem that the data safety is not considered in the DNA storage process in the related technology can be effectively solved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the application and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a hardware block diagram of an electronic device shown in accordance with an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of encryption encoding in a DNA store, according to an exemplary embodiment;

FIG. 3 is a schematic diagram illustrating a pseudo-random sequence generation process according to an example embodiment;

FIG. 4 is a schematic diagram illustrating encryption using a first random sequence of numbers, according to an example embodiment;

FIG. 5a is a schematic diagram of a constraint matrix shown according to an exemplary embodiment;

FIG. 5b is a schematic diagram of DNARaptor codes shown according to an example embodiment;

FIG. 6 is a schematic diagram illustrating DNA mapping and screening according to an exemplary embodiment;

FIG. 7 is a schematic diagram of an implementation of an encryption encoding method in a DNA storage in an application scenario;

FIG. 7a is a schematic diagram of data to be stored related to the application scenario shown in FIG. 7;

FIG. 7b is a schematic diagram of a target DNA sequence related to the application scenario shown in FIG. 7;

FIG. 8 is a block diagram illustrating a structure of an encryption encoding apparatus in a DNA storage according to an exemplary embodiment;

fig. 9 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

The following is an introduction and explanation of several terms involved in the present application:

DNA storage: DNA storage is a data storage method that uses DNA (deoxyribonucleic acid) molecules as a medium to store digital data. DNA is a molecule that stores genetic information in organisms, has excellent information density and long-term stability, and is therefore considered to be a potentially efficient, high-capacity, long-lasting data storage medium. The basic idea of DNA storage is to encode digital data into DNA sequences, which are then synthesized by synthetic chemistry and stored in test tubes or other containers. When the data is required to be retrieved, the DNA sequence can be read out by DNA sequencing technology, decoded and restored into digital data.

DNA cryptography: DNA cryptography refers to the field of information encryption and storage using the characteristics and molecular structure of DNA. The related research encrypts data physically or logically by biochemical technology, DNA encoding/decoding, base calculation rules and other methods, and finally synthesizes specific DNA sequence.

Chaotic system: chaotic systems refer to dynamic systems that are deterministic, aperiodic, and extremely sensitive depending on the initial conditions. Such a system exhibits seemingly disordered, complex and unpredictable behavior even though its law of motion is described by a simple nonlinear equation or rule. In practical application, the chaotic system is also used for data encryption, random number generation, information hiding and other aspects, and chaotic cryptography based on the chaotic system becomes a new branch in the field of cryptography.

Fountain code: fountain codes are error correction coding techniques derived from research in the field of communications, and aim to meet the amount of information required by a receiving end by generating an unlimited number of coded symbols at the transmitting end, without knowing the length of transmitted data in advance. The method has the characteristics of no need of fixed length, randomness and high-efficiency error correction capability.

The research in the current DNA storage coding field focuses on a coding method for realizing high information density (the number of binary data bits represented by each nucleotide) under specified constraint conditions (based on biochemical technologies such as DNA synthesis, storage and sequencing), and has less information security attention to the DNA storage process; while conventional DNA cryptography does not take steps to handle sequence errors generated by DNA storage processes (i.e., errors in stored data), it may result in incomplete or incorrect decrypted data.

Therefore, the encryption encoding method in the DNA storage provided by the application can effectively improve the data security in the DNA storage, and is correspondingly applicable to the encryption encoding device in the DNA storage, wherein the encryption encoding device in the DNA storage can be deployed in electronic equipment, and the electronic equipment can be computer equipment with a von Neumann architecture, for example, the computer equipment comprises a desktop computer, a notebook computer, a server and the like.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Fig. 1 shows a schematic structure of an electronic device according to an exemplary embodiment.

It should be noted that the electronic device is only an example adapted to the present application, and should not be construed as providing any limitation on the scope of use of the present application. Nor should the electronic device be construed as necessarily relying on or necessarily having one or more of the components of the exemplary electronic device 200 shown in fig. 1.

The hardware structure of the electronic device 200 may vary widely depending on the configuration or performance, as shown in fig. 1, the electronic device 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processing unit (CPU, central Processing Units) 270.

Specifically, the power supply 210 is configured to provide an operating voltage for each hardware device on the electronic device 200.

The interface 230 includes at least one wired or wireless network interface 231 for interacting with external devices. Of course, in other examples of the adaptation of the present application, the interface 230 may further include at least one serial-parallel conversion interface 233, at least one input-output interface 235, at least one USB interface 237, and the like, as shown in fig. 1, which is not particularly limited herein.

The memory 250 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, where the resources stored include an operating system 251, application programs 253, and data 255, and the storage mode may be transient storage or permanent storage.

The operating system 251 is used for managing and controlling various hardware devices and applications 253 on the electronic device 200, so as to implement the operation and processing of the cpu 270 on the mass data 255 in the memory 250, which may be Windows server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

The application 253 is based on computer readable instructions on the operating system 251 to perform at least one specific task, which may include at least one module (not shown in fig. 1), each of which may respectively contain computer readable instructions for the electronic device 200. For example, the encryption encoding device in the DNA store may be considered as an application 253 deployed on the electronic device 200.

The data 255 may be a photograph, a picture, or the like stored in a disk, or may be a target file, a DNA sequence, or the like, and stored in the memory 250.

The central processor 270 may include one or more of the above processors and is configured to communicate with the memory 250 via at least one communication bus to read computer readable instructions stored in the memory 250, thereby implementing operations and processing of the bulk data 255 in the memory 250. The encryption encoding method in the DNA store is accomplished, for example, by the central processor 270 reading a series of computer readable instructions stored in the memory 250.

Furthermore, the present application can be realized by hardware circuitry or by a combination of hardware circuitry and software, and thus, the implementation of the present application is not limited to any specific hardware circuitry, software, or combination of the two.

Referring to fig. 2, an embodiment of the present application provides an encryption encoding method in DNA storage, which is applicable to an electronic device, and a hardware structure of the electronic device is shown in fig. 1.

In the following method embodiments, for convenience of description, the execution subject of each step of the method is described as an electronic device, but this configuration is not particularly limited.

As shown in fig. 2, the method may include the steps of:

step 310, obtaining data to be stored, a first random number sequence and a second random number sequence generated by a hyper-chaotic pseudorandom sequence generator.

The data to be stored may be any type of data, such as text, pictures, audio, video, etc., and is not limited herein.

The first random number sequence is used for encrypting the data to be stored, and the second random number sequence is used for encoding the data to be stored. In some embodiments, the first random number sequence and the second random number sequence are generated by a hyperchaotic pseudorandom sequence generator.

In some embodiments, the hyperchaotic pseudo-random sequence generator is derived using the operations of a five-dimensional hyperchaotic system and the accompanying perturbation, floating point rounding, majority processing, variable exclusive or, and other additional operations. The calculation formula of the five-dimensional hyper-chaotic system is as follows:

Where a, b, c, d, e, h, r, m, k, p and q are constant parameters and x, y, z, w and u are state variables.

FIG. 3 illustrates a schematic diagram of constructing a hyperchaotic pseudorandom sequence generator using a five-dimensional hyperchaotic system in one embodiment. In fig. 3, the inputs of the hyperchaotic pseudorandom sequence generator comprise: state variables x, y, z, w, u, the sequence length n of the first random number sequence/the second random number sequence, and Output includes: a first random number sequence/a second random number sequence, the first random number sequence +.

The second random number sequence contains n keys.

As shown in fig. 3, the process of generating the first random number sequence/the second random number sequence by the hyper-chaotic pseudorandom sequence generator may include the steps of: the first step, initializing initial parameters a, b, c, d, e, h, r, m, k, p and q configured for a hyperchaotic pseudo-random sequence generator; and step two, respectively inputting the first key and the second key into a hyper-chaos pseudo-random sequence generator to perform pseudo-random sequence generation operation, so as to obtain a corresponding first random number sequence and a corresponding second random number sequence. The second step specifically comprises: when the sequence length of the first random number sequence and the second random number sequence does not reach n, the fixed interval is disturbed, the appointed times are calculated according to a calculation formula of the hyper-chaotic system, at least one group of state variables with appointed number are selected to generate key values and added into the first random number sequence and the second random number sequence until the sequence length of the first random number sequence and the second random number sequence reaches n. Further, the process of generating key values for each set of a specified number of state variables specifically includes: converting the real number into a fixed-length integer sequence, performing exclusive OR on the five integer parts to obtain a key value, and adding the key value into the first random number sequence and the second random number sequence.

And 330, encrypting the data to be stored by using the first random number sequence to obtain the source data.

In some embodiments, the encryption process may include the steps of: acquiring a global hash value of data to be stored, and preprocessing the data to be stored by using the global hash value to obtain a plurality of coding blocks; and carrying out logic operation on the first random data sequence and each coding block to obtain source data.

FIG. 4 is a schematic diagram illustrating encryption using a first random sequence, in FIG. 4, on one hand, a hash operation is performed on data to be stored (e.g., a target file) using an SHA256 algorithm to obtain a global hash value, and the global hash value is stored in a first encoding block, where the first encoding block is identified as 0, that is, the first encoding block is encoding block 0; on the other hand, the data to be stored is partitioned to obtain a plurality of coding blocks, and the identifiers of the coding blocks are respectively 1-n, namely, the coding blocks are respectively 1,2, … … and n. After the n+1 code blocks are obtained, performing encryption traversal processing on the n+1 code blocks, wherein the encryption traversal processing includes: performing logic operation on the coding block 0 and the coding block 1, and storing a logic operation result into the coding block 1; performing logic operation on the coding block 1 and the coding block 2, and storing a logic operation result into the coding block 2; and by analogy, carrying out logic operation on the coding block n-1 and the coding block n, and storing a logic operation result into the coding block n; and traversing until the nth coding block (namely the coding block n) is completed, and obtaining n+1 coding blocks. Based on the n+1 coding blocks and the first random number sequence, source data can be obtained through logic operation, and encryption of data to be stored is realized.

It should be noted that, the logic operation in this embodiment refers to exclusive or, and of course, in other embodiments, the logic operation may further include, but is not limited to: and, or, not, nor, or the like, the present embodiment is not particularly limited to this configuration.

And 350, performing DNA Raptor coding on the source data, and introducing DNARaptor a second random number sequence into the coding process to serve as reference table data, so as to obtain at least one coded data.

In some embodiments, DNARaptor encoding includes constraint matrix based precoding and LT encoding. Wherein the precoding utilizes a constraint matrix to encode the source data into a plurality of intermediate symbols; LT encoding is used to encode the plurality of intermediate symbols into a plurality of encoded data.

Specifically, DNARaptor encoding process may include the following steps: expanding the source data, and pre-encoding the expanded source data by utilizing a constraint matrix to obtain a plurality of intermediate symbols; LT coding is carried out on each intermediate symbol, so that a plurality of coded data are obtained; each coded data corresponds to each intermediate symbol one-to-one. It should be noted that, here, the extension refers to dividing the source data into a plurality of source symbols, taking K source symbols as a source block, adding (s+h) zero elements before K source symbols in each source block to form an extended source symbol, and finally obtaining the extended source data from the plurality of extended source symbols.

Fig. 5a shows a schematic representation of a constraint matrix in one embodiment, which constraint matrix is constituted by G _LDPC、G_Half、I_S、I_H、Z、G_LT in fig. 5 a. Wherein G _LDPC is an sxk vitamin matrix of the LDPC symbol; g _Half is the H× (K+S) vitamins matrix of the Half symbol (Gray code); i _S is an S×S dimension identity matrix; i _H is H×H dimension identity matrix; z is an S X H dimensional zero matrix; g _LT is a kxl vitamin matrix of LT code symbols. In some embodiments, the G _LT matrix in the constraint matrix, and the degree values and random numbers in the LT encoding process are each generated using a random number generator that uses the second random number sequence as reference table data.

Fig. 5b shows a schematic diagram of a DNA Raptor coding process in an embodiment, as shown in fig. 5b, first, the extended source data D is precoded by using a constraint matrix a, that is, the extended source data D is multiplied by an inverse matrix of the constraint matrix a to obtain L intermediate symbols C, and then, the L intermediate symbols are LT-coded to obtain n coded data, which specifically includes: a first step of randomly selecting an integer value d from a range of 1 to n using a degree distribution function as a degree value of LT coding; generating d random numbers in the range from 1 to n, selecting intermediate symbols corresponding to the random numbers from the n intermediate symbols, and performing an XOR operation on the selected intermediate symbols to generate encoded data; third, repeating the first and second steps until n pieces of encoded data are generated. In the DNARaptor coding process, a random generator (namely a random function R (x)) is introduced, and the random basis tables V0 and V1 of the random generator are formed by inputting a Key Key1 into a second random number sequence generated by the hyperchaotic pseudo-random sequence generator.

Step 370, based on the DNA base mapping rule, each coded data is converted into a corresponding DNA sequence, and a target DNA sequence is generated from the DNA sequence corresponding to each coded data.

After a plurality of encoded data are obtained, each encoded data can be mapped into a corresponding DNA sequence according to a given mapping rule.

In some embodiments, the given mapping rule comprises a DNA base mapping rule that substantially reflects the correspondence between different binary data and different DNA bases.

Specifically, as shown in fig. 6, the mapping process based on the DNA base mapping rule may include the steps of: determining a DNA base mapping rule corresponding to each piece of coded data aiming at n pieces of coded data, and generating a corresponding DNA sequence from each piece of coded data according to the determined DNA base mapping rule; taking the identification of the coded data as a random number seed, and acquiring an error correction code generated for the coded data; according to the determined DNA base mapping rule, respectively converting the random number seeds and the error correction codes into corresponding DNA bases, and respectively adding the converted DNA bases into DNA sequences corresponding to the coding data. It is to be noted that the DNA base corresponding to the random number seed may be added before the DNA sequence, and the DNA base corresponding to the error correction code may be added after the DNA sequence, which is not limited herein.

In this way, the data error correction capability is improved by increasing the length of the DNA sequence, so that the DNA sequence has certain redundancy, thereby having certain error correction capability, and the stored information can be restored into the original file under certain error rate.

After the DNA sequences corresponding to the coding data are obtained, the DNA sequences are also screened according to given constraint conditions, and finally the stored target DNA sequences are obtained.

In some embodiments, the given constraint includes a biological constraint including, but not limited to: homopolymers, GC content, palindromic sequences, and the like.

Specifically, with continued reference to fig. 6, the biological constraint-based screening process may include the steps of: performing biological constraint detection on the DNA sequences corresponding to the coded data; regenerating the coded data corresponding to the DNA sequence which does not accord with the biological constraint through LT codes in the DNA Raptor codes until the DNA sequence corresponding to the regenerated coded data accords with the biological constraint; the target DNA sequence is generated from all n DNA sequences that meet biological constraints.

Through the process, the demands of encoding and encryption in DNA storage are integrated, and a hyperchaotic pseudo-random sequence generator in a chaotic system and DNARaptor codes in a DNA fountain code are combined in a DNA encryption encoding scheme so as to integrate the data encryption process into the DNA encoding process, thereby fully guaranteeing the data safety in the DNA storage process, and further effectively solving the problem that the data safety is not considered in the DNA storage process in the related technology.

In addition, the method can meet various custom constraint conditions while guaranteeing the high information density of DNA storage, and solve the problem of a small number of sequence errors generated in the DNA storage process.

Fig. 7 is a schematic diagram of an implementation of an encryption encoding method in DNA storage in an application scenario. Such application scenarios include, but are not limited to: semiconductor application scenarios, biological application scenarios, etc.

In this application scenario, the flow of the steps of encryption encoding in DNA storage is divided into four parts: the method comprises the following specific steps of hyper-chaos pseudo-random sequence generation, file preprocessing, DNARaptor coding and mapping screening:

The first step, hyper-chaos pseudo-random sequence generation, which comprises the following steps:

(1) And determining initial parameters of the hyperchaotic system according to the Key Key1 (a group of state variables), performing primary hyperchaotic pseudo-random sequence generation operation to obtain 512 pieces of 4-byte unsigned integer, and taking the 512 pieces of the initial parameters as data of reference tables V0 and V1 (256 pieces of 4-byte unsigned integer respectively) of a random number generator Rand [ X, i, m ] in a Raptor code. Rand [ X, i, m ] is used to generate random numbers for LT encoding processes, defined as follows:

Where X represents the input value, i is an index variable, and m is the modulus applied to the result. The values of V0 and V1 are combined with a bitwise XOR operation and then modulo operation is applied to obtain the final result, i.e., an integer between 0 and m-1.

(2) The random number sequence used for subsequent encryption is generated by the same pseudo-random sequence generator according to the Key Key 2.

The second step, the file preprocessing, the steps are as follows:

(1) And (5) blocking. The target file is segmented, each block is 30 bytes in size and recorded as a coding block, each block is numbered by taking the block as a unit, each block has a corresponding unique id number, the size is 4 bytes (the record number of the last 2 bytes), and the id number does not participate in encryption. And meanwhile, calculating a global hash value of the whole file through the SHA256, taking 30 bytes of the whole file as an initial hash key, and storing the initial hash key into a first coding block, wherein the target file generates a partition from a second coding block. The maximum number of the single block is 20000, the number of the blocks is recorded in the first 2 bytes of the id number, and the id is formed by the number of the blocks and the number of the blocks.

(2) Encryption. For each code block, exclusive-or with the hash key and taking the result as a new hash key. And generating 20 random numbers through the constructed hyper-chaos pseudo-random sequence generator, and performing exclusive-OR on the random numbers and the hash exclusive-OR coded block data to finish encryption.

Thirdly, DNARaptor coding, wherein the steps are as follows:

(1) Precoding generates intermediate symbols. The number of blocks K is counted and the whole coded blocks form a source data matrix D, by means of which intermediate symbols C are generated, as shown in fig. 5a, the G _LT part of which is generated by means of a random number generator Rand X, i, m. Taking 10KB of data as an example, let k=342, look up table to get corresponding s=31, h=10, then l=383 (the parameters corresponding to different K are recorded in the specific table). 383 intermediate symbols are finally generated by the inverse matrix multiplication of the source data matrix D with matrix a.

(2) LT coding. The intermediate symbols are encoded by LT encoding to generate encoded data. The LT encoded metric and the random number are generated by a random number generator Rand [ X, i, m ], with each intermediate symbol corresponding to a row of encoded data. LT encoding requires that encoded symbols greater than the number of source symbols (i.e., the number of chunks K) be generated, so the lowest redundancy is selected, ultimately resulting in 350 rows of encoded data. By setting redundancy, a corresponding amount of encoded data can be additionally generated.

Fourth, mapping and screening, which comprises the following steps:

(1) Mapping. And determining a mapping rule from 1bytet binary data to DNA bases according to the result of modulo 8 of each coded data id, and obtaining a 4nt DNA sequence. The DNA mapping rules are shown in Table 1. Each row of encoded data generated 120nt of DNA sequence. In addition, a random number seed of 4 bytes (i.e., block id) is added before the sequence, a 2byte RS error correction code generated by a 30byte encoded block is added after the sequence, and the 6byte data is converted into DNA bases according to a mapping rule. The sequence 144ntDNA was finally obtained.

TABLE 1DNA mapping rules

(2) And (5) screening. Bio-restriction assays, i.e., homopolymer and GC content assays, were performed on each DNA sequence. And (3) according to the detection result, reserving the passing DNA sequence, removing the DNA sequence which does not meet the constraint, and restarting the steps from the LT coding position until all the coding data are converted into qualified DNA sequences.

It is worth mentioning that the DNA decryption decoding means that the DNA sequence is restored to the target file, the whole process corresponds to the encryption encoding process one by one, namely, the DNA sequence is converted to the original target file through three steps of reflection screening, DNA Raptor decoding and file recovery, each step is reversible, the used random number is generated by the hyperchaotic pseudo-random sequence generator by adopting the same secret key, and therefore, the decryption decoding is guaranteed to be correct and complete.

Based on the above process, the encryption coding method in the DNA storage provided by the application can encrypt and code any type of data, and finally convert the data into a DNA sequence for DNA storage, wherein the redundancy (the number of additional DNA chains under the condition of 100% decoding) and the constraint rule of the DNA sequence are customizable, and the simulation experiment analysis is performed under the conditions of 20% redundancy and meeting two common biological constraint conditions (the GC content is 40% -60% and the homopolymer length is less than 4).

Taking 10KB of text as an example of data to be stored as shown in FIG. 7a, the target DNA sequence obtained by encryption encoding in DNA storage by using the encryption encoding method of the application is shown in FIG. 7 b. The 10KB data to be stored corresponds to 342 blocks, the theory corresponds to 342 DNA chains, and the length of each DNA chain is 144nt. The Raptor code characteristic requirement is slightly larger than 342 DNA chains to be successfully decoded, so that the initial coding result is set to be 350 DNA chains, namely, the DNA chains are infinitely generated and selected until the condition that the successful decoding can be achieved and the number of the DNA chains is 350 is met, the 350 DNA chains are taken as the initial result, and the redundancy of 0.26 and the information capacity of 1.59bits/nt (the theoretical upper limit is 2 bits/nt) can be achieved. In addition, the data error correction capability can be improved by increasing the number of the DNA chains, so that 20% of redundancy is selected, 20% of DNA chains are additionally generated, and 411 DNA chains are finally generated, so that the redundancy of 0.512 and the information capacity of 1.32bits/nt are achieved.

In the application scene, aiming at the encryption effect, conventional encryption algorithm performance analysis such as key space analysis, key sensitivity analysis, correlation analysis, information entropy analysis, ciphertext change rate analysis, randomness analysis and the like are performed, so that the encryption effect of the invention is proved to ensure the data security; aiming at error correction performance, the success rate of decoding with different error rates (based on sequence loss and base errors) under different redundancies is tested, and good error correction performance is reflected.

The following is an embodiment of the apparatus of the present application that may be used to perform the encryption encoding method in the DNA storage according to the present application. For details not disclosed in the device embodiments of the present application, please refer to a method embodiment of the encryption encoding method in the DNA storage according to the present application.

Referring to fig. 8, in an embodiment of the present application, an encryption encoding apparatus 900 in DNA storage is provided, including but not limited to: a data acquisition module 910, a data encryption module 930, a DNA encoding module 950, and a DNA mapping module 950.

The data acquisition module 910 is configured to acquire data to be stored, and a first random number sequence and a second random number sequence generated by the hyperchaotic pseudorandom sequence generator.

The data encryption module 930 is configured to encrypt data to be stored using the first random number sequence to obtain source data.

The DNA encoding module 950 is configured to perform DNA Raptor encoding on the source data, and introduce DNARaptor the second random number sequence into the encoding process as reference table data, so as to obtain at least one encoded data.

The DNA mapping module 970 is used for converting each coded data into a corresponding DNA sequence based on a DNA base mapping rule, and generating a target DNA sequence from the DNA sequence corresponding to each coded data.

In an exemplary embodiment, the apparatus 900 further includes: and a random number sequence generation module.

The random number sequence generation module is used for determining initial parameters configured for the hyper-chaos pseudo-random sequence generator; initializing a hyper-chaos pseudo-random sequence generator, and respectively inputting a first key and a second key into the hyper-chaos pseudo-random sequence generator to perform pseudo-random sequence generation operation to obtain a corresponding first random number sequence and a corresponding second random number sequence.

In an exemplary embodiment, the data encryption module 930 is further configured to obtain a global hash value of the data to be stored, and perform preprocessing on the data to be stored by using the global hash value to obtain a plurality of encoding blocks; and carrying out logic operation on the first random data sequence and each coding block to obtain source data.

In an exemplary embodiment, the data encryption module 930 is further configured to perform a hash operation on the data to be stored by using an SHA256 algorithm to obtain a global hash value, and store the global hash value to the first encoding block; the identity of the first coded block is 0; partitioning data to be stored to obtain a plurality of coding blocks; the identification of each coding block is 1-n respectively; and (3) performing encryption traversal processing on the n+1 coding blocks, wherein the encryption traversal processing comprises the following steps: performing logic operation on the previous coding block and the current coding block, and storing a logic operation result into the current coding block; and traversing until the nth coding block is completed, and obtaining n+1 coding blocks.

In an exemplary embodiment, the data encoding module 950 is further configured to expand the source data, and precode the expanded source data with a constraint matrix to obtain a plurality of intermediate symbols; LT coding is carried out on each intermediate symbol, so that a plurality of coded data are obtained; each coded data corresponds to each intermediate symbol one by one; wherein, the G _LT matrix in the constraint matrix, the degree value and the random number in the LT coding process are generated by a random number generator taking the second random number sequence as reference table data.

In an exemplary embodiment, the DNA encoding module 970 is further configured to determine a DNA base mapping rule corresponding to the encoded data for each encoded data, and generate a corresponding DNA sequence from the encoded data according to the determined DNA base mapping rule; taking the identification of the coded data as a random number seed, and acquiring an error correction code generated for the coded data; according to the determined DNA base mapping rule, respectively converting the random number seeds and the error correction codes into corresponding DNA bases, and respectively adding the converted DNA bases into DNA sequences corresponding to the coding data.

In an exemplary embodiment, the DNA encoding module 970 is further configured to perform bio-restriction detection on DNA sequences corresponding to each encoded data; regenerating the coded data corresponding to the DNA sequence which does not accord with the biological constraint through DNA Raptor coding until the DNA sequence corresponding to the regenerated coded data accords with the biological constraint; the target DNA sequence is generated from all DNA sequences that meet the biological constraints.

It should be noted that, when the encryption encoding device in DNA storage provided in the foregoing embodiment performs encryption encoding in DNA storage, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the encryption encoding device in DNA storage will be divided into different functional modules to complete all or part of the functions described above.

In addition, the encryption encoding device in the DNA storage provided in the above embodiment belongs to the same concept as the embodiment of the encryption encoding method in the DNA storage, and the specific manner in which each module performs the operation has been described in detail in the method embodiment, which is not described herein.

Referring to fig. 9, in an embodiment of the present application, an electronic device 4000 is provided, and the electronic device 400 may include: desktop computers, notebook computers, servers, etc.

In fig. 9, the electronic device 4000 includes at least one processor 4001 and at least one memory 4003.

Among other things, data interaction between the processor 4001 and the memory 4003 may be achieved by at least one communication bus 4002. The communication bus 4002 may include a path for transferring data between the processor 4001 and the memory 4003. The communication bus 4002 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus or the like. The communication bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor, data signal Processor), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmable GATE ARRAY ) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program instructions or code in the form of instructions or data structures and that can be accessed by electronic device 400.

The memory 4003 has computer readable instructions stored thereon, and the processor 4001 can read the computer readable instructions stored in the memory 4003 through the communication bus 4002.

The computer readable instructions are executed by the one or more processors 4001 to implement the method of encoding encryption in DNA storage in the embodiments described above.

Further, in an embodiment of the present application, there is provided a storage medium having stored thereon computer readable instructions that are executed by one or more processors to implement the encryption encoding method in DNA storage as described above.

In an embodiment of the present application, a computer program product is provided, where the computer program product includes computer readable instructions stored in a storage medium, and one or more processors of an electronic device read the computer readable instructions from the storage medium, load and execute the computer readable instructions, so that the electronic device implements the encryption encoding method in DNA storage as described above.

Compared with the existing fountain code coding technology in DNA storage, the application integrates encryption operation in the coding process on the premise of ensuring that the coding result meets the DNA storage requirement, provides data encryption effect, improves data security and retains the advantages of the fountain code in the field of DNA storage coding. Meanwhile, a new hyper-chaos pseudo-random sequence generator is constructed. A new hyper-five-dimensional chaotic system with good chaotic characteristics is constructed, a series of data processing operations are added on the basis of the new hyper-five-dimensional chaotic system, a corresponding hyper-chaotic pseudorandom sequence generator is formed, and random numbers with strong randomness are generated.

In addition, compared with the encryption method in DNA cryptography, the method is not limited to image encryption, the type of data to be stored is not required to be considered, the size of the data to be stored is not required to be considered, the length of the generated DNA sequence can be selected by self, the constraint condition can be set by self, the redundancy can be determined by self, the application range is wider, the method can be adapted to any current DNA storage system, the DNA sequence with any length and redundancy is generated, any constraint condition is met, and the approximate encryption effect is achieved. Compared with the existing encryption method, the DNA sequence information generated by the method has higher density, and can meet more constraint conditions.

Finally, aiming at the data errors generated by the current technology in the DNA storage process, the application has certain error correction capability and can restore the storage information to the original file under certain error rate.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims

1. A method of encryption encoding in DNA storage, the method comprising:

acquiring data to be stored, and a first random number sequence and a second random number sequence generated by a hyper-chaos pseudo-random sequence generator;

Encrypting the data to be stored by using the first random number sequence to obtain source data;

performing DNA Raptor coding on the source data, and introducing the second random number sequence into the DNA Raptor coding process to serve as a random number seed, so as to obtain at least one piece of coded data;

Based on DNA base mapping rules, each piece of coding data is respectively converted into a corresponding DNA sequence, and a target DNA sequence is generated by the DNA sequence corresponding to each piece of coding data.

2. The method of claim 1, wherein prior to obtaining the first random number sequence and the second random number sequence generated by the hyperchaotic pseudorandom sequence generator, the method further comprises:

Initializing initial parameters configured for the hyper-chaotic pseudorandom sequence generator;

and respectively inputting a first key and a second key into the hyper-chaos pseudo-random sequence generator to perform pseudo-random sequence generation operation, so as to obtain the corresponding first random number sequence and second random number sequence.

3. The method of claim 1, wherein encrypting the data to be stored using the first random number sequence to obtain source data comprises:

Acquiring a global hash value of the data to be stored, and preprocessing the data to be stored by utilizing the global hash value to obtain a plurality of coding blocks;

And carrying out logic operation on the first random data sequence and each coding block to obtain the source data.

4. The method of claim 3, wherein the obtaining the global hash value of the data to be stored and the preprocessing the data to be stored using the global hash value to obtain the plurality of encoded blocks comprises:

carrying out hash operation on the data to be stored by adopting an SHA256 algorithm to obtain the global hash value, and storing the global hash value into a first coding block; the identity of the first of said encoded blocks is 0;

partitioning the data to be stored to obtain a plurality of coding blocks; the identification of each coding block is respectively 1-n;

Performing encryption traversal processing on n+1 coding blocks, wherein the encryption traversal processing comprises: performing logic operation on the previous coding block and the current coding block, and storing a logic operation result into the current coding block;

And traversing until the nth coding block is completed, and obtaining n+1 coding blocks.

5. The method of claim 1, wherein said DNA Raptor encoding said source data and introducing said second random number sequence into the DNA Raptor encoding process as a random number seed to obtain at least one encoded data, comprising:

Expanding the source data, and pre-encoding the expanded source data by utilizing a constraint matrix to obtain a plurality of intermediate symbols;

Performing LT coding on each intermediate symbol to obtain a plurality of coded data; each coded data corresponds to each intermediate symbol one by one;

The G _LT matrix in the constraint matrix, the degree value and the random number in the LT coding process are generated by a random number generator taking the second random number sequence as reference table data.

6. The method of claim 1, wherein said converting each of said encoded data into a corresponding DNA sequence based on DNA base mapping rules, respectively, comprises:

Determining a DNA base mapping rule corresponding to each piece of coded data, and generating a corresponding DNA sequence from the coded data according to the determined DNA base mapping rule;

taking the identification of the coded data as a random number seed, and acquiring an error correction code generated for the coded data;

And respectively converting the random number seeds and the error correction codes into corresponding DNA bases according to the determined DNA base mapping rules, and respectively adding the converted DNA bases into DNA sequences corresponding to the coding data.

7. The method of any one of claims 1 to 6, wherein generating a target DNA sequence from the DNA sequence corresponding to each of the encoded data comprises:

performing biological constraint detection on the DNA sequences corresponding to the coding data;

Regenerating the coding data corresponding to the DNA sequence which does not accord with the biological constraint through DNA Raptor coding until the regenerated DNA sequence corresponding to the coding data accords with the biological constraint;

the target DNA sequence is generated from all DNA sequences that meet biological constraints.

8. An encryption encoding device in DNA storage, the device comprising:

The data acquisition module is used for acquiring data to be stored and a first random number sequence and a second random number sequence generated by the hyper-chaos pseudo-random sequence generator;

the data encryption module is used for encrypting the data to be stored by using the first random number sequence to obtain source data;

The DNA coding module is used for carrying out DNA Raptor coding on the source data, and introducing the second random number sequence into the DNA Raptor coding process to be used as reference table data so as to obtain at least one piece of coding data;

and the DNA mapping module is used for respectively converting each coded data into corresponding DNA sequences based on DNA base mapping rules and generating target DNA sequences from the DNA sequences corresponding to each coded data.

9. An electronic device, comprising: at least one processor, and at least one memory, wherein,

The memory has computer readable instructions stored thereon;

The computer readable instructions are executed by one or more of the processors to cause an electronic device to implement the method of encoding encryption in a DNA store of any one of claims 1 to 7.

10. A storage medium having stored thereon computer readable instructions, wherein the computer readable instructions are executed by one or more processors to implement the method of encoding in DNA storage of any one of claims 1 to 7.