CN110647585A - Data deployment system with automatic screening and backup functions - Google Patents

Data deployment system with automatic screening and backup functions Download PDF

Info

Publication number
CN110647585A
CN110647585A CN201910906104.1A CN201910906104A CN110647585A CN 110647585 A CN110647585 A CN 110647585A CN 201910906104 A CN201910906104 A CN 201910906104A CN 110647585 A CN110647585 A CN 110647585A
Authority
CN
China
Prior art keywords
data
unit
backup
screening
deployment system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910906104.1A
Other languages
Chinese (zh)
Inventor
宋仪轩
阚苏立
谢可辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Healthcare Big Data Protection And Development Co Ltd
Original Assignee
Jiangsu Healthcare Big Data Protection And Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Healthcare Big Data Protection And Development Co Ltd filed Critical Jiangsu Healthcare Big Data Protection And Development Co Ltd
Priority to CN201910906104.1A priority Critical patent/CN110647585A/en
Publication of CN110647585A publication Critical patent/CN110647585A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data deployment, in particular to a data deployment system with automatic screening and backup functions. The system comprises a data screening unit and a data backup unit, wherein the data screening unit and the data backup unit realize data interaction through the Internet. In the data deployment system with the automatic screening and backup functions, the data screening unit is arranged, the environment suitable for rapidly screening the massive initial detectors is constructed through reasonable redundancy and distribution of the initial detectors and the self bodies according to the requirements of an initial detector screening algorithm on the characteristics of the massive initial detectors and the limited self bodies in a large data environment, the data backup unit is arranged, a data channel acquires data from a client and caches the data in the cloud storage unit, processed data are stored in a virtualized storage, data cloud backup is achieved, and data loss is prevented.

Description

Data deployment system with automatic screening and backup functions
Technical Field
The invention relates to the technical field of data deployment, in particular to a data deployment system with automatic screening and backup functions.
Background
With the advent of the big data era, the data information amount is increasingly huge, so that the data deployment efficiency is low, and meanwhile, the data contains a large amount of information, so that once the data deployment system is damaged, the data is difficult to retrieve, and serious loss is caused.
Disclosure of Invention
The present invention is directed to a data deployment system with automatic screening and backup functions to address one or more of the deficiencies set forth in the background above.
In order to achieve the above object, the present invention provides a data deployment system with automatic screening and backup functions, which includes a data screening unit and a data backup unit, wherein the data screening unit and the data backup unit implement data interaction through the internet, the data screening unit is used for screening data, and the flow of the data screening unit is as follows:
s11, storing the self bodies with limited quantity into the memory of the storage and calculation node;
s12, according to the matching rule, matching and checking the detectors in the storage and computing node mass initial detector subset;
s13, judging that the initial detectors in the massive initial detector subset can become candidate maturity detectors;
and S14, sending the candidate maturity detector and the maximum matching value with the self body to the optimization node.
Preferably, in S12, the BMH2CKMP algorithm is used as the matching rule, and the steps are as follows:
firstly, modifying self-body data to be positive matching based on a BMH2C algorithm;
preprocessing to obtain a next array from the pattern string;
and thirdly, when the patterns are matched, if the characters are mismatched, judging whether the jump value is positive or negative, selecting jump in the positive direction to obtain the maximum displacement, searching a next array in the negative direction, and moving the position of j to force i not to backtrack.
Preferably, the next array is defined as follows:
Figure BDA0002213308220000021
that is, when next [ j ] ═ k > 0, P [0 … k-1] ═ P [ j-k, j-1] is represented.
Preferably, the process for judging the candidate maturity detector is as follows:
s21, saving all storage and calculation nodes and sending the storage and calculation nodes to a candidate maturity detector of the optimization node;
s22, sequencing mass candidate mature detectors from small to large according to the maximum matching degree and constructing a set;
s23, circularly taking out the detector of the candidate maturity;
s24, judging whether the number of detectors reaches the set value of the system, if so, circulating to S23, and if not, entering the next step;
and S25, taking the first candidate mature detector in the set and putting the first candidate mature detector in the optimized set.
Preferably, the data backup unit comprises a cloud storage unit and a client unit, and the flow of the cloud storage unit is as follows:
s31, the data channel obtains data from the client and caches the data in the cloud storage unit;
s32, carrying out correlation processing on the data;
and S33, storing the processed data in a virtualized storage.
Preferably, in S32, the data correlating process includes a data compressing unit, a data encrypting unit and a data de-duplicating unit.
Preferably, the data compression unit adopts an LZHJ algorithm, and the flow of the LZHJ algorithm is as follows:
(xi) in the Forward buffer, for a string in the Forward buffer, it is marked X1,X2,…,XN
② recording the current matching character string as Y1,Y2,…,YKWherein Y isKThe last character in the sliding compression window;
thirdly, recording the current maximum matching length as N>m, and X1=Y1,X2=Y2
Fourthly, to X1,X2,…,XNAnd Y1,Y2,…,YKSequential comparison is carried out, and the obtained matching length is recorded as Lengthamax { i | Xi=Yi,2≤i≤min(N,K)}。
Preferably, the data encryption unit adopts RSA algorithm, and the algorithm steps are as follows:
selecting two different large prime numbers p and q at will, and calculating a product r as p q;
randomly selecting a large integer e, wherein the e is relatively prime with (p-1) q-1, and the integer e is used as an encryption key;
determining a decryption key d;
the integers r and e are disclosed, but d is not disclosed.
Preferably, the data de-duplication unit is classified based on de-duplication granularity, and the steps are as follows:
deleting repeated data of the full document level;
secondly, eliminating redundant file blocks;
and thirdly, byte level redundancy elimination.
Preferably, the client unit comprises an application interface module and an operation request module, wherein the application interface module is used for providing relevant application programs for clients needing data backup and recovery, and the operation request module is used for sending data backup or recovery requests to the clients through the application programs.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the data deployment system with the automatic screening and backup functions, the data screening unit is arranged, and an environment suitable for rapidly screening massive initial detectors is constructed according to the requirements of an initial detector screening algorithm and the reasonable redundancy and distribution of the initial detectors and self bodies for the characteristics of massive initial detectors and limited self bodies in a large data environment.
2. In the data deployment system with the automatic screening and backup functions, a data backup unit is arranged, a data channel acquires data from a client and caches the data in a cloud storage unit, processed data are stored in a virtualized storage, data cloud backup is achieved, and data loss is prevented.
3. In the data deployment system with the automatic screening and backup functions, data are processed through the data compression unit, the data encryption unit and the repeated data deletion unit, so that redundant information is reduced, and data processing is accelerated.
Drawings
FIG. 1 is a block diagram of the overall structure of the present invention;
FIG. 2 is a flow chart of a data screening unit of the present invention;
FIG. 3 is a flow chart of a candidate maturity detector of the present invention;
FIG. 4 is a flow chart of the BMH2CKMP algorithm of the present invention;
FIG. 5 is a block diagram of a data backup unit according to the present invention;
FIG. 6 is a flow chart of a cloud storage unit of the present invention;
FIG. 7 is a block diagram of data processing modules of the present invention;
FIG. 8 is a flow chart of the LZHJ algorithm of the present invention;
FIG. 9 is a flow chart of the RSA algorithm of the present invention;
FIG. 10 is a flow chart of the heavy granularity classification of the present invention;
fig. 11 is a block diagram of a client unit of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The invention provides a data deployment system with automatic screening and backup functions, as shown in fig. 1-4, comprising a data screening unit and a data backup unit, wherein the data screening unit and the data backup unit realize data interaction through the internet, the data screening unit is used for screening data, and the flow of the data screening unit is as follows:
s11, storing the self bodies with limited quantity into the memory of the storage and calculation node;
s12, according to the matching rule, matching and checking the detectors in the storage and computing node mass initial detector subset;
s13, judging that the initial detectors in the massive initial detector subset can become candidate maturity detectors;
and S14, sending the candidate maturity detector and the maximum matching value with the self body to the optimization node.
In this embodiment, the data screening unit is based on a Map/Reduce model, and based on the characteristic that distributed concurrent execution is performed in each storage and computation node, a partition monitoring strategy for the massive initial detectors is designed on the basis that the massive initial detector subsets are stored in each storage and computation node.
Specifically, in S12, the BMH2CKMP algorithm is used as the matching rule, and the steps are as follows:
firstly, modifying self-body data to be positive matching based on a BMH2C algorithm;
preprocessing to obtain a next array from the pattern string;
and thirdly, when the patterns are matched, if the characters are mismatched, judging whether the jump value is positive or negative, selecting jump in the positive direction to obtain the maximum displacement, searching a next array in the negative direction, and moving the position of j to force i not to backtrack.
The next array is defined as follows:
Figure BDA0002213308220000051
that is, when next [ j ] ═ k > 0, P [0 … k-1] ═ P [ j-k, j-1] is represented.
Still further, the specific steps of sending the candidate maturity detector and the maximum matching value with the self body to the optimization node are shown as the following algorithm:
Partition_Selection(detector_subset,self_set)
{
while (Detector _ subset also has an unchecked initial detector)
{
Taking out the initial detector which is not checked;
setting the value of the maximum matching degree max _ match to be 0;
setting a flag that indicates whether the detector can be a candidate maturity detector to be 1;
while (self _ set also has unchecked autologous)
{
Checking the matching degree m of the initial detector and the self body by using a matching rule;
if (the matching degree M is less than the threshold M set by the system)
If(m>max_.match)
max_match=m;
else
The detector cannot become a candidate maturity detector, setting flag to 0
}
If(flag==1)
Outputting the candidate maturity detector and the maximum matching degree max _ match with the self body to an optimization node;
}
}
the algorithm is used for circularly taking out an undetected initial detector and carrying out initial setting on the initial detector, wherein the maximum matching degree of the initial detector is 0, and the initial flag bit 1 of a candidate maturity detector is set; then circularly matching with the undetected self body in the system, if the matching degree is smaller than a threshold value set by the system and is larger than the maximum matching degree, setting the candidate maturity mark of the initial detector to be 1, otherwise, setting the non-maturity mark to be 0; after the detector is determined to be a candidate mature detector, outputting the candidate mature detector and the maximum matching degree max _ match between the detector and the self body to an optimization node, and carrying out next optimization; the calculation amount of the algorithm is mainly dependent on the number of initial detectors in the large data system, and the calculation amount of the algorithm is in a linear increasing trend along with the increase of the number of the initial detectors.
It is worth noting that the process of determining the candidate maturity detector is as follows:
s21, saving all storage and calculation nodes and sending the storage and calculation nodes to a candidate maturity detector of the optimization node;
s22, sequencing mass candidate mature detectors from small to large according to the maximum matching degree and constructing a set;
s23, circularly taking out the detector of the candidate maturity;
s24, judging whether the number of detectors reaches the set value of the system, if so, circulating to S23, and if not, entering the next step;
and S25, taking the first candidate mature detector in the set and putting the first candidate mature detector in the optimized set.
Example 2
As a second embodiment of the present invention, in order to implement data backup, the present invention further improves a data backup unit, as a preferred embodiment, as shown in fig. 5 to 11, the data backup unit includes a cloud storage unit and a client unit, and the flow of the cloud storage unit is as follows:
s31, the data channel obtains data from the client and caches the data in the cloud storage unit;
s32, carrying out correlation processing on the data;
and S33, storing the processed data in a virtualized storage.
In S32, the data correlation processing unit includes a data compression unit, a data encryption unit, and a data de-duplication unit.
In the embodiment, the cloud storage unit is developed based on a Hadoop network, the HDFS is a distributed file system specially designed for cheap hardware, bottom support is provided for data storage in a distributed computing mode, data fault tolerance is built in a software layer, high throughput is provided for accessing data of an application program, and the method can be applied to creation and development of a cloud storage system.
Further, the data compression unit adopts an LZHJ algorithm, and the flow of the LZHJ algorithm is as follows:
(xi) in the Forward buffer, for a string in the Forward buffer, it is marked X1,X2,…,XN
② recording the current matching character string as Y1,Y2,…,YKWherein Y isKCompressing the last in the window for slidingA character;
thirdly, recording the current maximum matching length as N>m, and X1=Y1,X2=Y2
Fourthly, to X1,X2,…,XNAnd Y1,Y2,…,YKSequential comparison is carried out, and the obtained matching length is recorded as Lengthamax { i | Xi=Yi,2≤i≤min(N,K)}。
Further, the data encryption unit adopts an RSA algorithm, which comprises the following steps:
selecting two different large prime numbers p and q at will, and calculating a product r as p q;
randomly selecting a large integer e, wherein e is relatively prime to (p-1) x (q-1), the integer e is used as an encryption key, and the selection of e is easy, for example, all prime numbers larger than p and q can be used;
determining a decryption key d, d: d can be calculated according to e, p and q;
the integers r and e are disclosed, but d is not disclosed.
Encrypting a plaintext P (assuming that P is an integer less than r) into a ciphertext C by:
C=Pemodulor。
and decrypting the ciphertext C into a plaintext P, wherein the calculation method comprises the following steps:
P=Cdmodulor。
however, it is not possible to calculate d from r and e alone (not p and q). Thus, anyone can encrypt the plaintext, but only the authorized user (knowing d) can decrypt the ciphertext.
It should be noted that the deduplication unit is classified based on deduplication granularity, and the steps are as follows:
deleting repeated data of the full document level;
secondly, eliminating redundant file blocks;
and thirdly, byte level redundancy elimination.
Wherein, the data de-duplication of the full file hierarchy: and detecting and deleting repeated data by taking the whole file as a unit, calculating the hash value of the whole file, and searching whether the same file exists in the storage system according to the hash value of the file. The method has the advantages that the calculation speed is very high under the common hardware condition;
file block redundancy elimination: dividing a file into data blocks in different modes, and detecting by taking the data blocks as units;
wherein: byte level deduplication: duplicate content is looked up and deleted from the byte level and the differential partial content is typically generated by a differential compression strategy.
The client unit comprises an application interface module and an operation request module, wherein the application interface module is used for providing related application programs for clients needing data backup and recovery, and the operation request module is used for sending data backup or recovery requests to the clients through the application programs.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A data deployment system with automatic screening and backup functions comprises a data screening unit and a data backup unit, and is characterized in that: the data screening unit and the data backup unit realize data interaction through the internet, the data screening unit is used for screening data, and the flow of the data screening unit is as follows:
s11, storing the self bodies with limited quantity into the memory of the storage and calculation node;
s12, according to the matching rule, matching and checking the detectors in the storage and computing node mass initial detector subset;
s13, judging that the initial detectors in the massive initial detector subset can become candidate maturity detectors;
and S14, sending the candidate maturity detector and the maximum matching value with the self body to the optimization node.
2. The data deployment system with automatic screening and backup functions of claim 1, wherein: in S12, the BMH2CKMP algorithm is used as the matching rule, and the steps are as follows:
firstly, modifying self-body data to be positive matching based on a BMH2C algorithm;
preprocessing to obtain a next array from the pattern string;
and thirdly, when the patterns are matched, if the characters are mismatched, judging whether the jump value is positive or negative, selecting jump in the positive direction to obtain the maximum displacement, searching a next array in the negative direction, and moving the position of j to force i not to backtrack.
3. The data deployment system with automatic screening and backup functions of claim 2, wherein: the next array is defined as follows:
that is, when next [ j ] ═ k > 0, P [0.. k-1] ═ P [ j-k, j-1] is represented.
4. The data deployment system with automatic screening and backup functions of claim 1, wherein: the candidate maturity detector determination process is as follows:
s21, saving all storage and calculation nodes and sending the storage and calculation nodes to a candidate maturity detector of the optimization node;
s22, sequencing mass candidate mature detectors from small to large according to the maximum matching degree and constructing a set;
s23, circularly taking out the detector of the candidate maturity;
s24, judging whether the number of detectors reaches the set value of the system, if so, circulating to S23, and if not, entering the next step;
and S25, taking the first candidate mature detector in the set and putting the first candidate mature detector in the optimized set.
5. The data deployment system with automatic screening and backup functions of claim 1, wherein: the data backup unit comprises a cloud storage unit and a client unit, and the cloud storage unit comprises the following processes:
s31, the data channel obtains data from the client and caches the data in the cloud storage unit;
s32, carrying out correlation processing on the data;
and S33, storing the processed data in a virtualized storage.
6. The data deployment system with automatic screening and backup functions of claim 5, wherein: in S32, the data correlation processing unit includes a data compression unit, a data encryption unit, and a data de-duplication unit.
7. The data deployment system with automatic screening and backup functions of claim 6, wherein: the data compression unit adopts an LZHJ algorithm, and the flow of the LZHJ algorithm is as follows:
(xi) in the Forward buffer, for a string in the Forward buffer, it is marked X1,X2,...,XN
② recording the current matching character string as Y1,Y2,...,YKWherein Y isKThe last character in the sliding compression window;
(iii) the current maximum matching length is recorded as N > m, and X1=Y1,X2=Y2
Fourthly, to X1,X2,...,XNAnd Y1,Y2,...,YKCarrying out sequential comparison, and recording the obtained matching length as
Lengthmax{i|Xi=Yi,2≤i≤min(N,K)}。
8. The data deployment system with automatic screening and backup functions of claim 6, wherein: the data encryption unit adopts RSA algorithm, and the algorithm steps are as follows:
selecting two different large prime numbers p and q at will, and calculating a product r as p q;
randomly selecting a large integer e, wherein the e is relatively prime with (p-1) q-1, and the integer e is used as an encryption key;
determining a decryption key d;
the integers r and e are disclosed, but d is not disclosed.
9. The data deployment system with automatic screening and backup functions of claim 6, wherein: the data de-duplication unit is classified based on de-duplication granularity and comprises the following steps:
deleting repeated data of the full document level;
secondly, eliminating redundant file blocks;
and thirdly, byte level redundancy elimination.
10. The data deployment system with automatic screening and backup functions of claim 1, wherein: the client unit comprises an application interface module and an operation request module, wherein the application interface module is used for providing related application programs for clients needing data backup and recovery, and the operation request module is used for sending data backup or recovery requests to the clients through the application programs.
CN201910906104.1A 2019-09-24 2019-09-24 Data deployment system with automatic screening and backup functions Pending CN110647585A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910906104.1A CN110647585A (en) 2019-09-24 2019-09-24 Data deployment system with automatic screening and backup functions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910906104.1A CN110647585A (en) 2019-09-24 2019-09-24 Data deployment system with automatic screening and backup functions

Publications (1)

Publication Number Publication Date
CN110647585A true CN110647585A (en) 2020-01-03

Family

ID=68992523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910906104.1A Pending CN110647585A (en) 2019-09-24 2019-09-24 Data deployment system with automatic screening and backup functions

Country Status (1)

Country Link
CN (1) CN110647585A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214479A (en) * 2020-12-01 2021-01-12 王跃 Medical data management system and method based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604408A (en) * 2009-04-03 2009-12-16 江苏大学 A kind of generation of detecting device and detection method
CN102073907A (en) * 2011-02-10 2011-05-25 江苏大学 Novel artificial immune system and ant colony optimization-based detector set optimization method
CN105867323A (en) * 2016-03-31 2016-08-17 东华大学 Industrial cloud data safety automatic production line based on dynamic clonal selection algorithm
CN109858260A (en) * 2019-01-08 2019-06-07 莱芜职业技术学院 A kind of big data management system Internet-based

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604408A (en) * 2009-04-03 2009-12-16 江苏大学 A kind of generation of detecting device and detection method
CN102073907A (en) * 2011-02-10 2011-05-25 江苏大学 Novel artificial immune system and ant colony optimization-based detector set optimization method
CN105867323A (en) * 2016-03-31 2016-08-17 东华大学 Industrial cloud data safety automatic production line based on dynamic clonal selection algorithm
CN109858260A (en) * 2019-01-08 2019-06-07 莱芜职业技术学院 A kind of big data management system Internet-based

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEIXIN_34378922: "《非对称加密(2)非对称加密算法》", 《非对称加密(2)非对称加密算法》 *
姜正禄: "《改进的LZ77数据压缩算法》", 《软件工程与应用》 *
韦安垒: "《一种快速单模式匹配算法的设计与实现》", 《网络空间安全》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214479A (en) * 2020-12-01 2021-01-12 王跃 Medical data management system and method based on big data
CN112214479B (en) * 2020-12-01 2021-07-13 陕西亚创医软信息科技有限公司 Medical data management system and method based on big data

Similar Documents

Publication Publication Date Title
Xia et al. {FastCDC}: A fast and efficient {Content-Defined} chunking approach for data deduplication
Xia et al. A comprehensive study of the past, present, and future of data deduplication
US9727573B1 (en) Out-of core similarity matching
US7478113B1 (en) Boundaries
US9244623B1 (en) Parallel de-duplication of data chunks of a shared data object using a log-structured file system
US8489612B2 (en) Identifying similar files in an environment having multiple client computers
US9063947B2 (en) Detecting duplicative hierarchical sets of files
CN106611035A (en) Retrieval algorithm for deleting repetitive data in cloud storage
US20070255758A1 (en) System and method for sampling based elimination of duplicate data
US10366072B2 (en) De-duplication data bank
US10936228B2 (en) Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
US11314598B2 (en) Method for approximating similarity between objects
Bhalerao et al. A survey: On data deduplication for efficiently utilizing cloud storage for big data backups
Marques et al. Secure deduplication on mobile devices
Barik et al. GeoBD2: Geospatial big data deduplication scheme in fog assisted cloud computing environment
Kumar et al. Bucket based data deduplication technique for big data storage system
US9256503B2 (en) Data verification
Viji et al. Comparative analysis for content defined chunking algorithms in data deduplication
Kumar et al. Genetic optimized data deduplication for distributed big data storage systems
Kim et al. Design and implementation of binary file similarity evaluation system
CN110647585A (en) Data deployment system with automatic screening and backup functions
Kirubakaran et al. A cloud based model for deduplication of large data
Kumar et al. Differential Evolution based bucket indexed data deduplication for big data storage
Gang et al. [Retracted] Dynamic Deduplication Algorithm for Cross‐User Duplicate Data in Hybrid Cloud Storage
Vikraman et al. A study on various data de-duplication systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200103