CN105930534B - A kind of fragmentation of data reduction method based on cloud storage service price - Google Patents

A kind of fragmentation of data reduction method based on cloud storage service price Download PDF

Info

Publication number
CN105930534B
CN105930534B CN201610443197.5A CN201610443197A CN105930534B CN 105930534 B CN105930534 B CN 105930534B CN 201610443197 A CN201610443197 A CN 201610443197A CN 105930534 B CN105930534 B CN 105930534B
Authority
CN
China
Prior art keywords
data
fragmentation
service
service charge
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610443197.5A
Other languages
Chinese (zh)
Other versions
CN105930534A (en
Inventor
谭玉娟
晏志超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201610443197.5A priority Critical patent/CN105930534B/en
Publication of CN105930534A publication Critical patent/CN105930534A/en
Application granted granted Critical
Publication of CN105930534B publication Critical patent/CN105930534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1748De-duplication implemented within the file system, e.g. based on file segments
    • G06F16/1752De-duplication implemented within the file system, e.g. based on file segments based on file chunks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1724Details of de-fragmentation performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of fragmentation of data reduction method based on cloud storage service price of the present invention, this method weighs memory space service charge and takes (fragmentation of data reading service expense does not include data transport service and takes) using fragmentation of data reading service caused by data de-duplication technology, identification leads to fragmentation of data block of the reading data service charge greater than saved data space service charge, by retaining these fragmentation of data blocks in cloud storage system without data de-duplication operations, to reduce the fragmentation of data amount in cloud storage system, reduce the reading data service fee paid needed for user, it allows users to enjoy data storage service provided by cloud storage system with minimum service price.

Description

A kind of fragmentation of data reduction method based on cloud storage service price
Technical field
The invention belongs to computer information storage technology fields, and in particular to a kind of to be used for based on cloud storage service price The method for reducing fragmentation of data in cloud storage system.
Background technique
With the arriving of information age, data presentation increases explosively, and IDC, which predicts the year two thousand twenty whole world, will generate 44ZB Data.And the cloud storage system of storage service is provided as external, generally use data de-duplication technology at present to reduce The data volume that user is stored.But, although the technology can save memory space cost, it is broken that a large amount of data can be introduced Piece.This is primarily due to using after data de-duplication technology, and continuous data block is dispersed in physical space in logic Storage causes to read to need a large amount of data read operation when data, user is made to generate a large amount of reading data service fee.Example Such as, N number of continuous data block an of file is formed after deleting duplicated data block, this N number of continuous data block is likely to (physics list that storage object in cloud storage system data store is stored in cloud storage system in N number of different storage object Position, a storage object are usually hundreds of MB bytes or number GB byte), cause to read the reading data that this file needs n times Operation.And in cloud storage service system, reading data service is charged by the number of operations of reading data, therefore user reads Take this document 1 time data read operation expense for then needing to pay n times that will generate huge if user frequently reads this document Reading data service charge.
In order to reduce user using the expense of storage service caused by cloud storage service (including data space service charge and Data read operation service charge), a kind of fragmentation of data reduction method based on cloud storage service price of the present invention, this method tradeoff Fragmentation of data reading service caused by memory space service charge and use data de-duplication technology takes (fragmentation of data reading Service charge does not include data transport service and takes), identification causes reading data service charge caused by user to be greater than saved number According to the fragmentation of data block of memory space service charge, by retaining these fragmentation of data blocks in cloud storage system without repeat number According to delete operation, to reduce the fragmentation of data amount in cloud storage system, the reading data service charge paid needed for user is reduced With allowing users to enjoy data storage service provided by cloud storage system with minimum service price.
Summary of the invention
A kind of fragmentation of data reduction method based on cloud storage service price of the present invention, this method weigh memory space service Take and (fragmentation of data reading service expense does not include using the expense of fragmentation of data reading service caused by data de-duplication technology Data transport service is taken), identification causes reading data service charge caused by user to be greater than saved data space clothes It is engaged in the fragmentation of data block that takes, by retaining these fragmentation of data blocks in cloud storage system without data de-duplication operations, To reduce the fragmentation of data amount in cloud storage system, the reading data service fee paid needed for user is reduced, is enabled users to It is enough that data storage service provided by cloud storage system is enjoyed with minimum service price.
Core of the invention thought is that identification causes reading data service charge caused by user to be greater than saved data The fragmentation of data section of memory space service charge, fragmentation of data section refer to that be stored in address in the same storage object continuously multiple Repeated data block, there may be multiple fragmentation of data sections in the same storage object.If being generated for some fragmentation of data section Reading data service charge be greater than saved memory space service charge, then all data regarded as in the fragmentation of data section are broken Tile is to rewrite fragmentation of data block, will not carry out data de-duplication operations to these fragmentation of data blocks.As shown in formula 1, It is reading data service charge caused by some fragmentation of data section on the left of the sign of inequality, right side is the memory space service saved Take, if some fragmentation of data section meets formula 1, shows that the reading data service charge of generation is greater than by the fragmentation of data section and saved Memory space about takes.Wherein C in sign of inequality left sidegetGenerated reading data service when 1 data read operation of fingering row Take, which does not include data transport service and take, unrelated with the total size of fragmentation of data section;NreadRefer to that user reads the data The number of fragment section, Cget*NreadAs user reads reading data service charge caused by the fragmentation of data section;The sign of inequality is right Side DataSize is the total size of fragmentation of data section, unit GB, CstorageFor memory space service charge required for every GB, DataSize*CstorageAs user stores the memory space service charge paid required for the fragmentation of data section.
Cget*Nread>DataSize*Cstorage(formula 1)
Main flow of the invention are as follows:
(1) cloud storage service end receives data block information (including the number for belonging to same file that user client is sent According to block content, data block length, data block fingerprint etc.).
(2) whether cloud storage service end data block obtained in finding step (1) in cloud storage system is repeated data Block, if so, marking corresponding data block is repeated data block, otherwise, then marking corresponding data block is non-duplicate data block.
(3) the fragmentation of data section being made of the repeated data block marked in step (2) is obtained, fragmentation of data section is by storing There may be multiple data are broken in the continuous repeated data block composition in the address of the same storage object, the same storage object Segment.
(4) the reading data service charge and storage of each fragmentation of data section according to obtained in the calculating of formula 1 step (3) Simulation spatial service expense (wherein reading data number by system provide a fixed value or system rule of thumb data provide one it is pre- Measured value), if reading data service charge is greater than memory space service charge, mark all data blocks inside corresponding data fragment section To rewrite fragmentation of data block, otherwise, then marking all data blocks inside corresponding data fragment section is non-rewriting fragmentation of data block.
(5) it marks all rewriting fragmentation of data blocks marked in step (4) and in step (2) all non-duplicate Data block is written together in the storage object of cloud storage system, and deletes the non-rewriting fragmentation of data marked in step (4) Block.
Detailed description of the invention
Fig. 1 is flow chart of the invention
Specific embodiment
The present invention relates to user clients and cloud storage service end.User client mainly sends to cloud storage service end and needs The file data information to be stored, the data that cloud storage service end is then sended over using cloud storage system storage user.It repeats Data deletion technology can independently be realized at cloud storage service end, can also be cooperated by cloud storage service end and user client It realizes.Method of the invention can be combined with any one implementation of data de-duplication technology.
Fig. 1 is flow chart of the invention.Specific step is as follows:
(1) cloud storage service end receives the data block information for belonging to same file that user client sends over, should Data block information includes the metadata information and data content information of data block.The metadata information of data block includes data block Length, for data block fingerprint of uniquely tagged data block etc..
(2) whether cloud storage service end data block obtained in finding step (1) in cloud storage system is repeated data Block, the specific steps are as follows:
(2.1) one of data block is read, checks whether the data block is stored mistake in cloud storage system Data block.If so, marking the data block is repeated data block, otherwise, then marking the data block is non-duplicate data block.
(2.2) it is finished if the data block in step (1) is all read, enters next step, be otherwise transferred to step (2.1)。
(3) the fragmentation of data section of the repeated data block marked in step (2.1) composition is obtained, fragmentation of data section is by step (2.1) the continuous repeated data block composition in the address for being stored in the same storage object marked in.
(4) whether obtained fragmentation of data section is to rewrite fragmentation of data section in finding step (3).Specific step is as follows:
(4.1) obtain a fragmentation of data section, according to formula 1 calculate the fragmentation of data section reading data service charge and (wherein reading data number by system provides a fixed value or system rule of thumb data provides one memory space service charge A predicted value), if reading data service charge is greater than memory space service charge, mark all data blocks in the fragmentation of data section To rewrite fragmentation of data block, otherwise, then marking all data blocks in the fragmentation of data section is non-rewriting fragmentation of data block.
(4.2) it is finished if all fragmentation of data sections in step (4) all calculate, enters next step, it is no Then it is transferred to step (4.1).
(5) it marks all rewriting fragmentation of data blocks marked in step (4.1) and in step (2.1) all non- Repeated data block is written together in the storage object of cloud storage system, and deletes the non-rewriting number marked in step (4.1) According to pieces of debris.

Claims (2)

1. a kind of fragmentation of data reduction method based on cloud storage service price, specific steps are as follows:
(1) cloud storage service end receives the data block information for belonging to same file that user client is sent;
(2) whether cloud storage service end data block obtained in finding step (1) in cloud storage system is repeated data block, if It is then to mark corresponding data block for repeated data block, otherwise, then marking corresponding data block is non-duplicate data block;
(3) the fragmentation of data section being made of the repeated data block marked in step (2) is obtained, fragmentation of data section is same by being stored in The continuous repeated data block in the address of one storage object forms, and there are multiple fragmentation of data sections in the same storage object;
(4) the reading data service charge and memory space service charge of each fragmentation of data section obtained in step (3) are calculated, if Reading data service charge is greater than memory space service charge, then marks all data blocks inside corresponding data fragment section to rewrite number According to pieces of debris, otherwise, then marking all data blocks inside corresponding data fragment section is non-rewriting fragmentation of data block;
(5) by all rewriting fragmentation of data blocks marked in step (4) and all non-duplicate datas marked in step (2) Block is written together in the storage object of cloud storage system, and deletes the non-rewriting fragmentation of data block marked in step (4).
2. the method according to claim 1, wherein calculating the number of each fragmentation of data section in the step (4) Take according to reading service and memory space service charge, finds out the fragmentation of data that reading data service charge is greater than memory space service charge Section, as shown in formula 1;If meeting formula 1, illustrate that reading data service charge is greater than memory space service charge, marks corresponding number It is to rewrite fragmentation of data block according to all data blocks in fragment section, repeat number delete operation is not carried out to these data blocks;Otherwise, Then illustrate that reading data service charge is less than or equal to memory space service charge, needs to delete the institute inside its corresponding data fragment section There is data block;
Cget*Nread>DataSize*Cstorage(formula 1),
In above-mentioned formula 1, the CgetGenerated reading data service charge, the expense when 1 data read operation of fingering row Take not comprising data transport service, it is unrelated with the total size of fragmentation of data section;The NreadRefer to that user reads the fragmentation of data The number of section, Cget*NreadAs user reads reading data service charge caused by the fragmentation of data section;The DataSize For the total size of fragmentation of data section, unit GB, the CstorageIt is described for memory space service charge required for every GB DataSize*CstorageAs user stores the memory space service charge paid required for the fragmentation of data section.
CN201610443197.5A 2016-06-20 2016-06-20 A kind of fragmentation of data reduction method based on cloud storage service price Active CN105930534B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610443197.5A CN105930534B (en) 2016-06-20 2016-06-20 A kind of fragmentation of data reduction method based on cloud storage service price

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610443197.5A CN105930534B (en) 2016-06-20 2016-06-20 A kind of fragmentation of data reduction method based on cloud storage service price

Publications (2)

Publication Number Publication Date
CN105930534A CN105930534A (en) 2016-09-07
CN105930534B true CN105930534B (en) 2019-05-17

Family

ID=56830935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610443197.5A Active CN105930534B (en) 2016-06-20 2016-06-20 A kind of fragmentation of data reduction method based on cloud storage service price

Country Status (1)

Country Link
CN (1) CN105930534B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599111B (en) * 2016-11-30 2021-07-02 浙江信安数智科技有限公司 Data management method and storage system
CN109408288B (en) * 2018-09-29 2020-07-10 华中科技大学 Method for removing duplicate fragments of data in packed file backup process

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7478217B2 (en) * 2006-04-07 2009-01-13 Mediatek Inc. Method of storing both large and small files in a data storage device and data storage device thereof
EP2662782A1 (en) * 2012-05-10 2013-11-13 Siemens Aktiengesellschaft Method and system for storing data in a database
US20140122104A1 (en) * 2012-10-26 2014-05-01 Koninklijke Philips N.V. Coaching system that builds coaching messages for physical activity promotion
CN102999605A (en) * 2012-11-21 2013-03-27 重庆大学 Method and device for optimizing data placement to reduce data fragments

Also Published As

Publication number Publication date
CN105930534A (en) 2016-09-07

Similar Documents

Publication Publication Date Title
JP6373328B2 (en) Aggregation of reference blocks into a reference set for deduplication in memory management
US10642515B2 (en) Data storage method, electronic device, and computer non-volatile storage medium
CN110520857B (en) Data processing performance enhancement for neural networks using virtualized data iterators
CN103959256B (en) Data duplication based on fingerprint is deleted
US8131687B2 (en) File system with internal deduplication and management of data blocks
CN103136243B (en) File system duplicate removal method based on cloud storage and device
TW201841122A (en) Key-value store tree
JP2005267600A5 (en)
CN101178726B (en) Method to unarchive data file
JP2005302038A (en) Method and system for renaming consecutive key in b-tree
JP2012089094A5 (en)
CN105009067A (en) Managing operations on stored data units
CN108255989B (en) Picture storage method and device, terminal equipment and computer storage medium
CN105117351A (en) Method and apparatus for writing data into cache
CN107786638A (en) A kind of data processing method, apparatus and system
CN105493080B (en) The method and apparatus of data de-duplication based on context-aware
CN105930534B (en) A kind of fragmentation of data reduction method based on cloud storage service price
JP5821744B2 (en) Data presence / absence determination apparatus, data presence / absence determination method, and data presence / absence determination program
EP3477462B1 (en) Tenant aware, variable length, deduplication of stored data
CN110352410A (en) Track the access module and preextraction index node of index node
CN105009068A (en) Managing operations on stored data units
CN111427511B (en) Data storage method and device
US7685186B2 (en) Optimized and robust in-place data transformation
CN106484691A (en) The date storage method of mobile terminal and device
CN111078652A (en) Filing and compressing method and device for logistics box codes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant