WO2021114464A1 - Data deduplication method, system and device, and computer-readable storage medium - Google Patents

Data deduplication method, system and device, and computer-readable storage medium Download PDF

Info

Publication number
WO2021114464A1
WO2021114464A1 PCT/CN2020/073400 CN2020073400W WO2021114464A1 WO 2021114464 A1 WO2021114464 A1 WO 2021114464A1 CN 2020073400 W CN2020073400 W CN 2020073400W WO 2021114464 A1 WO2021114464 A1 WO 2021114464A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
sub
data
target data
target
Prior art date
Application number
PCT/CN2020/073400
Other languages
French (fr)
Chinese (zh)
Inventor
岳斌
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2021114464A1 publication Critical patent/WO2021114464A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Definitions

  • the core idea of judging whether the data is duplicate data is to calculate the fingerprint value of the data, and the calculation of the fingerprint value requires a large amount of CPU (central processing unit, central processing unit) resources, which affects the performance of the device.
  • CPU central processing unit, central processing unit
  • the purpose of this application is to provide a data deduplication method, which can solve the technical problem of how to reduce the amount of CPU resources occupied by the data deduplication method to a certain extent.
  • This application also provides a data deduplication system, equipment, and computer-readable storage medium.
  • a data deduplication method includes:
  • Parallel operations are performed on the initial value and the target data to obtain the operation data.
  • the construction of the initial value corresponding to the preset size through the SSE instruction set includes:
  • the updating the loop value based on the first mask value, the second mask value, the loop value, and the target data includes:
  • the sub-loop value corresponding to the sub-target data is updated.
  • the product value of the sub-cycle value corresponding to the sub-target data and the first mask value is used as the sub-cycle value corresponding to the sub-target data.
  • the preset size is 128 bits, and the preset number is 4.
  • the method further includes:
  • a data deduplication system includes:
  • the first calculation module is configured to perform calculations on the target data through the SSE instruction set to obtain calculation data corresponding to the preset size
  • the first obtaining module is configured to obtain the fingerprint value of the target data in the target storage device
  • the first judgment module is used to judge whether the hash value is consistent with the fingerprint value, and if not, no longer write the target data into the target storage device.
  • a data deduplication device including:
  • Memory used to store computer programs
  • a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of any of the above data deduplication methods are implemented.
  • FIG. 1 is a flowchart of a data deduplication method provided by an embodiment of the application
  • FIG. 2 is a flow chart of the calculation data obtained in this application.
  • FIG. 3 is a flowchart of updating the cycle value in this application.
  • FIG. 4 is a flowchart of updating sub-cycle values in this application.
  • FIG. 5 is a schematic structural diagram of a data deduplication system provided by an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a data deduplication device provided by an embodiment of this application.
  • FIG. 7 is a schematic diagram of another structure of a data deduplication device provided by an embodiment of the application.
  • FIG. 1 is a flowchart of a data deduplication method provided by an embodiment of the application.
  • the data deduplication method provided in the embodiments of the present application can be applied to devices such as servers and user terminals, and includes the following steps:
  • Step S101 Read target data of a preset size in a target storage device.
  • the target data of a preset size can be read in the target storage device first, and the preset size can be determined according to actual needs, for example, according to the calculation efficiency, the size of the data stored in the target storage device, and so on.
  • Step S102 Operate the target data through the SSE instruction set to obtain the operation data corresponding to the preset size.
  • the SSE (Streaming SIMD Extensions, single instruction multiple data stream extension) instruction set can be used to perform operations on the target data.
  • the SSE instruction set includes single instruction multiple data floating point calculations and additional SIMD (Single Instruction Multiple Data, single instruction multiple data) integer and cache control instructions, so the SSE instruction set can speed up the operation of the target data.
  • the size of the calculation data can be determined according to a preset size.
  • Step S103 Perform a hash operation on the calculated data to obtain a corresponding hash value.
  • Step S104 Obtain the fingerprint value of the target data in the target storage device.
  • Step S105 Determine whether the hash value is consistent with the fingerprint value, if not, perform step S106: no longer write the target data into the target storage device; if yes, perform step S107: end, of course, other operations can also be performed Wait.
  • the size of the hash value can be determined according to the performance of the SSE instruction set, or it can be determined according to specific computing requirements.
  • the calculated hash value is the actual fingerprint value of the target data in the target storage device. Therefore, the hash value and the acquired fingerprint value can be used to determine whether to delete the target data.
  • the data deduplication method reads target data of a preset size in a target storage device; performs operations on the target data through an SSE instruction set to obtain arithmetic data corresponding to the preset size; and performs operations on the arithmetic data.
  • Hash operation to obtain the corresponding hash value; obtain the fingerprint value of the target data in the target storage device; determine whether the hash value is consistent with the fingerprint value, and if not, no longer write the target data into the target storage device.
  • the data deduplication method provided by this application uses the SSE instruction set to perform operations on the target data, and realizes that the SSE instruction set is used to improve the efficiency of the operation on the target data, thereby improving the operation efficiency of the hash value, and only needs to By judging whether the hash value is consistent with the fingerprint value, it can be judged whether the target data is duplicate data, which improves the efficiency of the deduplication operation on the target data and reduces the resource consumption of the CPU.
  • the SSE instruction set when the target data is calculated through the SSE instruction set to obtain the operation data corresponding to the preset size, the SSE instruction set may be used to construct the data corresponding to the preset size. Initial value; Parallel operation is performed on the initial value and target data to obtain the calculated data.
  • Figure 2 is a flow chart of the calculation data obtained in this application.
  • step S102 the step of performing operations on the target data through the SSE instruction set to obtain the operation data corresponding to the preset size may be specifically as follows:
  • Step S111 Construct a first mask value and a second mask value with the data length equal to the preset size through the SSE instruction set.
  • Step S112 Construct a loop value with the data length equal to the preset size.
  • Step S113 Use the first mask value, the second mask value, and the loop value as initial values.
  • Step S114 Update the loop value based on the first mask value, the second mask value, the loop value, and the target data.
  • Step S115 Determine whether the data length of the loop value corresponds to the preset size, if yes, proceed to Step S116: Use the loop value as the calculation data; if not, proceed to Step S117: End, or return to execution in the target storage device Steps to read target data of a preset size, etc.
  • the first mask whose data length is equal to the preset size can be respectively constructed through the SSE instruction set.
  • Code value and second mask value construct a loop value with the data length equal to the preset size; use the first mask value, second mask value, and loop value as the initial value; correspondingly, compare the initial value and the target data
  • the loop value can be updated based on the first mask value, the second mask value, the loop value and the target data; it is determined whether the data length of the loop value corresponds to the preset size, and if so, the loop value The loop value is used as the calculation data. If not, it returns to the step of reading the target data of the preset size in the target storage device.
  • Fig. 3 is a flowchart of updating the cycle value in this application.
  • step S114 the step of updating the cyclic value based on the first mask value, the second mask value, the cyclic value, and the target data may specifically be:
  • Step S121 Split the loop value into a preset number of sub loop values.
  • Step S122 Split the target data into sub-target data corresponding to the sub-cycle values one-to-one.
  • Step S123 Based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-cycle value, update the sub-cycle value corresponding to the sub-target data.
  • the loop value when the loop value is updated based on the first mask value, the second mask value, the loop value, and the target data, the loop value can be split into a preset number of sub loop values; the target data is split It is the sub-target data corresponding to the sub-loop value one-to-one; based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, the sub-target data corresponding to the sub-loop value is updated.
  • FIG. 4 is a flowchart of updating sub-cycle values in this application.
  • step S123 based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-cycle value, update the sub-target data corresponding to the sub-cycle value
  • Step S131 Multiply the sub-target data by the second mask value to obtain the first multiplication value.
  • Step S132 The value obtained by adding the first multiplier value to the sub-loop value corresponding to the sub-target data is used as the sub-loop value corresponding to the sub-target data.
  • Step S133 The sub-loop value corresponding to the sub-target data is shifted by 31 bits to the left and 31 bits to the right to perform an AND operation, and the value obtained by the operation is used as the sub-loop value corresponding to the sub-target data.
  • Step S134 The product value of the sub-loop value corresponding to the sub-target data and the first mask value is used as the sub-loop value corresponding to the sub-target data.
  • the sub-target data when updating the sub-loop value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, the sub-target data can be combined with the second
  • the mask value is multiplied to obtain the first multiplier; the value obtained by adding the first multiplier to the sub-loop value corresponding to the sub-target data is used as the sub-loop value corresponding to the sub-target data; and the sub-loop value corresponding to the sub-target data is added Shift the value of 31 bits to the left and 31 bits to the right to do the AND operation, and use the value obtained from the operation as the sub-loop value corresponding to the sub-target data; the product value of the sub-loop value corresponding to the sub-target data and the first mask value, As the sub-loop value corresponding to the sub-target data.
  • the values of the preset size and the preset number can be determined according to actual needs.
  • the preset size can be 128 bits
  • the preset number can be 4.
  • Uint32 date_1, Uint32 date_2, Uint32 date_3, Uint32 date_4 represent the sub-target data obtained by splitting the target data
  • Uint32 v_1, Uint32 v_2, Uint32 v_3, Uint32 v_4 represents the sub-loop value obtained by splitting the loop value
  • Uint32 date_x corresponds to Uint32 v_x one-to-one
  • the sub-target data is updated based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value
  • Table 1 The structure of target data
  • deletion information of the target data can also be generated and saved.
  • FIG. 5 is a schematic structural diagram of a data deduplication system provided by an embodiment of this application.
  • the first reading module 101 is configured to read target data of a preset size in a target storage device
  • the first calculation module 102 is configured to perform calculations on target data through the SSE instruction set to obtain calculation data corresponding to a preset size
  • the second operation module 103 is configured to perform a hash operation on the operation data to obtain a corresponding hash value
  • the first obtaining module 104 is configured to obtain the fingerprint value of the target data in the target storage device
  • the first judging module 105 is used to judge whether the hash value is consistent with the fingerprint value, and if not, the target data is no longer written into the target storage device.
  • the first calculation module may include:
  • the first construction sub-module is used to construct the initial value corresponding to the preset size through the SSE instruction set;
  • the first operation sub-module is used to perform parallel operations on the initial value and the target data to obtain the operation data.
  • the first construction submodule may include:
  • the first construction unit is configured to separately construct a first mask value and a second mask value with a data length equal to a preset size through the SSE instruction set;
  • the second construction unit is used to construct a cyclic value whose data length is equal to the preset size
  • the first setting unit is used to use the first mask value, the second mask value and the loop value as initial values
  • the first operation sub-module may include:
  • the first update submodule is used to update the cycle value based on the first mask value, the second mask value, the cycle value, and the target data;
  • the first judging sub-module is used to judge whether the data length of the loop value corresponds to the preset size, if it is, the loop value is used as the calculation data, if not, it prompts the first reading module to execute the reading in the target storage device Steps to preset size target data.
  • the first update submodule may include:
  • the first splitting sub-module is used to split the loop value into a preset number of sub loop values
  • the second splitting sub-module is used to split the target data into sub-target data corresponding to the sub-cycle value one-to-one;
  • the second update sub-module is used to update the sub-cycle value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-cycle value.
  • the second update submodule may include:
  • the first calculation unit is configured to multiply the sub-target data by the second mask value to obtain the first multiplication value
  • the second calculation unit is configured to add the first multiplier value and the sub-cycle value corresponding to the sub-target data as the sub-cycle value corresponding to the sub-target data;
  • the third calculation unit is used to perform an AND operation of the sub-cycle value corresponding to the sub-target data by 31 bits to the left and 31 bits to the right, and use the value obtained by the operation as the sub-loop value corresponding to the sub-target data;
  • the fourth calculation unit is configured to use the product value of the sub-loop value corresponding to the sub-target data and the first mask value as the sub-loop value corresponding to the sub-target data.
  • the preset size may be 128 bits, and the preset number may be 4.
  • the first generation module is configured to generate and save the deletion information of the target data after the first judgment module deletes the target data in the target storage device.
  • the present application also provides a data deduplication device and a computer-readable storage medium, both of which have the corresponding effects of the data deduplication method provided in the embodiments of the present application.
  • FIG. 6 is a schematic structural diagram of a data deduplication device provided by an embodiment of this application.
  • a data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • the memory 201 stores a computer program.
  • the processor 202 implements the following steps when the computer program is executed:
  • a data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202.
  • the memory 201 stores a computer program.
  • the processor 202 executes the computer program, the following steps are implemented: Corresponding initial value; perform parallel operations on the initial value and target data to obtain the calculated data.
  • a data deduplication device provided by an embodiment of the application includes a memory 201 and a processor 202.
  • the memory 201 stores a computer program.
  • the processor 202 executes the computer program, the following steps are implemented: the data length and the pre-determined data are constructed separately through the SSE instruction set.
  • a data deduplication device includes a memory 201 and a processor 202.
  • the memory 201 stores a computer program.
  • the processor 202 executes the computer program, the following steps are implemented: Multiply to obtain the first multiplier value; add the first multiplier value to the sub-loop value corresponding to the sub-target data as the sub-loop value corresponding to the sub-target data; shift the sub-loop value corresponding to the sub-target data to the left by 31
  • the bit and the value shifted by 31 bits to the right are ANDed, and the value obtained by the operation is used as the sub-loop value corresponding to the sub-target data; the product value of the sub-loop value corresponding to the sub-target data and the first mask value is used as the sub-target The sub-loop value corresponding to the data.
  • another data deduplication device may further include: an input port 203 connected to the processor 202, used to transmit commands input from the outside to the processor 202; and connected to the processor 202
  • the display unit 204 is used to display the processing result of the processor 202 to the outside;
  • the communication module 205 connected to the processor 202 is used to realize the communication between the data deduplication device and the outside.
  • the display unit 204 can be a display panel, a laser scanning display, etc.; the communication mode adopted by the communication module 205 includes but is not limited to mobile high-definition link technology (HML), universal serial bus (USB), high-definition multimedia interface (HDMI), Wireless connection: wireless fidelity technology (WiFi), Bluetooth communication technology, low-power Bluetooth communication technology, communication technology based on IEEE802.11s.
  • HML mobile high-definition link technology
  • USB universal serial bus
  • HDMI high-definition multimedia interface
  • WiFi wireless fidelity technology
  • Bluetooth communication technology low-power Bluetooth communication technology
  • communication technology based on IEEE802.11s IEEE802.11s.
  • An embodiment of the present application provides a computer-readable storage medium in which a computer program is stored.
  • the computer program When the computer program is executed by a processor, the following steps are implemented: construct an initial value corresponding to a preset size through the SSE instruction set ; Parallel operation is performed on the initial value and the target data to obtain the operation data.
  • An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: the SSE instruction set is used to separately construct a data length equal to a preset size The first mask value and the second mask value; construct a loop value with the data length equal to the preset size; use the first mask value, the second mask value, and the loop value as the initial value; based on the first mask value, The second mask value, the loop value and the target data update the loop value; determine whether the data length of the loop value corresponds to the preset size, if it is, the loop value is used as the calculation data, if not, it returns to the target storage device The step of reading target data of a preset size.
  • An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: split the cycle value into a preset number of sub-cycle values; Split the target data into sub-target data corresponding to the sub-loop value one-to-one; update the sub-loop value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value .
  • An embodiment of the present application provides a computer-readable storage medium in which a computer program is stored.
  • the following steps are implemented: multiplying the sub-target data by the second mask value to obtain The first multiplier value; the value obtained by adding the first multiplier value and the sub-loop value corresponding to the sub-target data as the sub-loop value corresponding to the sub-target data; shifts the sub-loop value corresponding to the sub-target data by 31 bits to the left and right
  • the 31-bit value is used for the AND operation, and the value obtained by the operation is used as the sub-loop value corresponding to the sub-target data; the product value of the sub-loop value corresponding to the sub-target data and the first mask value is used as the sub-target data corresponding to the sub-loop value. Cycle value.
  • An embodiment of the present application provides a computer-readable storage medium in which a computer program is stored.
  • the computer program is executed by a processor, the following steps are implemented: the preset size is 128 bits, and the preset number is 4.
  • An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored.
  • the computer program is executed by a processor, the following steps are implemented: After the target data is deleted from the target storage device, the target data is generated Delete the information and save.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPly erasable programmable ROM registers
  • hard disks hard disks
  • removable disks or CD-ROMs , Or any other form of storage medium known in the technical field.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data deduplication method, system and device, and a computer-readable storage medium. The method comprises: reading target data of a preset size in a target storage device (S101); performing an operation on the target data by means of an SSE instruction set to obtain operational data corresponding to the preset size (S102); performing a hash operation on the operational data to obtain a corresponding hash value (S103); acquiring a fingerprint value of the target data in the target storage device (S104); and determining whether the hash value is consistent with the fingerprint value (S105), and if not, not writing the target data into the target storage device any more (S106). By means of the data deduplication method, the operation efficiency of target data is improved by means of an SSE instruction set, the operation efficiency of a hash value is further improved, and whether the target data is duplicated data can be determined only by determining whether the hash value is consistent with a fingerprint value, such that the resource consumption of a CPU can be reduced. The corresponding technical problems are also solved by means of the data deduplication system and device and the computer-readable storage medium.

Description

一种数据重删方法、***、设备及计算机可读存储介质Data deduplication method, system, equipment and computer readable storage medium
本申请要求于2019年12月12日提交中国专利局、申请号为201911275091.9、发明名称为“一种数据重删方法、***、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911275091.9, and the invention title is "a data deduplication method, system, equipment, and computer-readable storage medium" on December 12, 2019. The entire content is incorporated into this application by reference.
技术领域Technical field
本申请涉及存储技术领域,更具体地说,涉及一种数据重删方法、***、设备及计算机可读存储介质。This application relates to the field of storage technology, and more specifically, to a data deduplication method, system, device, and computer-readable storage medium.
背景技术Background technique
当前,在存储领域中,海量数据的查询和存储均需要占用超大的资源,严重影响了数据存储的性能。为了降低存储数据所需占用的资源,提高数据存储性能,现有的一种方法是对数据进行重删处理,重删处理也即对重复数据进行删除,使得存储设备中只保留有一份相同数据,在不影响数据一致性的前提下,减少盘上的数据存放量。At present, in the storage field, the query and storage of massive data require huge resources, which seriously affects the performance of data storage. In order to reduce the resources required to store data and improve data storage performance, an existing method is to deduplicate data. Deduplication is to delete duplicate data so that only one copy of the same data remains in the storage device. , On the premise of not affecting data consistency, reduce the amount of data storage on the disk.
然而,判断数据是否为重复数据的核心思想是计算数据的指纹值,而指纹值的计算需要占用大量的CPU(central processing unit,中央处理器)资源,从而影响设备的性能。However, the core idea of judging whether the data is duplicate data is to calculate the fingerprint value of the data, and the calculation of the fingerprint value requires a large amount of CPU (central processing unit, central processing unit) resources, which affects the performance of the device.
综上所述,如何降低数据重删方法占用的CPU资源量是目前本领域技术人员亟待解决的问题。In summary, how to reduce the amount of CPU resources occupied by the data deduplication method is a problem to be solved urgently by those skilled in the art.
发明内容Summary of the invention
本申请的目的是提供一种数据重删方法,其能在一定程度上解决如何降低数据重删方法占用的CPU资源量的技术问题。本申请还提供了一种数据重删***、设备及计算机可读存储介质。The purpose of this application is to provide a data deduplication method, which can solve the technical problem of how to reduce the amount of CPU resources occupied by the data deduplication method to a certain extent. This application also provides a data deduplication system, equipment, and computer-readable storage medium.
为了实现上述目的,本申请提供如下技术方案:In order to achieve the above objectives, this application provides the following technical solutions:
一种数据重删方法,包括:A data deduplication method includes:
在目标存储设备中读取预设大小的目标数据;Read the target data of the preset size in the target storage device;
通过SSE指令集对所述目标数据进行运算,得到与所述预设大小相对应的运算数据;Performing operations on the target data through the SSE instruction set to obtain operation data corresponding to the preset size;
对所述运算数据进行哈希运算,得到相应的哈希值;Performing a hash operation on the operation data to obtain a corresponding hash value;
获取所述目标数据在所述目标存储设备中的指纹值;Acquiring the fingerprint value of the target data in the target storage device;
判断所述哈希值与所述指纹值是否一致,若否,则不再将所述目标数据写入所述目标存储设备中。It is determined whether the hash value is consistent with the fingerprint value, and if not, the target data is no longer written into the target storage device.
优选的,所述通过SSE指令集对所述目标数据进行运算,得到与所述预设大小相对应的运算数据,包括:Preferably, the operating the target data through the SSE instruction set to obtain the operating data corresponding to the preset size includes:
通过所述SSE指令集构建与所述预设大小相对应的初始数值;Constructing an initial value corresponding to the preset size through the SSE instruction set;
对所述初始数值和所述目标数据进行并行运算,得到所述运算数据。Parallel operations are performed on the initial value and the target data to obtain the operation data.
优选的,所述通过所述SSE指令集构建与所述预设大小相对应的初始数值,包括:Preferably, the construction of the initial value corresponding to the preset size through the SSE instruction set includes:
通过所述SSE指令集分别构建数据长度与所述预设大小相等的第一掩码值及第二掩码值;Separately constructing a first mask value and a second mask value with a data length equal to the preset size through the SSE instruction set;
构建数据长度与所述预设大小相等的循环值;Construct a loop value with a data length equal to the preset size;
将所述第一掩码值、所述第二掩码值及所述循环值作为所述初始数值;Using the first mask value, the second mask value, and the loop value as the initial value;
所述对所述初始数值和所述目标数据进行并行运算,得到所述运算数据,包括:The parallel operation of the initial value and the target data to obtain the operation data includes:
基于所述第一掩码值、所述第二掩码值、所述循环值及所述目标数据更新所述循环值;Updating the loop value based on the first mask value, the second mask value, the loop value, and the target data;
判断所述循环值的数据长度是否与所述预设大小相对应,若是,则将所述循环值作为所述运算数据,若否,则返回执行所述在目标存储设备中读取预设大小的目标数据的步骤。Determine whether the data length of the loop value corresponds to the preset size, if yes, use the loop value as the calculation data, if not, return to execute the read preset size in the target storage device Of the target data.
优选的,所述基于所述第一掩码值、所述第二掩码值、所述循环值及所述目标数据更新所述循环值,包括:Preferably, the updating the loop value based on the first mask value, the second mask value, the loop value, and the target data includes:
将所述循环值拆分为预设数量的子循环值;Split the cycle value into a preset number of sub-cycle values;
将所述目标数据拆分为与所述子循环值一一对应的子目标数据;Split the target data into sub-target data corresponding to the sub-cycle values one-to-one;
基于所述子目标数据及所述第一掩码值、所述第二掩码值、对应的所述子循环值,更新所述子目标数据对应的所述子循环值。Based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, the sub-loop value corresponding to the sub-target data is updated.
优选的,所述基于所述子目标数据及所述第一掩码值、所述第二掩码值、对应的所述子循环值,更新所述子目标数据对应的所述子循环值,包括:Preferably, the updating the sub-loop value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, include:
将所述子目标数据与所述第二掩码值相乘,得到第一乘值;Multiplying the sub-target data and the second mask value to obtain a first multiplication value;
将所述第一乘值与所述子目标数据对应的所述子循环值相加的值,作为所述子目标数据对应的所述子循环值;Adding the first multiplier value to the sub-loop value corresponding to the sub-target data as the sub-loop value corresponding to the sub-target data;
将所述子目标数据对应的所述子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为所述子目标数据对应的所述子循环值;Performing an AND operation on the value of the sub-cycle value corresponding to the sub-target data by shifting 31 bits to the left and 31 bits to the right, and using the value obtained from the operation as the sub-loop value corresponding to the sub-target data;
将所述子目标数据对应的所述子循环值与所述第一掩码值的乘积值,作为所述子目标数据对应的所述子循环值。The product value of the sub-cycle value corresponding to the sub-target data and the first mask value is used as the sub-cycle value corresponding to the sub-target data.
优选的,所述预设大小为128位,所述预设数量为4。Preferably, the preset size is 128 bits, and the preset number is 4.
优选的,所述在所述目标存储设备中删除所述目标数据之后,还包括:Preferably, after deleting the target data in the target storage device, the method further includes:
生成所述目标数据的删除信息并保存。Generate and save the deletion information of the target data.
一种数据重删***,包括:A data deduplication system includes:
第一读取模块,用于在目标存储设备中读取预设大小的目标数据;The first reading module is configured to read target data of a preset size in the target storage device;
第一运算模块,用于通过SSE指令集对所述目标数据进行运算,得到与所述预设大小相对应的运算数据;The first calculation module is configured to perform calculations on the target data through the SSE instruction set to obtain calculation data corresponding to the preset size;
第二运算模块,用于对所述运算数据进行哈希运算,得到相应的哈希值;The second operation module is used to perform a hash operation on the operation data to obtain a corresponding hash value;
第一获取模块,用于获取所述目标数据在所述目标存储设备中的指纹值;The first obtaining module is configured to obtain the fingerprint value of the target data in the target storage device;
第一判断模块,用于判断所述哈希值与所述指纹值是否一致,若否,则不再将所述目标数据写入所述目标存储设备中。The first judgment module is used to judge whether the hash value is consistent with the fingerprint value, and if not, no longer write the target data into the target storage device.
一种数据重删设备,包括:A data deduplication device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行所述计算机程序时实现如上任一所述数据重删方法的步骤。The processor is used to implement the steps of any one of the above data deduplication methods when the computer program is executed.
一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现如上任一所述数据重删方法的 步骤。A computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of any of the above data deduplication methods are implemented.
本申请提供的一种数据重删方法,在目标存储设备中读取预设大小的目标数据;通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据;对运算数据进行哈希运算,得到相应的哈希值;获取目标数据在目标存储设备中的指纹值;判断哈希值与指纹值是否一致,若否,则不再将目标数据写入目标存储设备中。本申请提供的一种数据重删方法,通过SSE指令集对目标数据进行运算,实现了借助SSE指令集提高对目标数据进行运算的效率,进而提高了对哈希值的运算效率,并且只需通过判断哈希值与指纹值是否一致便可以判断目标数据是否为重复数据,提高对目标数据进行重删运算的效率,可以降低对CPU的资源消耗。本申请提供的一种数据重删***、设备及计算机可读存储介质也解决了相应技术问题。The data deduplication method provided by this application reads target data of a preset size in a target storage device; performs operations on the target data through an SSE instruction set to obtain arithmetic data corresponding to the preset size; and performs operations on the arithmetic data. Hash operation to obtain the corresponding hash value; obtain the fingerprint value of the target data in the target storage device; determine whether the hash value is consistent with the fingerprint value, and if not, no longer write the target data into the target storage device. The data deduplication method provided by this application uses the SSE instruction set to perform operations on the target data, and realizes that the SSE instruction set is used to improve the efficiency of the operation on the target data, thereby improving the operation efficiency of the hash value, and only needs to By judging whether the hash value is consistent with the fingerprint value, it can be judged whether the target data is duplicate data, which improves the efficiency of the deduplication operation on the target data and reduces the resource consumption of the CPU. The data deduplication system, device, and computer-readable storage medium provided by this application also solve the corresponding technical problems.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only It is an embodiment of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without creative work.
图1为本申请实施例提供的一种数据重删方法的流程图;FIG. 1 is a flowchart of a data deduplication method provided by an embodiment of the application;
图2为本申请中得到运算数据的流程图;Figure 2 is a flow chart of the calculation data obtained in this application;
图3为本申请中更新循环值的流程图;Figure 3 is a flowchart of updating the cycle value in this application;
图4为本申请中更新子循环值的流程图;Figure 4 is a flowchart of updating sub-cycle values in this application;
图5为本申请实施例提供的一种数据重删***的结构示意图;FIG. 5 is a schematic structural diagram of a data deduplication system provided by an embodiment of this application;
图6为本申请实施例提供的一种数据重删设备的结构示意图;FIG. 6 is a schematic structural diagram of a data deduplication device provided by an embodiment of this application;
图7为本申请实施例提供的一种数据重删设备的另一结构示意图。FIG. 7 is a schematic diagram of another structure of a data deduplication device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没 有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
请参阅图1,图1为本申请实施例提供的一种数据重删方法的流程图。Please refer to FIG. 1. FIG. 1 is a flowchart of a data deduplication method provided by an embodiment of the application.
本申请实施例提供的一种数据重删方法,可以应用于服务器、用户终端等设备中,包括以下步骤:The data deduplication method provided in the embodiments of the present application can be applied to devices such as servers and user terminals, and includes the following steps:
步骤S101:在目标存储设备中读取预设大小的目标数据。Step S101: Read target data of a preset size in a target storage device.
实际应用中,可以先在目标存储设备中读取预设大小的目标数据,预设大小可以根据实际需要确定,比如根据运算效率、目标存储设备中存储的数据大小等确定。In practical applications, the target data of a preset size can be read in the target storage device first, and the preset size can be determined according to actual needs, for example, according to the calculation efficiency, the size of the data stored in the target storage device, and so on.
步骤S102:通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据。Step S102: Operate the target data through the SSE instruction set to obtain the operation data corresponding to the preset size.
实际应用中,读取出目标数据后,便可以通过SSE(Streaming SIMD Extensions,单指令多数据流扩展)指令集对目标数据进行运算,由于SSE指令集包含单指令多数据浮点计算以及额外的SIMD(Single Instruction Multiple Data,单指令多数据)整数和高速缓存控制指令,所以借助SSE指令集可以加快对目标数据的运算。具体应用场景中,运算数据的大小可以根据预设大小来确定。In practical applications, after reading the target data, the SSE (Streaming SIMD Extensions, single instruction multiple data stream extension) instruction set can be used to perform operations on the target data. The SSE instruction set includes single instruction multiple data floating point calculations and additional SIMD (Single Instruction Multiple Data, single instruction multiple data) integer and cache control instructions, so the SSE instruction set can speed up the operation of the target data. In a specific application scenario, the size of the calculation data can be determined according to a preset size.
步骤S103:对运算数据进行哈希运算,得到相应的哈希值。Step S103: Perform a hash operation on the calculated data to obtain a corresponding hash value.
步骤S104:获取目标数据在目标存储设备中的指纹值。Step S104: Obtain the fingerprint value of the target data in the target storage device.
步骤S105:判断哈希值与指纹值是否一致,若否,则执行步骤S106:不再将目标数据写入目标存储设备中;若是,则可以执行步骤S107:结束,当然,也可以执行其他操作等。Step S105: Determine whether the hash value is consistent with the fingerprint value, if not, perform step S106: no longer write the target data into the target storage device; if yes, perform step S107: end, of course, other operations can also be performed Wait.
实际应用中,哈希值的大小可以根据SSE指令集的性能来确定,也可以根据具体的运算需求来确定等。此外,计算得到的哈希值便是目标数据在目标存储设备中的实际指纹值,因此可以通过哈希值与获取的指纹值来决定是否删除目标数据。In practical applications, the size of the hash value can be determined according to the performance of the SSE instruction set, or it can be determined according to specific computing requirements. In addition, the calculated hash value is the actual fingerprint value of the target data in the target storage device. Therefore, the hash value and the acquired fingerprint value can be used to determine whether to delete the target data.
本申请提供的一种数据重删方法,在目标存储设备中读取预设大小的目标数据;通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据;对运算数据进行哈希运算,得到相应的哈希值;获取目标数 据在目标存储设备中的指纹值;判断哈希值与指纹值是否一致,若否,则不再将目标数据写入目标存储设备中。本申请提供的一种数据重删方法,通过SSE指令集对目标数据进行运算,实现了借助SSE指令集提高对目标数据进行运算的效率,进而提高了对哈希值的运算效率,并且只需通过判断哈希值与指纹值是否一致便可以判断目标数据是否为重复数据,提高对目标数据进行重删运算的效率,可以降低对CPU的资源消耗。The data deduplication method provided by this application reads target data of a preset size in a target storage device; performs operations on the target data through an SSE instruction set to obtain arithmetic data corresponding to the preset size; and performs operations on the arithmetic data. Hash operation to obtain the corresponding hash value; obtain the fingerprint value of the target data in the target storage device; determine whether the hash value is consistent with the fingerprint value, and if not, no longer write the target data into the target storage device. The data deduplication method provided by this application uses the SSE instruction set to perform operations on the target data, and realizes that the SSE instruction set is used to improve the efficiency of the operation on the target data, thereby improving the operation efficiency of the hash value, and only needs to By judging whether the hash value is consistent with the fingerprint value, it can be judged whether the target data is duplicate data, which improves the efficiency of the deduplication operation on the target data and reduces the resource consumption of the CPU.
本申请实施例提供的一种数据重删方法,在通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据时,可以先通过SSE指令集构建与预设大小相对应的初始数值;再对初始数值和目标数据进行并行运算,得到运算数据。In the data deduplication method provided by the embodiments of the present application, when the target data is calculated through the SSE instruction set to obtain the operation data corresponding to the preset size, the SSE instruction set may be used to construct the data corresponding to the preset size. Initial value; Parallel operation is performed on the initial value and target data to obtain the calculated data.
请参阅图2,图2为本申请中得到运算数据的流程图。Please refer to Figure 2. Figure 2 is a flow chart of the calculation data obtained in this application.
本申请实施例提供的一种数据重删方法中,步骤S102:通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据的步骤可以具体为:In the data deduplication method provided by the embodiment of the present application, step S102: the step of performing operations on the target data through the SSE instruction set to obtain the operation data corresponding to the preset size may be specifically as follows:
步骤S111:通过SSE指令集分别构建数据长度与预设大小相等的第一掩码值及第二掩码值。Step S111: Construct a first mask value and a second mask value with the data length equal to the preset size through the SSE instruction set.
步骤S112:构建数据长度与预设大小相等的循环值。Step S112: Construct a loop value with the data length equal to the preset size.
步骤S113:将第一掩码值、第二掩码值及循环值作为初始数值。Step S113: Use the first mask value, the second mask value, and the loop value as initial values.
步骤S114:基于第一掩码值、第二掩码值、循环值及目标数据更新循环值。Step S114: Update the loop value based on the first mask value, the second mask value, the loop value, and the target data.
步骤S115:判断循环值的数据长度是否与预设大小相对应,若是,则执行步骤S116:将循环值作为运算数据;若否,则可以执行步骤S117:结束,或者返回执行在目标存储设备中读取预设大小的目标数据的步骤等。Step S115: Determine whether the data length of the loop value corresponds to the preset size, if yes, proceed to Step S116: Use the loop value as the calculation data; if not, proceed to Step S117: End, or return to execution in the target storage device Steps to read target data of a preset size, etc.
也即本申请实施例提供的一种数据重删方法,在通过SSE指令集构建与预设大小相对应的初始数值时,可以通过SSE指令集分别构建数据长度与预设大小相等的第一掩码值及第二掩码值;构建数据长度与预设大小相等的循环值;将第一掩码值、第二掩码值及循环值作为初始数值;相应的, 在对初始数值和目标数据进行并行运算,得到运算数据时,可以基于第一掩码值、第二掩码值、循环值及目标数据更新循环值;判断循环值的数据长度是否与预设大小相对应,若是,则将循环值作为运算数据,若否,则返回执行在目标存储设备中读取预设大小的目标数据的步骤。That is to say, in the data deduplication method provided by the embodiment of the present application, when the initial value corresponding to the preset size is constructed through the SSE instruction set, the first mask whose data length is equal to the preset size can be respectively constructed through the SSE instruction set. Code value and second mask value; construct a loop value with the data length equal to the preset size; use the first mask value, second mask value, and loop value as the initial value; correspondingly, compare the initial value and the target data When performing parallel operations to obtain the operation data, the loop value can be updated based on the first mask value, the second mask value, the loop value and the target data; it is determined whether the data length of the loop value corresponds to the preset size, and if so, the loop value The loop value is used as the calculation data. If not, it returns to the step of reading the target data of the preset size in the target storage device.
请参阅图3,图3为本申请中更新循环值的流程图。Please refer to Fig. 3, which is a flowchart of updating the cycle value in this application.
本申请实施例提供的一种数据重删方法中,步骤S114:基于第一掩码值、第二掩码值、循环值及目标数据更新循环值的步骤可以具体为:In a data deduplication method provided in an embodiment of the present application, step S114: the step of updating the cyclic value based on the first mask value, the second mask value, the cyclic value, and the target data may specifically be:
步骤S121:将循环值拆分为预设数量的子循环值。Step S121: Split the loop value into a preset number of sub loop values.
步骤S122:将目标数据拆分为与子循环值一一对应的子目标数据。Step S122: Split the target data into sub-target data corresponding to the sub-cycle values one-to-one.
步骤S123:基于子目标数据及第一掩码值、第二掩码值、对应的子循环值,更新子目标数据对应的子循环值。Step S123: Based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-cycle value, update the sub-cycle value corresponding to the sub-target data.
也即实际应用中,在基于第一掩码值、第二掩码值、循环值及目标数据更新循环值时,可以将循环值拆分为预设数量的子循环值;将目标数据拆分为与子循环值一一对应的子目标数据;基于子目标数据及第一掩码值、第二掩码值、对应的子循环值,更新子目标数据对应的子循环值。That is to say, in practical applications, when the loop value is updated based on the first mask value, the second mask value, the loop value, and the target data, the loop value can be split into a preset number of sub loop values; the target data is split It is the sub-target data corresponding to the sub-loop value one-to-one; based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, the sub-target data corresponding to the sub-loop value is updated.
请参阅图4,图4为本申请中更新子循环值的流程图。Please refer to FIG. 4, which is a flowchart of updating sub-cycle values in this application.
本申请实施例提供的一种数据重删方法中,步骤S123:基于子目标数据及第一掩码值、第二掩码值、对应的子循环值,更新子目标数据对应的子循环值的步骤可以具体为:In a data deduplication method provided in an embodiment of the present application, step S123: based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-cycle value, update the sub-target data corresponding to the sub-cycle value The steps can be specifically:
步骤S131:将子目标数据与第二掩码值相乘,得到第一乘值。Step S131: Multiply the sub-target data by the second mask value to obtain the first multiplication value.
步骤S132:将第一乘值与子目标数据对应的子循环值相加的值,作为子目标数据对应的子循环值。Step S132: The value obtained by adding the first multiplier value to the sub-loop value corresponding to the sub-target data is used as the sub-loop value corresponding to the sub-target data.
步骤S133:将子目标数据对应的子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为子目标数据对应的子循环值。Step S133: The sub-loop value corresponding to the sub-target data is shifted by 31 bits to the left and 31 bits to the right to perform an AND operation, and the value obtained by the operation is used as the sub-loop value corresponding to the sub-target data.
步骤S134:将子目标数据对应的子循环值与第一掩码值的乘积值,作为子目标数据对应的子循环值。Step S134: The product value of the sub-loop value corresponding to the sub-target data and the first mask value is used as the sub-loop value corresponding to the sub-target data.
也即具体应用场景中,在基于子目标数据及第一掩码值、第二掩码值、 对应的子循环值,更新子目标数据对应的子循环值时,可以将子目标数据与第二掩码值相乘,得到第一乘值;将第一乘值与子目标数据对应的子循环值相加的值,作为子目标数据对应的子循环值;将子目标数据对应的子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为子目标数据对应的子循环值;将子目标数据对应的子循环值与第一掩码值的乘积值,作为子目标数据对应的子循环值。That is, in a specific application scenario, when updating the sub-loop value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, the sub-target data can be combined with the second The mask value is multiplied to obtain the first multiplier; the value obtained by adding the first multiplier to the sub-loop value corresponding to the sub-target data is used as the sub-loop value corresponding to the sub-target data; and the sub-loop value corresponding to the sub-target data is added Shift the value of 31 bits to the left and 31 bits to the right to do the AND operation, and use the value obtained from the operation as the sub-loop value corresponding to the sub-target data; the product value of the sub-loop value corresponding to the sub-target data and the first mask value, As the sub-loop value corresponding to the sub-target data.
本申请实施例提供的一种数据重删方法中,预设大小及预设数量的值均可以根据实际需要确定,比如预设大小可以为128位,预设数量可以为4。假设目标数据、循环值的结构分别为表1和表2,其中,Uint32 date_1、Uint32 date_2、Uint32 date_3、Uint32 date_4表示目标数据拆分得到的子目标数据,Uint32 v_1、Uint32 v_2、Uint32 v_3、Uint32 v_4表示循环值拆分得到的子循环值,且Uint32 date_x与Uint32 v_x一一对应,则基于子目标数据及第一掩码值、第二掩码值、对应的子循环值,更新子目标数据对应的子循环值的过程可以用公式表示为:V_x=v_x+date_x*P2;v_x=v_x<<31|v_x>>33;V_x=v_x*P1。其中,P1表示第一掩码值,P2表示第二掩码值。In a data deduplication method provided in an embodiment of the present application, the values of the preset size and the preset number can be determined according to actual needs. For example, the preset size can be 128 bits, and the preset number can be 4. Assuming that the structure of the target data and the loop value are Table 1 and Table 2, respectively, Uint32 date_1, Uint32 date_2, Uint32 date_3, Uint32 date_4 represent the sub-target data obtained by splitting the target data, Uint32 v_1, Uint32 v_2, Uint32 v_3, Uint32 v_4 represents the sub-loop value obtained by splitting the loop value, and Uint32 date_x corresponds to Uint32 v_x one-to-one, then the sub-target data is updated based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value The corresponding sub-cycle value process can be expressed by a formula: V_x=v_x+date_x*P2; v_x=v_x<<31|v_x>>33; V_x=v_x*P1. Among them, P1 represents the first mask value, and P2 represents the second mask value.
表1目标数据的结构Table 1 The structure of target data
Figure PCTCN2020073400-appb-000001
Figure PCTCN2020073400-appb-000001
表2循环值的结构Table 2 Structure of cyclic value
Figure PCTCN2020073400-appb-000002
Figure PCTCN2020073400-appb-000002
本申请实施例提供的一种数据重删方法,为了便于外界对目标数据进行管理,在目标存储设备中删除目标数据之后,还可以生成目标数据的删除信息并保存。In the data deduplication method provided by the embodiment of the present application, in order to facilitate the outside management of target data, after the target data is deleted in the target storage device, deletion information of the target data can also be generated and saved.
请参阅图5,图5为本申请实施例提供的一种数据重删***的结构示意图。Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a data deduplication system provided by an embodiment of this application.
本申请实施例提供的一种数据重删***,可以包括:A data deduplication system provided by an embodiment of the present application may include:
第一读取模块101,用于在目标存储设备中读取预设大小的目标数据;The first reading module 101 is configured to read target data of a preset size in a target storage device;
第一运算模块102,用于通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据;The first calculation module 102 is configured to perform calculations on target data through the SSE instruction set to obtain calculation data corresponding to a preset size;
第二运算模块103,用于对运算数据进行哈希运算,得到相应的哈希值;The second operation module 103 is configured to perform a hash operation on the operation data to obtain a corresponding hash value;
第一获取模块104,用于获取目标数据在目标存储设备中的指纹值;The first obtaining module 104 is configured to obtain the fingerprint value of the target data in the target storage device;
第一判断模块105,用于判断哈希值与指纹值是否一致,若否,则不再将目标数据写入目标存储设备中。The first judging module 105 is used to judge whether the hash value is consistent with the fingerprint value, and if not, the target data is no longer written into the target storage device.
本申请实施例提供的一种数据重删***,第一运算模块可以包括:In the data deduplication system provided by an embodiment of the present application, the first calculation module may include:
第一构建子模块,用于通过SSE指令集构建与预设大小相对应的初始数值;The first construction sub-module is used to construct the initial value corresponding to the preset size through the SSE instruction set;
第一运算子模块,用于对初始数值和目标数据进行并行运算,得到运算数据。The first operation sub-module is used to perform parallel operations on the initial value and the target data to obtain the operation data.
本申请实施例提供的一种数据重删***,第一构建子模块可以包括:In the data deduplication system provided by the embodiment of the present application, the first construction submodule may include:
第一构建单元,用于通过SSE指令集分别构建数据长度与预设大小相等的第一掩码值及第二掩码值;The first construction unit is configured to separately construct a first mask value and a second mask value with a data length equal to a preset size through the SSE instruction set;
第二构建单元,用于构建数据长度与预设大小相等的循环值;The second construction unit is used to construct a cyclic value whose data length is equal to the preset size;
第一设置单元,用于将第一掩码值、第二掩码值及循环值作为初始数值;The first setting unit is used to use the first mask value, the second mask value and the loop value as initial values;
第一运算子模块可以包括:The first operation sub-module may include:
第一更新子模块,用于基于第一掩码值、第二掩码值、循环值及目标数据更新循环值;The first update submodule is used to update the cycle value based on the first mask value, the second mask value, the cycle value, and the target data;
第一判断子模块,用于判断循环值的数据长度是否与预设大小相对应,若是,则将循环值作为运算数据,若否,则提示第一读取模块执行在目标存储设备中读取预设大小的目标数据的步骤。The first judging sub-module is used to judge whether the data length of the loop value corresponds to the preset size, if it is, the loop value is used as the calculation data, if not, it prompts the first reading module to execute the reading in the target storage device Steps to preset size target data.
本申请实施例提供的一种数据重删***,第一更新子模块可以包括:In the data deduplication system provided by the embodiment of the present application, the first update submodule may include:
第一拆分子模块,用于将循环值拆分为预设数量的子循环值;The first splitting sub-module is used to split the loop value into a preset number of sub loop values;
第二拆分子模块,用于将目标数据拆分为与子循环值一一对应的子目标数据;The second splitting sub-module is used to split the target data into sub-target data corresponding to the sub-cycle value one-to-one;
第二更新子模块,用于基于子目标数据及第一掩码值、第二掩码值、 对应的子循环值,更新子目标数据对应的子循环值。The second update sub-module is used to update the sub-cycle value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-cycle value.
本申请实施例提供的一种数据重删***,第二更新子模块可以包括:In the data deduplication system provided by the embodiment of the present application, the second update submodule may include:
第一计算单元,用于将子目标数据与第二掩码值相乘,得到第一乘值;The first calculation unit is configured to multiply the sub-target data by the second mask value to obtain the first multiplication value;
第二计算单元,用于将第一乘值与子目标数据对应的子循环值相加的值,作为子目标数据对应的子循环值;The second calculation unit is configured to add the first multiplier value and the sub-cycle value corresponding to the sub-target data as the sub-cycle value corresponding to the sub-target data;
第三计算单元,用于将子目标数据对应的子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为子目标数据对应的子循环值;The third calculation unit is used to perform an AND operation of the sub-cycle value corresponding to the sub-target data by 31 bits to the left and 31 bits to the right, and use the value obtained by the operation as the sub-loop value corresponding to the sub-target data;
第四计算单元,用于将子目标数据对应的子循环值与第一掩码值的乘积值,作为子目标数据对应的子循环值。The fourth calculation unit is configured to use the product value of the sub-loop value corresponding to the sub-target data and the first mask value as the sub-loop value corresponding to the sub-target data.
本申请实施例提供的一种数据重删***,预设大小可以为128位,预设数量可以为4。In the data deduplication system provided by the embodiment of the present application, the preset size may be 128 bits, and the preset number may be 4.
本申请实施例提供的一种数据重删***,还可以包括:The data deduplication system provided by the embodiment of the present application may further include:
第一生成模块,用于第一判断模块在目标存储设备中删除目标数据之后,生成目标数据的删除信息并保存。The first generation module is configured to generate and save the deletion information of the target data after the first judgment module deletes the target data in the target storage device.
本申请还提供了一种数据重删设备及计算机可读存储介质,其均具有本申请实施例提供的一种数据重删方法具有的对应效果。请参阅图6,图6为本申请实施例提供的一种数据重删设备的结构示意图。The present application also provides a data deduplication device and a computer-readable storage medium, both of which have the corresponding effects of the data deduplication method provided in the embodiments of the present application. Please refer to FIG. 6, which is a schematic structural diagram of a data deduplication device provided by an embodiment of this application.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:A data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202. The memory 201 stores a computer program. The processor 202 implements the following steps when the computer program is executed:
在目标存储设备中读取预设大小的目标数据;Read the target data of the preset size in the target storage device;
通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据;Operate the target data through the SSE instruction set to obtain the operation data corresponding to the preset size;
对运算数据进行哈希运算,得到相应的哈希值;Perform a hash operation on the calculated data to obtain the corresponding hash value;
获取目标数据在目标存储设备中的指纹值;Obtain the fingerprint value of the target data in the target storage device;
判断哈希值与指纹值是否一致,若否,则不再将目标数据写入目标存储设备中。It is judged whether the hash value is consistent with the fingerprint value, and if not, the target data is no longer written into the target storage device.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202, 存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:通过SSE指令集构建与预设大小相对应的初始数值;对初始数值和目标数据进行并行运算,得到运算数据。A data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202. The memory 201 stores a computer program. When the processor 202 executes the computer program, the following steps are implemented: Corresponding initial value; perform parallel operations on the initial value and target data to obtain the calculated data.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:通过SSE指令集分别构建数据长度与预设大小相等的第一掩码值及第二掩码值;构建数据长度与预设大小相等的循环值;将第一掩码值、第二掩码值及循环值作为初始数值;基于第一掩码值、第二掩码值、循环值及目标数据更新循环值;判断循环值的数据长度是否与预设大小相对应,若是,则将循环值作为运算数据,若否,则返回执行在目标存储设备中读取预设大小的目标数据的步骤。A data deduplication device provided by an embodiment of the application includes a memory 201 and a processor 202. The memory 201 stores a computer program. When the processor 202 executes the computer program, the following steps are implemented: the data length and the pre-determined data are constructed separately through the SSE instruction set. Set the first mask value and the second mask value with the same size; construct a loop value with the data length equal to the preset size; use the first mask value, the second mask value and the loop value as the initial value; based on the first Mask value, second mask value, cycle value and target data update cycle value; judge whether the data length of the cycle value corresponds to the preset size, if yes, use the cycle value as the calculation data, if not, return to the execution at The step of reading the target data of the preset size in the target storage device.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:将循环值拆分为预设数量的子循环值;将目标数据拆分为与子循环值一一对应的子目标数据;基于子目标数据及第一掩码值、第二掩码值、对应的子循环值,更新子目标数据对应的子循环值。A data deduplication device provided in an embodiment of the present application includes a memory 201 and a processor 202. A computer program is stored in the memory 201. When the processor 202 executes the computer program, the following steps are implemented: Sub-loop value; split the target data into sub-target data corresponding to the sub-loop value one-to-one; update the corresponding sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value The value of the sub-loop.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:将子目标数据与第二掩码值相乘,得到第一乘值;将第一乘值与子目标数据对应的子循环值相加的值,作为子目标数据对应的子循环值;将子目标数据对应的子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为子目标数据对应的子循环值;将子目标数据对应的子循环值与第一掩码值的乘积值,作为子目标数据对应的子循环值。A data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202. The memory 201 stores a computer program. When the processor 202 executes the computer program, the following steps are implemented: Multiply to obtain the first multiplier value; add the first multiplier value to the sub-loop value corresponding to the sub-target data as the sub-loop value corresponding to the sub-target data; shift the sub-loop value corresponding to the sub-target data to the left by 31 The bit and the value shifted by 31 bits to the right are ANDed, and the value obtained by the operation is used as the sub-loop value corresponding to the sub-target data; the product value of the sub-loop value corresponding to the sub-target data and the first mask value is used as the sub-target The sub-loop value corresponding to the data.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:预设大小为128位,预设数量为4。A data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202. A computer program is stored in the memory 201. When the processor 202 executes the computer program, the following steps are implemented: the preset size is 128 bits, and the preset number is Is 4.
本申请实施例提供的一种数据重删设备,包括存储器201和处理器202,存储器201中存储有计算机程序,处理器202执行计算机程序时实现如下步骤:在目标存储设备中删除目标数据之后,生成目标数据的删除信息并保 存。A data deduplication device provided by an embodiment of the present application includes a memory 201 and a processor 202. The memory 201 stores a computer program. When the processor 202 executes the computer program, the following steps are implemented: After deleting the target data in the target storage device, Generate and save the deletion information of the target data.
请参阅图7,本申请实施例提供的另一种数据重删设备中还可以包括:与处理器202连接的输入端口203,用于传输外界输入的命令至处理器202;与处理器202连接的显示单元204,用于显示处理器202的处理结果至外界;与处理器202连接的通信模块205,用于实现数据重删设备与外界的通信。显示单元204可以为显示面板、激光扫描使显示器等;通信模块205所采用的通信方式包括但不局限于移动高清链接技术(HML)、通用串行总线(USB)、高清多媒体接口(HDMI)、无线连接:无线保真技术(WiFi)、蓝牙通信技术、低功耗蓝牙通信技术、基于IEEE802.11s的通信技术。Referring to FIG. 7, another data deduplication device provided by an embodiment of the present application may further include: an input port 203 connected to the processor 202, used to transmit commands input from the outside to the processor 202; and connected to the processor 202 The display unit 204 is used to display the processing result of the processor 202 to the outside; the communication module 205 connected to the processor 202 is used to realize the communication between the data deduplication device and the outside. The display unit 204 can be a display panel, a laser scanning display, etc.; the communication mode adopted by the communication module 205 includes but is not limited to mobile high-definition link technology (HML), universal serial bus (USB), high-definition multimedia interface (HDMI), Wireless connection: wireless fidelity technology (WiFi), Bluetooth communication technology, low-power Bluetooth communication technology, communication technology based on IEEE802.11s.
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
在目标存储设备中读取预设大小的目标数据;Read the target data of the preset size in the target storage device;
通过SSE指令集对目标数据进行运算,得到与预设大小相对应的运算数据;Operate the target data through the SSE instruction set to obtain the operation data corresponding to the preset size;
对运算数据进行哈希运算,得到相应的哈希值;Perform a hash operation on the calculated data to obtain the corresponding hash value;
获取目标数据在目标存储设备中的指纹值;Obtain the fingerprint value of the target data in the target storage device;
判断哈希值与指纹值是否一致,若否,则不再将目标数据写入目标存储设备中。It is judged whether the hash value is consistent with the fingerprint value, and if not, the target data is no longer written into the target storage device.
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:通过SSE指令集构建与预设大小相对应的初始数值;对初始数值和目标数据进行并行运算,得到运算数据。An embodiment of the present application provides a computer-readable storage medium in which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: construct an initial value corresponding to a preset size through the SSE instruction set ; Parallel operation is performed on the initial value and the target data to obtain the operation data.
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:通过SSE指令集分别构建数据长度与预设大小相等的第一掩码值及第二掩码值;构建数据长度与预设大小相等的循环值;将第一掩码值、第二掩码值及循环值作为初始数值;基于第一掩码值、第二掩码值、循环值及目标数据更新循环值;判断循环值的数据长度是否与预设大小相对应,若是,则将循环 值作为运算数据,若否,则返回执行在目标存储设备中读取预设大小的目标数据的步骤。An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: the SSE instruction set is used to separately construct a data length equal to a preset size The first mask value and the second mask value; construct a loop value with the data length equal to the preset size; use the first mask value, the second mask value, and the loop value as the initial value; based on the first mask value, The second mask value, the loop value and the target data update the loop value; determine whether the data length of the loop value corresponds to the preset size, if it is, the loop value is used as the calculation data, if not, it returns to the target storage device The step of reading target data of a preset size.
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:将循环值拆分为预设数量的子循环值;将目标数据拆分为与子循环值一一对应的子目标数据;基于子目标数据及第一掩码值、第二掩码值、对应的子循环值,更新子目标数据对应的子循环值。An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented: split the cycle value into a preset number of sub-cycle values; Split the target data into sub-target data corresponding to the sub-loop value one-to-one; update the sub-loop value corresponding to the sub-target data based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value .
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:将子目标数据与第二掩码值相乘,得到第一乘值;将第一乘值与子目标数据对应的子循环值相加的值,作为子目标数据对应的子循环值;将子目标数据对应的子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为子目标数据对应的子循环值;将子目标数据对应的子循环值与第一掩码值的乘积值,作为子目标数据对应的子循环值。An embodiment of the present application provides a computer-readable storage medium in which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: multiplying the sub-target data by the second mask value to obtain The first multiplier value; the value obtained by adding the first multiplier value and the sub-loop value corresponding to the sub-target data as the sub-loop value corresponding to the sub-target data; shifts the sub-loop value corresponding to the sub-target data by 31 bits to the left and right The 31-bit value is used for the AND operation, and the value obtained by the operation is used as the sub-loop value corresponding to the sub-target data; the product value of the sub-loop value corresponding to the sub-target data and the first mask value is used as the sub-target data corresponding to the sub-loop value. Cycle value.
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:预设大小为128位,预设数量为4。An embodiment of the present application provides a computer-readable storage medium in which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: the preset size is 128 bits, and the preset number is 4.
本申请实施例提供的一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,计算机程序被处理器执行时实现如下步骤:在目标存储设备中删除目标数据之后,生成目标数据的删除信息并保存。An embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: After the target data is deleted from the target storage device, the target data is generated Delete the information and save.
本申请所涉及的计算机可读存储介质包括随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质。The computer-readable storage media involved in this application include random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, and CD-ROMs , Or any other form of storage medium known in the technical field.
本申请实施例提供的一种数据重删***、设备及计算机可读存储介质中相关部分的说明请参见本申请实施例提供的一种数据重删方法中对应部分的详细说明,在此不再赘述。另外,本申请实施例提供的上述技术方案中与现有技术中对应技术方案实现原理一致的部分并未详细说明,以免过多赘述。Please refer to the detailed description of the corresponding part in the data deduplication method provided in the embodiment of this application for the description of the relevant parts in the data deduplication system, device and computer-readable storage medium provided in the embodiment of this application, which will not be repeated here. Go into details. In addition, the parts of the foregoing technical solutions provided by the embodiments of the present application that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is any such actual relationship or order between. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.
对所公开的实施例的上述说明,使本领域技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to implement or use this application. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined herein can be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, this application will not be limited to the embodiments shown in this document, but should conform to the widest scope consistent with the principles and novel features disclosed in this document.

Claims (10)

  1. 一种数据重删方法,其特征在于,包括:A method for data deduplication, which is characterized in that it includes:
    在目标存储设备中读取预设大小的目标数据;Read the target data of the preset size in the target storage device;
    通过SSE指令集对所述目标数据进行运算,得到与所述预设大小相对应的运算数据;Performing operations on the target data through the SSE instruction set to obtain operation data corresponding to the preset size;
    对所述运算数据进行哈希运算,得到相应的哈希值;Performing a hash operation on the operation data to obtain a corresponding hash value;
    获取所述目标数据在所述目标存储设备中的指纹值;Acquiring the fingerprint value of the target data in the target storage device;
    判断所述哈希值与所述指纹值是否一致,若否,则不再将所述目标数据写入所述目标存储设备中。It is determined whether the hash value is consistent with the fingerprint value, and if not, the target data is no longer written into the target storage device.
  2. 根据权利要求1所述的方法,其特征在于,所述通过SSE指令集对所述目标数据进行运算,得到与所述预设大小相对应的运算数据,包括:The method according to claim 1, wherein the performing operations on the target data through the SSE instruction set to obtain the operation data corresponding to the preset size comprises:
    通过所述SSE指令集构建与所述预设大小相对应的初始数值;Constructing an initial value corresponding to the preset size through the SSE instruction set;
    对所述初始数值和所述目标数据进行并行运算,得到所述运算数据。Parallel operations are performed on the initial value and the target data to obtain the operation data.
  3. 根据权利要求2所述的方法,其特征在于,所述通过所述SSE指令集构建与所述预设大小相对应的初始数值,包括:The method according to claim 2, wherein the constructing the initial value corresponding to the preset size through the SSE instruction set comprises:
    通过所述SSE指令集分别构建数据长度与所述预设大小相等的第一掩码值及第二掩码值;Separately constructing a first mask value and a second mask value with a data length equal to the preset size through the SSE instruction set;
    构建数据长度与所述预设大小相等的循环值;Construct a loop value with a data length equal to the preset size;
    将所述第一掩码值、所述第二掩码值及所述循环值作为所述初始数值;Using the first mask value, the second mask value, and the loop value as the initial value;
    所述对所述初始数值和所述目标数据进行并行运算,得到所述运算数据,包括:The parallel operation of the initial value and the target data to obtain the operation data includes:
    基于所述第一掩码值、所述第二掩码值、所述循环值及所述目标数据更新所述循环值;Updating the loop value based on the first mask value, the second mask value, the loop value, and the target data;
    判断所述循环值的数据长度是否与所述预设大小相对应,若是,则将所述循环值作为所述运算数据,若否,则返回执行所述在目标存储设备中读取预设大小的目标数据的步骤。Determine whether the data length of the loop value corresponds to the preset size, if yes, use the loop value as the calculation data, if not, return to execute the read preset size in the target storage device Of the target data.
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述第一掩码值、所述第二掩码值、所述循环值及所述目标数据更新所述循环值,包括:The method of claim 3, wherein the updating the loop value based on the first mask value, the second mask value, the loop value, and the target data comprises:
    将所述循环值拆分为预设数量的子循环值;Split the cycle value into a preset number of sub-cycle values;
    将所述目标数据拆分为与所述子循环值一一对应的子目标数据;Split the target data into sub-target data corresponding to the sub-cycle values one-to-one;
    基于所述子目标数据及所述第一掩码值、所述第二掩码值、对应的所述子循环值,更新所述子目标数据对应的所述子循环值。Based on the sub-target data, the first mask value, the second mask value, and the corresponding sub-loop value, the sub-loop value corresponding to the sub-target data is updated.
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述子目标数据及所述第一掩码值、所述第二掩码值、对应的所述子循环值,更新所述子目标数据对应的所述子循环值,包括:The method according to claim 4, wherein said updating said sub-target data based on said sub-target data, said first mask value, said second mask value, and the corresponding sub-cycle value The sub-cycle value corresponding to the target data includes:
    将所述子目标数据与所述第二掩码值相乘,得到第一乘值;Multiplying the sub-target data and the second mask value to obtain a first multiplication value;
    将所述第一乘值与所述子目标数据对应的所述子循环值相加的值,作为所述子目标数据对应的所述子循环值;Adding the first multiplier value to the sub-loop value corresponding to the sub-target data as the sub-loop value corresponding to the sub-target data;
    将所述子目标数据对应的所述子循环值左移31位与右移31位的数值做与操作,将与操作得到的值作为所述子目标数据对应的所述子循环值;Performing an AND operation on the value of the sub-cycle value corresponding to the sub-target data by shifting 31 bits to the left and 31 bits to the right, and using the value obtained from the operation as the sub-loop value corresponding to the sub-target data;
    将所述子目标数据对应的所述子循环值与所述第一掩码值的乘积值,作为所述子目标数据对应的所述子循环值。The product value of the sub-cycle value corresponding to the sub-target data and the first mask value is used as the sub-cycle value corresponding to the sub-target data.
  6. 根据权利要求5所述的方法,其特征在于,所述预设大小为128位,所述预设数量为4。The method according to claim 5, wherein the preset size is 128 bits, and the preset number is 4.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述在所述目标存储设备中删除所述目标数据之后,还包括:The method according to any one of claims 1 to 6, wherein after deleting the target data in the target storage device, the method further comprises:
    生成所述目标数据的删除信息并保存。Generate and save the deletion information of the target data.
  8. 一种数据重删***,其特征在于,包括:A data deduplication system is characterized in that it includes:
    第一读取模块,用于在目标存储设备中读取预设大小的目标数据;The first reading module is configured to read target data of a preset size in the target storage device;
    第一运算模块,用于通过SSE指令集对所述目标数据进行运算,得到与所述预设大小相对应的运算数据;The first calculation module is configured to perform calculations on the target data through the SSE instruction set to obtain calculation data corresponding to the preset size;
    第二运算模块,用于对所述运算数据进行哈希运算,得到相应的哈希值;The second operation module is used to perform a hash operation on the operation data to obtain a corresponding hash value;
    第一获取模块,用于获取所述目标数据在所述目标存储设备中的指纹值;The first obtaining module is configured to obtain the fingerprint value of the target data in the target storage device;
    第一判断模块,用于判断所述哈希值与所述指纹值是否一致,若否,则不再将所述目标数据写入所述目标存储设备中。The first judgment module is used to judge whether the hash value is consistent with the fingerprint value, and if not, no longer write the target data into the target storage device.
  9. 一种数据重删设备,其特征在于,包括:A data deduplication device is characterized in that it comprises:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述数据重删方法的步骤。The processor is configured to implement the steps of the data deduplication method according to any one of claims 1 to 7 when the computer program is executed.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述数据重删方法的步骤。A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the data deduplication method according to any one of claims 1 to 7 is implemented A step of.
PCT/CN2020/073400 2019-12-12 2020-01-21 Data deduplication method, system and device, and computer-readable storage medium WO2021114464A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911275091.9 2019-12-12
CN201911275091.9A CN111090397B (en) 2019-12-12 2019-12-12 Data deduplication method, system, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2021114464A1 true WO2021114464A1 (en) 2021-06-17

Family

ID=70396318

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073400 WO2021114464A1 (en) 2019-12-12 2020-01-21 Data deduplication method, system and device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN111090397B (en)
WO (1) WO2021114464A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786141A (en) * 2022-04-29 2022-07-22 恒玄科技(上海)股份有限公司 Message filtering method and device in Bluetooth wireless mesh network

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361346B (en) * 2023-06-02 2023-08-08 山东浪潮科学研究院有限公司 Data table analysis method, device and equipment based on mask calculation and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185541A1 (en) * 2001-10-29 2013-07-18 Yen-Kuang Chen Bitstream buffer manipulation with a simd merge instruction
CN107004031A (en) * 2016-04-19 2017-08-01 华为技术有限公司 Split while using Vector Processing
CN107534445A (en) * 2016-04-19 2018-01-02 华为技术有限公司 For splitting the Vector Processing of cryptographic Hash calculating
CN107644081A (en) * 2017-09-21 2018-01-30 锐捷网络股份有限公司 Data duplicate removal method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629258B (en) * 2012-02-29 2013-12-18 浪潮(北京)电子信息产业有限公司 Repeating data deleting method and device
US8856546B2 (en) * 2012-06-07 2014-10-07 Intel Corporation Speed up secure hash algorithm (SHA) using single instruction multiple data (SIMD) architectures
JP6028567B2 (en) * 2012-12-28 2016-11-16 富士通株式会社 Data storage program, data search program, data storage device, data search device, data storage method, and data search method
KR101473837B1 (en) * 2013-05-03 2014-12-18 인하대학교 산학협력단 An Invalid Data Recycling Method for Improving I/O Performance in SSD-based Storage System
CN104077380B (en) * 2014-06-26 2017-07-18 深圳信息职业技术学院 A kind of data de-duplication method, apparatus and system
CN104462388B (en) * 2014-12-10 2017-12-29 上海爱数信息技术股份有限公司 A kind of redundant data method for cleaning based on tandem type storage medium
CN104881470B (en) * 2015-05-28 2018-05-08 暨南大学 A kind of data de-duplication method towards mass picture data
CN105930101A (en) * 2016-05-04 2016-09-07 中国人民解放军国防科学技术大学 Weak fingerprint repeated data deletion mechanism based on flash memory solid-state disk
CN107276745B (en) * 2017-06-23 2020-08-04 上海兆芯集成电路有限公司 Processor for implementing secure hash algorithm and digital signal processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185541A1 (en) * 2001-10-29 2013-07-18 Yen-Kuang Chen Bitstream buffer manipulation with a simd merge instruction
CN107004031A (en) * 2016-04-19 2017-08-01 华为技术有限公司 Split while using Vector Processing
CN107534445A (en) * 2016-04-19 2018-01-02 华为技术有限公司 For splitting the Vector Processing of cryptographic Hash calculating
CN107644081A (en) * 2017-09-21 2018-01-30 锐捷网络股份有限公司 Data duplicate removal method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114786141A (en) * 2022-04-29 2022-07-22 恒玄科技(上海)股份有限公司 Message filtering method and device in Bluetooth wireless mesh network
CN114786141B (en) * 2022-04-29 2023-11-21 恒玄科技(上海)股份有限公司 Message filtering method and device in Bluetooth wireless mesh network

Also Published As

Publication number Publication date
CN111090397B (en) 2021-10-22
CN111090397A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
EP3113043B1 (en) Method, device and host for updating metadata stored in columns in distributed file system
US10073854B2 (en) Data deduplication in a virtualization environment
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
CN112166425A (en) Efficient in-memory relationship representation for metamorphic graphs
CN111737221B (en) Data read-write method, device and equipment of cluster file system and storage medium
WO2021114464A1 (en) Data deduplication method, system and device, and computer-readable storage medium
JP7047228B2 (en) Data query methods, devices, electronic devices, readable storage media, and computer programs
KR102287677B1 (en) Data accessing method, apparatus, device, and storage medium
WO2020007288A1 (en) Method and system for managing memory data and maintaining data in memory
AU2017399399B2 (en) Method and system for optimizing database system, electronic device, and storage medium
US10366081B2 (en) Declarative partitioning for data collection queries
WO2017128701A1 (en) Method and apparatus for storing data
WO2022222350A1 (en) Method for encrypting data, and computing device
US20210334243A1 (en) Multitenant application server using a union file system
WO2023124217A1 (en) Method and device for acquiring comprehensively sorted data of multi-column data
WO2021017305A1 (en) Data query method and apparatus, electronic device, and computer readable storage medium
WO2023056946A1 (en) Data caching method and apparatus, and electronic device
CN111752972B (en) Data association query method and system based on RocksDB key-value storage mode
WO2018120933A1 (en) Storage and query method and device of data base
WO2018053889A1 (en) Distributed computing framework and distributed computing method
WO2019186777A1 (en) Information processing device, control method, and program
US9760577B2 (en) Write-behind caching in distributed file systems
WO2020224498A1 (en) Relational database based on alliance chain, and operation method and apparatus therefor
WO2023050704A1 (en) Data caching method, system and device in ai cluster, and computer medium
US20130179650A1 (en) Data sharing using difference-on-write

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20899652

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20899652

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20899652

Country of ref document: EP

Kind code of ref document: A1