TWI804466B

TWI804466B - Method of retrieving data stored in memory and dedupe module

Info

Publication number: TWI804466B
Application number: TW106116633A
Authority: TW
Inventors: 冬岩姜; 林常惠; 克里希納馬拉迪; 金鍾民; 鄭宏忠
Original assignee: 南韓商三星電子股份有限公司
Priority date: 2016-05-20
Filing date: 2017-05-19
Publication date: 2023-06-11
Also published as: JP2017208096A; TW201741883A; JP6920107B2; CN107402889B; KR20170131274A; CN107402889A; KR102190403B1

Abstract

A method of retrieving data stored in a memory associated with a dedupe module is provided. The method includes: identifying a logical address of the data; identifying a physical line ID of the data in accordance with the logical address by looking up at least a portion of the logical address in a translation table; locating a respective physical line, the respective physical line corresponding to the PLID; and retrieving the data from the respective physical line, the retrieving including copying a respective hash cylinder to the read cache, the respective hash cylinder including: a respective hash bucket, the respective hash bucket including the respective physical line; and a respective reference counter bucket, the respective reference counter bucket including a respective reference counter associated with the respective physical line.

Description

擷取記憶體中儲存的資料的方法與去重複模組 Method for retrieving data stored in memory and deduplication module

[相關申請案的交叉參考] [CROSS-REFERENCE TO RELATED APPLICATIONS]

本申請案是於2016年5月20日提出申請的美國非臨時專利申請案第15/161,136號的部分接續案，且亦是於2016年5月23日提出申請的美國非臨時專利申請案第15/162,517號的部分接續案，美國非臨時專利申請案第15/161,136號主張於2016年3月29日提出申請的美國臨時專利申請案第62/314,918號的優先權及權利，美國非臨時專利申請案第15/162,517號主張於2016年3月31日提出申請的美國臨時專利申請案第62/316,397號的優先權及權利，並且本申請案更主張於2017年2月1日提出申請的美國臨時專利申請案第62/453,461號、於2016年7月29日提出申請的美國臨時專利申請案第62/368,775號、於2017年1月27日提出申請的美國臨時專利申請案第62/451,157號、於2016年3月31日提出申請的美國臨時專利申請案第62/316,402號、及於2017年1月25號提出申請的62/450,502號的優先權及權利，該些專利申請案的全部內容皆併入本案供參考。 This application is a continuation-in-part of U.S. Nonprovisional Patent Application No. 15/161,136, filed May 20, 2016, and U.S. Nonprovisional Patent Application No. Continuation-in-Part of 15/162,517, U.S. Nonprovisional Patent Application No. 15/161,136 Claiming Priority and Rights to U.S. Provisional Patent Application No. 62/314,918, filed March 29, 2016, U.S. Nonprovisional Patent Application No. 62/314,918 Patent Application No. 15/162,517 asserts priority and rights to U.S. Provisional Patent Application No. 62/316,397, filed March 31, 2016, and the present application claims February 1, 2017 U.S. Provisional Patent Application No. 62/453,461, filed July 29, 2016, U.S. Provisional Patent Application No. 62/368,775, filed January 27, 2017, U.S. Provisional Patent Application No. 62 /451,157, U.S. Provisional Patent Application No. 62/316,402, filed March 31, 2016, and U.S. Provisional Patent Application No. 62/450,502, filed January 25, 2017, which The entire contents of this case are incorporated herein by reference.

根據本發明的實施例的一或多個態樣是有關於系統記憶體及儲存器，且更具體而言是有關於高容量低潛時記憶體及儲存器。 One or more aspects of embodiments according to the invention relate to system memory memory and storage, and more particularly to high-capacity low-latency memory and storage.

例如資料庫(database)、虛擬桌上架構(virtual desktop infrastructure)、及資料分析(data analytics)等典型現代電腦應用程式需要大的主記憶體。隨著電腦系統按比例減小以執行更複雜的資料及儲存密集型應用程式，對較大記憶體容量的需求成比例地增長。 Typical modern computer applications such as databases, virtual desktop infrastructure, and data analytics require large main memory. As computer systems scale down to execute more complex data- and storage-intensive applications, the need for larger memory capacities grows proportionally.

通常，隨機存取記憶體(random-access memory，RAM)在所述隨機存取記憶體的實體設計可儲存的資料數量方面受限。舉例而言，8吉位元組(GB)的動態隨機存取記憶體(dynamic random access memory，DRAM)可通常最大容納8吉位元組的資料。此外，未來的資料中心應用程式將使用高容量低潛時記憶體。 Typically, random-access memory (RAM) is limited in the amount of data that the physical design of the random-access memory can store. For example, an 8 gigabyte (GB) dynamic random access memory (DRAM) can usually hold a maximum of 8 gigabytes of data. In addition, future data center applications will use high-capacity low-latency memory.

在此先前技術章節中揭露的以上資訊僅用於增強對本發明的背景的理解，且因此其可能含有不構成先前技術的資訊。 The above information disclosed in this prior art section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art.

本發明的實施例的各態樣是有關於能使隨機存取記憶體(RAM)內的記憶體容量較所述隨機存取記憶體的實體記憶體大小大的方法及相關聯結構。根據本發明的實施例，使用去重複演算法來達成資料記憶體減小(data memory reduction)及上下文定址(context addressing)。根據本發明的實施例，將使用者資料儲存於藉由所述使用者資料的雜湊值來進行索引的雜湊表中。 Aspects of the embodiments of the present invention relate to methods and associated structures that enable the memory capacity in random access memory (RAM) to be larger than the physical memory size of the RAM. According to an embodiment of the present invention, a deduplication algorithm is used to achieve data memory reduction and context addressing. According to an embodiment of the present invention, the user data is stored in a hash table indexed by a hash value of the user data.

根據本發明的實施例，提供一種擷取在與去重複模組相關聯的記憶體中儲存的資料的方法，所述去重複模組包括讀取快取，所述記憶體包括轉譯表及組合資料結構，所述組合資料結構包括雜湊表及參考計數器表，所述雜湊表及所述參考計數器表各自儲存於所述組合資料結構的多個雜湊圓柱體中，所述雜湊表包括多個雜湊桶，所述多個雜湊桶各自包括多個實體線，每一所述實體線儲存資料，所述參考計數器表包括多個參考計數器桶，所述多個參考計數器桶各自包括多個參考計數器。所述方法包括：辨識所述資料的邏輯位址；藉由在轉譯表中查找所述邏輯位址的至少一部分來根據所述邏輯位址辨識所述資料的實體線ID(physical line ID，PLID)；對所述多個實體線中的相應實體線進行定位，所述相應實體線對應於所述實體線ID；以及自所述相應實體線擷取所述資料，所述擷取包括將所述多個雜湊圓柱體中的相應雜湊圓柱體拷貝至所述讀取快取，所述相應雜湊圓柱體包括：所述多個雜湊桶中的相應雜湊桶，所述相應雜湊桶包括所述相應實體線；以及所述多個參考計數器桶中的相應參考計數器桶，所述相應參考計數器桶包括與所述相應實體線相關聯的相應參考計數器。 According to an embodiment of the present invention, there is provided a method of retrieving data stored in memory associated with a deduplication module, the deduplication module including a read cache, the memory including translation tables and combination a data structure, the combined data structure comprising a hash table and a reference counter table each stored in a plurality of hash cylinders of the combined data structure, the hash table comprising a plurality of hash Buckets, each of the plurality of hash buckets includes a plurality of entity lines, and each of the entity lines stores data, and the reference counter table includes a plurality of reference counter buckets, and each of the plurality of reference counter buckets includes a plurality of reference counters. The method includes: identifying a logical address of the data; identifying a physical line ID (physical line ID, PLID) of the data according to the logical address by looking up at least a part of the logical address in a translation table ); positioning a corresponding entity line among the plurality of entity lines, the corresponding entity line corresponding to the entity line ID; and retrieving the data from the corresponding entity line, the retrieving comprising the A corresponding hash cylinder in the plurality of hash cylinders is copied to the read cache, the corresponding hash cylinder includes: a corresponding hash bucket in the plurality of hash buckets, the corresponding hash bucket includes the corresponding a physical line; and a respective reference counter bucket of the plurality of reference counter buckets, the respective reference counter bucket comprising a respective reference counter associated with the respective physical line.

所述方法可更包括基於所述實體線ID來確定所述資料儲存於所述雜湊表中。 The method may further include determining that the data is stored in the hash table based on the entity line ID.

所述實體線ID可利用應用於所述資料的第一雜湊函數來產生。所述實體線ID可包括指向所述雜湊表中的位置的位址。 The entity line ID may be generated using a first hash function applied to the data. The physical line ID may include an address pointing to a location in the hash table.

所述實體線ID可包括：第一辨識符，表示所述資料是儲存於所述雜湊表中還是儲存於溢出記憶體區中；第二辨識符，表示儲存有所述資料的列；以及第三辨識符，表示儲存有所述資料的行(或路線(way))。 The entity line ID may include: a first identifier, indicating whether the data is stored in the hash table or in an overflow memory area; a second identifier, indicating a row in which the data is stored; and 3. An identifier indicating the row (or way) in which the data is stored.

所述組合資料結構可更包括簽名表，所述簽名表包括多個簽名桶，每一所述簽名桶包括多個簽名。所述相應雜湊圓柱體可更包括所述多個簽名桶中的相應簽名桶，所述相應簽名桶包括與所述相應實體線相關聯的相應簽名。 The combined data structure may further include a signature table, the signature table includes a plurality of signature buckets, each of the signature buckets includes a plurality of signatures. The respective hash cylinder may further include a respective signature bucket of the plurality of signature buckets, the respective signature bucket including a respective signature associated with the respective entity line.

所述實體線ID可利用應用於所述資料的第一雜湊函數來產生。所述實體線ID可包括指向所述雜湊表中的位置的位址。所述多個簽名可利用較所述第一雜湊函數小的第二雜湊函數來產生。 The entity line ID may be generated using a first hash function applied to the data. The physical line ID may include an address pointing to a location in the hash table. The plurality of signatures may be generated using a second hash function that is smaller than the first hash function.

每一參考計數器可追蹤針對儲存於所述雜湊表中的對應資料進行的去重複次數。 Each reference counter tracks the number of deduplications performed on corresponding data stored in the hash table.

根據本發明的實施例，提供一種將資料儲存於與去重複引擎相關聯的記憶體中的方法。所述方法包括：辨識將被儲存的所述資料；利用第一雜湊函數來確定與在所述記憶體中的雜湊表中所述資料所應儲存之處對應的第一雜湊值；將所述資料儲存於所述雜湊表中與所述第一雜湊值對應的位置；利用較所述第一雜湊函數小的第二雜湊函數來確定亦與所述資料所應儲存之處對應的第二雜湊值；將所述第一雜湊值儲存於所述記憶體中的轉譯表中；以及將所述第二雜湊值儲存於所述記憶體中的簽名表中。 According to an embodiment of the present invention, a method of storing data in a memory associated with a deduplication engine is provided. The method includes: identifying the data to be stored; using a first hash function to determine a first hash value corresponding to where the data should be stored in a hash table in the memory; The data is stored in the hash table at a location corresponding to the first hash value; using a second hash function smaller than the first hash function to determine a second hash corresponding to the location where the data should be stored value; storing the first hash value in a translation table in the memory; and storing the second hash value in a signature table in the memory.

所述方法可更包括使參考計數器表中與所述資料對應的參考計數器遞增。 The method may further include incrementing a reference counter corresponding to the profile in a reference counter table.

所述記憶體可包括：所述雜湊表，儲存多個資料；所述轉譯表，儲存利用所述第一雜湊函數產生的多個實體線ID(PLID)；所述簽名表，儲存利用所述第二雜湊函數產生的多個簽名；參考計數器表，儲存多個參考計數器，每一所述參考計數器追蹤針對儲存於所述雜湊表中的對應資料進行的去重複次數；以及溢出記憶體區。 The memory may include: the hash table, which stores a plurality of data; the translation table, which stores a plurality of entity line IDs (PLIDs) generated by the first hash function; A plurality of signatures generated by the second hash function; a reference counter table storing a plurality of reference counters, each of which tracks the number of deduplication performed on corresponding data stored in the hash table; and an overflow memory area.

所述實體線ID中的每一者可包括：第一辨識符，表示所述資料是儲存於所述雜湊表中還是儲存於所述溢出記憶體區中；第二辨識符，表示儲存有所述資料的列；以及第三辨識符，表示儲存有所述資料的行。 Each of the physical line IDs may include: a first identifier indicating whether the data is stored in the hash table or in the overflow memory area; a second identifier indicating whether the data is stored in the overflow memory area; the column of the data; and a third identifier indicating the row in which the data is stored.

所述雜湊表、所述簽名表、及所述參考計數器表可被整合至組合資料結構中。所述組合資料結構可包括多個雜湊圓柱體，每一所述雜湊圓柱體包括：雜湊桶，包括多個實體線；簽名桶，包括與所述多個實體線對應的相應簽名；以及參考計數器桶，包括與所述多個實體線對應的相應參考計數器。 The hash table, the signature table, and the reference counter table may be integrated into a combined data structure. The composite data structure may include a plurality of hash cylinders, each of the hash cylinders includes: a hash bucket including a plurality of entity lines; a signature bucket including corresponding signatures corresponding to the plurality of entity lines; and a reference counter a bucket including corresponding reference counters corresponding to the plurality of entity lines.

所述將所述資料儲存於所述雜湊表中與所述第一雜湊值對應的位置可包括將所述資料儲存於與所述第一雜湊值對應的所述雜湊桶中。所述將所述第二雜湊值儲存於所述簽名表中可包括將所述第二雜湊值儲存於與儲存所述資料的所述雜湊桶對應的所述簽名桶中。 The storing the data in the hash table at a location corresponding to the first hash value may include storing the data in the hash bucket corresponding to the first hash value. The storing the second hash value in the signature table may include storing the second hash value in the signature bucket corresponding to the hash bucket storing the data.

根據本發明的實施例，提供一種去重複模組，所述去重複模組包括：讀取快取；去重複引擎，自主機系統接收資料擷取請求；以及記憶體，所述記憶體包括轉譯表以及組合資料結構，所述組合資料結構包括：雜湊表，包括多個雜湊桶，每一所述雜湊桶包括多個實體線，每一所述實體線均儲存資料；參考計數器表，包括多個參考計數器桶，每一所述參考計數器桶包括多個參考計數器；以及多個雜湊圓柱體，每一所述雜湊圓柱體包括所述雜湊桶中的一者及所述參考計數器桶中的一者。所述資料擷取請求使所述去重複引擎：辨識所述資料的邏輯位址；藉由在轉譯表中查找所述邏輯位址的至少一部分來根據所述邏輯位址辨識所述資料的實體線ID(PLID)；對所述多個實體線中的相應實體線進行定位，所述相應實體線對應於所述實體線ID；以及自所述實體線擷取所述資料，所述擷取所述資料包括將所述多個雜湊圓柱體中的相應雜湊圓柱體拷貝至所述讀取快取，所述相應雜湊圓柱體包括：所述多個雜湊桶中的相應雜湊桶，所述相應雜湊桶包括所述相應實體線；以及所述多個參考計數器桶中的相應參考計數器桶，所述相應參考計數器桶包括與所述相應實體線相關聯的相應參考計數器。 According to an embodiment of the present invention, a deduplication module is provided, the deduplication module includes: a read cache; a deduplication engine, which receives a data retrieval request from a host system; and a memory, the memory includes a translation A table and a combined data structure, the combined data structure includes: a hash table, including a plurality of hash buckets, each of which includes a plurality of entity lines, and each of the entity lines stores data; a reference counter table, including multiple a reference counter bucket, each of the reference counter buckets includes a plurality of reference counters; and a plurality of hash cylinders, each of the hash cylinders includes one of the hash buckets and one of the reference counter buckets By. The data retrieval request causes the deduplication engine to: identify a logical address of the data; identify an entity of the data based on the logical address by looking up at least a portion of the logical address in a translation table a line ID (PLID); locating a corresponding physical line in the plurality of physical lines, the corresponding physical line corresponding to the physical line ID; and retrieving the data from the physical line, the retrieving The data includes copying a corresponding hash cylinder of the plurality of hash cylinders to the read cache, the corresponding hash cylinder comprising: a corresponding hash bucket of the plurality of hash buckets, the corresponding The hash bucket includes the respective entity line; and a respective reference counter bucket of the plurality of reference counter buckets, the respective reference counter bucket comprising a respective reference counter associated with the respective entity line.

所述資料擷取請求可更使所述去重複引擎基於所述實體線ID來確定所述資料儲存於所述雜湊表中。 The data retrieval request may further cause the deduplication engine to determine that the data is stored in the hash table based on the entity line ID.

所述實體線ID可包括：第一辨識符，表示所述資料是儲存於所述雜湊表中還是儲存於溢出記憶體區中；第二辨識符，表示儲存有所述資料的列；以及第三辨識符，表示儲存有所述資料的行。 The entity line ID may include: a first identifier, indicating whether the data is stored in the hash table or in an overflow memory area; a second identifier, indicating a row in which the data is stored; and Three identifiers indicating the row in which the data is stored.

每一所述參考計數器可追蹤針對儲存於所述雜湊表中的對應資料進行的去重複次數。 Each of the reference counters tracks the number of deduplications performed on corresponding data stored in the hash table.

根據本發明的實施例，提供一種去重複模組，所述去重複模組包括：主機介面；傳送管理器，經由所述主機介面自主機系統接收資料傳送請求；以及多個分區，每一所述分區包括：去重複引擎，自所述傳送管理器接收分區資料請求；多個記憶體控制器；記憶體管理器，位於所述去重複引擎與所述記憶體控制器之間；以及多個記憶體模組，每一所述記憶體模組耦合至所述記憶體控制器中的一者。 According to an embodiment of the present invention, a deduplication module is provided, the deduplication module includes: a host interface; a transfer manager, which receives a data transfer request from a host system through the host interface; and a plurality of partitions, each of which The partition includes: a deduplication engine, receiving a partition data request from the transfer manager; a plurality of memory controllers; a memory manager, located between the deduplication engine and the memory controller; and a plurality of Memory modules, each of the memory modules is coupled to one of the memory controllers.

根據本發明的實施例，提供一種去重複模組，所述去重複模組包括讀取快取、記憶體、及去重複引擎。所述記憶體包括：轉譯表；雜湊表，包括多個雜湊桶，每一所述雜湊桶包括多個實體線，每一所述實體線均儲存資料；以及參考計數器表，包括多個參考計數器桶，每一所述參考計數器桶包括多個參考計數器。所述去重複引擎辨識所述多個雜湊桶中的第一雜湊桶的V個虛擬桶，所述虛擬桶是所述多個雜湊桶中的位於所述第一雜湊桶附近的其他桶，所述虛擬桶將在所述第一雜湊桶已滿時儲存所述第一雜湊桶的資料中的一些資料，V是整數且基於所述第一雜湊桶的虛擬桶有多滿來動態地設定。 According to an embodiment of the present invention, a deduplication module is provided, the deduplication Complex modules include read cache, memory, and deduplication engines. The memory includes: a translation table; a hash table, including a plurality of hash buckets, each of which includes a plurality of physical lines, and each of the physical lines stores data; and a reference counter table, including a plurality of reference counters Buckets, each of the reference counter buckets includes a plurality of reference counters. The deduplication engine identifies V virtual buckets of a first hash bucket in the plurality of hash buckets, the virtual buckets are other buckets in the plurality of hash buckets that are located near the first hash bucket, so The virtual bucket will store some data in the first hash bucket when the first hash bucket is full, V is an integer and is dynamically set based on how full the virtual bucket of the first hash bucket is.

100、150:去重複模組 100, 150: deduplication module

130:橋接器 130: bridge

140:記憶體控制器 140: Memory controller

142:記憶體控制器0 142: memory controller 0

144:記憶體控制器 144:Memory controller

1160、162:主機介面 1160, 162: host interface

170、170’:讀取快取 170, 170': read cache

180:記憶體模組 180:Memory module

182:雙列直插記憶體模組/快閃記憶體0 182: Dual in-line memory module/flash memory 0

184:雙列直插記憶體模組/快閃記憶體1 184: Dual in-line memory module/flash memory 1

200、202:去重複引擎 200, 202: deduplication engine

210:記憶體管理器 210:Memory manager

220、220’:雜湊表 220, 220': hash table

230:傳送管理器 230: Transmission Manager

240:轉譯表 240:Translation table

242、242’:頁面索引表 242, 242': page index table

244、244’:第2層映射表 244, 244': layer 2 mapping table

260、260’:簽名及參考計數器表 260, 260': signature and reference counter table

280、280’、280”:溢出記憶體區 280, 280’, 280”: overflow memory area

310、310’:邏輯位址 310, 310': logical address

312:轉譯表索引 312: Translation table index

314、314’:粒度 314, 314': Granularity

316:頁面索引 316: Page index

318:頁面表項 318: Page Table Entry

320、320’:實體位址 320, 320': entity address

322:區位元 322: location element

326、326’:列索引 326, 326': column index

328、328’:行索引 328, 328': row index

400、400’:實體線 400, 400': solid line

500、500-i、500-M-1:雜湊圓柱體 500, 500-i, 500-M-1: hash cylinder

500-0:第一圓柱體 500-0: first cylinder

520、520-i、520-M-1:簽名桶 520, 520-i, 520-M-1: signature barrel

520-0:第一簽名桶 520-0: First Signature Bucket

540、540’、540-i、540-M-1:參考計數器桶 540, 540', 540-i, 540-M-1: Reference Counter Bucket

540-0:第一參考計數器桶 540-0: first reference counter bucket

560、560’、560-i、560-M-1:雜湊桶 560, 560', 560-i, 560-M-1: hash bucket

560-0:第一雜湊桶 560-0: the first hash bucket

600:組合資料結構/組合結構/組合表 600: Combination data structure/combination structure/combination table

1000、1010、1020、1030、1040、1050、1060、1100、1110、 1120、1130、1140、1150:步驟 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1100, 1110, 1120, 1130, 1140, 1150: steps

COL_INDX:行索引 COL_INDX: row index

LA:邏輯位址 LA: logical address

R_INDX:列索引 R_INDX: column index

RGN:區位元 RGN: location element

V:高度 V: height

VBH:虛擬桶高度索引 VBH: virtual bucket height index

參照本說明書、申請專利範圍、及附圖將領會及理解本發明的該些及其他特徵及態樣，在附圖中：圖1是根據本發明實施例的去重複模組的方塊圖。 These and other features and aspects of the present invention will be comprehended and understood with reference to this specification, the scope of the patent application, and the accompanying drawings. In the accompanying drawings: FIG. 1 is a block diagram of a deduplication module according to an embodiment of the present invention.

圖2是根據本發明另一實施例的去重複模組的方塊圖。 FIG. 2 is a block diagram of a deduplication module according to another embodiment of the invention.

圖3是根據本發明實施例的去重複引擎的邏輯概念的方塊圖。 FIG. 3 is a block diagram of a logical concept of a deduplication engine according to an embodiment of the present invention.

圖4是根據本發明實施例的包括單層式轉譯表的去重複引擎的邏輯概念的方塊圖。 FIG. 4 is a block diagram of a logical concept of a deduplication engine including a single-level translation table according to an embodiment of the present invention.

圖5是根據本發明實施例的包括兩層式轉譯表的去重複引擎的邏輯概念的方塊圖。 FIG. 5 is a block diagram of a logical concept of a deduplication engine including a two-level translation table according to an embodiment of the present invention.

圖6是根據本發明實施例的包括具有動態第2層(level two， L2)映射表、簽名及參考計數器表、以及溢出記憶體區的兩層式轉譯表的去重複引擎的邏輯概念的方塊圖。 Fig. 6 is according to the embodiment of the present invention and includes dynamic layer 2 (level two, L2) A block diagram of the logic concept of the deduplication engine of the mapping table, the signature and reference counter table, and the two-level translation table of the overflow memory area.

圖7是根據本發明實施例的雜湊圓柱體的邏輯概念的方塊圖。 FIG. 7 is a block diagram of a logical concept of a hash cylinder according to an embodiment of the present invention.

圖8是根據本發明實施例的組合資料結構的邏輯概念的方塊圖。 FIG. 8 is a block diagram of the logical concept of a combined data structure according to an embodiment of the present invention.

圖9是根據本發明實施例的與虛擬桶相關聯的雜湊桶及對應的參考計數器桶的邏輯概念的方塊圖。 FIG. 9 is a block diagram of a logical concept of a hash bucket associated with a virtual bucket and a corresponding reference counter bucket according to an embodiment of the present invention.

圖10是說明根據本發明實施例的擷取在隨機存取記憶體(RAM)中儲存的資料的方法的流程圖。 FIG. 10 is a flowchart illustrating a method of retrieving data stored in random access memory (RAM) according to an embodiment of the present invention.

圖11是說明根據本發明實施例的將資料儲存於隨機存取記憶體中的方法的流程圖。 FIG. 11 is a flowchart illustrating a method for storing data in a random access memory according to an embodiment of the present invention.

本發明的實施例是有關於能使記憶體(例如，隨機存取記憶體(RAM))內的記憶體容量較實體記憶體大小大的方法及相關聯結構。根據本發明的實施例，使用去重複演算法來達成資料記憶體減小及上下文定址。根據本發明的實施例，使用者資料儲存於藉由所述使用者資料的雜湊值來進行索引的雜湊表中。 Embodiments of the present invention relate to methods and associated structures that enable memory capacity in a memory (eg, random access memory (RAM)) to be larger than the size of physical memory. According to an embodiment of the present invention, a deduplication algorithm is used to achieve data memory reduction and context addressing. According to an embodiment of the present invention, the user data is stored in a hash table indexed by a hash value of the user data.

儘管動態隨機存取記憶體(DRAM)技術急劇地按比例減小以超越20奈米(nm)製程技術來滿足對記憶體容量的此增長需求，然而可應用例如去重複等技術來將系統記憶體的虛擬記憶體容量相較於所述系統記憶體的實體記憶體容量提高多達2至3 倍或更多。另外，本發明的實施例可利用其他類型的記憶體(例如，快閃記憶體)。 Although dynamic random access memory (DRAM) technology is scaling down dramatically beyond 20 nanometer (nm) process technology to meet this increasing demand for memory capacity, techniques such as deduplication can be applied to The virtual memory capacity of the system memory is increased by up to 2 to 3 times compared to the physical memory capacity of the system memory times or more. Additionally, embodiments of the invention may utilize other types of memory (eg, flash memory).

使用輔助壓縮方法，本發明的實施例可提供先進的經去重複記憶體及資料結構以藉由充分利用所有記憶體資源來持續達成高的去重複比率。 Using assisted compression methods, embodiments of the present invention can provide advanced deduplicated memory and data structures to consistently achieve high deduplication ratios by fully utilizing all memory resources.

資料中心應用程式非常需要具有高容量及低潛時的記憶體裝置。此記憶體裝置可採用去重複方案及資料壓縮方案來提供較其實體記憶體大小大的記憶體容量。經去重複記憶體裝置可藉由減小重複使用者資料及充分利用可用的記憶體資源而持續地達成高的去重複比率。另外，由經去重複記憶體裝置所採用的去重複方案可達成對經去重複資料的高效定址。 Data center applications strongly require memory devices with high capacity and low latency. The memory device may employ a deduplication scheme and a data compression scheme to provide a memory capacity larger than its physical memory size. Deduplicated memory devices can consistently achieve high deduplication ratios by reducing duplication of user data and fully utilizing available memory resources. In addition, the deduplication scheme employed by deduplicated memory devices can achieve efficient addressing of deduplicated data.

資料去重複(或資料重複去除(data duplication elimination))指代減少記憶體裝置中的冗餘資料，以由此降低記憶體裝置的容量成本(capacity cost)。在資料去重複中，資料物件/項(例如，資料檔案)被分割成一或多個資料線/塊/區塊。藉由將由相同的資料組成的多個資料區塊與所儲存的單一資料區塊相關聯，資料區塊的重複拷貝可藉由電腦記憶體而得到減少或去除，由此減少記憶體裝置中的冗餘資料拷貝的總數量。冗餘資料拷貝的減少可提高讀取潛時及記憶體頻寬，且可潛在地使電力得到節省。 Data deduplication (or data duplication elimination) refers to reducing redundant data in a memory device, thereby reducing the capacity cost of the memory device. In data deduplication, a data object/item (eg, a data file) is divided into one or more data lines/blocks/blocks. By associating multiple blocks of data consisting of the same data with a single block of stored data, duplicate copies of blocks of data can be reduced or eliminated by computer memory, thereby reducing memory usage in memory devices. The total number of redundant data copies. The reduction of redundant data copies increases read latency and memory bandwidth, and potentially results in power savings.

因此，若重複的資料拷貝可減少至單個資料拷貝，則在使用相同數量的實體資源的同時記憶體裝置的總可用容量會增大。由於記憶體裝置的所得縮減(resultant economization)使得能夠減少資料重寫計數、且由於可捨棄對已儲存於記憶體中的重複資料區塊的寫入請求，因此可藉由有效地提高寫入持久性(write endurance)來延長實作資料去重複的記憶體裝置的壽命。 Therefore, if duplicate data copies can be reduced to a single data copy, the total usable capacity of the memory device can be increased while using the same amount of physical resources. big. Since resultant economicalization of memory devices enables reduction of data rewrite counts, and since write requests to duplicate data blocks already stored in memory can be discarded, write endurance can be effectively improved Write endurance is used to prolong the life of memory devices implementing data deduplication.

相關技術的資料去重複方法可能使用記憶體中去重複技術(in-memory deduplication technology)，藉此將去重複引擎與中央處理單元(central processing unit，CPU)或記憶體控制器(memory controller，MC)按以中央處理單元為中心的方式進行整合。此類方法通常實作經去重複快取(deduplicated cache，DDC)，所述經去重複快取與所述記憶體控制器運作進而使得中央處理單元處理器能夠察覺到複本，並嘗試根據所述記憶體控制器的控制來提供經去重複記憶體操作(例如，內容查找、參考計數更新等)。去重複方法亦可實作直接轉譯緩衝器(direct translation buffer，DTB)，所述直接轉譯緩衝器是用於對轉譯線進行快取以藉由自關鍵路徑移除轉譯提取(translation fetch)來提高資料讀取的快取，且直接轉譯緩衝器可相似於旁視緩衝器(lookaside buffer)。 The data deduplication method of the related art may use in-memory deduplication technology (in-memory deduplication technology), whereby the deduplication engine and the central processing unit (central processing unit, CPU) or memory controller (memory controller, MC) are connected. ) are integrated in a central processing unit-centric manner. Such methods typically implement a deduplicated cache (DDC) that works with the memory controller such that the CPU is aware of duplicates and attempts to Control of the memory controller to provide deduplicated memory operations (eg, content lookups, reference count updates, etc.). The deduplication approach can also implement a direct translation buffer (DTB), which is used to cache translation lines to improve translation fetches by removing them from the critical path. Cache for data reads, and direct translation buffers can be similar to lookaside buffers.

去重複已被最普遍地用於硬驅動機。然而，對於在例如動態隨機存取記憶體(DRAM)等揮發性記憶體的區域中提供微粒去重複(fine grain deduplication)亦存在關注。 Deduplication has been most commonly used for hard drives. However, there is also interest in providing fine grain deduplication in areas of volatile memory such as dynamic random access memory (DRAM).

以下結合附圖闡述的詳細說明旨在說明根據本發明而提供的能使隨機存取記憶體(或其他記憶體儲存器)內的記憶體容量較所述隨機存取記憶體(或其他記憶體儲存器)的實體記憶體大小大的方法及相關聯結構的示例性實施例，且並非旨在代表可用以構造或利用本發明的僅有形式。本說明結合所示實施例闡述本發明的特徵。然而應理解，相同的或等效的功能及結構亦可藉由亦旨在被囊括於本發明的精神及範圍內的不同實施例來達成。如本文他處所示，相同的元件編號旨在表示相同的元件或特徵。 The detailed description set forth below in conjunction with the accompanying drawings is intended to illustrate the memory capacity in random access memory (or other memory storage) provided according to the present invention. Exemplary embodiments of methods and associated structures that are larger than the physical memory size of random access memory (or other memory storage) described, and are not intended to represent the only embodiments that may be used to construct or utilize the present invention form. This description sets forth the features of the invention in conjunction with the illustrated embodiments. It should be understood, however, that the same or equivalent functions and structures can be achieved by different embodiments that are also intended to be included within the spirit and scope of the present invention. As indicated elsewhere herein, like element numbers are intended to refer to like elements or features.

圖1是根據本發明實施例的去重複模組的方塊圖。參照圖1，根據本發明實施例的去重複模組100包括橋接器130、記憶體控制器140、主機介面(host interface，host I/F)160、讀取快取170、一或多個記憶體模組180、及去重複引擎200。 FIG. 1 is a block diagram of a deduplication module according to an embodiment of the present invention. Referring to FIG. 1, a deduplication module 100 according to an embodiment of the present invention includes a bridge 130, a memory controller 140, a host interface (host interface, host I/F) 160, a read cache 170, and one or more memories Body module 180, and deduplication engine 200.

橋接器130可提供用於使去重複引擎200及讀取快取170能夠與記憶體控制器140通訊的介面。記憶體控制器140可提供用於使橋接器130與記憶體模組180通訊的介面。讀取快取170可為記憶體模組180的一部分。 Bridge 130 may provide an interface for enabling deduplication engine 200 and read cache 170 to communicate with memory controller 140 . The memory controller 140 can provide an interface for the bridge 130 to communicate with the memory module 180 . The read cache 170 can be a part of the memory module 180 .

在一些實施例中，橋接器130可不存在。在此種情形中，記憶體控制器140可直接與去重複引擎200及讀取快取170通訊。 In some embodiments, bridge 130 may not be present. In this case, the memory controller 140 can communicate directly with the deduplication engine 200 and the read cache 170 .

去重複引擎200經由主機介面160與主機系統通訊以在記憶體模組180中儲存或存取資料。去重複引擎200可進一步經由主機介面160與主機系統的其他組件通訊。 The deduplication engine 200 communicates with the host system via the host interface 160 to store or access data in the memory module 180 . The deduplication engine 200 can further communicate with other components of the host system via the host interface 160 .

記憶體模組180可為用於連接動態隨機存取記憶體的雙列直插記憶體模組(dual in-line memory module，DIMM)槽，或者其可為快閃記憶體(用於連接其他類型的記憶體的槽)等。 The memory module 180 can be a dual in-line memory module (DIMM) slot for connecting to a dynamic random access memory, or it can be a flash memory (for connecting to other type of memory slot), etc.

圖2是根據本發明另一實施例的去重複模組的方塊圖。參照圖2，去重複模組150可包括一或多個分區250(例如，分區0 205-0、分區1 205-1等)、傳送管理器230、及主機介面(I/F)162。每一分區250可包括去重複引擎202、記憶體管理器210、一或多個記憶體控制器(例如，記憶體控制器(MC0)142、記憶體控制器(MC1)144等)、以及一或多個記憶體模組(例如，DIMM/快閃記憶體0 182、DIMM/快閃記憶體1184等)。 FIG. 2 is a block diagram of a deduplication module according to another embodiment of the invention. Referring to FIG. 2 , the deduplication module 150 may include one or more partitions 250 (eg, partition 0 205 - 0 , partition 1 205 - 1 , etc.), a transfer manager 230 , and a host interface (I/F) 162 . Each partition 250 may include deduplication engine 202, memory manager 210, one or more memory controllers (eg, memory controller (MCO) 142, memory controller (MC1) 144, etc.), and a or multiple memory modules (eg, DIMM/Flash 0 182, DIMM/Flash 1 184, etc.).

去重複引擎202中的每一者可直接與傳送管理器230通訊或經由主機介面162與主機系統通訊。傳送管理器230可經由主機介面162與主機系統通訊。 Each of deduplication engines 202 may communicate directly with delivery manager 230 or with a host system via host interface 162 . The transfer manager 230 can communicate with the host system via the host interface 162 .

傳送管理器230可經由主機介面162自主機系統接收資料傳送請求。傳送管理器230可進一步管理往來於去重複模組的所述一或多個分區250進行的資料傳送。在一些實施例中，傳送管理器230可確定在哪一分區250上儲存將被儲存的(例如，儲存於隨機存取記憶體中的)資料。在其他實施例中，傳送管理器自主機系統接收關於應在哪一分區250上儲存資料的指令。在一些實施例中，傳送管理器230可對自主機系統接收的資料進行拆分並將其發送至所述分區中的兩者或更多者。 The transfer manager 230 can receive data transfer requests from the host system via the host interface 162 . The transfer manager 230 may further manage data transfers to and from the one or more partitions 250 of the deduplication module. In some embodiments, transfer manager 230 may determine on which partition 250 to store data (eg, stored in random access memory) to be stored. In other embodiments, the transfer manager receives instructions from the host system as to which partition 250 the data should be stored on. In some embodiments, transfer manager 230 may split and send material received from the host system to two or more of the partitions.

去重複模組150可經由主機介面162與主機系統的各組件通訊。 The deduplication module 150 can communicate with various components of the host system through the host interface 162 .

去重複引擎202可為其相應分區250而自傳送管理器230接收分區資料請求。去重複引擎202可進一步控制資料在記憶體模組中的存取及儲存。記憶體管理器210可確定在所述一或多個記憶體模組中的哪一記憶體模組上儲存所述資料或者應在所述一或多個記憶體模組中的哪一記憶體模組上儲存所述資料。所述一或多個記憶體控制器可控制資料在其相應記憶體模組上的儲存或存取。 Deduplication engine 202 may receive a partition data request from delivery manager 230 for its corresponding partition 250 . The deduplication engine 202 can further control data in memory Access and storage in the module. The memory manager 210 may determine which memory module of the one or more memory modules to store the data on or which memory module of the one or more memory modules should be stored The data is stored on the module. The one or more memory controllers can control the storage or access of data on their respective memory modules.

在一些實施例中，去重複引擎202及記憶體管理器210可被實作成能夠執行記憶體管理器的功能與去重複引擎的功能二者的單一的記憶體管理器。 In some embodiments, deduplication engine 202 and memory manager 210 may be implemented as a single memory manager capable of performing both the functions of the memory manager and the functions of the deduplication engine.

所述一或多個記憶體控制器、記憶體管理器210、及去重複引擎可各自利用任何適合的硬體(例如，應用專用積體電路(application-specific integrated circuit，ASIC))、韌體(例如，數位訊號處理器(digital signal processor，DSP)或現場可程式化閘陣列(field programmable gate array，FPGA))、軟體、或者軟體、韌體、及硬體的適合組合來實作。此外，在下文中，可更詳細地闡述去重複引擎。 The one or more memory controllers, memory manager 210, and deduplication engine may each utilize any suitable hardware (eg, application-specific integrated circuit (ASIC)), firmware, (for example, a digital signal processor (DSP) or a field programmable gate array (FPGA)), software, or a suitable combination of software, firmware, and hardware. Furthermore, in the following, the deduplication engine may be explained in more detail.

根據一些實施例，當記憶體具有大的容量時，可使用分區來減小轉譯表大小。 According to some embodiments, when the memory has a large capacity, partitioning can be used to reduce the translation table size.

圖3是根據本發明實施例的去重複引擎的邏輯概念的方塊圖。參照圖3，去重複引擎200可包括多個表。去重複引擎200可包括雜湊表220、轉譯表240、簽名及參考計數器表260、以及溢出記憶體區280。 FIG. 3 is a block diagram of a logical concept of a deduplication engine according to an embodiment of the present invention. Referring to FIG. 3, the deduplication engine 200 may include a plurality of tables. The deduplication engine 200 may include a hash table 220 , a translation table 240 , a signature and reference counter table 260 , and an overflow memory area 280 .

雜湊表220可包括多個實體線(physical line，PL)。每一實體線可包括資料(例如，使用者資料)。雜湊表220內的資料被去重複(即，重複資料已合併至單一位置中以減少儲存空間使用)。 The hash table 220 may include a plurality of physical lines (PL). Every An entity line can include data (eg, user data). The data in the hash table 220 is deduplicated (ie, duplicate data has been consolidated into a single location to reduce storage space usage).

轉譯表240包括儲存於轉譯表240中的多個實體線ID。雜湊表的每一實體線具有儲存於轉譯表240中的相關聯實體線ID(PLID)。儲存於轉譯表240中的實體線ID是邏輯位址到實體位址的轉譯。舉例而言，當去重複引擎200需要對與特定邏輯位址相關聯的資料進行定位時，去重複引擎200可利用轉譯表240來查詢儲存於所述邏輯位址處的資料並接收與儲存有所述資料的雜湊表220的實體線對應的所述資料的實體線ID。去重複引擎200可接著存取儲存於雜湊表220中的對應實體線處的資料。 The translation table 240 includes a plurality of entity line IDs stored in the translation table 240 . Each physical line of the hash table has an associated physical line ID (PLID) stored in translation table 240 . The physical line IDs stored in the translation table 240 are logical address to physical address translations. For example, when the deduplication engine 200 needs to locate the data associated with a specific logical address, the deduplication engine 200 can use the translation table 240 to query the data stored at the logical address and receive and store the data associated with the logical address. The entity line ID of the data corresponds to the entity line of the data hash table 220 . The deduplication engine 200 can then access the data stored in the hash table 220 at the corresponding entity line.

實體線ID可使用第一雜湊函數來產生。舉例而言，當需要將資料保存於雜湊表內時，對所述資料運行第一雜湊函數以確定與應儲存所述資料的實體線對應的第一雜湊值。第一雜湊值被保存作為所述資料的實體線ID。 The entity line ID can be generated using a first hash function. For example, when the data needs to be stored in the hash table, the first hash function is run on the data to determine the first hash value corresponding to the entity line that should store the data. The first hash value is saved as the entity line ID of the data.

每一實體線ID表示目標資料線的實體位置。由於資料線可位於雜湊表220中或溢出記憶體區280中，因此實體線ID可為雜湊表220中的或溢出記憶體區280中的位置。 Each physical line ID represents the physical position of the target data line. Since the data line can be located in the hash table 220 or in the overflow memory area 280 , the physical line ID can be a location in the hash table 220 or in the overflow memory area 280 .

雜湊表220可被視作具有列行結構的表。在此種情形中，實體線ID是由區位元、列位元、及行位元組成(例如，參見圖4及其說明)。第一雜湊函數可產生列位元作為起點來尋找可在其中儲存資料的可用實體線。當找到可用實體線時可確定其他位元。 The hash table 220 can be viewed as a table with a column and row structure. In this case, the physical line ID is composed of field bits, column bits, and row bits (see, eg, FIG. 4 and its description). The first hash function can generate bits as a starting point to find an available physical line in which data can be stored. Other bits may be determined when an available physical line is found.

若在以上步驟中在雜湊表220中未找到可用實體線，則可將資料寫入至溢出記憶體區280。在此種情形中，實體線ID將為溢出記憶體區表項的實體位置。 If no available physical line is found in the hash table 220 in the above steps, the data can be written into the overflow memory area 280 . In this case, the physical line ID will be the physical location of the overflow memory area entry.

使用第二雜湊函數計算出的資料的第二雜湊值(例如，簽名)被儲存於簽名表中。第二雜湊函數可較第一雜湊函數小。第一雜湊函數與第二雜湊函數可為任何適合的雜湊函數且其可為不同的雜湊函數。 The second hash value (eg, signature) of the data calculated using the second hash function is stored in the signature table. The second hash function may be smaller than the first hash function. The first hash function and the second hash function can be any suitable hash function and they can be different hash functions.

可使用簽名來對兩個資料線進行快速比較。當有新資料線即將被寫入至雜湊表220時，可進行檢查以查看在所述雜湊表中是否已存在相同的資料線。執行此檢查可避免多次儲存相同的資料。 Signatures can be used to quickly compare two data lines. When a new data line is about to be written to the hash table 220, a check can be made to see if the same data line already exists in the hash table. Perform this check to avoid storing the same data multiple times.

若不使用簽名來進行所述檢查，則對記憶體的特定區(整個桶或整個虛擬桶)中的所有資料進行讀取以偵測重複。當使用簽名來進行所述檢查時，僅自記憶體讀取用於所述特定區的資料的簽名，此可節省頻寬。 If signatures are not used for the check, all data in a specific area of memory (the entire bucket or the entire virtual bucket) is read to detect duplication. When signatures are used for the checking, only the signatures for the data in that particular region are read from memory, which saves bandwidth.

當不存在匹配簽名時，不存在與新資料線匹配的資料線。否則，當找到匹配簽名時，由於簽名比較可能為誤報(false positive)，因此自記憶體讀取具有匹配簽名的資料線以進行進一步比較。 When there is no matching signature, there is no material line to match the new material line. Otherwise, when a matching signature is found, since the signature comparison may be a false positive, the data line with the matching signature is read from the memory for further comparison.

雜湊表中的每一資料線在簽名表中具有對應簽名且每一資料線在參考計數器表中具有對應參考計數器。 Each data line in the hash table has a corresponding signature in the signature table and each data line has a corresponding reference counter in the reference counter table.

參考計數器表追蹤針對雜湊表220中的實體線中的每一者進行的去重複次數(例如，資料已重複的次數)。當將經去重複資料的實例增添至雜湊表時，可使參考計數器表中的對應參考計數器遞增，而不是增添與前面所儲存的使用者資料相同的新使用者資料，且當自雜湊表刪除經去重複資料的實例時，可將參考計數器表中的對應參考計數器減小一。 Reference counter table traces for each of the entity lines in the hash table 220 The number of deduplications performed by the operator (for example, the number of times the data has been duplicated). When an instance of deduplicated data is added to the hash table, the corresponding reference counter in the reference counter table may be incremented instead of adding new user data identical to the previously stored user data, and when deleted from the hash table When an instance of deduplicated data is deduplicated, the corresponding reference counter in the reference counter table may be decremented by one.

此外，經去重複記憶體(即，雜湊表)是由實體線(PL)構成，其是具有固定位元寬度的使用者資料C。預設實體線長度可為64位元組，但本發明並非僅限於此。實體線長度可被配置成其他大小，舉例而言，實體線大小可大於或小於64位元組。舉例而言，實體線大小可為32位元組。 In addition, the deduplicated memory (ie, the hash table) is composed of physical lines (PL), which are user data C with a fixed bit width. The preset physical line length may be 64 bytes, but the present invention is not limited thereto. The physical line length can be configured to other sizes, for example, the physical line size can be larger or smaller than 64 bytes. For example, the physical line size may be 32 bytes.

較大的實體線大小可減小轉譯表的大小而且亦可減少重複資料的數量(即，減少因需要匹配大得多的位元圖案而進行的去重複次數)。較小的實體線大小可增大轉譯表的大小且亦可增大重複資料的數量(即，增大去重複次數)。 A larger physical line size reduces the size of the translation table and also reduces the amount of duplicated data (ie, reduces the number of deduplications required to match much larger bit patterns). A smaller physical line size can increase the size of the translation table and can also increase the amount of duplicate data (ie, increase the number of deduplication times).

轉譯表儲存被稱作實體線ID(PLID)的邏輯位址到實體位址的轉譯。實體線ID是藉由雜湊函數h₁(C)而產生。另外，對於每一實體線，在簽名表中儲存有與所述實體線相關聯的簽名。所述簽名較藉由雜湊函數h₂(C)而產生的使用者資料的雜湊結果小得多。在參考計數器表中儲存有亦與所述實體線相關聯的參考計數器。所述參考計數器對使用者資料匹配實體線內容的次數(即，去重複比率)進行計數。 The translation table stores the translation of logical addresses called physical line IDs (PLIDs) to physical addresses. The entity line ID is generated by the hash function h ₁ (C). In addition, for each entity line, a signature associated with the entity line is stored in the signature table. The signature is much smaller than the hash result of the user data generated by the hash function h ₂ (C). Reference counters that are also associated with the entity line are stored in the reference counter table. The reference counter counts the number of times the user data matches the content of the entity line (ie, the deduplication ratio).

雜湊表、簽名表、及參考計數器表皆可具有相同的資料結構但具有不同的粒度(granularity)。 Hash table, signature table, and reference counter table can all have the same data structure but with different granularity.

儘管所述多個表被示作去重複模組的一部分，然而本發明並非僅限於此。根據本發明的一些實施例，所述多個表可儲存於位於去重複模組內的記憶體(例如，隨機存取記憶體)中，且根據其他實施例，所述多個表儲存於位於去重複模組外部的記憶體(例如，隨機存取記憶體)中且由所述去重複模組以本文所述方式進行控制。 Although the tables are shown as part of the deduplication module, the invention is not limited thereto. According to some embodiments of the invention, the plurality of tables may be stored in a memory (eg, random access memory) located within the deduplication module, and according to other embodiments, the plurality of tables may be stored in a memory located in in memory (eg, random access memory) external to the deduplication module and controlled by the deduplication module in the manner described herein.

對本發明的以上特徵的其他說明可在美國專利申請案第15/473,311號中找到，所述美國專利申請案的全部內容併入本案供參考。 Additional descriptions of the above features of the present invention can be found in US Patent Application Serial No. 15/473,311, which is hereby incorporated by reference in its entirety.

圖4是根據本發明實施例的包括單層式轉譯表的去重複引擎的邏輯概念的方塊圖。轉譯表是主要元資料表，其可因自身的大小及在使用時所耗用的時間而對去重複比率、系統容量、及/或系統潛時造成影響。參照圖4，邏輯位址(LA)310可作為資料在系統記憶體(例如，動態隨機存取記憶體)中所儲存的位置而被電腦系統使用。 FIG. 4 is a block diagram of a logical concept of a deduplication engine including a single-level translation table according to an embodiment of the present invention. The translation table is the main metadata table that can have an impact on the deduplication ratio, system capacity, and/or system latency due to its size and the time it takes to use. Referring to FIG. 4, a logical address (LA) 310 may be used by a computer system as a location where data is stored in system memory (eg, DRAM).

邏輯位址310可為x位元長，其中x是整數。邏輯位址310可包括為g位元長的粒度314，其中g是整數。粒度314可定位於邏輯位址310的位元0至位元g-1處。邏輯位址310可更包括轉譯表索引312。轉譯表索引312可為x-g位元長且可定位於邏輯位址310的位元g至位元x-1處。在一些實施例中，當實體線400為32位元組長時，g為5(2⁵=32)，且當實體線400為64位元組長時，g為6(2⁶=64)。在一些實施例中，當支援1太位元組(terabyte)(1TB)的虛擬容量時，x為40(24⁰為1太位元組)。 Logical address 310 may be x bits long, where x is an integer. Logical address 310 may include a granularity 314 that is g bits long, where g is an integer. Granularity 314 may be located at bit 0 to bit g−1 of logical address 310 . The logical address 310 may further include a translation table index 312 . Translation table index 312 may be xg bits long and may be located at bit g through bit x−1 of logical address 310 . In some embodiments, g is 5 (2 ⁵ =32) when the physical line 400 is 32 bytes long, and g is 6 (2 ⁶ =64) when the physical line 400 is 64 bytes long. In some embodiments, when a virtual capacity of 1 terabyte (1 TB) is supported, x is 40 (24 ⁰ is 1 terabyte).

轉譯表索引312對應於轉譯表240內的實體位址320。實體位址320可包括區位元(RGN)322、列索引(R_INDX)326、及行索引(COL_INDX)328。區位元(RGN)322可為單一位元且可表示雜湊表220中或溢出記憶體區280中是否儲存有資料。列索引(R_INDX)326可為m個位元且對應於雜湊表220中的M個列(0至M-1或0至2^m-1)。行索引(COL_INDX)328可為n個位元且對應於雜湊表220中的N個行(0至N-1或0至2ⁿ-1)。M、N、m、及n是整數。根據一些實施例，當雜湊表為128吉位元組(2³⁷)且g=6時，m=26，n=5，M=2²⁶，且N=2⁵。 The translation table index 312 corresponds to the physical address 320 in the translation table 240 . The physical address 320 may include a region bit (RGN) 322 , a column index (R_INDX) 326 , and a row index (COL_INDX) 328 . The region bit (RGN) 322 can be a single bit and can indicate whether there is data stored in the hash table 220 or the overflow memory area 280 . Column index (R_INDX) 326 can be m bits and corresponds to M columns (0 to M−1 or 0 to 2 ^m −1) in hash table 220 . The row index (COL_INDX) 328 can be n bits and corresponds to N rows (0 to N−1 or 0 to 2 ⁿ −1) in the hash table 220 . M, N, m, and n are integers. According to some embodiments, when the hash table is 128 gigabytes (2 ³⁷ ) and g=6, m=26, n=5, M=2 ²⁶ , and N=2 ⁵ .

此外，溢出記憶體區280儲存未放置在雜湊表中的資料。 In addition, the overflow memory area 280 stores data not placed in the hash table.

圖5是根據本發明實施例的包括兩層式轉譯表的去重複引擎的邏輯概念的方塊圖。轉譯表是主要元資料表，其可對去重複比率、系統容量、及系統潛時造成影響。在圖5所示去重複引擎中，轉譯表包括兩層，即頁面索引表242及第2層(L2)映射表244。 FIG. 5 is a block diagram of a logical concept of a deduplication engine including a two-level translation table according to an embodiment of the present invention. The translation table is the main metadata table that can affect the deduplication ratio, system capacity, and system latency. In the deduplication engine shown in FIG. 5 , the translation table includes two layers, namely the page index table 242 and the layer 2 (L2) mapping table 244 .

邏輯位址310’可作為資料在記憶體(例如，隨機存取記憶體)中所儲存的位置被電腦系統使用。邏輯位址310’可為x位元長，其中x是整數。邏輯位址310’可包括為g位元長的粒度314’，其中g是整數。粒度314’可定位於邏輯位址310’的位元0至位元g-1處。邏輯位址310’可更包括頁面表項318及頁面索引 316。頁面表項318可為12-g位元長且可定位於邏輯位址310’的位元g至位元11處。頁面索引可為x-12位元長且可定位於邏輯位址310’的位元12至位元x-1處。在一些實施例中，當實體線400'為32位元組長時，g為5(2⁵=32)，且當實體線400'為64位元長時，g為6(2⁶=64)。在一些實施例中，當支援1太位元組(1TB)的虛擬容量時，x為40(2⁴⁰為1太位元組)。 The logical address 310' can be used by the computer system as a location where data is stored in a memory (eg, random access memory). Logical address 310' may be x bits long, where x is an integer. Logical address 310' may include a granularity 314' that is g bits long, where g is an integer. Granularity 314' may be located at bit 0 to bit g-1 of logical address 310'. The logical address 310 ′ may further include a page table entry 318 and a page index 316 . Page table entry 318 may be 12-g bits long and may be located at bit g through bit 11 of logical address 310'. The page index can be x-12 bits long and can be located at bits 12 to x-1 of logical address 310'. In some embodiments, when the physical line 400' is 32 bytes long, g is 5 (2 ⁵ =32), and when the physical line 400' is 64 bits long, g is 6 (2 ⁶ =64) . In some embodiments, x is 40 (2 ⁴⁰ is 1 Terabyte) when a virtual capacity of 1 Terabyte (1 TB) is supported.

頁面索引316對應於頁面索引表242內的頁面。頁面索引表242內的頁面對應於第2層映射表244內的表項0位置。頁面表項318表示哪一表項會在表項0之後儲存所儲存資料的與邏輯位址310’對應的實體位址320’。 Page index 316 corresponds to a page within page index table 242 . A page in the page index table 242 corresponds to an entry 0 location in the layer 2 mapping table 244 . The page entry 318 indicates which entry will store the physical address 320' corresponding to the logical address 310' of the stored data after entry 0.

換言之，頁面索引316與一組第2層映射表項相關聯且頁面表項318指定所述組中的表項。頁面索引316引向所述組中的第一表項，且頁面表項318示出此表項組中的哪一特定表項含有實體位址320’。頁面索引表242中的每一頁面可包括區位元(RGN)。區位元(RGN)322’可為單一位元且可表示雜湊表220’中或溢出記憶體區280’中是否儲存有資料。 In other words, page index 316 is associated with a set of layer 2 mapping entries and page entry 318 specifies the entries in the set. Page index 316 leads to the first entry in the set, and page entry 318 shows which particular entry in the set of entries contains physical address 320'. Each page in page index table 242 may include a region bit (RGN). The region bit (RGN) 322' can be a single bit and can indicate whether there is data stored in the hash table 220' or in the overflow memory area 280'.

實體位址320’可包括列索引(R_INDX)326’及行索引(COL_INDX)328’。列索引(R_INDX)326’可為m個位元且對應於雜湊表220’中的M個列(0至M-1或0至2^m-1)。行索引(COL_INDX)328’可為n個位元且對應於雜湊表220’中的N個行(0至N-1或0至2ⁿ-1)。M、N、m、及n是整數。根據一些實施例，當雜湊表為128吉位元組(2³⁷)且g=6時，m=26，n=5， M=2²⁶，且N=2⁵。 Physical address 320' may include column index (R_INDX) 326' and row index (COL_INDX) 328'. Column index (R_INDX) 326' can be m bits and corresponds to M columns (0 to M-1 or 0 to 2 ^m -1) in hash table 220'. The row index (COL_INDX) 328' can be n bits and corresponds to N rows (0 to N-1 or 0 to ²ⁿ -1) in the hash table 220'. M, N, m, and n are integers. According to some embodiments, when the hash table is 128 gigabytes (2 ³⁷ ) and g=6, m=26, n=5, M=2 ²⁶ , and N=2 ⁵ .

圖6是根據本發明實施例的包括具有動態第2層映射表及溢出記憶體區的兩層式轉譯表的去重複引擎的邏輯概念的方塊圖。參照圖6，兩層式轉譯表可為溢出記憶體區騰出額外空間。 6 is a block diagram of the logical concept of a deduplication engine including a two-level translation table with a dynamic L2 mapping table and an overflow memory area according to an embodiment of the present invention. Referring to FIG. 6, the two-level translation table can make extra space for the overflow memory area.

根據一些實施例，簽名及參考計數器表260’的大小以及頁面索引表242’的大小是固定的，但第2層映射表244’及溢出記憶體區280”的大小是動態的。 According to some embodiments, the size of the signature and reference counter table 260' and the size of the page index table 242' are fixed, but the size of the layer 2 mapping table 244' and overflow memory area 280" is dynamic.

隨著第2層映射表244’及溢出記憶體區280”的大小增大，其會朝彼此擴展。如此一來，儲存空間可藉由容許第2層映射表244’或溢出記憶體區280”向未使用空間中擴展而得到高效使用。 As the size of the layer 2 mapping table 244' and overflow memory area 280" increases, they will expand towards each other. ” to be efficiently used by expanding into unused space.

圖7是根據本發明實施例的雜湊圓柱體的邏輯概念的方塊圖。圖8是根據本發明實施例的組合資料結構的邏輯概念的方塊圖。參照圖7及圖8，簽名表、參考計數器表、及雜湊表被劃分開且排列於組合資料結構600(例如，組合結構600或組合表600)的雜湊圓柱體500(例如，雜湊圓柱體500-i)內的桶(例如，雜湊桶i)中。每一雜湊圓柱體500包括雜湊表的雜湊桶560(例如，雜湊桶560-i)、簽名表的簽名桶520(例如，簽名桶520-i)、及參考計數器表的參考計數器桶540(例如，參考計數器桶i)。 FIG. 7 is a block diagram of a logical concept of a hash cylinder according to an embodiment of the present invention. FIG. 8 is a block diagram of the logical concept of a combined data structure according to an embodiment of the present invention. 7 and 8, the signature table, the reference counter table, and the hash table are divided and arranged in the hash cylinder 500 (for example, the hash cylinder 500) of the combination data structure 600 (for example, the combination structure 600 or the combination table 600). -i) in buckets (eg, hash bucket i). Each hash cylinder 500 includes a hash bucket 560 of a hash table (eg, hash bucket 560-i), a signature bucket 520 of a signature table (eg, signature bucket 520-i), and a reference counter bucket 540 of a reference counter table (eg, , refer to counter bucket i).

雜湊桶560包括多個表項或實體線(例如，表項0至表項N-1)。 The hash bucket 560 includes multiple entries or entity lines (eg, entry 0 to entry N−1).

簽名桶520包括多個簽名，所述多個簽名對應於儲存於同一雜湊圓柱體500的雜湊桶560內的實體線中的資料。 Signature bucket 520 includes a plurality of signatures corresponding to data stored in entity lines within hash bucket 560 of the same hash cylinder 500 .

參考計數器桶540包括多個參考計數器，所述多個參考計數器對應於儲存於同一雜湊圓柱體500的雜湊桶560內的實體線中的資料已被去重複的次數。 The reference counter bucket 540 includes a plurality of reference counters corresponding to the number of times data in a physical line stored in the hash bucket 560 of the same hash cylinder 500 has been deduplicated.

換言之，雜湊表被劃分成多個雜湊桶560，每一雜湊桶560包括多個表項。簽名表被劃分成多個簽名桶520，每一簽名桶520包括多個簽名。參考計數器表被劃分成多個參考計數器桶540，每一參考計數器桶540包括多個參考計數器。 In other words, the hash table is divided into multiple hash buckets 560, and each hash bucket 560 includes multiple entries. The signature table is divided into a plurality of signature buckets 520, each signature bucket 520 including a plurality of signatures. The reference counter table is divided into a plurality of reference counter buckets 540, each reference counter bucket 540 including a plurality of reference counters.

組合資料結構600被組織成將一個雜湊桶560、一個簽名桶520、及一個參考計數器桶540一起放置於雜湊圓柱體500中。根據本發明的一些實施例，各所述桶以下次序進行排列：第一簽名桶520-0、第一參考計數器桶540-0、第一雜湊桶560-0、第二簽名桶520-1、第二參考計數器桶540-1、第二雜湊桶560-1等。 The combined data structure 600 is organized to place a hash bucket 560 , a signature bucket 520 , and a reference counter bucket 540 together in the hash cylinder 500 . According to some embodiments of the present invention, the buckets are arranged in the following order: first signature bucket 520-0, first reference counter bucket 540-0, first hash bucket 560-0, second signature bucket 520-1, The second reference counter bucket 540-1, the second hash bucket 560-1, and so on.

在此排列中，第一簽名桶520-0包括與儲存於第一雜湊桶560-0中的資料相關聯的簽名，且第一參考計數器桶540-0包括與儲存於第一雜湊桶560-0中的資料相關聯的參考計數器。此外，第二簽名桶520-1包括與儲存於第二雜湊桶560-1中的資料相關聯的簽名，且第二參考計數器桶540-1包括與儲存於第二雜湊桶560-1中的資料相關聯的參考計數器。此外，第一圓柱體500-0包括第一簽名桶520-0、第一參考計數器桶540-0、及第一雜湊桶560-0，且第二圓柱體500-1包括第二簽名桶520-1、第二參考計數器桶540-1、及第二雜湊桶560-1。 In this arrangement, the first signature bucket 520-0 includes signatures associated with the data stored in the first hash bucket 560-0, and the first reference counter bucket 540-0 includes signatures associated with the data stored in the first hash bucket 560-0. The reference counter associated with the profile in 0. In addition, the second signature bucket 520-1 includes signatures associated with the data stored in the second hash bucket 560-1, and the second reference counter bucket 540-1 includes signatures associated with the data stored in the second hash bucket 560-1. The reference counter associated with the profile. In addition, the first cylinder 500-0 includes a first signature bucket 520-0, a first reference counter bucket 540-0, and a first hash bucket 560-0, and the second cylinder 500-1 includes a second signature bucket 520 -1, second reference count bucket 540-1, and a second hash bucket 560-1.

如此一來，每一雜湊圓柱體500包括資料以及與儲存於同一雜湊圓柱體500內的所述資料相關聯的簽名及參考計數器。 As such, each hash cylinder 500 includes data and a signature and reference counter associated with said data stored within the same hash cylinder 500 .

當對儲存於組合資料結構600的雜湊圓柱體500-i內的資料作出請求時，整個雜湊圓柱體500-i被拷貝至讀取快取170’中。由於整個雜湊圓柱體500-i被拷貝至讀取快取170’中，因此擷取所請求資料、對應簽名(或相應簽名)、及對應參考計數器(或相應參考計數器)中的所有者所需的時間可減少。 When a request is made for data stored in hash cylinder 500-i of combined data structure 600, the entire hash cylinder 500-i is copied into read cache 170'. Since the entire hash cylinder 500-i is copied into the read cache 170', it is necessary to retrieve the owner in the requested data, the corresponding signature (or corresponding signature), and the corresponding reference counter (or corresponding reference counter). time can be reduced.

根據一些實施例，讀取資料快取可與雜湊圓柱體為相同大小。 According to some embodiments, the read data cache may be the same size as the hash cylinder.

此外，當去重複引擎判斷雜湊表內是否已存在資料(以避免重複)時，整個雜湊圓柱體500可被拷貝至讀取快取170’中。由於去重複引擎在判斷是否可進行去重複時且在儲存資料時存取簽名、參考計數器、及資料，因此使讀取快取拷貝整個讀取圓柱體可減少存取時間及提高總計算速度。 In addition, when the de-duplication engine determines whether data already exists in the hash table (to avoid duplication), the entire hash cylinder 500 can be copied into the read cache 170'. Since the deduplication engine accesses the signature, reference counter, and data when determining whether deduplication is possible and when storing the data, having the read cache copy the entire read cylinder reduces access time and increases overall computation speed.

換言之，為改善潛時及效能，可創建雜湊圓柱體500作為雜湊表項、簽名、及參考計數器表項的整合單元。整合雜湊圓柱體500可藉由減少系統記憶體存取循環來改善系統潛時。所述密實的資料結構可減少記憶體存取時間。每一雜湊圓柱體500包括去重複引擎執行計算所需的所有資訊。組合資料結構600亦可使快取更容易。 In other words, to improve latency and performance, the hash cylinder 500 can be created as an integrated unit of hash entries, signatures, and reference counter entries. Integrating the hash cylinder 500 can improve system latency by reducing system memory access cycles. The compact data structure reduces memory access time. Each hash cylinder 500 includes all the information the deduplication engine needs to perform the computation. Combining the data structure 600 can also make caching easier.

圖9是根據本發明實施例的與虛擬桶相關聯的雜湊桶及對應的參考計數器桶的邏輯概念的方塊圖。參照圖9，每一雜湊桶560’可與一或多個虛擬桶VB(例如，VB-0至VB-V-1)相關聯。每一雜湊桶560’可包括N個路線(例如，WAY0至WAYN-1)。 Fig. 9 is a hash bucket associated with a virtual bucket according to an embodiment of the present invention and A block diagram of the logical concept of the corresponding reference counter bucket. Referring to FIG. 9, each hash bucket 560' may be associated with one or more virtual buckets VB (e.g., VB-0 through VB-V-1). Each hash bucket 560' may include N ways (eg, WAY0 to WAYN-1).

與相關技術的雜湊表不同，本實施例的雜湊表各自包括多個虛擬雜湊桶或虛擬桶，所述虛擬桶是由多個實體雜湊桶或實體桶構成。在下文中，用語「實體桶」將指代前面所論述的雜湊桶，且將用於將前面所論述的所述雜湊桶與所述虛擬桶加以區別。 Different from the hash tables in the related art, the hash tables in this embodiment each include multiple virtual hash buckets or virtual buckets, and the virtual buckets are composed of multiple physical hash buckets or physical buckets. Hereinafter, the term "physical bucket" will refer to the previously discussed hash bucket and will be used to distinguish the previously discussed hash bucket from the virtual bucket.

每一虛擬桶可包括雜湊表的實體桶中的部分實體桶。然而，應注意，所述虛擬桶中的不同虛擬桶可共享一或多個實體桶。如以下將闡述，使用根據本發明實施例的虛擬桶，額外的維數被增添至雜湊表。因此，可在排列及放置資料方面提供更大的撓性，由此提高效率且提高去重複動態隨機存取記憶體系統的壓縮比率(compression ratio)。 Each virtual bucket may include some of the physical buckets of the hash table. However, it should be noted that different ones of the virtual buckets may share one or more physical buckets. As will be explained below, using virtual buckets according to embodiments of the present invention, additional dimensions are added to the hash table. Thus, greater flexibility is provided in the arrangement and placement of data, thereby increasing efficiency and increasing the compression ratio of the deduplicated DRAM system.

本實施例使用虛擬桶來將資料放置撓性提高另一程度以釋放由其他虛擬桶所共享的其他實體桶，乃因儲存於雜湊桶中的一者中的資料區塊可在對應虛擬桶內移動、或移動至不同的實體桶。藉由釋放雜湊表內的空間，可藉由移除陳舊的/重複的資料來達成去重複。亦即，藉由使用根據本發明的實施例的虛擬桶，使用雜湊函數對資料線進行雜湊不會對受約束的對應位置造成嚴格限制，且資料能夠被放置於附近/「附近位置(near-location)」實體桶中，附近/「附近位置」實體桶指代位於包括最初意圖(但被佔用)實體雜湊桶的同一虛擬桶內的實體桶。 This embodiment uses virtual buckets to increase data placement flexibility to another degree to free up other physical buckets shared by other virtual buckets, because data blocks stored in one of the hash buckets can be in the corresponding virtual bucket Move, or move to a different entity bucket. Deduplication can be achieved by removing stale/duplicate data by freeing up space within the hash table. That is, by using virtual buckets according to embodiments of the present invention, hashing data lines using a hash function does not impose strict restrictions on constrained corresponding locations, and data can be placed near/"near- location)" entity bucket, the nearby/"nearby location" entity bucket refers to entity buckets that are located within the same virtual bucket that includes the originally intended (but occupied) entity hash bucket.

作為例子，內容(例如，資料線)將被放置於實體桶中的一者中。若資料線將被放置於第一實體桶中，則作為對需要將資料線放置於實體桶中的替代，本實施例容許使用較單一的實體桶大且包括所述實體桶、但亦包括其他實體桶的虛擬桶。亦即，虛擬桶含有對齊於雜湊表內的毗鄰的、或相鄰的實體桶的集合。 As an example, content (eg, data lines) will be placed in one of the entity buckets. If the data line is to be placed in the first physical bucket, then instead of requiring the data line to be placed in the physical bucket, this embodiment allows the use of a single physical bucket larger than and including the physical bucket, but also other The virtual bucket of the physical bucket. That is, a virtual bucket contains a set of contiguous, or contiguous, physical buckets aligned within the hash table.

因此，虛擬桶容許資料區塊在雜湊表內移動以為未來的寫入操作釋放空間。 Thus, virtual buckets allow data blocks to be moved within the hash table to free up space for future write operations.

對虛擬桶的其他說明請參見於2016年5月23日提出申請的美國專利申請案第15/162,512號及於2016年5月23日提出申請的美國專利申請案15/162,517號，該些美國專利申請案的全部內容皆併入本案供參考。 For other descriptions of virtual buckets, please refer to US Patent Application No. 15/162,512 filed on May 23, 2016 and US Patent Application No. 15/162,517 filed on May 23, 2016. The entire contents of the patent application are hereby incorporated by reference.

此外，虛擬桶可具有動態高度或大小。具有動態虛擬桶高度(virtual bucket height，VBH)可使得在限制潛時影響的同時提高記憶體利用率。 Additionally, virtual buckets can have dynamic heights or sizes. Having a dynamic virtual bucket height (VBH) allows for improved memory utilization while limiting the impact of latency.

與實體桶相關聯的虛擬桶的數目是由虛擬桶(virtual bucket，VB)高度索引來表示。虛擬桶高度資訊被儲存於與雜湊桶560’相關聯的參考計數器桶540’中的最末參考計數器中。參考計數器的位元的一部分被用作虛擬桶高度索引(例如，VBH[1：0])。 The number of virtual buckets associated with the physical bucket is represented by a virtual bucket (virtual bucket, VB) height index. Virtual bucket height information is stored in the last reference counter in reference counter bucket 540' associated with hash bucket 560'. A portion of the bits of the reference counter is used as a virtual bucket height index (eg, VBH[1:0]).

使用雜湊桶i作為例子，若虛擬桶高度為V，則雜湊桶i的虛擬桶可指代雜湊桶i+1至雜湊桶i+V。當雜湊桶i已滿時，去重複引擎將會將使用者資料放入虛擬桶中。 Using hash bucket i as an example, if the height of the virtual bucket is V, the virtual buckets of hash bucket i may refer to hash bucket i+1 to hash bucket i+V. When the hash bucket i is full, the deduplication engine will put the user data into the virtual bucket.

旗標(flag)(一個參考計數器(RC)位元的一部分，例如雜湊桶M中的最末RC位元)表示當前雜湊桶i正使用多少虛擬桶。如此一來，由於無需搜索多於所需數量的虛擬桶，因此潛時可減少。相關技術的虛擬桶使用固定的虛擬桶高度。使用固定的虛擬桶高度使得搜索邏輯將搜索所有虛擬桶而無論雜湊桶i實際使用多少虛擬桶，此可能增大潛時。 flag (a part of a reference counter (RC) bit, e.g. For example, the last RC bit in the hash bucket M) indicates how many virtual buckets the current hash bucket i is using. This reduces latency since there is no need to search more virtual buckets than necessary. The virtual bucket in the related art uses a fixed height of the virtual bucket. Using a fixed virtual bucket height enables the search logic to search all virtual buckets regardless of how many virtual buckets are actually used by hash bucket i, which may increase latency.

虛擬桶不需要其他記憶體空間。其使用附近雜湊桶中的未使用表項。舉例而言，對於雜湊桶i+1，其虛擬桶可指代雜湊桶i+2至雜湊桶i+V’+1。 Virtual buckets do not require additional memory space. It uses unused entries from nearby hash buckets. For example, for hash bucket i+1, its virtual buckets may refer to hash bucket i+2 to hash bucket i+V'+1.

此外，當雜湊桶i的虛擬桶(例如，雜湊桶i+1至雜湊桶i+V)已滿時，根據本發明實施例的去重複引擎會增加所述虛擬桶的高度V以利用更多附近的雜湊桶中的可用空間。由於相關技術的虛擬桶高度是預先設定的(而非動態的)，因此其無法增大。如此一來，當雜湊桶i的虛擬桶(例如，雜湊桶i+1至雜湊桶i+V)已滿時，相關技術的去重複引擎無法使高度V增大。 In addition, when the virtual buckets of hash bucket i (for example, hash bucket i+1 to hash bucket i+V) are full, the deduplication engine according to the embodiment of the present invention will increase the height V of the virtual buckets to utilize more Available space in nearby hash buckets. Since the height of the virtual bucket in the related art is preset (rather than dynamic), it cannot be increased. In this way, when the virtual buckets of hash bucket i (for example, hash bucket i+1 to hash bucket i+V) are full, the related art deduplication engine cannot increase the height V.

另外，藉由動態地調整虛擬桶的高度，當去重複引擎判斷雜湊表內是否已存在資料(以避免重複)時，所述去重複引擎將僅需檢查正被使用的虛擬桶而不是檢查為預先設定數目的虛擬桶。此可減少存取時間且提高總計算速度。 In addition, by dynamically adjusting the height of the virtual bucket, when the de-duplication engine judges whether data already exists in the hash table (to avoid duplication), the de-duplication engine will only need to check the virtual bucket being used instead of checking for A preset number of virtual buckets. This can reduce access time and increase overall computation speed.

圖10是說明根據本發明實施例的擷取在隨機存取記憶體中儲存的資料的方法的流程圖。儘管圖10示出使用隨機存取記憶體，然而本發明並非僅限於此且可將任何其他適合的記憶體類型與本文中的方法一起使用。 FIG. 10 is a flowchart illustrating a method for retrieving data stored in a random access memory according to an embodiment of the present invention. Although FIG. 10 illustrates the use of random access memory, the invention is not so limited and any other suitable memory type may be used with the methods herein.

參照圖10，電腦系統的中央處理單元可請求儲存於隨機存取記憶體中的資料。所述中央處理單元可提供隨機存取記憶體內的資料的位置的位址。本發明並非僅限於此且舉例而言，其他組件亦可自隨機存取記憶體請求資料並提供邏輯位址。 Referring to FIG. 10, the central processing unit of the computer system may request data stored in the random access memory. The central processing unit can provide an address of the location of the data in the random access memory. The invention is not limited thereto and for example, other components can also request data from the RAM and provide logical addresses.

根據本發明實施例的擷取在隨機存取記憶體內儲存的資料的方法包括辨識在隨機存取記憶體中儲存的資料的邏輯位址(1000)。邏輯位址可對應於轉譯表中的位置。 A method for retrieving data stored in a random access memory according to an embodiment of the present invention includes identifying a logical address of the data stored in the random access memory (1000). A logical address may correspond to a location in a translation table.

所述方法更包括藉由在轉譯表中查找所述邏輯位址來根據所述邏輯位址辨識資料的實體線ID(PLID)(1010)。 The method further includes identifying a physical line ID (PLID) of data based on the logical address by looking up the logical address in a translation table (1010).

所述方法更包括基於所述實體線ID來判斷資料是儲存於隨機存取記憶體的雜湊表中還是儲存於所述隨機存取記憶體的溢出記憶體區中(1020)。 The method further includes determining whether the data is stored in a hash table of the random access memory or in an overflow memory area of the random access memory based on the physical line ID (1020).

當資料儲存於雜湊表中時，所述方法更包括對所述雜湊表的與實體線ID對應的實體線進行定位(1030)及自所述雜湊表的實體線擷取資料(1040)。擷取資料可包括自簽名表及參考計數器表擷取對應資料。 When the data is stored in the hash table, the method further includes locating (1030) an entity line corresponding to the entity line ID in the hash table and retrieving data from the entity line in the hash table (1040). Retrieving data may include retrieving corresponding data from the signature table and the reference counter table.

當資料儲存於溢出記憶體區中時，所述方法更包括對所述溢出記憶體區的與實體線ID對應的實體線進行定位(1050)及自所述溢出記憶體區的實體線擷取資料(1060)。 When the data is stored in the overflow memory area, the method further includes locating (1050) the physical line corresponding to the physical line ID in the overflow memory area and retrieving the physical line from the overflow memory area Data (1060).

可使用應用於所述資料的第一雜湊函數來產生實體線ID。所述實體線ID可包括指向隨機存取記憶體的雜湊表中的或所述隨機存取記憶體的溢出記憶體區中的位置的位址。 The entity line ID may be generated using a first hash function applied to the profile. The physical line ID may include an address pointing to a location in a random access memory hash table or in an overflow memory area of the random access memory.

實體線ID可包括：第一辨識符(例如，參見圖4中的RGN)，表示資料是儲存於雜湊表中還是儲存於溢出記憶體區中；第二辨識符(例如，參見圖4中的R_INDX)，表示儲存有所述資料的列；以及第三辨識符(例如，參見圖4中的COL_INDX)，表示儲存有所述資料的行。 The entity line ID may include: a first identifier (for example, see RGN in FIG. R_INDX), indicating the column storing the data; and a third identifier (for example, see COL_INDX in FIG. 4 ), indicating the row storing the data.

所述方法可更包括自簽名表擷取與所述資料相關聯的簽名。 The method may further include retrieving a signature associated with the data from a signature table.

所述隨機存取記憶體可包括：所述雜湊表，儲存多個資料；所述轉譯表，儲存使用第一雜湊函數產生的多個實體線ID；簽名表，儲存使用較第一雜湊函數小的第二雜湊函數產生的多個簽名；參考計數器表，儲存多個參考計數器，每一所述參考計數器追蹤針對儲存於雜湊表中的對應資料進行的去重複次數；以及所述溢出記憶體區。 The random access memory may include: the hash table, storing a plurality of data; the translation table, storing a plurality of entity line IDs generated using the first hash function; A plurality of signatures generated by the second hash function; a reference counter table storing a plurality of reference counters, each of which tracks the number of deduplication performed on corresponding data stored in the hash table; and the overflow memory area .

雜湊表、簽名表、及參考計數器表可被整合至組合資料結構中。所述組合資料結構可包括多個雜湊圓柱體，且每一雜湊圓柱體可包括：雜湊桶，包括多個實體線；簽名桶，包括與所述多個實體線對應的相應簽名；以及參考計數器桶，包括與所述多個實體線對應的相應參考計數器。 Hash tables, signature tables, and reference counter tables can be integrated into a combined data structure. The composite data structure may include a plurality of hash cylinders, and each hash cylinder may include: a hash bucket including a plurality of entity lines; a signature bucket including corresponding signatures corresponding to the plurality of entity lines; and a reference counter a bucket including corresponding reference counters corresponding to the plurality of entity lines.

自實體線或溢出記憶體區擷取資料可包括將包括實體線、對應簽名、及對應參考計數器在內的整個雜湊圓柱體拷貝至讀取快取。 Retrieving data from a physical line or overflow memory region may include copying the entire hash cylinder including the physical line, corresponding signature, and corresponding reference counter to the read cache.

圖11是說明根據本發明實施例的將資料儲存於隨機存取記憶體中的方法的流程圖。儘管圖11示出使用隨機存取記憶體，然而本發明並非僅限於此且可將任何其他適合的記憶體類型與本文中的方法一起使用。 FIG. 11 is a diagram illustrating storing data in random access memory according to an embodiment of the present invention Flowchart of the in-memory method. Although FIG. 11 illustrates the use of random access memory, the invention is not so limited and any other suitable memory type may be used with the methods herein.

參照圖11，電腦系統的中央處理單元可請求將資料儲存於隨機存取記憶體中。所述中央處理單元可提供將被儲存於隨機存取記憶體內的資料。本發明並非僅限於此且舉例而言，其他組件亦可請求將資料儲存於隨機存取記憶體中並提供所述資料。 Referring to FIG. 11 , the central processing unit of the computer system may request to store data in the random access memory. The central processing unit may provide data to be stored in random access memory. The invention is not limited thereto and for example other components may also request to store data in RAM and provide said data.

根據本發明實施例的將資料儲存於隨機存取記憶體內的方法包括辨識將被儲存於隨機存取記憶體中的資料(1100)。 A method of storing data in random access memory according to an embodiment of the present invention includes identifying data to be stored in random access memory (1100).

所述方法更包括利用第一雜湊函數來確定與所述資料所應在隨機存取記憶體的雜湊表中儲存之處對應的第一雜湊值(1110)。 The method further includes using a first hash function to determine a first hash value corresponding to where the data should be stored in a hash table in random access memory (1110).

所述方法更包括將資料儲存於雜湊表中與第一雜湊值對應的位置(1120)。 The method further includes storing the data in the hash table at a location corresponding to the first hash value (1120).

所述方法更包括利用第二雜湊函數來確定亦與所述資料所應儲存之處對應的第二雜湊值(1130)。第二雜湊函數可小於第一雜湊函數。 The method further includes using a second hash function to determine a second hash value that also corresponds to where the data should be stored (1130). The second hash function may be smaller than the first hash function.

所述方法更包括將第一雜湊值儲存於轉譯表中(1140)。 The method further includes storing the first hash value in a translation table (1140).

所述方法更包括將第二雜湊表儲存於簽名表中(1150)。 The method further includes storing the second hash table in the signature table (1150).

所述隨機存取記憶體可包括：所述雜湊表，儲存多個資料；所述轉譯表，儲存使用第一雜湊函數產生的多個實體線ID(PLID)；所述簽名表，儲存使用第二雜湊函數產生的多個簽名；參考計數器表，儲存多個參考計數器，每一所述參考計數器追蹤針對儲存於雜湊表中的對應資料進行的去重複次數；以及溢出記憶體區。 The random access memory may include: the hash table storing a plurality of data material; the translation table stores a plurality of entity line IDs (PLIDs) generated using the first hash function; the signature table stores a plurality of signatures generated using the second hash function; the reference counter table stores a plurality of reference counters , each of the reference counters tracks the number of deduplication performed on the corresponding data stored in the hash table; and an overflow memory area.

實體線ID中的每一者可包括：第一辨識符(例如，參見圖4中的RGN)，表示所述資料是儲存於雜湊表中還是儲存於溢出記憶體區中；第二辨識符(例如，參見圖4中的R_INDX)，表示儲存有所述資料的列；以及第三辨識符(例如，參見圖4中的COL_INDX)，表示儲存有所述資料的行。 Each of the entity line IDs may include: a first identifier (eg, see RGN in FIG. 4 ), indicating whether the data is stored in a hash table or in an overflow memory area; a second identifier ( For example, see R_INDX in FIG. 4 ), indicating the column storing the data; and a third identifier (see, for example, COL_INDX in FIG. 4 ), indicating the row storing the data.

雜湊表、簽名表、及參考計數器表可被整合至組合資料結構中。所述組合資料結構可包括多個雜湊圓柱體。每一雜湊圓柱體可包括：雜湊桶，包括多個實體線；簽名桶，包括與所述多個實體線對應的相應簽名；以及參考計數器桶，包括與所述多個實體線對應的相應參考計數器。 Hash tables, signature tables, and reference counter tables can be integrated into a combined data structure. The composite data structure may include a plurality of hash cylinders. Each hash cylinder may comprise: a hash bucket comprising a plurality of entity lines; a signature bucket comprising respective signatures corresponding to the plurality of entity lines; and a reference counter bucket comprising respective references corresponding to the plurality of entity lines counter.

將資料儲存於雜湊表中與第一雜湊值對應的位置可包括將所述資料儲存於與所述第一雜湊值對應的雜湊桶中。將第二雜湊值儲存於簽名表中可包括將所述第二雜湊值儲存於與儲存有所述資料的雜湊桶對應的簽名桶中。 Storing data in a hash table at a location corresponding to a first hash value may include storing the data in a hash bucket corresponding to the first hash value. Storing the second hash value in the signature table may include storing the second hash value in a signature bucket corresponding to the hash bucket storing the data.

因此，本發明的實施例是有關於能使記憶體(例如，隨機存取記憶體)內的記憶體容量較實體記憶體大小大的方法及相關聯結構。根據本發明的實施例，使用去重複來達成資料記憶體減小及上下文定址。根據本發明的實施例，將使用者資料儲存於藉由所述使用者資料的雜湊值來進行索引的雜湊表中。 Accordingly, embodiments of the present invention relate to methods and associated structures that enable the memory capacity in a memory (eg, random access memory) to be larger than the physical memory size. According to an embodiment of the present invention, data memory is achieved using deduplication Reduction and context addressing. According to an embodiment of the present invention, the user data is stored in a hash table indexed by a hash value of the user data.

應理解，儘管本文中可能使用用語「第一(first)」、「第二(second)」、「第三(third)」等來闡述各種元件、組件、區、層、及/或區段，然而該些元件、組件、區、層、及/或區段不應受該些用語限制。該些用語用於區分一元件、組件、區、層、或區段自其他元件、組件、區、層、或區段。因此，在不背離本發明的精神及範圍的條件下，可將以下所論述的第一元件、組件、區、層、或區段稱為第二元件、組件、區、層、或區段。 It should be understood that although the terms "first", "second", "third", etc. may be used herein to describe various elements, components, regions, layers, and/or sections, However, the elements, components, regions, layers, and/or sections should not be limited by these terms. These terms are used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed below could be termed a second element, component, region, layer or section without departing from the spirit and scope of the present invention.

根據本文所述本發明的實施例的一(或多個)相關裝置或組件(例如，去重複引擎)可利用任何適合的硬體(例如，應用專用積體電路)、韌體(例如，數位訊號處理器(DSP)或現場可程式化閘陣列(FPGA))、軟體、或者軟體、韌體、及硬體的適合組合來實作。舉例而言，可將相關裝置的各種組件形成於一個積體電路(integrated circuit，IC)晶片上或單獨的積體電路晶片上。此外，可將相關裝置的各種組件實作於撓性印刷電路膜、膠帶載體封裝(tape carrier package，TCP)、印刷電路板(printed circuit board，PCB)上、或與一或多個電路及/或其他裝置形成於相同的基板上。此外，相關裝置的各種組件可為在一或多個計算裝置中由一或多個處理器運行、執行電腦程式指令並與用於執行本文所述各種功能性的其他系統組件進行交互的過程或執行緒。電腦程式指令儲存於可在使用例如(舉例而言，隨機存取記憶體 (RAM))等標準記憶體裝置的計算裝置中實作的記憶體中。電腦程式指令亦可儲存於例如(舉例而言，光碟唯讀記憶體(CD-ROM)、快閃驅動機、或類似元件)等其他非暫時性電腦可讀媒體中。此外，熟習此項技術者應知，在不背離本發明示例性實施例的精神及範圍的條件下，可將各種計算裝置的功能性組合或整合成單一的計算裝置，或者可使一特定計算裝置的功能性跨越一或多個其他計算裝置來分佈。 One (or more) associated devices or components (e.g., a deduplication engine) according to embodiments of the invention described herein may utilize any suitable hardware (e.g., application-specific integrated circuits), firmware (e.g., digital Signal Processor (DSP) or Field Programmable Gate Array (FPGA)), software, or a suitable combination of software, firmware, and hardware. For example, various components of related devices may be formed on one integrated circuit (IC) die or on separate IC dies. In addition, various components of the associated device may be implemented on a flexible printed circuit film, a tape carrier package (TCP), a printed circuit board (PCB), or in conjunction with one or more circuits and/or or other devices formed on the same substrate. Furthermore, the various components of the associated devices may be processes or processes in one or more computing devices executed by one or more processors, executing computer program instructions and interacting with other system components for performing the various functionalities described herein. Execution thread. Computer program instructions are stored in memory that can be used in, for example, random access memory (RAM)) in memory implemented in computing devices such as standard memory devices. Computer program instructions may also be stored on other non-transitory computer readable media such as, for example, compact disc read only memory (CD-ROM), flash drive, or the like. Furthermore, those skilled in the art will appreciate that the functionality of various computing devices may be combined or integrated into a single computing device, or that a particular computing The functionality of the device is distributed across one or more other computing devices.

此外，亦應理解，當稱一個元件、組件、區、層、及/或區段位於兩個元件、組件、區、層、及/或區段「之間(between)」時，所述元件、組件、區、層、及/或區段可為所述兩個元件、組件、區、層、及/或區段之間的唯一元件、組件、區、層、及/或區段，抑或亦可存在一或多個中間元件、組件、區、層、及/或區段。 In addition, it will also be understood that when an element, component, region, layer, and/or section is referred to as being "between" two elements, components, regions, layers, and/or sections, that element , component, region, layer, and/or section may be the only element, component, region, layer, and/or section between said two elements, components, regions, layers, and/or sections, or One or more intervening elements, components, regions, layers, and/or sections may also be present.

本文中所用的術語僅是為闡述具體實施例，而非旨在限制本發明。除非上下文中清楚地另外指明，否則本文所用的單數形式「一(a及an)」旨在亦包含複數形式。更應理解，當在本說明書中使用用語「包括(comprise/comprises/comprising)」、及「包含(includes/including/include)」時，是指明所陳述特徵、整數、步驟、操作、元件、及/或組件的存在，但不排除一或多個其他特徵、整數、步驟、操作、元件、組件及/或其群組的存在或增添。 The terms used herein are for describing specific embodiments only, and are not intended to limit the present invention. As used herein, the singular forms "a and an" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should be further understood that when the terms "comprise/comprises/comprising" and "includes/including/include" are used in this specification, they are intended to indicate the stated features, integers, steps, operations, elements, and The presence of/or a component does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof.

本文中使用的用語「及/或(and/or)」包括相關所列項其中一或多個項的任意及所有組合。當例如「…中的至少一者(at least one of)」、「…中的一者(one of)」、及「選自…的(selected from)」等表達位於一系列元件之前時，是修飾整個系列的元件，而並非修飾所述系列中的各別元件。此外，在闡述本發明的實施例時使用「可(may)」是指代「本發明的一或多個實施例」。此外，用語「示例性(exemplary)」旨在指代例子或說明。 As used herein, the term "and/or (and/or)" includes any and all combinations of one or more of the associated listed items. When, for example, "at least one of", "one of", and "selected from from)" when preceding a series of elements modifies the entire series of elements and not the individual elements in said series. In addition, the use of "may" when describing embodiments of the present invention refers to "one or more embodiments of the present invention." Also, the word "exemplary" is intended to mean an example or illustration.

本文所用用語「使用(use)」、「正使用(using)」、及「被使用(used)」可視為分別與用語「利用(utilize)」、「正利用(utilizing)」、及「被利用(utilized)」同義。 As used herein, the terms "use", "using", and "used" may be considered in contrast to the terms "utilize", "utilizing", and "used", respectively (utilized)" is synonymous.

針對本發明的一或多個實施例闡述的特徵可結合本發明的其他實施例的特徵來加以使用。舉例而言，在第一實施例中闡述的特徵可與在第二實施例中闡述的特徵加以組合以形成第三實施例，儘管所述第三實施例可能未在本文中具體闡述。 Features described with respect to one or more embodiments of the invention may be used in combination with features of other embodiments of the invention. For example, features set forth in a first embodiment may be combined with features set forth in a second embodiment to form a third embodiment, although that third embodiment may not be specifically set forth herein.

熟習此項技術者亦應知，可藉由硬體、韌體(例如，藉由應用專用積體電路)、或以軟體、韌體、及/或硬體的任何組合來執行所述過程。此外，所述過程的步驟的順序不是固定的，而是可更改成如熟習此項技術者所知的任何所期望順序。所更改順序可包括所述步驟中的所有者或所述步驟的一部分。 Those skilled in the art will also appreciate that the process may be performed by hardware, firmware (eg, by ASIC), or any combination of software, firmware, and/or hardware. Furthermore, the order of the steps of the described processes is not fixed but may be changed into any desired order as known to those skilled in the art. The altered order may include the owner of the steps or a portion of the steps.

儘管已針對一些具體實施例闡述了本發明，然而熟習此項技術者將不費力地構想出所述實施例的變型，此決不背離本發明的範圍及精神。此外，對於熟習各種技術者，本文所述發明自身將提出針對其他任務的解決方案及針對其他應用程式的修改形式。申請者意圖使申請專利範圍涵蓋本發明的所有此種用途以及在不背離本發明的精神及範圍的條件下可對出於揭露目的而選擇的本發明實施例作出的變化及潤飾。因此，本發明的當前實施例應被視作說明性的而非約束性的，本發明的範圍將由隨附申請專利範圍及其等效範圍來表示。 Although the invention has been described with respect to some specific embodiments, those skilled in the art will readily conceive variations of the described embodiments which in no way depart from the scope and spirit of the invention. Furthermore, the invention described herein will itself suggest solutions to other tasks and modifications for other applications to those skilled in the art. It is the applicant's intent that the scope of the application cover all such uses of the invention as may be selected for disclosure purposes without departing from the spirit and scope of the invention. Variations and modifications made in the embodiments of the present invention. Therefore, the present embodiments of the present invention should be regarded as illustrative rather than restrictive, and the scope of the invention will be indicated by the appended claims and their equivalents.

1000、1010、1020、1030、1040、1050、1060:步驟 1000, 1010, 1020, 1030, 1040, 1050, 1060: steps

Claims

一種擷取在與去重複模組相關聯的記憶體中儲存的資料的方法，所述去重複模組包括讀取快取，所述記憶體包括轉譯表及組合資料結構，所述組合資料結構包括雜湊表及參考計數器表，所述雜湊表及所述參考計數器表各自儲存於所述組合資料結構的多個雜湊圓柱體中，所述雜湊表包括多個雜湊桶，所述多個雜湊桶各自包括多個實體線，每一所述實體線儲存資料，所述參考計數器表包括多個參考計數器桶，所述多個參考計數器桶各自包括多個參考計數器，所述方法包括：辨識所述資料的邏輯位址；藉由在所述轉譯表中查找所述邏輯位址的至少一部分來根據所述邏輯位址辨識所述資料的實體線ID(PLID)；對所述多個實體線中的相應實體線進行定位，所述相應實體線對應於所述實體線ID；以及自所述相應實體線擷取所述資料，所述擷取包括將所述多個雜湊圓柱體中的相應雜湊圓柱體拷貝至所述讀取快取，所述相應雜湊圓柱體包括：所述多個雜湊桶中的相應雜湊桶，所述相應雜湊桶包括所述相應實體線；以及所述多個參考計數器桶中的相應參考計數器桶，所述相應參考計數器桶包括與所述相應實體線相關聯的相應參考計數器，其中所述實體線ID是利用應用於所述資料的第一雜湊函數來產生。 A method of retrieving data stored in memory associated with a deduplication module, the deduplication module including a read cache, the memory including translation tables and a combined data structure, the combined data structure Including a hash table and a reference counter table, the hash table and the reference counter table are respectively stored in a plurality of hash cylinders of the combined data structure, the hash table includes a plurality of hash buckets, and the plurality of hash buckets Each includes a plurality of entity lines, each of the entity lines stores data, the reference counter table includes a plurality of reference counter buckets, each of the plurality of reference counter buckets includes a plurality of reference counters, the method includes: identifying the a logical address of the data; identifying a physical line ID (PLID) of the data according to the logical address by looking up at least a portion of the logical address in the translation table; for the plurality of physical lines locating the corresponding entity line corresponding to the entity line ID; and retrieving the data from the corresponding entity line, the retrieving comprising the corresponding hash in the plurality of hash cylinders a cylinder copied to the read cache, the corresponding hash cylinder comprising: a corresponding hash bucket of the plurality of hash buckets, the corresponding hash bucket comprising the corresponding entity line; and the plurality of reference counters a respective reference counter bucket of the buckets, said respective reference counter bucket comprising a respective reference counter associated with said respective entity line, Wherein the entity line ID is generated by using a first hash function applied to the data.

如申請專利範圍第1項所述的方法，更包括基於所述實體線ID來確定所述資料儲存於所述雜湊表中。 The method described in claim 1 further includes determining that the data is stored in the hash table based on the entity line ID.

如申請專利範圍第1項所述的方法，其中所述實體線ID包括指向所述雜湊表中的位置的位址。 The method according to claim 1, wherein the entity line ID includes an address pointing to a location in the hash table.

如申請專利範圍第3項所述的方法，其中所述實體線ID包括：第一辨識符，表示所述資料是儲存於所述雜湊表中還是儲存於溢出記憶體區中；第二辨識符，表示儲存有所述資料的列；以及第三辨識符，表示儲存有所述資料的行。 The method described in item 3 of the scope of patent application, wherein the entity line ID includes: a first identifier indicating whether the data is stored in the hash table or stored in an overflow memory area; a second identifier , indicating the column in which the data is stored; and a third identifier, indicating the row in which the data is stored.

如申請專利範圍第1項所述的方法，其中所述組合資料結構更包括簽名表，所述簽名表包括多個簽名桶，每一所述簽名桶包括多個簽名，且其中所述相應雜湊圓柱體更包括所述多個簽名桶中的相應簽名桶，所述相應簽名桶包括與所述相應實體線相關聯的相應簽名。 The method described in item 1 of the scope of the patent application, wherein the combined data structure further includes a signature table, the signature table includes a plurality of signature buckets, each of the signature buckets includes a plurality of signatures, and wherein the corresponding hash The cylinder further includes a respective signature bucket of the plurality of signature buckets, the respective signature bucket including a respective signature associated with the respective entity line.

如申請專利範圍第5項所述的方法，其中所述實體線ID包括指向所述雜湊表中的位置的位址，且其中所述多個簽名是利用較所述第一雜湊函數小的第二雜湊函數來產生。 The method of claim 5, wherein the entity line ID includes an address pointing to a location in the hash table, and wherein the plurality of signatures are obtained using a second hash function smaller than the first hash function Two hash functions are generated.

如申請專利範圍第1項所述的方法，其中每一所述參考計數器追蹤針對儲存於所述雜湊表中的對應資料進行的去重複次數。 The method described in item 1 of the scope of application, wherein each of the references A counter tracks the number of deduplications performed on corresponding data stored in the hash table.

一種將資料儲存於與去重複引擎相關聯的記憶體中的方法，所述方法包括：辨識將被儲存的所述資料；利用第一雜湊函數來確定與所述資料所應在所述記憶體中的雜湊表中儲存之處對應的第一雜湊值作為實體線ID；將所述資料儲存於所述雜湊表中與所述第一雜湊值對應的位置；利用較所述第一雜湊函數小的第二雜湊函數來確定亦與所述資料所應儲存之處對應的第二雜湊值；將所述第一雜湊值儲存於所述記憶體中的轉譯表中；以及將所述第二雜湊值儲存於所述記憶體中的簽名表中。 A method of storing data in memory associated with a deduplication engine, the method comprising: identifying the data to be stored; using a first hash function to determine where the data should be in the memory The first hash value corresponding to the place stored in the hash table in is used as the entity line ID; the data is stored in the position corresponding to the first hash value in the hash table; determining a second hash value that also corresponds to where the data should be stored; storing the first hash value in a translation table in the memory; and storing the second hash value Values are stored in a signature table in said memory.

如申請專利範圍第8項所述的方法，更包括使參考計數器表中與所述資料對應的參考計數器遞增。 The method described in claim 8 further includes incrementing the reference counter corresponding to the data in the reference counter table.

如申請專利範圍第8項所述的方法，其中所述記憶體包括：所述雜湊表，儲存多個所述資料；所述轉譯表，儲存利用所述第一雜湊函數產生的多個實體線ID(PLID)；所述簽名表，儲存利用所述第二雜湊函數產生的多個簽名；參考計數器表，儲存多個參考計數器，每一所述參考計數器追蹤針對儲存於所述雜湊表中的對應資料進行的去重複次數；以及溢出記憶體區。 The method described in item 8 of the scope of patent application, wherein the memory includes: the hash table storing a plurality of the data; the translation table storing a plurality of entity lines generated by the first hash function ID (PLID); the signature table stores a plurality of signatures generated by the second hash function; the reference counter table stores a plurality of reference counters, each of the reference counters tracking the number of deduplication performed on corresponding data stored in the hash table; and overflowing a memory area.

如申請專利範圍第10項所述的方法，其中所述多個實體線ID中的每一者包括：第一辨識符，表示所述資料是儲存於所述雜湊表中還是儲存於所述溢出記憶體區中；第二辨識符，表示儲存有所述資料的列；以及第三辨識符，表示儲存有所述資料的行。 The method according to claim 10, wherein each of the plurality of entity line IDs includes: a first identifier indicating whether the data is stored in the hash table or in the overflow In the memory area; the second identifier indicates the row storing the data; and the third identifier indicates the row storing the data.

如申請專利範圍第10項所述的方法，其中所述雜湊表、所述簽名表、及所述參考計數器表被整合至組合資料結構中，且其中所述組合資料結構包括多個雜湊圓柱體，每一所述雜湊圓柱體包括：雜湊桶，包括多個實體線；簽名桶，包括與所述多個實體線對應的相應簽名；以及參考計數器桶，包括與所述多個實體線對應的相應參考計數器。 The method of claim 10, wherein said hash table, said signature table, and said reference counter table are integrated into a combined data structure, and wherein said combined data structure comprises a plurality of hash cylinders , each of the hash cylinders includes: a hash bucket, including a plurality of entity lines; a signature bucket, including corresponding signatures corresponding to the plurality of entity lines; and a reference counter bucket, including Corresponding reference counter.

如申請專利範圍第12項所述的方法，其中所述將所述資料儲存於所述雜湊表中與所述第一雜湊值對應的位置包括將所述資料儲存於與所述第一雜湊值對應的所述雜湊桶中，且其中所述將所述第二雜湊值儲存於所述簽名表中包括將所述第二雜湊值儲存於與儲存所述資料的所述雜湊桶對應的所述簽名桶中。 The method described in claim 12 of the patent application, wherein the storing the data in the position corresponding to the first hash value in the hash table includes storing the data in a position corresponding to the first hash value corresponding to the hash bucket, and The storing the second hash value in the signature table includes storing the second hash value in the signature bucket corresponding to the hash bucket storing the data.

一種去重複模組，包括：讀取快取；去重複引擎，自主機系統接收資料擷取請求；以及記憶體，所述記憶體包括：轉譯表；以及組合資料結構，包括：雜湊表，包括多個雜湊桶，每一所述雜湊桶包括多個實體線，每一所述實體線均儲存資料；參考計數器表，包括多個參考計數器桶，每一所述參考計數器桶包括多個參考計數器；以及多個雜湊圓柱體，每一所述雜湊圓柱體包括所述雜湊桶中的一者及所述參考計數器桶中的一者，其中所述資料擷取請求使所述去重複引擎：辨識所述資料的邏輯位址；藉由在所述轉譯表中查找所述邏輯位址的至少一部分來根據所述邏輯位址辨識所述資料的實體線ID(PLID)；對所述多個實體線中的相應實體線進行定位，所述相應實體線對應於所述實體線ID；以及自所述相應實體線擷取所述資料，所述擷取所述資料包括將所述多個雜湊圓柱體中的相應雜湊圓柱體拷貝至所述讀取快取，所述相應雜湊圓柱體包括：所述多個雜湊桶中的相應雜湊桶，所述相應雜湊桶包括所述相應實體線；以及所述多個參考計數器桶中的相應參考計數器桶，所述相應參考計數器桶包括與所述相應實體線相關聯的相應參考計數器，其中所述實體線ID是利用應用於所述資料的第一雜湊函數來產生。 A deduplication module, comprising: a read cache; a deduplication engine, receiving a data retrieval request from a host system; and a memory, the memory comprising: a translation table; and a combined data structure comprising: a hash table, comprising A plurality of hash buckets, each of which includes a plurality of physical lines, each of which stores data; a reference counter table, including a plurality of reference counter buckets, and each of the reference counter buckets includes a plurality of reference counters and a plurality of hash cylinders, each of said hash cylinders comprising one of said hash buckets and one of said reference counter buckets, wherein said data retrieval request causes said deduplication engine to: identify a logical address of the data; identifying a physical line ID (PLID) of the data from the logical address by looking up at least a portion of the logical address in the translation table; for the plurality of entities locating a corresponding entity line in the line, the corresponding entity line corresponding to the entity line ID; and retrieving the data from the corresponding entity line, the retrieving the data comprising Copying a corresponding hash cylinder in the plurality of hash cylinders to the read cache, the corresponding hash cylinder includes: a corresponding hash bucket in the plurality of hash buckets, the corresponding hash bucket includes the said corresponding entity line; and a respective reference counter bucket of said plurality of reference counter buckets, said respective reference counter bucket comprising a respective reference counter associated with said respective entity line, wherein said entity line ID is determined using A first hash function of the data is generated.

如申請專利範圍第14項所述的去重複模組，其中所述資料擷取請求更使所述去重複引擎基於所述實體線ID來確定所述資料儲存於所述雜湊表中。 The deduplication module described in claim 14, wherein the data retrieval request further causes the deduplication engine to determine that the data is stored in the hash table based on the entity line ID.

如申請專利範圍第14項所述的去重複模組，其中所述實體線ID包括指向所述雜湊表中的位置的位址。 The deduplication module according to claim 14, wherein the physical line ID includes an address pointing to a location in the hash table.

如申請專利範圍第16項所述的去重複模組，其中所述實體線ID包括：第一辨識符，表示所述資料是儲存於所述雜湊表中還是儲存於溢出記憶體區中；第二辨識符，表示儲存有所述資料的列；以及第三辨識符，表示儲存有所述資料的行。 The deduplication module described in item 16 of the scope of the patent application, wherein the entity line ID includes: a first identifier indicating whether the data is stored in the hash table or in the overflow memory area; The second identifier indicates the column in which the data is stored; and the third identifier indicates the row in which the data is stored.

如申請專利範圍第14項所述的去重複模組，其中所述組合資料結構更包括簽名表，所述簽名表包括多個簽名桶，每一所述簽名桶包括多個簽名，且其中所述相應雜湊圓柱體更包括所述多個簽名桶中的相應簽名桶，所述相應簽名桶包括與所述相應實體線相關聯的相應簽名。 As the de-duplication module described in item 14 of the scope of patent application, wherein the combined data structure further includes a signature table, and the signature table includes a plurality of signature buckets, each of which includes a plurality of signatures, and wherein the corresponding hash cylinder further includes a corresponding signature bucket of the plurality of signature buckets, the corresponding signature bucket includes a signature associated with the corresponding entity line corresponding signature.

如申請專利範圍第18項所述的去重複模組，其中所述實體線ID包括指向所述雜湊表中的位置的位址，且其中所述多個簽名是利用較所述第一雜湊函數小的第二雜湊函數來產生。 The deduplication module according to claim 18, wherein the physical line ID includes an address pointing to a location in the hash table, and wherein the plurality of signatures are compared using the first hash function A small second hash function is generated.

如申請專利範圍第14項所述的去重複模組，其中每一所述參考計數器追蹤針對儲存於所述雜湊表中的對應資料進行的去重複次數。 The deduplication module according to claim 14, wherein each of the reference counters tracks the number of deduplication performed on the corresponding data stored in the hash table.

一種去重複模組，包括：主機介面；傳送管理器，經由所述主機介面自主機系統接收資料傳送請求；以及多個分區，每一所述分區包括：去重複引擎，自所述傳送管理器接收分區資料請求；多個記憶體控制器；記憶體管理器，位於所述去重複引擎與所述記憶體控制器之間；以及多個記憶體模組，每一所述記憶體模組耦合至所述記憶體控制器中的一者，其中所述去重複引擎將第一雜湊函數應用於資料以產生實體線ID，所述實體線ID指向所述記憶體模組中儲存所述資料的位置。 A deduplication module comprising: a host interface; a transfer manager receiving data transfer requests from a host system via the host interface; and a plurality of partitions, each of the partitions comprising: a deduplication engine from the transfer manager receiving partition data requests; a plurality of memory controllers; a memory manager located between the deduplication engine and the memory controller; and a plurality of memory modules, each of which is coupled to the memory modules to one of the memory controllers, wherein the deduplication engine applies a first hash function to data to produce entity A line ID, the physical line ID points to the location where the data is stored in the memory module.

一種去重複模組，包括：讀取快取；記憶體，所述記憶體包括：轉譯表；以及雜湊表，包括多個雜湊桶，每一所述雜湊桶包括多個實體線，每一所述實體線均儲存資料；參考計數器表，包括多個參考計數器桶，每一所述參考計數器桶包括多個參考計數器；以及去重複引擎，存取所述讀取快取，且辨識所述多個雜湊桶中的第一雜湊桶的V個虛擬桶，所述虛擬桶是所述多個雜湊桶中的位於所述第一雜湊桶附近的其他雜湊桶，所述虛擬桶將在所述第一雜湊桶已滿時儲存所述第一雜湊桶的資料中的部分資料，V是整數且基於所述第一雜湊桶的虛擬桶有多滿來動態地設定，其中所述去重複引擎將第一雜湊函數應用於第一資料以產生實體線ID，所述實體線ID指向儲存所述第一資料的對應的所述實體線。 A deduplication module, comprising: a read cache; a memory, the memory including: a translation table; and a hash table, including a plurality of hash buckets, each of which includes a plurality of physical lines, each Each of the physical lines stores data; a reference counter table includes a plurality of reference counter buckets, and each of the reference counter buckets includes a plurality of reference counters; and a deduplication engine accesses the read cache and identifies the multiple reference counters V virtual buckets of the first hash bucket in a hash bucket, the virtual buckets are other hash buckets near the first hash bucket in the plurality of hash buckets, and the virtual buckets will be in the first hash bucket Store part of the data in the first hash bucket when a hash bucket is full, V is an integer and is dynamically set based on how full the virtual bucket of the first hash bucket is, wherein the deduplication engine A hash function is applied to the first data to generate an entity line ID pointing to the corresponding entity line storing the first data.