TWI798536B - Method and computer program product for analyzing failures on flash data - Google Patents

Method and computer program product for analyzing failures on flash data Download PDF

Info

Publication number
TWI798536B
TWI798536B TW109106865A TW109106865A TWI798536B TW I798536 B TWI798536 B TW I798536B TW 109106865 A TW109106865 A TW 109106865A TW 109106865 A TW109106865 A TW 109106865A TW I798536 B TWI798536 B TW I798536B
Authority
TW
Taiwan
Prior art keywords
mentioned
data
storage device
block
processing unit
Prior art date
Application number
TW109106865A
Other languages
Chinese (zh)
Other versions
TW202134876A (en
Inventor
袁竟成
Original Assignee
慧榮科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 慧榮科技股份有限公司 filed Critical 慧榮科技股份有限公司
Priority to TW109106865A priority Critical patent/TWI798536B/en
Publication of TW202134876A publication Critical patent/TW202134876A/en
Application granted granted Critical
Publication of TWI798536B publication Critical patent/TWI798536B/en

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure describes a method for analyzing failures on flash data, which is performed by a processing unit of an analysis host when loading and executing software program code. The processing unit is coupled to a peripheral storage interface to connect to a malfunctioned storage device. The method includes obtaining data that can be used in failure analysis, and backing-up the obtained data in a local storage device, rather than backing-up the whole data stored in the storage device, to facilitate preliminary analysis.

Description

閃存資料故障分析的方法及電腦程式產品 Flash memory data fault analysis method and computer program product

本發明涉及儲存裝置,尤指一種閃存資料故障分析的方法及電腦程式產品。 The invention relates to a storage device, in particular to a method for analyzing faults of flash memory data and a computer program product.

隨著對記憶體裝置(如動態隨機存取記憶體、閃存等)的儲存空間及電路微型化的需求增加,記憶體製造商追求在不犧牲珍貴母板空間的情況下提升晶片的性能。晶片設計者不使用傳統的單一晶片或多晶片封裝,反而採用多晶粒堆疊的解決方案。多晶粒(Dies,如多個記憶體晶片)堆疊不只降低封裝的整體面積,還在推動信號時消耗較少的能量並獲得更快的傳輸速度,進而有效提升電性信號的效能。堆疊晶粒的打線技術用以使堆疊的記憶體晶片及印刷電路板間彼此連接,進而使控制器通過基底耦接或連接上記憶體晶片。之後,堆疊的記憶體晶片、控制器和基底再封裝起來,成為單一的積體電路(Integrated Circuits,IC)。然而,當記憶體晶片中儲存的資料發生重大的錯誤並需要故障分析時,可能需要破壞封裝好的晶片。 因此,本發明提出一種閃存資料故障分析的方法及電腦程式產品,用於在進行破壞性分析前有效率地備份資料,有利於以後的故障分析。 As the demand for storage space and circuit miniaturization of memory devices (such as DRAM, flash memory, etc.) increases, memory manufacturers seek to increase chip performance without sacrificing precious motherboard space. Instead of using traditional single-die or multi-die packages, chip designers are using multi-die stacking solutions. Multi-die (Dies, such as multiple memory chips) stacking not only reduces the overall area of the package, but also consumes less energy when driving signals and obtains faster transmission speeds, thereby effectively improving the performance of electrical signals. The wire bonding technology of stacking die is used to connect the stacked memory chip and the printed circuit board to each other, so that the controller is coupled or connected to the memory chip through the substrate. Afterwards, the stacked memory chip, controller and substrate are packaged again to form a single integrated circuit (Integrated Circuits, IC). However, when a major error occurs in the data stored in the memory chip and failure analysis is required, it may be necessary to destroy the packaged chip. Therefore, the present invention proposes a flash memory data failure analysis method and a computer program product, which are used for efficiently backing up data before destructive analysis, which is beneficial to future failure analysis.

有鑑於此,如何減輕或消除上述相關領域的缺失,實為有待解決的問題。 In view of this, how to alleviate or eliminate the deficiencies in the above-mentioned related fields is a problem to be solved.

本說明書涉及一種閃存資料故障分析的方法,由分析主機的處理單 元於載入並執行軟體的程式碼時實施。處理單元耦接周邊儲存介面,用於連接其上發生故障的儲存裝置。其方法包含:獲取儲存裝置中用來進行故障分析的資料並備份在本地儲存裝置,而不完整備份上述儲存裝置中的所有資料,從而協助進行初步分析。 This specification relates to a method for fault analysis of flash memory data, by analyzing the processing unit of the host Elements are implemented when the software's code is loaded and executed. The processing unit is coupled to the peripheral storage interface, and is used for connecting the storage device on which a failure occurs. The method includes: obtaining the data used for failure analysis in the storage device and backing up the data in the local storage device, but not completely backing up all the data in the storage device, so as to assist in preliminary analysis.

本說明書另涉及一種電腦程式產品,用於閃存資料故障分析,包含由分析主機的處理單元載入並執行的程式碼。處理單元耦接周邊儲存介面,用於連接其上發生故障的儲存裝置。其程式碼包含:獲取儲存裝置中用來進行故障分析的資料並備份在本地儲存裝置,而不完整備份上述儲存裝置中的所有資料,從而協助進行初步分析。 The present specification also relates to a computer program product used for fault analysis of flash memory data, including program codes loaded and executed by the processing unit of the analysis host. The processing unit is coupled to the peripheral storage interface, and is used for connecting the storage device on which a failure occurs. Its program code includes: obtaining the data used for fault analysis in the storage device and backing it up in the local storage device, but not fully backing up all the data in the above storage device, thereby assisting in preliminary analysis.

上述實施例的優點之一,通過如上所述的獲取和備份的技術手段,避免完整備份所有資料花費大量時間。 One of the advantages of the above-mentioned embodiment is that it avoids spending a lot of time to completely back up all data through the technical means of acquisition and backup as described above.

上述實施例的另一優點,通過如上所述的獲取和備份的技術手段,避免儲存模組中用戶私密或機密的訊息被不法人員從用來做備份的本地儲存裝置竊取。 Another advantage of the above-mentioned embodiment is that through the above-mentioned acquisition and backup technical means, the user's private or confidential information in the storage module is prevented from being stolen by illegal personnel from the local storage device used for backup.

本發明的其他優點將搭配以下的說明和圖式進行更詳細的解說。 Other advantages of the present invention will be explained in more detail with the following description and drawings.

10:故障分析系統 10: Fault analysis system

100:儲存裝置 100: storage device

110:儲存IC 110: storage IC

120:控制器 120: Controller

130:儲存模組 130: storage module

140:基板 140: Substrate

150:印刷電路板 150: printed circuit board

160:連接器 160: connector

200:電腦 200: computer

210:主機 210: Host

211:處理單元 211: processing unit

212:記憶體 212: Memory

213:顯示介面 213: display interface

215:輸入介面 215: input interface

217:周邊儲存介面 217:Peripheral storage interface

219:本地儲存裝置 219:Local storage device

230:顯示器 230: Display

250:輸入裝置 250: input device

310:處理單元 310: processing unit

320:主機介面 320: host interface

340:閃存介面 340: Flash memory interface

360:RAM 360: RAM

S410~S460:方法步驟 S410~S460: method steps

S510~S580:方法步驟 S510~S580: method steps

S610~S680:方法步驟 S610~S680: method steps

圖1為依據本發明實施例的故障分析系統示意圖。 FIG. 1 is a schematic diagram of a fault analysis system according to an embodiment of the present invention.

圖2為依據本發明實施例的分析主機的方塊圖。 FIG. 2 is a block diagram of an analysis host according to an embodiment of the present invention.

圖3為依據本發明實施例的儲存裝置的方塊圖。 FIG. 3 is a block diagram of a storage device according to an embodiment of the invention.

圖4為依據本發明實施例的故障分析方法的流程圖。 FIG. 4 is a flowchart of a fault analysis method according to an embodiment of the present invention.

圖5為依據本發明實施例的資料備份方法的流程圖。 FIG. 5 is a flowchart of a data backup method according to an embodiment of the present invention.

圖6為依據本發明實施例的故障分析方法的流程圖。 FIG. 6 is a flowchart of a fault analysis method according to an embodiment of the present invention.

以下說明為完成發明的較佳實現方式,其目的在於描述本發明的基本精神,但並不用以限定本發明。實際的發明內容必須參考之後的權利要求範圍。 The following description is a preferred implementation mode of the invention, and its purpose is to describe the basic spirit of the invention, but not to limit the invention. For the actual content of the invention, reference must be made to the scope of the claims that follow.

必須了解的是,使用於本說明書中的“包含”、“包括”等詞,用以表 示存在特定的技術特徵、數值、方法步驟、作業處理、元件以及/或組件,但並不排除可加上更多的技術特徵、數值、方法步驟、作業處理、元件、組件,或以上的任意組合。 It must be understood that words such as "comprises" and "including" used in this specification are used to express It shows that there are specific technical characteristics, values, method steps, operation processes, components and/or components, but it does not exclude the addition of more technical characteristics, values, method steps, operation processes, components, components, or any of the above combination.

於權利要求中使用如“第一”、“第二”、“第三”等詞是用來修飾權利要求中的元件,並非用來表示之間具有優先順序,前置關係,或者是一個元件先於另一個元件,或者是執行方法步驟時的時間先後順序,僅用來區別具有相同名字的元件。 Words such as "first", "second", and "third" used in the claims are used to modify the elements in the claims, and are not used to indicate that there is a priority order, a pre-relationship, or an element An element preceding another element, or a chronological order in performing method steps, is only used to distinguish elements with the same name.

參考圖1,故障分析系統10可至少包含儲存裝置100(待分析物件)和電腦200。儲存裝置100可以是記憶卡(Memory Card)、隨身碟(Portable Disk)、固態硬碟(Solid State Disk,SSD)等,至少包含儲存積體電路(Integrated Circuits,IC)110、印刷電路板(Printed Circuit Board,PCB)150及連接器160。儲存裝置100可配置於各式各樣的電子產品中,例如個人電腦、筆記本電腦(Laptop PC)、平板電腦、手機、數位相機、數位攝影機等電子產品。電腦200至少包含主機210、顯示器230和輸入裝置250。顯示器230可以是薄膜電晶體液晶面板(Thin Film Transistor-Liquid Crystal Display,TFT-LCD panel)、有機發光二極管面板(Organic Light-Emitting Diode,OLED panel)等,用於顯示代表儲存裝置100中儲存內容的文字、數字、符號、圖案,或應用程式提供給分析工程師觀看的畫面。輸入裝置250可包含鍵盤、滑鼠、觸控面板等,用於使分析工程師輸入訊息或控制動作,以完成儲存裝置100中儲存資料的下載和故障分析。 Referring to FIG. 1 , the failure analysis system 10 may at least include a storage device 100 (object to be analyzed) and a computer 200 . The storage device 100 can be a memory card (Memory Card), a portable disk (Portable Disk), a solid state disk (Solid State Disk, SSD), etc., and at least includes a storage integrated circuit (Integrated Circuits, IC) 110, a printed circuit board (Printed Circuit Board (PCB) 150 and connector 160. The storage device 100 can be configured in various electronic products, such as personal computers, notebook computers (Laptop PC), tablet computers, mobile phones, digital cameras, digital video cameras and other electronic products. The computer 200 includes at least a host 210 , a display 230 and an input device 250 . The display 230 may be a thin film transistor liquid crystal panel (Thin Film Transistor-Liquid Crystal Display, TFT-LCD panel), an organic light-emitting diode panel (Organic Light-Emitting Diode, OLED panel), etc., for displaying the stored content in the representative storage device 100 The text, numbers, symbols, patterns, or the screen provided by the application program to the analysis engineer. The input device 250 may include a keyboard, a mouse, a touch panel, etc., and is used for an analysis engineer to input information or control actions, so as to complete downloading of data stored in the storage device 100 and fault analysis.

儲存積體電路110可至少包含控制器120、儲存模組130和基板(Substrate)140。控制器120及儲存模組130可安裝(Mount)在基板140上,使得控制器120可通過基板140中的電路耦接或連接儲存模組130。儲存模組130可包含多個NAND閃存晶粒,堆疊於基板140之上,提供大量的儲存空間,通常是數百千兆位元組 (Gigabytes,GB),甚至是數個兆兆位元組(Terabytes,TB),用於儲存用戶資料,例如高分辨率圖片、影片等。每個NAND晶粒中的儲存單元可包含單層式單元(Single-Level Cells,SLCs)、多層式單元(Multi-Level Cells,MLCs)、三層式單元(Triple Level Cells,TLCs)、四層式單元(Quad-Level Cells,QLCs)或上述任意組合。控制器120可採用開放NAND閃存同步模式(Open NAND Flash Interface ONFI Sync)、開放NAND閃存非同步模式(ONFI Async)、雙倍資料率開關(DDR Toggle)或其他通信協議與儲存模組130通信。控制器120、儲存模組130和基板140可封裝為單一的積體電路100,並且安裝在印刷電路板150,使得控制器120能夠通過印刷電路板150中的電路和連接器160耦接主機210。 The storage integrated circuit 110 may at least include a controller 120 , a storage module 130 and a substrate 140 . The controller 120 and the storage module 130 can be mounted on the substrate 140 , so that the controller 120 can be coupled or connected to the storage module 130 through the circuit in the substrate 140 . The storage module 130 may include multiple NAND flash memory dies stacked on the substrate 140 to provide a large amount of storage space, usually hundreds of gigabytes (Gigabytes, GB), or even several terabytes (Terabytes, TB), used to store user data, such as high-resolution pictures, videos, etc. The storage cells in each NAND grain can include single-level cells (Single-Level Cells, SLCs), multi-level cells (Multi-Level Cells, MLCs), triple-level cells (Triple Level Cells, TLCs), quadruple Formula units (Quad-Level Cells, QLCs) or any combination of the above. The controller 120 can communicate with the storage module 130 using Open NAND Flash Interface ONFI Sync, Open NAND Flash Interface ONFI Async, Double Data Rate Switch (DDR Toggle) or other communication protocols. The controller 120, the storage module 130 and the substrate 140 can be packaged into a single integrated circuit 100 and installed on the printed circuit board 150, so that the controller 120 can be coupled to the host 210 through the circuit in the printed circuit board 150 and the connector 160 .

參考圖2,主機(或稱為分析主機)210可至少包含處理單元211、記憶體212、顯示介面213、輸入介面215、周邊儲存介面(Peripheral Storage Interface)217和本地儲存裝置219。處理單元211可使用多種方式實施,如使用通用硬體(例如,單處理器、具有並行處理能力的多處理器、圖形處理器或其他具有運算能力的處理器),並且在執行故障分析應用程式的程式碼時,提供如後所述的功能。周邊儲存介面217可以是通用序列匯流排(Universal Serial Bus,USB)、先進技術附件(Advanced Technology Attachment,ATA)、序列先進技術附件(Serial Advanced Technology Attachment,SATA)、快速周邊元件互聯(Peripheral Component Interconnect Express,PCI-E)、通用閃存儲存(Universal Flash Storage,UFS)、快速非揮發性記憶體(Non-Volatile Memory Express,NVMe)、開放通道固態硬碟(Open-Channel SSD)、嵌入式多媒體卡(Embedded MultiMedia Card,eMMC)或其他介面,用於連接儲存裝置100。處理單元211在載入及執行故障分析應用程式時,通過周邊儲存介面217發送命令給儲存裝置100,用於讀取儲 存裝置100中的資料,以進行進一步的故障分析。記憶體212可為動態隨機存取記憶體(Dynamic Random Access Memory,DRAM),提供揮發性的儲存空間,用於緩存處理單元211在執行故障分析應用程式的過程中所需要的資料,例如,變數、資料表等,以及從儲存裝置100讀取的資料。本地儲存裝置219可為硬碟(Hard Disk)、固態硬碟等,提供非揮發性的儲存空間,用於永久性儲存從儲存裝置100讀取的資料。處理單元211另可通過顯示介面213和輸入介面215分別連接顯示器230和輸入裝置250。 Referring to FIG. 2, the host (or analysis host) 210 may at least include a processing unit 211, a memory 212, a display interface 213, an input interface 215, a peripheral storage interface (Peripheral Storage Interface) 217 and a local storage device 219. The processing unit 211 can be implemented in various ways, such as using general-purpose hardware (for example, a single processor, a multi-processor with parallel processing capability, a graphics processor, or other processors with computing capabilities), and when executing a fault analysis application program When the code of the , provides the functions described below. The peripheral storage interface 217 can be Universal Serial Bus (Universal Serial Bus, USB), Advanced Technology Attachment (Advanced Technology Attachment, ATA), Serial Advanced Technology Attachment (Serial Advanced Technology Attachment, SATA), fast peripheral component interconnection (Peripheral Component Interconnect Express, PCI-E), Universal Flash Storage (UFS), Fast Non-Volatile Memory Express (NVMe), Open-Channel SSD (Open-Channel SSD), Embedded Multimedia Card (Embedded MultiMedia Card, eMMC) or other interfaces for connecting to the storage device 100 . When the processing unit 211 loads and executes the fault analysis application program, it sends a command to the storage device 100 through the peripheral storage interface 217 for reading the storage device 100. Store the data in the device 100 for further fault analysis. The memory 212 can be a dynamic random access memory (Dynamic Random Access Memory, DRAM), providing a volatile storage space for caching the data required by the processing unit 211 during the execution of the fault analysis application program, for example, variables , data tables, etc., and the data read from the storage device 100 . The local storage device 219 can be a hard disk (Hard Disk), a solid state disk, etc., providing non-volatile storage space for permanently storing data read from the storage device 100 . The processing unit 211 can also be connected to the display 230 and the input device 250 through the display interface 213 and the input interface 215 respectively.

參考圖3,控制器120至少包含處理單元310、主機介面320、閃存介面340和隨機存取記憶體(Random Access Memory,RAM)360。處理單元310可使用多種方式實施,如使用通用硬體(例如,單處理器、具有並行處理能力的多處理器、圖形處理器或其他具有運算能力的處理器),並且在執行資料備份應用程式的程式碼時,提供如後所述的功能。主機介面320,如USB、ATA、SATA、PCI-E、UFS、NVMe、Open-Channel SSD、eMMC或其他介面,連接至分析主機210的周邊儲存介面217,用於使處理單元310通過主機介面320從分析主機310(更詳細的說,處理單元211)接收命令後啟動並執行備份方法,以及通過主機介面320回覆儲存模組130中儲存的資料和相關訊息給分析主機310。閃存介面340可以是ONFI Sync、ONFI Async、DDR Toggle或其他介面。RAM 360可以是動態隨機存取記憶體(Dynamic Random Access Memory,DRAM)或者是靜態隨機存取記憶體(Static Random Access Memory,SRAM),用於儲存資料備份應用程式的程式碼和從儲存模組130讀取的資料。在另一些實施例中,RAM 360可被設置在儲存積體電路110之外的印刷電路板150上,並且處理單元310通過記憶體控制器(Memory Controller,未顯示於圖3)存取RAM 360,本發明並不因此侷限。 Referring to FIG. 3 , the controller 120 includes at least a processing unit 310 , a host interface 320 , a flash memory interface 340 and a random access memory (Random Access Memory, RAM) 360 . The processing unit 310 can be implemented in a variety of ways, such as using general-purpose hardware (for example, a single processor, a multi-processor with parallel processing capability, a graphics processor or other processors with computing capabilities), and when executing data backup applications When the code of the , provides the functions described below. The host interface 320, such as USB, ATA, SATA, PCI-E, UFS, NVMe, Open-Channel SSD, eMMC or other interfaces, is connected to the peripheral storage interface 217 of the analysis host 210 for enabling the processing unit 310 to pass through the host interface 320 Start and execute the backup method after receiving a command from the analysis host 310 (more specifically, the processing unit 211 ), and reply the data and related information stored in the storage module 130 to the analysis host 310 through the host interface 320 . The flash memory interface 340 can be ONFI Sync, ONFI Async, DDR Toggle or other interfaces. The RAM 360 can be a dynamic random access memory (Dynamic Random Access Memory, DRAM) or a static random access memory (Static Random Access Memory, SRAM), which is used to store the program code of the data backup application program and the slave storage module 130 read the data. In other embodiments, the RAM 360 may be disposed on the printed circuit board 150 outside the storage integrated circuit 110, and the processing unit 310 accesses the RAM 360 through a memory controller (Memory Controller, not shown in FIG. 3 ). , the present invention is not limited thereby.

當儲存裝置100故障或者被判斷成不良品時,需要進行故障分析, 用於找出故障的原因並據以改善儲存裝置100中特定元件的製造工序,或者修正由處理單元310載入並執行的韌體的程式碼等。在進行破壞性分析(如硬體破壞、韌體破壞、儲存資料破壞等)前,分析主機210可先備份儲存模組130中儲存的資料,並且依據備份資料進行初步分析。在一些實施方式,分析主機210可備份儲存模組130中儲存的所有資料。然而,儲存模組130的容量龐大,可能為從數個GB到數個TB不等,完整備份所有資料將花費大量的時間,可能需要6~12小時,不利於快速產生初步報告。此外,完整備份儲存模組130中的所有資料將佔據大量的本地儲存裝置219的空間,並且儲存模組130中用戶私密或機密的訊息也容易被不法人員竊取。 When the storage device 100 fails or is judged to be a defective product, a failure analysis is required, It is used to find out the cause of the failure and improve the manufacturing process of specific components in the storage device 100 accordingly, or to modify the program code of the firmware loaded and executed by the processing unit 310 . Before performing destructive analysis (such as hardware damage, firmware damage, stored data damage, etc.), the analysis host 210 can first back up the data stored in the storage module 130, and perform preliminary analysis based on the backup data. In some embodiments, the analysis host 210 can back up all data stored in the storage module 130 . However, the capacity of the storage module 130 is huge, ranging from several GB to several TB, and it will take a lot of time to completely back up all the data, which may take 6-12 hours, which is not conducive to quickly generate a preliminary report. In addition, a full backup of all the data in the storage module 130 will occupy a large amount of space in the local storage device 219 , and the user's private or confidential information in the storage module 130 is also easy to be stolen by criminals.

為瞭解決如上所述的問題,本發明提出一種閃存資料故障分析方法,用於獲取和備份儲存裝置100中用來進行故障分析的關鍵資料,而不完整備份儲存裝置100中的所有資料,從而協助進行初步分析。 In order to solve the above-mentioned problems, the present invention proposes a flash memory data fault analysis method for obtaining and backing up key data used for fault analysis in the storage device 100, instead of completely backing up all the data in the storage device 100, thereby Assist in initial analysis.

在一種實施例中,分析主機210負責決定儲存裝置100中需要備份的關鍵資料是哪些,並統合儲存裝置100傳送上來的資料並產生能夠被工程師、電腦執行的故障分析程式碼,或者以上兩者所解析的檔案。參考圖4顯示的由處理單元211載入和執行的故障分析模組的程式碼而實施的故障分析方法,詳細說明如下: In one embodiment, the analysis host 210 is responsible for determining the key data that needs to be backed up in the storage device 100, and integrates the data sent by the storage device 100 to generate fault analysis code that can be executed by engineers, computers, or both. The parsed file. Referring to the fault analysis method implemented by the program code of the fault analysis module loaded and executed by the processing unit 211 shown in FIG. 4, the detailed description is as follows:

步驟S410:獲取儲存裝置100的設備資訊。處理單元211可通過周邊儲存介面217發送標準定義的一個或多個管理命令(Administration Command或者Management Command)或者儲存IC 110的製造商自行定義的一個或多個命令(也可以稱為供應商命令,Vendor Command)給儲存裝置100,並且從儲存裝置100接收回覆的設備資訊,例如儲存IC 110的廠商標識(Factory Identifier,FID)、控制器120的標識、儲存模組130的標識、其中配備的NAND閃存類型、NAND閃存結構(如平面、通道、交錯的數量等)、每個頁面的長度、每個實體塊包含的頁面數量等訊息。 Step S410: Obtain device information of the storage device 100 . The processing unit 211 can send one or more management commands (Administration Command or Management Command) defined by the standard through the peripheral storage interface 217 or one or more commands defined by the manufacturer of the storage IC 110 (also referred to as supplier commands, Vendor Command) to the storage device 100, and receive the reply device information from the storage device 100, such as the factory identifier (Factory Identifier, FID) of the storage IC 110, the identifier of the controller 120, the identifier of the storage module 130, the NAND equipped therein Flash memory type, NAND flash memory structure (such as plane, channel, number of interleaves, etc.), length of each page, number of pages contained in each physical block, etc.

步驟S420:依據儲存裝置100的設備資訊產生設備描述檔案和設備基本參數檔案,並且儲存到本地儲存裝置219。為了方便解讀,設備描述檔案的內容可編碼為可延伸標記式語言文件(Extensible Markup Language,XML document),包含控制器120的標識、儲存模組130的標識、其中配備的NAND閃存類型、NAND閃存結構、每個頁面的長度、每個實體塊包含的頁面數量等資訊。設備基本參數檔案可為512位元組或者1024位元組的二進制檔案(binary file),包含從儲存裝置100轉儲(Dump)的儲存IC 110的FID。 Step S420 : Generate a device description file and a device basic parameter file according to the device information of the storage device 100 , and store them in the local storage device 219 . For the convenience of interpretation, the content of the device description file can be encoded as an Extensible Markup Language (XML document), including the identification of the controller 120, the identification of the storage module 130, the type of NAND flash memory equipped therein, and the NAND flash memory structure, the length of each page, and the number of pages contained in each entity block. The device basic parameter file can be a 512-byte or 1024-byte binary file (binary file), including the FID of the storage IC 110 dumped from the storage device 100 .

步驟S430:產生空的備份資料列表檔案和資料檔案並儲存到本地儲存裝置219,用於容納在儲存裝置100(詳細來說,儲存模組130)中能夠用來儲存進行故障分析的關鍵資料。備份資料列表檔案的內容可編碼為文字檔案(Text File),有利於工程師和/或程式碼解讀,包含多行(Rows),每一行的長度是固定的,用於記錄除了原始壞塊(Original Bad Block)以外的每一塊的每個頁面的元資料。資料檔案可為二進制檔案。如果儲存模組130中的一個頁面儲存了系統資料,這些系統資料會儲存在資料檔案中的一塊連續區域,並且在備份資料列表檔案的相應行中的特定欄位(Field)儲存這個區域位置資訊,如開始位置和長度的訊息。 Step S430: Generate empty backup data list files and data files and store them in the local storage device 219 for storing key data for fault analysis in the storage device 100 (specifically, the storage module 130 ). The content of the backup data list file can be encoded as a text file (Text File), which is beneficial to engineers and/or code interpretation. It contains multiple lines (Rows), and the length of each line is fixed. Bad Block) Metadata for each page of each block. The data files can be binary files. If a page in the storage module 130 stores system data, these system data will be stored in a continuous area in the data file, and the specific field (Field) in the corresponding row of the backup data list file will store the location information of this area , such as the start position and length of the message.

步驟S440:將變數i設為(或稱初始為)0。儲存模組130包含多個實體塊,每個實體塊包含多個頁面。雖然儲存模組130的硬體設置為多個通道,每個通道連接到一個或多個晶粒,並且每個晶粒包含多個實體塊(可以使用邏輯單元號、塊編號等資訊來標識),但是處理單元310可依據實體設置的規則將所有的實體塊邏輯性地安排成一個序列,並且使用變數i來指出目前要處理的實體塊是哪一個。在這裡需要注意的是,在現行的大部份標準中,為了降低操作儲存裝置100的複雜度,在正常模式下(Normal Mode,也就是不在工廠模式下),主機端執行的用於驅動儲存裝置100的程式碼並沒有儲存 模組130的實體架構的知識。但是,由於故障分析應用程式的程式碼是由儲存IC 110的製造商提供給儲存裝置100的製造商並運行在工廠模式下,因此才具備操作儲存模組130的知識。 Step S440: Set the variable i to (or call it initially) 0. The storage module 130 includes multiple physical blocks, and each physical block includes multiple pages. Although the hardware of the storage module 130 is configured as multiple channels, each channel is connected to one or more dies, and each die contains multiple physical blocks (which can be identified by using logical unit numbers, block numbers, etc.) , but the processing unit 310 can logically arrange all the entity blocks into a sequence according to the rules set by the entity, and use the variable i to indicate which entity block is currently being processed. It should be noted here that in most current standards, in order to reduce the complexity of operating the storage device 100, in the normal mode (Normal Mode, that is, not in the factory mode), the host-side execution is used to drive the storage device. The code for device 100 is not saved Knowledge of the entity architecture of the module 130 . However, since the code of the fault analysis application is provided by the manufacturer of the storage IC 110 to the manufacturer of the storage device 100 and runs in factory mode, only the knowledge of operating the storage module 130 is available.

接著,處理單元211反覆執行一個迴圈(loop,包含步驟S451至步驟S459),用於獲取儲存模組130的每個實體塊中能夠用來進行故障分析的關鍵資料,一直到所有實體塊都已經處理完畢為止(步驟S457中「是」的路徑)。 Next, the processing unit 211 repeatedly executes a loop (loop, including steps S451 to S459), for obtaining key data that can be used for fault analysis in each physical block of the storage module 130, until all physical blocks are Until the process has been completed ("Yes" path in step S457).

步驟S451:判斷第i個實體塊是否為原始壞塊(Original Bad Block)。如果是,則處理單元211執行步驟S457,否則,執行步驟S453。儲存裝置100在離開工廠前的開卡程序中會針對儲存模組130中的所有實體塊進行測試,用於偵測出哪些實體塊是壞塊(這裡稱為原始壞塊),將這些壞塊的實體位置資訊記錄到原始壞塊表(Original-Bad-Block Table),並且儲存在儲存模組130中的特定實體塊。在備份流程的一開始,處理單元211可先通過周邊儲存介面217發出特定供應商命令給儲存裝置100(詳細的說,控制器120)從特定實體塊讀取原始壞塊表,並儲存回覆的原始壞塊表在記憶體212中以供快速查找。處理單元310在此步驟可搜索原始壞塊表的內容並判斷第i個實體塊的實體位置資訊是否儲存於原始壞塊表之中。如果是,則代表第i個實體塊是原始壞塊,可以不進行資料備份,用於節約資料讀取的時間。總的來說,通過步驟S451,處理單元211可依據原始壞塊表的內容跳過原始壞塊而不處理,用於節約資料備份的時間。 Step S451: Determine whether the i-th physical block is an original bad block (Original Bad Block). If yes, the processing unit 211 executes step S457, otherwise, executes step S453. The storage device 100 will test all the physical blocks in the storage module 130 in the card opening procedure before leaving the factory to detect which physical blocks are bad blocks (herein referred to as original bad blocks), and these bad blocks The physical location information of is recorded in the original bad block table (Original-Bad-Block Table), and stored in the specific physical block in the storage module 130 . At the beginning of the backup process, the processing unit 211 can first send a specific supplier command to the storage device 100 (in detail, the controller 120) through the peripheral storage interface 217 to read the original bad block table from a specific physical block, and store the reply. The original bad block table is in the memory 212 for quick lookup. In this step, the processing unit 310 can search the content of the original bad block table and determine whether the physical location information of the i-th physical block is stored in the original bad block table. If yes, it means that the i-th physical block is an original bad block, and data backup may not be performed to save time for data reading. In general, through step S451, the processing unit 211 can skip the original bad block without processing according to the content of the original bad block table, so as to save time for data backup.

步驟S453:獲取第i個實體塊的類型資訊。處理單元310可通過周邊儲存介面217發出特定供應商命令給儲存裝置100(詳細的說,控制器120),用於讀取第i個實體塊的第0個頁面,並從第0個頁面中的元資料(Meta Data)獲取第i個實體塊的類型資訊。除了原始壞塊以外,每個實體塊可使用第0個頁面中元資料的一個位元組(Byte)來標識為系統塊(System Block)、鏈接塊(Link Block,或者稱為 映射塊,Map Block)、緩存塊(Cache Block,或者成為用戶資料塊User-data Block)、母塊(Mother Block)、子塊(Child Block)等,並使用第0個頁面中元資料的另一個位元組來標識是否為新壞塊(New Bad Block)。 Step S453: Obtain type information of the i-th entity block. The processing unit 310 can send a specific supplier command to the storage device 100 (in detail, the controller 120) through the peripheral storage interface 217, for reading the 0th page of the i-th physical block, and from the 0th page Get the type information of the i-th entity block from Meta Data. In addition to the original bad block, each physical block can be identified as a system block (System Block), link block (Link Block, or called Mapping block, Map Block), cache block (Cache Block, or user data block User-data Block), mother block (Mother Block), child block (Child Block), etc., and use another metadata in the 0th page A byte to identify whether it is a new bad block (New Bad Block).

系統塊中的每個頁面可用於儲存元資料和系統資料(System Data),例如包含控制器120和儲存模組130的輪廓資料(Profile Data)、原始壞塊表、原始壞行表(Original Bad Column Table)、區段起始表、系統內程式碼(In-System Programming,ISP Code)等,或上述任意組合,其中元資料是用來描述同個頁面中的系統資料的資料。 Each page in the system block can be used to store metadata and system data (System Data), for example including the profile data (Profile Data) of the controller 120 and the storage module 130, the original bad block table, the original bad row table (Original Bad Column Table), section start table, In-System Programming (ISP Code), etc., or any combination of the above, wherein the metadata is used to describe the system data in the same page.

鏈接塊中的每個頁面可用於儲存元資料和系統資料,例如包含高階群組表(High-level Group Table)、邏輯實體位置映射表(Logical-To-Physical,L2P Table)、實體邏輯位置映射表(Physical-To-Logical,P2L Table)、主機閃存位置映射表(Host-To-Flash,H2F Table)、閃存主機位置映射表(Flash-To-Host,F2H Table),或上述任意組合,其中元資料是用來描述同個頁面中的系統資料的資料。 Each page in the link block can be used to store metadata and system data, such as high-level group table (High-level Group Table), logical entity location mapping table (Logical-To-Physical, L2P Table), entity logical location mapping Table (Physical-To-Logical, P2L Table), host flash memory location mapping table (Host-To-Flash, H2F Table), flash memory host location mapping table (Flash-To-Host, F2H Table), or any combination of the above, wherein Metadata is the data used to describe the system data in the same page.

緩存塊中除了塊結束頁面(End-Of-Block,EOB Page)以外的每個頁面可用於儲存元資料和用戶資料(User Data),其中元資料是用來描述同個頁面中的用戶資料的資料。用戶資料無法被控制器120解讀,但是其是能夠被主機端210執行的作業系統、驅動程式或特定應用軟體解讀的資料,而這些用戶資料可能包含了用戶的私密或機密訊息。緩存塊的塊結束頁面是一種特殊的系統頁面,指出此緩存塊到這裡結束,之後的頁面沒有儲存任何有用的資料,用於儲存元資料和系統資料,其中元資料是用來描述同個頁面中的系統資料的資料。 Each page in the cache block except the End-Of-Block (EOB Page) can be used to store metadata and user data (User Data), where metadata is used to describe user data in the same page material. The user data cannot be interpreted by the controller 120 , but it is data that can be interpreted by the operating system, driver program or specific application software executed by the host 210 , and these user data may include the user's private or confidential information. The block end page of the cache block is a special system page, indicating that the cache block ends here, and the subsequent pages do not store any useful data, and are used to store metadata and system data, where metadata is used to describe the same page The data in the system data.

母塊和子塊是在垃圾回收程序(Garbage Collection,GC Process)中產生的緩存塊,其中儲存的資料類似於緩存塊。垃圾回收程序將多個頁面中破碎的用戶資料蒐集起來,並將蒐集的用戶資料寫入新 的頁面(也就是母塊或子塊中的頁面),用於使這些釋放出來的頁面可在抹除後被其他用戶資料使用。母塊指其中的頁面在垃圾回收程序中收集滿用戶資料;而子塊指其中的頁面在垃圾回收程序中沒有收集滿用戶資料。 The parent block and the child block are cache blocks generated in a garbage collection program (Garbage Collection, GC Process), and the data stored therein are similar to the cache blocks. The garbage collection program collects the broken user data in multiple pages, and writes the collected user data into the new Pages (that is, pages in the parent block or child block) are used to make these released pages available for use by other user data after erasing. The parent block means that the pages in it are full of user data in the garbage collection process; and the child block means that the pages in it are not full of user data in the garbage collection process.

新壞塊是在用戶使用時發生錯誤校驗及修正(Error Check and Correction,ECC)錯誤的實體塊,在還沒發生ECC錯誤前可能是系統塊、鏈接塊、緩存塊、母塊或子塊等。當某一塊在讀取操作中發生ECC錯誤時,在其中第0個頁面中元資料的特定位元組儲存此塊為新壞塊的資訊。 A new bad block is a physical block that has an error check and correction (Error Check and Correction, ECC) error when the user uses it. It may be a system block, a link block, a cache block, a parent block or a child block before an ECC error occurs. wait. When an ECC error occurs in a certain block during a read operation, the specific byte of metadata in the 0th page stores the information that this block is a new bad block.

步驟S455:依據類型資訊讀取第i塊中每個頁面的完整或部份資料並依據接收的資料新增內容到備份資料列表檔案和資料檔案。處理單元211可通過周邊儲存介面217發出特定供應商命令給儲存裝置100(詳細的說,控制器120),用於讀取第i塊中所有或者部份頁面的完整或者部份資料,此外,處理單元211更可從儲存裝置100的回覆中接收到第i塊每個頁面的輪廓資訊,如儲存模組130中的實體位置、長度等。讀取每個頁面資料的政策說明如下:對於系統塊,讀取其中所有頁面的元資料和系統資料。 Step S455: Read complete or partial data of each page in the i-th block according to the type information, and add content to the backup data list file and data file according to the received data. The processing unit 211 can send a specific supplier command to the storage device 100 (in detail, the controller 120) through the peripheral storage interface 217 to read the complete or partial data of all or some pages in the i-th block. In addition, The processing unit 211 may further receive the outline information of each page of the i-th block from the reply of the storage device 100 , such as the location and length of the entity in the storage module 130 . The policy for reading data per page is described as follows: For a system block, read metadata and system data for all pages within it.

對於鏈接塊,讀取其中所有頁面的元資料和系統資料。 For link blocks, read metadata and system data for all pages within it.

對於緩存塊,讀取其中除了塊結束頁面以外的所有頁面的元資料。 如果緩存塊中包含塊結束頁面,還讀取塊結束頁面的元資料和系統資料。 For a cache block, read metadata for all pages within it except the end page of the block. If the cache block contains the end page of the block, the metadata and system information of the end page of the block are also read.

對於母塊或子塊,讀取其中第1頁的元資料。 For the parent block or child block, read the metadata of page 1.

對於新壞塊,依據如上所述的讀取政策和此塊的類型資訊,讀取其中所有或者部份頁面的完整或者部份資料。例如,當此新壞塊在發生錯誤前是系統塊或鏈接塊塊時,讀取其中所有頁面的元資料和系統資料。其他類型的新壞塊的讀取政策可依此類推,為使說明書簡明不再贅述。 For the new bad block, according to the above-mentioned read policy and the type information of this block, read the complete or partial data of all or part of the pages therein. For example, when this new bad block was a system block or a link block before the error occurred, read the metadata and system data of all pages in it. The reading policy of other types of new bad blocks can be deduced in the same way, and will not be repeated for the sake of simplicity.

針對這個實體塊的每一個頁面,如果其中包含系統資料,處理單元211可在本地儲存裝置219中的資料檔案的目前結束位置之後加上這個頁面的系統資料。處理單元211還可在本地儲存裝置219中的資料列表檔案新增一行,用於儲存這個頁面的元資料,包含多個欄位,分別儲存這個頁面的實體位置資訊(如邏輯單元號編號、實體塊編號、頁面編號等)、頁面類型(如用戶資料頁面、ISP頁面、映射表頁面等)、這個塊的序列號、頁面內容索引、ECC錯誤旗標、指向資料檔案中儲存這個頁面的系統訊息的區域指標資訊(如偏移量和長度)等。需要注意的是,頁面內容索引的意義會隨著頁面類型的不同而有所不同。例如,對於資料頁面來說,頁面內容索引代表邏輯塊編號(Logical Block Number,LUN),而對於映射表頁面來說,頁面內容索引代表映射表編號。ECC錯誤旗標用來表示這個頁面的系統資料或者用戶資料是否存在錯誤位元但無法修正的訊息。如果這個頁面是用戶資料頁面,則區域指標資訊欄位儲存虛假值(NULL)。 For each page of the physical block, if it contains system data, the processing unit 211 may add the system data of this page after the current end position of the data file in the local storage device 219 . The processing unit 211 can also add a new line to the data list file in the local storage device 219, which is used to store the metadata of this page, including a plurality of fields, respectively storing the entity location information of this page (such as the logical unit number number, entity block number, page number, etc.), page type (such as user data page, ISP page, mapping table page, etc.), the serial number of this block, page content index, ECC error flag, system information pointing to the data file storing this page The region index information (such as offset and length), etc. It should be noted that the meaning of page content index will vary with different page types. For example, for a data page, the page content index represents a logical block number (Logical Block Number, LUN), and for a mapping table page, the page content index represents a mapping table number. The ECC error flag is used to indicate whether there is an error bit in the system data or user data of this page but cannot be corrected. If this page is a user profile page, the area indicator info field stores a false value (NULL).

步驟S457:判斷是否儲存模組130中所有必要實體塊都已經處理過。如果是,則處理單元310執行步驟S460,否則,執行步驟S459。所屬技術領域人員知曉儲存模組130中所有必要實體塊指已經用來儲存資料的實體塊,不包含還沒使用的空塊。 Step S457: Determine whether all necessary physical blocks in the storage module 130 have been processed. If yes, the processing unit 310 executes step S460, otherwise, executes step S459. Those skilled in the art know that all necessary physical blocks in the storage module 130 refer to physical blocks that have been used to store data, excluding empty blocks that have not been used yet.

步驟S459:變數i累加1。 Step S459: The variable i is incremented by 1.

步驟S460:依據產生檔案的內容進行故障分析。處理單元211可通過顯示介面213在顯示器230上顯示有利於工程師解讀的用戶界面(User Interface,UI),包含如上所述設備描述檔案、設備基本參數檔案、備份資料列表檔案和資料檔案的內容,使得工程師可根據專業經驗進行初步分析。或者是,處理單元211執行故障分析應用程式的程式碼,使用各種算法解析本地儲存裝置219中的設備描述檔案、設備基本參數檔案、備份資料列表檔案和資料檔案的內容, 完成初步分析,並通過顯示介面213在顯示器230上顯示分析結果。 Step S460: Perform fault analysis according to the content of the generated file. The processing unit 211 can display on the display 230 a user interface (User Interface, UI) that is helpful for engineers to interpret through the display interface 213, including the contents of the above-mentioned device description file, device basic parameter file, backup data list file and data file, It enables engineers to conduct preliminary analysis based on professional experience. Or, the processing unit 211 executes the program code of the fault analysis application program, and uses various algorithms to analyze the contents of the device description file, device basic parameter file, backup data list file and data file in the local storage device 219, The preliminary analysis is completed, and the analysis result is displayed on the display 230 through the display interface 213 .

在另一種實施例中,儲存IC 110的控制器120和分析主機210做不同的分工,儲存IC 110的控制器120負責決定需要備份的關鍵資料是哪些,而分析主機210負責統合儲存裝置100傳送上來的資料並產生能夠被工程師、電腦執行的故障分析程式碼,或者以上兩者所能解析的檔案。參考圖5顯示的由處理單元310載入和執行的資料備份模組的程式碼而實施的資料備份方法,詳細說明如下: In another embodiment, the controller 120 of the storage IC 110 and the analysis host 210 perform different divisions of labor. The controller 120 of the storage IC 110 is responsible for determining which key data needs to be backed up, and the analysis host 210 is responsible for integrating the data transmitted by the storage device 100. The uploaded data is generated into a fault analysis code that can be executed by an engineer, a computer, or a file that can be parsed by both. The data backup method implemented with reference to the program code of the data backup module loaded and executed by the processing unit 310 shown in FIG. 5 is described in detail as follows:

步驟S510:通過主機介面320從分析主機210接收到啟動資料備份的訊息,此訊息可搭載在標準定義的讀取命令中保留的位元組,或者供應商命令之中。 Step S510 : Receive a message to start data backup from the analysis host 210 through the host interface 320 , and the message can be included in a reserved byte in a standard-defined read command, or in a provider command.

步驟S520:將變數i設為(或稱初始為)0。其他技術細節類似步驟S440,為求簡明不再贅述。 Step S520: Set variable i to 0 (or call it initial). Other technical details are similar to step S440, and will not be repeated for simplicity.

接著,處理單元310反覆執行一個迴圈(loop,包含步驟S530至步驟S570),用於獲取儲存模組130的每個實體塊中能夠用來進行故障分析的關鍵資料,一直到所有實體塊都已經處理完畢為止(步驟S560中「是」的路徑)。 Next, the processing unit 310 repeatedly executes a loop (loop, including step S530 to step S570), which is used to obtain key data that can be used for fault analysis in each physical block of the storage module 130, until all physical blocks are Until the process has been completed ("Yes" path in step S560).

步驟S530:判斷第i個實體塊是否為原始壞塊。如果是,則處理單元310執行步驟S560,否則,執行步驟S540。部份技術細節類似步驟S451,為求簡明不再贅述。與步驟S451不同的是,處理單元310可通過閃存介面340從特定實體塊讀取原始壞塊表,並儲存回覆的原始壞塊表在RAM 360中以供快速查找。 Step S530: Determine whether the i-th physical block is an original bad block. If yes, the processing unit 310 executes step S560, otherwise, executes step S540. Some technical details are similar to step S451, and will not be repeated for simplicity. Different from step S451, the processing unit 310 can read the original bad block table from the specific physical block through the flash memory interface 340, and store the returned original bad block table in the RAM 360 for quick search.

步驟S540:獲取第i個實體塊的類型資訊。處理單元310可通過閃存介面340發出命令給儲存模組130,用於讀取第i個實體塊的第0個頁面,並從第0個頁面中的元資料獲取第i個實體塊的類型資訊。實體塊的類型資訊可參考步驟S453的說明,為求簡明不再贅述。 Step S540: Obtain type information of the i-th physical block. The processing unit 310 can issue a command to the storage module 130 through the flash memory interface 340 to read the 0th page of the i-th physical block, and obtain the type information of the i-th physical block from the metadata in the 0th page . For the type information of the physical block, please refer to the description of step S453, which will not be repeated for simplicity.

步驟S550:依據類型資訊通過閃存介面340發出命令給儲存模組130,用於讀取第i塊中所有或者部份頁面的完整或者部份資料,並通過主 機介面320回覆給分析主機210。除了讀取的資料以外,處理器310還可回覆第i塊每個頁面的輪廓資訊,如儲存模組130中的物理位置、長度等。讀取每個頁面資料的政策可參考步驟S455的相關部份說明,為求簡明不再贅述。 Step S550: Send a command to the storage module 130 through the flash memory interface 340 according to the type information to read the complete or partial data of all or part of the pages in the i-th block, and pass the master The machine interface 320 replies to the analysis host 210 . In addition to the read data, the processor 310 can also return the outline information of each page of the i-th block, such as the physical location and length in the storage module 130 . For the policy of reading the data of each page, please refer to the relevant part of the description of step S455, and will not repeat it for simplicity.

步驟S560:判斷是否儲存模組130中所有必要實體塊都已經處理過。如果是,則處理單元310執行步驟S580,否則,執行步驟S570。所屬技術領域人員知曉儲存模組130中所有必要實體塊指已經用來儲存資料的實體塊,不包含還沒使用的空塊。 Step S560: Determine whether all necessary physical blocks in the storage module 130 have been processed. If yes, the processing unit 310 executes step S580, otherwise, executes step S570. Those skilled in the art know that all necessary physical blocks in the storage module 130 refer to physical blocks that have been used to store data, excluding empty blocks that have not been used yet.

步驟S570:變數i累加1。 Step S570: The variable i is incremented by 1.

步驟S580:回覆分析主機210資料備份完成的訊息。 Step S580: Reply to the analysis host 210 with a message that data backup is complete.

在這裡需要注意的是,由於儲存模組130可以設計成多通道存取架構,因此圖5所述迴圈中的多個回合可執行在分時多工環境(Time-sharing Multitasking Environment)或者多核處理器,並且兩個或以上回合間部份步驟的執行在時間上可以重疊,例如當第i個回合執行步驟S550時第i+1個回合執行步驟S540等等,用於優化讀取效率,本發明不應因此侷限。 It should be noted here that since the storage module 130 can be designed as a multi-channel access architecture, multiple rounds in the loop described in FIG. 5 can be executed in a time-sharing multitasking environment (Time-sharing Multitasking Environment) or multi-core processor, and the execution of some steps between two or more rounds can overlap in time, for example, when the i-th round executes step S550, the i+1th round executes step S540, etc., to optimize the reading efficiency, The invention should not be limited thereby.

此外,參考圖6顯示的由處理單元211載入和執行的故障分析模組的程式碼而實施的故障分析方法,詳細說明如下: In addition, with reference to the fault analysis method implemented by the program code of the fault analysis module loaded and executed by the processing unit 211 shown in FIG. 6, the detailed description is as follows:

步驟S610:技術細節類似步驟S410,為求簡明不再贅述。 Step S610: The technical details are similar to step S410, and will not be repeated for simplicity.

步驟S620:技術細節類似步驟S420,為求簡明不再贅述。 Step S620: The technical details are similar to step S420, and will not be repeated for simplicity.

步驟S630:技術細節類似步驟S430,為求簡明不再贅述。 Step S630: The technical details are similar to step S430, and will not be repeated for simplicity.

步驟S640:發送啟動資料備份的訊息給儲存裝置100。處理單元211可通過周邊儲存介面217發送標準定義的讀取命令,或者供應商命令來通知控制器120開始進行資料備份,使得處理單元310開始執行如圖5所述的資料備份方法。這個步驟對應於圖5的步驟S510。 Step S640 : Send a message to start data backup to the storage device 100 . The processing unit 211 can send a standard-defined read command or a supplier command through the peripheral storage interface 217 to notify the controller 120 to start data backup, so that the processing unit 310 starts to execute the data backup method as shown in FIG. 5 . This step corresponds to step S510 in FIG. 5 .

接著,處理單元211反覆執行一個迴圈(包含步驟S650至步驟S670),用於從儲存裝置110獲取儲存模組130的每個實體塊中能夠 用來進行故障分析的關鍵資料,並且產生備份資料列表檔案和資料檔案的內容,一直到所有實體塊都已經接收並處理完畢為止(步驟S670中「是」的路徑)。 Next, the processing unit 211 repeatedly executes a loop (including step S650 to step S670), which is used to obtain from the storage device 110 the The key data used for fault analysis, and the content of the backup data list file and the data file are generated until all the physical blocks have been received and processed (the path of "Yes" in step S670).

步驟S650:從儲存裝置100接收關於一個實體塊的資料。這個步驟對應於圖5的步驟S550。接收的實體塊可以是如上所述的系統塊、鏈接塊、緩存塊、母塊、子塊等,此外,接收的實體塊可能是新壞塊。 Step S650: Receive information about a physical block from the storage device 100 . This step corresponds to step S550 of FIG. 5 . The received physical block may be the above-mentioned system block, link block, cache block, parent block, child block, etc. In addition, the received physical block may be a new bad block.

步驟S660:依據接收的資料新增內容到備份資料列表檔案和資料檔案。技術細節類似步驟S455的相關部份內容,為求簡明不再贅述。 Step S660: Add content to the backup data list file and data file according to the received data. The technical details are similar to the relevant parts of step S455, and will not be repeated for simplicity.

步驟S670:判斷是否通過周邊儲存介面217從儲存裝置100接收到資料備份完成的訊息。如果是,則處理單元211執行步驟S680,否則,執行步驟S650。這個步驟對應於圖5的步驟S580。 Step S670 : Determine whether a message that data backup is complete is received from the storage device 100 through the peripheral storage interface 217 . If yes, the processing unit 211 executes step S680, otherwise, executes step S650. This step corresponds to step S580 of FIG. 5 .

步驟S680:依據產生檔案的內容進行故障分析。技術細節類似步驟S460的內容,為求簡明不再贅述。 Step S680: Perform fault analysis according to the content of the generated file. The technical details are similar to the content of step S460, and will not be repeated for simplicity.

需要注意的是,雖然如上所述圖5的步驟S510至S580是由控制器120中的處理單元310完成,但由於以上方法由分析主機210的處理單元211在執行步驟S640時觸發,因此,分析主機210的處理單元211可視為間接完成了如上所述的步驟S510至S580的技術手段。 It should be noted that although steps S510 to S580 in FIG. 5 are completed by the processing unit 310 in the controller 120 as described above, since the above method is triggered by the processing unit 211 of the analysis host 210 when executing step S640, the analysis The processing unit 211 of the host 210 can be regarded as a technical means that indirectly completes steps S510 to S580 as described above.

通過圖4至圖6所述實施例的方法的執行,從一個面向總結來說,分析主機210的處理單元221從儲存裝置100獲取和備份的資料中不包含用戶資料。從另一個面向總結來說,分析主機210的處理單元221從儲存裝置100獲取和備份的資料中包含除了原始壞塊以外的每一塊中每一頁的元資料和系統資料。 Through the execution of the method in the embodiments described in FIGS. 4 to 6 , in summary, the data obtained and backed up by the processing unit 221 of the analysis host 210 from the storage device 100 does not include user data. To summarize from another aspect, the data acquired and backed up by the processing unit 221 of the analysis host 210 from the storage device 100 includes metadata and system data of each page in each block except the original bad block.

本發明所述的方法中的全部或部份步驟可以電腦指令實現,例如電腦的作業系統、電腦中特定硬體的驅動程式、或軟體程式等。此外,也可實現於其他類型程式。所屬技術領域人員可將本發明實施例的方法撰寫成電腦指令,為求簡潔不再加以描述。依據本發明實施例 方法實施的電腦指令可儲存於適當的電腦可讀取媒體,例如DVD、CD-ROM、USB碟、硬碟,亦可置於可通過網路(例如,網際網路,或其他適當載具)存取的網路伺服器。 All or part of the steps in the method of the present invention can be realized by computer instructions, such as the operating system of the computer, the driver program of the specific hardware in the computer, or the software program. In addition, it can also be implemented in other types of programs. Those skilled in the art can write the methods of the embodiments of the present invention into computer instructions, and no further description will be made for the sake of brevity. According to the embodiment of the present invention Computer instructions for implementing the method can be stored on a suitable computer-readable medium, such as DVD, CD-ROM, USB drive, hard disk, or placed on a network (for example, the Internet, or other suitable vehicle) Accessed web server.

雖然圖1至圖3中包含了以上描述的元件,但不排除在不違反發明的精神下,使用更多其他的附加元件,已達成更佳的技術效果。此外,雖然圖4至圖6的流程圖採用指定的順序來執行,但是在不違反發明精神的情況下,熟習此技藝人士可以在達到相同效果的前提下,修改這些步驟間的順序,所以,本發明並不侷限於僅使用如上所述的順序。此外,熟習此技藝人士亦可以將若干步驟整合為一個步驟,或者是除了這些步驟外,循序或平行地執行更多步驟,本發明亦不因此而侷限。 Although the elements described above are included in FIGS. 1 to 3 , it is not excluded that more additional elements may be used to achieve better technical effects without violating the spirit of the invention. In addition, although the flow charts in FIG. 4 to FIG. 6 are executed in a specified order, those skilled in the art can modify the order of these steps under the premise of achieving the same effect without violating the spirit of the invention. Therefore, The invention is not limited to using only the sequence described above. In addition, those skilled in the art may also integrate several steps into one step, or perform more steps sequentially or in parallel in addition to these steps, and the present invention is not limited thereby.

雖然本發明使用以上實施例進行說明,但需要注意的是,這些描述並非用以限縮本發明。相反地,此發明涵蓋了熟習此技藝人士顯而易見的修改與相似設置。所以,申請權利要求範圍須以最寬廣的方式解釋來包含所有顯而易見的修改與相似設置。 Although the present invention is described using the above examples, it should be noted that these descriptions are not intended to limit the present invention. On the contrary, the invention covers modifications and similar arrangements obvious to those skilled in the art. Therefore, the claims of the application must be interpreted in the broadest manner to include all obvious modifications and similar arrangements.

S410~S460:方法步驟 S410~S460: method steps

Claims (11)

一種閃存資料故障分析的方法,由一分析主機的一處理單元於載入並執行軟體的程式碼時實施,其中上述處理單元耦接一周邊儲存介面,用於連接其上發生故障的一儲存裝置,上述方法包含:上述分析主機的上述處理單元通過上述周邊儲存介面獲取上述儲存裝置中用來進行故障分析的資料並備份在一本地儲存裝置,而不完整備份上述儲存裝置中的所有資料,從而協助進行初步分析,其中,上述儲存裝置包含一儲存模組,上述儲存模組包含多個實體塊,每個上述實體塊包含多個頁面,用於儲存元資料、系統資料和用戶資料,其中,上述用戶資料包含用戶的私密或機密訊息,其中,獲取和備份的上述資料中包含除了原始壞塊以外的每個上述實體塊中每個上述頁的上述元資料和上述系統資料,但不包含除了原始壞塊以外的每個上述實體塊中每個上述頁的上述用戶資料。 A method for analyzing flash memory data faults, implemented by a processing unit of an analysis host when loading and executing software program codes, wherein the processing unit is coupled to a peripheral storage interface for connecting to a storage device where a fault occurs The above-mentioned method includes: the above-mentioned processing unit of the above-mentioned analysis host obtains the data used for fault analysis in the above-mentioned storage device through the above-mentioned peripheral storage interface and backs it up in a local storage device, instead of completely backing up all the data in the above-mentioned storage device, thereby To assist in preliminary analysis, wherein the storage device includes a storage module, the storage module includes multiple physical blocks, and each of the above physical blocks includes multiple pages for storing metadata, system data and user data, wherein, The above-mentioned user data includes the user's private or confidential information, and the above-mentioned data obtained and backed up include the above-mentioned metadata and the above-mentioned system data of each of the above-mentioned pages in each of the above-mentioned physical blocks except the original bad block, but do not include The above-mentioned user data of each of the above-mentioned pages in each of the above-mentioned physical blocks other than the original bad block. 如請求項1中所述的閃存資料故障分析的方法,包含:從上述儲存裝置中的上述儲存模組讀取一原始壞塊表;以及依據上述原始壞塊表的內容跳過上述儲存模組中的原始壞塊而不處理。 The method for fault analysis of flash memory data as described in claim 1, comprising: reading an original bad block table from the above-mentioned storage module in the above-mentioned storage device; and skipping the above-mentioned storage module according to the content of the above-mentioned original bad block table The original bad blocks in the file are not processed. 如請求項1中所述的閃存資料故障分析的方法,包含:從上述儲存裝置中的上述儲存模組獲取除了原始壞塊以外的每個上述實體塊的類型資訊;以及依據每個上述實體塊的上述類型資訊從上述儲存裝置讀取同一實體 塊中每個上述頁面的完整或部份資料,並儲存上述讀取的資料至上述本地儲存裝置。 The method for fault analysis of flash memory data as described in claim 1, comprising: obtaining the type information of each of the above-mentioned physical blocks except the original bad block from the above-mentioned storage module in the above-mentioned storage device; and according to each of the above-mentioned physical blocks Information of the above type is read from the above storage device by the same entity Complete or partial data of each of the above pages in the block, and store the above read data to the above local storage device. 一種電腦程式產品,用於閃存資料故障分析,包含由一分析主機的一處理單元載入並執行的程式碼,其中上述處理單元耦接一周邊儲存介面,用於連接其上發生故障的一儲存裝置,上述程式碼包含:上述分析主機的上述處理單元通過上述周邊儲存介面獲取上述儲存裝置中用來進行故障分析的資料並備份在一本地儲存裝置,而不完整備份上述儲存裝置中的所有資料,從而協助進行初步分析,其中,上述儲存裝置包含一儲存模組,上述儲存模組包含多個實體塊,每個上述實體塊包含多個頁面,用於儲存元資料、系統資料和用戶資料,其中,上述用戶資料包含用戶的私密或機密訊息,其中,獲取和備份的上述資料中包含除了原始壞塊以外的每個上述實體塊中每個上述頁的上述元資料和上述系統資料,但不包含除了原始壞塊以外的每個上述實體塊中每個上述頁的上述用戶資料。 A computer program product for failure analysis of flash memory data, including program code loaded and executed by a processing unit of an analysis host, wherein the processing unit is coupled to a peripheral storage interface for connecting to a storage device where a failure occurs device, the above-mentioned program code includes: the above-mentioned processing unit of the above-mentioned analysis host obtains the data used for fault analysis in the above-mentioned storage device through the above-mentioned peripheral storage interface and backs it up in a local storage device, instead of completely backing up all the data in the above-mentioned storage device , so as to assist in preliminary analysis, wherein the storage device includes a storage module, the storage module includes a plurality of physical blocks, and each of the physical blocks includes a plurality of pages for storing metadata, system data and user data, Wherein, the above-mentioned user data includes the user's private or confidential information, and the above-mentioned data obtained and backed up include the above-mentioned metadata and the above-mentioned system data of each of the above-mentioned pages in each of the above-mentioned physical blocks except the original bad block, but not Contains the above-mentioned user data of each of the above-mentioned pages in each of the above-mentioned physical blocks except the original bad block. 如請求項4中所述的電腦程式產品,包含由上述分析主機的上述處理單元載入並執行的程式碼:從上述儲存裝置中的上述儲存模組讀取原始壞塊表;以及依據上述原始壞塊表的內容跳過上述儲存模組中的原始壞塊而不處理。 The computer program product as described in claim 4, including the program code loaded and executed by the above-mentioned processing unit of the above-mentioned analysis host: reading the original bad block table from the above-mentioned storage module in the above-mentioned storage device; and according to the above-mentioned original The content of the bad block table skips the original bad blocks in the above-mentioned storage module without processing. 如請求項4中所述的電腦程式產品,包含由上述分析主機的上述處理單元載入並執行的程式碼: 從上述儲存裝置中的上述儲存模組獲取除了原始壞塊以外的每個上述實體塊的類型資訊;以及依據每個上述實體塊的上述類型資訊從上述儲存裝置讀取同一實體塊中每個上述頁面的完整或部份資料,並儲存上述讀取的資料至上述本地儲存裝置。 The computer program product as described in claim 4, including the program code loaded and executed by the above-mentioned processing unit of the above-mentioned analysis host: Obtain the type information of each of the above-mentioned physical blocks except the original bad block from the above-mentioned storage module in the above-mentioned storage device; and read each of the above-mentioned physical blocks in the same physical block from the above-mentioned storage device according to the above-mentioned type information of each of the above-mentioned physical blocks Complete or partial data of the page, and store the above-mentioned read data to the above-mentioned local storage device. 如請求項6所述的電腦程式產品,包含由上述分析主機的上述處理單元載入並執行的程式碼:儲存從上述儲存模組讀取的每個上述實體塊的每個上述頁面的上述系統資料至上述本地儲存裝置中的資料檔案;儲存從上述儲存模組讀取的每個上述實體塊的每個上述頁面的上述元資料至上述本地儲存裝置中的備份資料列表檔案,其中上述元資料是描述同一頁中的上述系統資料或上述用戶資料的資料;以及當從上述儲存模組讀取的任一個上述實體塊的任一個上述頁面含有上述系統資料時,儲存上述資料檔案中記錄此系統資料的位置訊息至上述備份資料列表檔案中相應行的特定欄位。 The computer program product as described in claim 6, including the program code loaded and executed by the above-mentioned processing unit of the above-mentioned analysis host: the above-mentioned system for storing each of the above-mentioned pages of each of the above-mentioned physical blocks read from the above-mentioned storage module Data to the data file in the above-mentioned local storage device; store the above-mentioned metadata of each of the above-mentioned pages of each of the above-mentioned entity blocks read from the above-mentioned storage module to the backup data list file in the above-mentioned local storage device, wherein the above-mentioned metadata It is the data describing the above-mentioned system data or the above-mentioned user data in the same page; and when any of the above-mentioned pages of any of the above-mentioned entity blocks read from the above-mentioned storage module contains the above-mentioned system data, the system is recorded in the above-mentioned data file The location information of the data to the specific field of the corresponding line in the above backup data list file. 如請求項7所述的電腦程式產品,其中,上述資料檔案為二進制檔案,上述備份資料列表檔案為文字檔案。 The computer program product as described in claim item 7, wherein the above-mentioned data file is a binary file, and the above-mentioned backup data list file is a text file. 如請求項6所述的電腦程式產品,其中,讀取和備份上述完整或部份資料的政策包含:對於系統塊,讀取和備份上述系統塊中所有頁面的元資料和系統資料;對於鏈接塊,讀取和備份上述鏈接塊中所有頁面的元資料和系統資料; 對於緩存塊,讀取和備份上述緩存塊中除了塊結束頁面以外的所有頁面的元資料,如果上述緩存塊中包含塊結束頁面,還讀取和備份上述塊結束頁面的元資料和系統資料;以及對於新壞塊,依據上述新壞塊的類型資訊和上述讀取和備份政策,讀取上述新壞塊中所有或者部份頁面的完整或者部份資料。 The computer program product as described in claim 6, wherein the policy of reading and backing up all or part of the above-mentioned data includes: for the system block, reading and backing up the metadata and system data of all pages in the above-mentioned system block; for the link block, read and backup metadata and system data of all pages in the above link block; For the cache block, read and back up the metadata of all pages in the above cache block except the block end page, if the above cache block contains the block end page, also read and backup the metadata and system data of the above block end page; And for the new bad block, according to the above-mentioned type information of the new bad block and the above-mentioned read and backup policy, read the complete or partial data of all or part of the pages in the above-mentioned new bad block. 如請求項6所述的電腦程式產品,其中,讀取和備份上述完整或部份資料的政策包含:對於母塊或子塊,讀取和備份上述母塊或子塊中第1頁的元資料。 The computer program product as described in claim 6, wherein the policy of reading and backing up the above-mentioned complete or partial data includes: for the parent block or the child block, reading and backing up the element of the first page of the above-mentioned parent block or child block material. 如請求項4中所述的電腦程式產品,包含由上述分析主機的上述處理單元載入並執行的程式碼:從上述儲存裝置獲取上述儲存裝置的設備資訊;以及依據上述儲存裝置的上述設備資訊產生設備描述檔案和設備基本參數檔案,並儲存上述設備描述檔案和上述設備基本參數檔案至上述本地儲存裝置,其中上述設備基本參數檔案記錄上述儲存裝置中的儲存積體電路的廠商標識。 The computer program product as described in claim 4, including the program code loaded and executed by the processing unit of the analysis host: obtaining the device information of the storage device from the storage device; and the device information based on the storage device Generate a device description file and a basic device parameter file, and store the device description file and the basic device parameter file in the local storage device, wherein the basic device parameter file records the manufacturer identification of the storage integrated circuit in the storage device.
TW109106865A 2020-03-03 2020-03-03 Method and computer program product for analyzing failures on flash data TWI798536B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW109106865A TWI798536B (en) 2020-03-03 2020-03-03 Method and computer program product for analyzing failures on flash data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW109106865A TWI798536B (en) 2020-03-03 2020-03-03 Method and computer program product for analyzing failures on flash data

Publications (2)

Publication Number Publication Date
TW202134876A TW202134876A (en) 2021-09-16
TWI798536B true TWI798536B (en) 2023-04-11

Family

ID=78777326

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109106865A TWI798536B (en) 2020-03-03 2020-03-03 Method and computer program product for analyzing failures on flash data

Country Status (1)

Country Link
TW (1) TWI798536B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200912758A (en) * 2007-09-05 2009-03-16 Chipsbank Technologies Shenzheng Co Ltd Method of manufacturing memory card and apparatus thereof
CN102473126A (en) * 2009-08-11 2012-05-23 桑迪士克科技股份有限公司 Controller and method for providing read status and spare block management information in flash memory system
CN105446655A (en) * 2015-04-23 2016-03-30 北京天诚盛业科技有限公司 Method and device for operating Nand Flash
CN109783017A (en) * 2015-01-27 2019-05-21 华为技术有限公司 It is a kind of to store the processing method of equipment bad block, device and storage equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200912758A (en) * 2007-09-05 2009-03-16 Chipsbank Technologies Shenzheng Co Ltd Method of manufacturing memory card and apparatus thereof
CN102473126A (en) * 2009-08-11 2012-05-23 桑迪士克科技股份有限公司 Controller and method for providing read status and spare block management information in flash memory system
CN109783017A (en) * 2015-01-27 2019-05-21 华为技术有限公司 It is a kind of to store the processing method of equipment bad block, device and storage equipment
CN105446655A (en) * 2015-04-23 2016-03-30 北京天诚盛业科技有限公司 Method and device for operating Nand Flash

Also Published As

Publication number Publication date
TW202134876A (en) 2021-09-16

Similar Documents

Publication Publication Date Title
US11086774B2 (en) Address translation for storage device
US9798499B2 (en) Hybrid-device storage based on environmental state
US9164887B2 (en) Power-failure recovery device and method for flash memory
US9164840B2 (en) Managing a solid state drive (‘SSD’) in a redundant array of inexpensive drives (‘RAID’)
US20140325148A1 (en) Data storage devices which supply host with data processing latency information, and related data processing methods
US9262283B2 (en) Method for reading kernel log upon kernel panic in operating system
US20130031298A1 (en) Including performance-related hints in requests to composite memory
CN112115068B (en) Multi-namespace data access method and computer readable storage medium
US20230236761A1 (en) Read-disturb-based logical storage read temperature information identification system
US11481153B2 (en) Data storage device and operating method thereof
US11922067B2 (en) Read-disturb-based logical storage read temperature information maintenance system
US11681638B2 (en) Method of synchronizing time between host device and storage device and system performing the same
KR20200114086A (en) Controller, memory system and operating method thereof
CN113448489B (en) Computer readable storage medium, method and apparatus for controlling access to flash memory card
TWI798536B (en) Method and computer program product for analyzing failures on flash data
US11256418B2 (en) Logical address history management in memory device
CN113342557B (en) Flash memory data fault detection method and computer readable storage medium
US11922020B2 (en) Read-disturb-based read temperature information persistence system
TWI749490B (en) Computer program product and method and apparatus for programming flash administration tables
US11593209B2 (en) Targeted repair of hardware components in a computing device
US20220027265A1 (en) Method and system for facilitating fast crash recovery in a storage device
US11907063B2 (en) Read-disturb-based physical storage read temperature information identification system
US11836073B2 (en) Storage device operating data counter system
CN113535616B (en) Computer readable storage medium, method and device for controlling access of flash memory device
US20240160511A1 (en) Failure prediction apparatus and method for storage devices