CN117546136A - Reliability of four channel memory module - Google Patents

Reliability of four channel memory module Download PDF

Info

Publication number
CN117546136A
CN117546136A CN202280044680.XA CN202280044680A CN117546136A CN 117546136 A CN117546136 A CN 117546136A CN 202280044680 A CN202280044680 A CN 202280044680A CN 117546136 A CN117546136 A CN 117546136A
Authority
CN
China
Prior art keywords
memory
channel
codeword
controller
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280044680.XA
Other languages
Chinese (zh)
Inventor
S·C·伍
D·李
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rambus Inc
Original Assignee
Rambus Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rambus Inc filed Critical Rambus Inc
Priority claimed from PCT/US2022/034338 external-priority patent/WO2022271695A1/en
Publication of CN117546136A publication Critical patent/CN117546136A/en
Pending legal-status Critical Current

Links

Landscapes

  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

A four-way memory module includes a dual channel memory device and four independent twenty (20) data bit memory channels. The channels of the dual channel memory are accessed independently. Thus, four channels for accessing a memory module each access one channel of the first set of dual channel memory devices and the second set of dual channel memory devices on the module. Error detection and correction codeword configuration and scheme may implement chipkill, single symbol data correction/double symbol data detection (SSDC/DSDD). Single symbol data correction using fewer memory devices may also be implemented. Error detection and correction codeword configurations and schemes may be switched in response to detecting a device failure, a signal line failure, or a memory channel failure.

Description

Reliability of four channel memory module
Drawings
Fig. 1 illustrates a buffer memory module.
Fig. 2 to 6 illustrate codeword configurations.
Fig. 7 is a flowchart illustrating a method of operation of the memory module.
FIG. 8 is a flow chart illustrating a method of operating a memory module with multiple error correction and detection schemes.
FIG. 9 is a flow chart illustrating a method of reconfiguring operation of a memory module after a device fails.
FIG. 10 is a flow chart illustrating a method of reconfiguring operation of a memory module after a data signal failure.
FIG. 11 is a flowchart illustrating a method of reconfiguring operation of a memory module after a memory channel fails.
Fig. 12 is a block diagram of a processing system.
Detailed Description
The four-way memory module includes a dual channel memory device and four independent twenty (20) data bit memory channels. The channels of the dual channel memory are accessed independently. Thus, four channels for accessing a memory module each access one channel of the first set of dual channel memory devices and the second set of dual channel memory devices on the module. Error detection and correction codeword configuration and scheme may implement chipkill, single symbol data correction/double symbol data detection (SSDC/DSDD). Single symbol data correction by fewer memory devices may also be implemented. Error detection and correction codeword configurations and schemes may be switched in response to detecting a device failure, a signal line failure, or a memory channel failure.
Fig. 1 illustrates a buffer memory module. In fig. 1, memory system 100 includes a module 150 and a controller 120. Controller 120 includes memory channel interfaces 121a-121d, a common signal interface 121e, error Detection and Correction (EDC) circuitry 125, and persistent error detection circuitry 126. Memory channel interfaces 121a-121D are operatively coupled to channel A-D interfaces 145a-145D, respectively, of module 150. Common signal interface 121e is operably coupled to a Registration Clock Driver (RCD) 135 of module 150.
In FIG. 1, module 150 includes left side dual channel DRAM devices 110a-110f (representing ten DRAM devices L0-L9), right side dual channel DRAM devices 110g-110L (representing ten sub-DRAM devices R0-R9), left side dual channel buffer devices 130a-130C (representing five buffer devices BL0-BL 4), right side dual channel buffer devices 130D-130f (representing five buffer devices BR0-BR 4), register Clock Driver (RCD) 135, channel A interface 145a, channel B interface 145B, channel C interface 145C, and channel D interface 145D. RCD 135 receives certain signals (e.g., clock, chip select) common to channel a-D interfaces 145 a-145D. The dual channel DRAM devices 110a-110l may also be referred to as dual x2DRAM devices.
Each dual channel DRAM device 110a-110l includes two non-overlapping sets of memory arrays that are accessed via two channel interfaces 111aa-111lb, respectively, that operate independently of each other. In other words, each DRAM device 110a-110l operates the command, address, and data transfer functions of its respective channel interface 111aa-111lb independently of the other channel interfaces 111aa-111lb on the same DRAM device. Thus, for example, channel A interface 111aa of DRAM L0 110a accesses a first set of memory arrays in DRAM L0 110a, and channel B interface 111ab of DRAM L0 110a accesses a second set of memory arrays in DRAM L0 110a, where the first set of memory arrays and the second set of memory arrays do not have any common memory arrays (i.e., are non-overlapping sets)
At least the CA signal of channel a interface 145a is operably coupled to RCD 135.RCD 135 is operable to couple the CA signal of channel A interface 145a to channel A interfaces 111aa-111fa of left DRAM devices 110a-110 f. Likewise, at least the CA signal of channel B interface 145B is operatively coupled to RCD 135.RCD 135 is operable to couple the CA signal of channel B interface 145B to channel B interfaces 111ab-111fb of left-side DRAM devices 110a-110 f.
At least the CA signal of channel C interface 145C is operably coupled to RCD 135.RCD 135 is operable to couple the CA signal of channel C interface 145C to channel C interfaces 111ga-111la of right-side DRAM devices 110g-110 l. Likewise, at least the CA signal of channel D interface 145D is operatively coupled to RCD 135.RCD 135 is operable to couple the CA signal of channel D interface 145D to channel D interfaces 111gb-111lb of right-side DRAM devices 110g-110 l.
Channel a interface 111aa of DRAM device 110a is operably coupled to communicate N data bits through device-side channel a interface 132aa of data buffer device 130 a. In one embodiment, n=2. Channel B interface 111ab of DRAM device 110a is operably coupled to communicate N data bits through device-side channel B interface 132ab of data buffer device 130 a. Channel a interface 111ba of DRAM device 110b is operably coupled to communicate N data bits through device-side channel a interface 132aa of data buffer device 130 a; channel B interface 111bb of DRAM device 110B is operably coupled to communicate N data bits through device-side channel B interface 132ab of data buffer device 130 a; channel a interface 111ca of DRAM device 110c is operably coupled to communicate N data bits through device-side channel a interface 132ba of data buffer device 130 b; channel B interface 111cb of DRAM device 110c is operably coupled to communicate N data bits through device-side channel B interface 132bb of data buffer device 130B, and so on, wherein all DRAM devices 110a-110l and data buffer devices 130a-130f on module 150 have similar connection patterns (not repeated herein for brevity).
The controller-side channel a interface 131aa is operatively coupled to the channel a interface 145a. The controller-side channel a interface 131aa communicates 2*N bits through the channel a interface 145a. 2*N bits include N bits conveyed by DRAM device 110a and N bits conveyed by DRAM device 110b, for a total of 2*N bits. Likewise, controller-side channel B interface 131ab is operably coupled to channel B interface 145B. Likewise, the controller-side channel A interfaces 131ba-131ca of the data buffer devices 130b-130c are operatively coupled to the channel A interface 145a; the controller-side channel B interfaces 131bb-131cb of the data buffer devices 130B-130c are operatively coupled to the channel B interface 145B; the controller side channel C interfaces 131da-131fa of the data buffer devices 130d-130f are operatively coupled to the channel C interface 145C; and the controller-side channel D interfaces 131db-131fb of the data buffer devices 130D-130f are operatively coupled to the channel D interface 145D. Thus, each memory lane a-D145 a-145D thus communicates with five (5) data buffer devices (left data buffer 130a-130c or right data buffer 130D-130 f), each data buffer device communicates using 2*N data signals, such that when n=2, each memory lane a-D145 a-145D generates twenty (20) Data (DQ) signals.
Fig. 2 illustrates a first codeword configuration for reliability. In FIG. 2, burst 202 from a memory module (e.g., memory module 150) includes a burst labeled t 0 To t 31 Thirty-two (32) slots. The lanes (e.g., lane a) of each DRAM device (e.g., DRAM devices L0-L9110a-110 f) communicate two (2) bits (i.e., n=2) per burst 202 time slot via data buffer devices (e.g., data buffer devices BL0-BL4 130a-130 c). Each codeword 204 of burst 202 is made up of eight (8) data symbols S0-S7 and two check symbols C0-C1. Each symbol S0-S7, C0-C1 of codeword 204 is comprised of four (4) bits communicated over two burst 202 slots using a single DRAM device L0-L9. See, for example, symbol S6 206, which is detailed in fig. 2. Symbol S6 is represented by at time slot t 0 DQ [0 ] conveyed by DRAM L6]And DQ [1 ]]At time slot t 1 DQ [0 ] conveyed by DRAM L6]And DQ [1 ]]Is composed so as to form a transmission in two time slots (t 0 And t 1 ) Four-bit symbols conveyed above. In one embodiment, the two time slots are consecutive, as shown in FIG. 2. In other embodiments, the two time slots may be discontinuous.
It should be appreciated that each codeword 204 is comprised of forty (40) bits, with the 40 bits organized into a total of 10 4-bit symbols. A total of ten symbols consists of eight data symbols and two check symbols. Thus, codeword 204 may be generated, checked and corrected using a reed-solomon (RS) error detection and correction scheme (e.g., by EDC circuitry 125 of controller 120) for RSs (10, 8). Using the results from EDC circuitry 125, persistent error circuitry 126 may determine whether the error in codeword 204 is persistent. Furthermore, because each symbol S0-S7, C0-C1 is communicated to/from a single DRAM L0-L9, the RS (10, 8) scheme provides chipkill capability in which failure of the entire DRAM device L0-L9 is a correctable error.
When chipkill capability is used across two channels (e.g., 111aa-111fa and 111ab-111 fb) in communication with a dual-channel set of DRAMs (e.g., DRAM devices L0-L9110a-110 f), the presence of a failure of the same DRAM (e.g., DRAM device L3 110 d) across the two channels (e.g., 111da and 111 db) of the DRAM indicates that the DRAM has failed. Thus, a symbol error on one of the two channels (e.g., channel a 145 a) may indicate a need to "kill" the DRAM/malfunctioning DRAM on the other channel (e.g., channel B145B). In one embodiment, the symbol error on one of the two channels (e.g., channel a 145 a) is used to: an error checking process (e.g., an erase operation) is initiated on another channel (e.g., channel B145B) before an error condition (e.g., a chip failure) is detected on the other channel. In one embodiment, a symbol error on only one of the two channels (e.g., channel a 145 a) may indicate that the channel/failed channel in failure (e.g., channel a 145 a) needs to be "killed" without altering the operation of the other channel (e.g., channel B145B). Thus, a different error correction and detection scheme may be used to operate a channel that is not in failure (e.g., channel B145B) than the error correction and detection scheme used by the channel/failed channel in failure (e.g., channel a 145 a).
Fig. 3 illustrates a second codeword configuration for reliability. In fig. 3, burst 302 from a memory module (e.g., memory module 150) includes a burst labeled t 0 To t 31 Thirty-two (32) slots. The lanes (e.g., lane a) of each DRAM device (e.g., DRAM devices L0-L9110a-110 f) communicate two (2) bits (i.e., n=2) per burst 302 time slot via data buffer devices (e.g., data buffer devices BL0-BL4 130a-130 c). Each codeword 304 of burst 302 is composed of sixteen (16) data symbols S0-S15, three (3) check symbols C0-C2, andan additional symbol, which may be a check symbol C3 or used to carry Additional Data (ADL). For simplicity, this additional symbol will be referred to hereinafter as check symbol C3. Each symbol S0-S15, C0-C3 of codeword 304 is comprised of eight (8) bits conveyed over eight (8) burst 302 slots by a single DRAM device L0-L9. See, for example, symbol S9 306, detailed in fig. 3. Symbol S9 306 is transmitted in time slot t 0 DQ [1 ] conveyed by DRAM L4]At time slot t 1 DQ [1 ] conveyed by DRAM L4]At time slot t 2 DQ [1 ] conveyed by DRAM L4]And so on until at time slot t 7 DQ [1 ] conveyed by DRAM L4 ]Is composed so as to form a frame in eight time slots (t 0 To t 7 ) Eight-bit symbols conveyed above. In one embodiment, eight time slots are consecutive, as shown in FIG. 3. In other embodiments, eight time slots may be discontinuous.
It should be appreciated that each codeword 304 is comprised of 160 bits, which 160 bits are organized into a total of 20 8-bit symbols. A total of twenty symbols consists of sixteen data symbols and three or four check symbols. Thus, the codeword 304 may be generated, checked and corrected using an RS (20, 16) or an RS (20, 17) error detection and correction scheme (e.g., by EDC circuitry 125 of the controller 120). Using the results from EDC circuitry 125, persistent error circuitry 126 may determine whether the error in codeword 304 is persistent. The RS (20, 16) and RS (20, 27) schemes provide single symbol data correction and dual symbol data detection (SSDC/DSDD) capabilities.
Fig. 4 illustrates a third codeword configuration for reliability. In FIG. 4, burst 402 from a memory module includes a bit labeled t 0 To t 31 Thirty-two (32) slots. In FIG. 4, the lane includes only 18 DQ signals and therefore only nine (9) DRAM devices L0-L8 need communicate. The channel (e.g., channel a) of each DRAM device (e.g., DRAM devices L0-L8) conveys two (2) bits (i.e., n=2) per burst 402 slot via a data buffer device (e.g., data buffers BL0-BL 4). Each codeword 404 of burst 402 consists of sixteen (16) data symbols S0-S15 and two (2) check symbols C0-C1. Each of the codewords 404 The symbols S0-S15, C0-C1 consist of eight (8) bits conveyed over eight (8) burst 402 slots by a single DRAM device L0-L8. See, for example, symbol S9 406 listed in detail in fig. 4. Symbol S9 406 is represented by the time slot t 0 DQ [1 ] conveyed by DRAM L4]At time slot t 1 DQ [1 ] conveyed by DRAM L4]At time slot t 2 DQ [1 ] conveyed by DRAM L4]Etc. and so on until at time slot t 7 DQ [1 ] conveyed by DRAM L4]Is composed so as to form a frame in eight time slots (t 0 To t 7 ) Eight-bit symbols conveyed above. In one embodiment, eight time slots are consecutive, as shown in fig. 4. In other embodiments, eight time slots may be discontinuous.
It should be appreciated that each codeword 404 is comprised of 144 bits, which are organized into a total of eighteen (18) 8-bit symbols. A total of eighteen symbols consists of sixteen data symbols and two check symbols. Thus, the codeword 404 may be generated, checked and corrected using an RS (18, 16) error detection and correction scheme (e.g., by EDC circuitry 125 of the controller 120). Using the results from EDC circuitry 125, persistent error detection circuitry 126 may determine whether the error in codeword 404 is persistent. The RS (18, 16) scheme provides Single Symbol Data Correction (SSDC) capability.
Fig. 5 illustrates a fourth codeword configuration without redundant information for reliability. In FIG. 5, burst 502 from a memory module includes a bit labeled t 0 To t 31 Thirty-two (32) slots. In FIG. 5, the lane includes only sixteen (16) DQ signals and therefore only eight (8) DRAM devices L0-L7 need to communicate. The channel (e.g., channel a) of each DRAM device (e.g., DRAM devices L0-L7) communicates two (2) bits (i.e., n=2) per burst 502 slot via four data buffer devices (e.g., data buffers BL0-BL 3). Each codeword 504 of burst 502 is made up of sixteen (16) data symbols S0-S15. Each symbol S0-S15 of codeword 504 is comprised of eight (8) bits conveyed over eight (8) burst 502 slots by a single DRAM device L0-L8. See, for example, symbol S9 506 listed in detail in fig. 5. Symbol S9 506 is transmitted in time slot t 0 DQ [1 ] conveyed by DRAM L4]At time slot t 1 DQ [1 ] conveyed by DRAM L4]At time slot t 2 DQ [1 ] conveyed by DRAM L4]Etc. and so on until at time slot t 7 DQ [1 ] conveyed by DRAM L4]Is composed so as to form a frame in eight time slots (t 0 To t 7 ) Eight-bit symbols conveyed above. In one embodiment, eight time slots are consecutive, as shown in fig. 5. In other embodiments, eight time slots may be discontinuous.
It should be appreciated that each codeword 504 is comprised of 128 bits organized into a total of sixteen (16) 8-bit symbols that do not include any check symbols. Thus, error detection and correction schemes cannot be used to generate, checksum correct codeword 504.
Fig. 6 illustrates a fifth codeword configuration for reliability. In fig. 6, a burst 602 from a memory module (e.g., memory module 150) includes a bit labeled t 0 To t 31 Thirty-two (32) slots. The lanes (e.g., lane a) of each DRAM device (e.g., DRAM devices L0-L9110a-110 f) communicate two (2) bits (i.e., n=2) per burst 602 slot via data buffer devices (e.g., data buffer devices BL0-BL4 130a-130 c). Each bit communicated per slot by a given DRAM device L0-L9110a-110f is assigned to a different symbol of codeword 304. Each of the different symbols communicated by a given DRAM device L0-L9 is assigned to a different code group.
Each codeword 604 of burst 602 is made up of twenty (20) symbols, the 20 symbols being divided into two ten symbol groups S0 0 -S9 0 And S0 1 -S9 1 . Each symbol S0 of codeword 604 0 -S9 0 、S0 1 -S9 1 Consists of four (4) bits conveyed over four (4) burst 602 slots by a single DQ signal of a single DRAM device L0-L9. See, for example, symbol S4 listed in detail in FIG. 6 0 606 and S4 1 . Symbol S4 0 606 by at time slot t 0 DQ [0 ] conveyed by DRAM L4]At time slot t 1 DQ [0 ] conveyed by DRAM L4]At time slot t 2 DQ [0 ] conveyed by DRAM L4]And at time slot t 3 DQ [0 ] conveyed by DRAM L4]Composition, thus shapeIs formed in four time slots (t 0 To t 3 ) Four-bit symbols conveyed above. Also, symbol S4 1 608 is defined by time slot t 0 DQ [1 ] conveyed by DRAM L4]At time slot t 1 DQ [1 ] conveyed by DRAM L4]At time slot t 2 DQ [1 ] conveyed by DRAM L4]And at time slot t 3 DQ [1 ] conveyed by DRAM L4]Is composed so as to form a transmission in four time slots (t 0 To t 3 ) The four-bit symbol conveyed above. In one embodiment, the four time slots are consecutive, as shown in fig. 6. In other embodiments, the four time slots may be discontinuous.
It should be appreciated that each codeword 604 consists of 80 bits, which are organized into a total of 20 4-bit symbols. A total of twenty symbols are represented by ten data symbols (symbol S0) 0 -S9 0 ) And 10 data symbols (symbol S0) assigned to the second code group 1 -S9 1 ) Composition is prepared. Thus, each encoded set S0 of check and correction codewords 604 may be generated, checked, and corrected using a separate RS (10, 8) error detection and correction scheme (e.g., by EDC circuitry 125 of controller 120) 0 -S9 0 And S0 1 -S9 1 . Using the results from EDC circuitry 125, persistent error detection circuitry 126 may determine whether the error in codeword 604 is persistent. The RS (10, 8) scheme provides single symbol data correction capability. Thus, the dual RS (10, 8) group scheme of codeword 604 provides one DQ or quarter device correction capability, since each of the two bits conveyed by DRAM devices L0-L9 are assigned to a different symbol, and the two different symbols are assigned to different code groups.
Fig. 7 is a flowchart illustrating a method of operation of the memory module. One or more of the steps illustrated in fig. 7 may be performed by, for example, memory system 100 and/or components thereof. A first codeword is generated (702) having a first data symbol field and a first check symbol field. For example, EDC circuitry 125 of controller 120 may generate codeword 204, where codeword 204 has data symbol fields S0-S7 and check symbol fields C0-C1.
The first codeword is communicated (704) through a first independent channel of a plurality of dual independent channel Dynamic Random Access Memory (DRAM) devices disposed on the module. For example, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codeword 204 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L9110a-110 f.
A second codeword is generated (706) having a second data symbol field and a second check symbol field. For example, EDC circuitry 125 of controller 120 may generate codeword 304, where codeword 304 has data symbol fields S0-S15 and check symbol fields C0-C3. The second codeword is communicated through a second independent channel (708) of the plurality of dual independent channel DRAM devices disposed on the module. For example, via memory channel B interface 121B, memory channel B145B of module 150, and data buffer devices 130a-130c, controller 120 may communicate codeword 304 through memory channel B interfaces 111ab-111fb of DRAM devices L0-L9 110a-110 f.
FIG. 8 is a flow chart illustrating a method of operating a memory module with multiple error correction and detection schemes. One or more of the steps illustrated in fig. 8 may be performed by, for example, memory system 100 and/or components thereof. The codeword (802) is communicated over a first channel of the module and a first channel of a plurality of dual channel DRAMs on the module using a first error detection and correction scheme. For example, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codeword 204 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L9 110a-110f, where codeword 204 is encoded using an RS (10, 8) error detection and correction scheme.
The codeword is communicated over a second channel of the module and a second channel of the plurality of dual channel DRAMs on the module using a second error detection and correction scheme (804). For example, via memory channel B interface 121B, memory channel B145B of module 150, and data buffer devices 130a-130c, controller 120 may communicate codeword 304 through memory channel B interfaces 111ab-111fb of DRAM devices L0-L9 110a-110f, where codeword 304 is encoded using an RS (20, 17) error detection and correction scheme.
FIG. 9 is a flowchart illustrating a method of reconfiguring operation of a memory module after a device failure is identified. One or more of the steps illustrated in fig. 9 may be performed by, for example, memory system 100 and/or components thereof. Using a first error detection and correction scheme, a codeword (902) extending across a first number of slots and having a second number of bits per symbol is communicated over a first channel of a module and a first channel of a plurality of dual channel DRAMs on the module. For example, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codewords 204 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L9 110a-110f, these codewords 204 having symbol sizes of four bits communicated over two time slots and being encoded using an RS (10, 8) error detection and correction scheme.
A failure of a first channel DRAM of the dual channel DRAMs is detected using a first error detection and correction scheme (904). For example, the EDC circuitry 125 of the controller 120 may use an RS (10, 8) EDC scheme to detect a failure of the DRAM device L3 110d. Using the results from EDC circuitry 125, persistent error detection circuitry 126 may determine that DRAM device L3 110d has a persistent failure. An indicator associated with a failure of a first channel DRAM of the dual channel DRAMs is set (906). For example, in response to detecting a failure of DRAM device L3 110d, controller 120 may set an internal bit or register with an indicator that DRAM device L3 110d has failed. The controller 120 may also transmit an indicator to the host and/or host operating system that the DRAM device L3 110d has failed.
The first channel is reset (908). For example, in response to detecting a failure of the DRAM device L3 110d, the controller 120 may cease using the DRAM device L3 110d. Using a second error detection and correction scheme, a codeword (910) extending across a third number of slots and having a fourth number of bits per symbol is conveyed through a first channel of a module. For example, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codewords 404 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L2, L4-L9 110a-110f, these codewords 404 having symbol sizes of eight bits communicated over eight time slots and being encoded using an RS (18, 16) error detection and correction scheme.
FIG. 10 is a flow chart illustrating a method of reconfiguring operation of a memory module after a data signal failure. One or more of the steps illustrated in fig. 10 may be performed by, for example, memory system 100 and/or components thereof. Using a first error detection and correction scheme, a plurality of codewords (1002) extending across a first number of slots and having a second number of bits per symbol are simultaneously communicated over a first channel of a module and a first channel of a plurality of dual channel DRAMs on the module. For example, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codewords 604 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L9 110a-110f, these codewords 604 being divided into two encoded groups, having symbol sizes of four bits communicated over four time slots, and encoded using two independent RS (10, 8) error detection and correction schemes.
Using a first error detection and correction scheme, a failure of a first data signal of one of the dual channel DRAMs is detected (1004). For example, the EDC circuitry 125 of the controller 120 may detect a failure of a Data (DQ) signal of the DRAM device L3110d using an RS (10, 8) EDC scheme. Using the results from EDC circuitry 125, persistent error detection circuitry 126 may determine that the Data (DQ) signal of DRAM device L3110d has a persistent fault. An indicator associated with a failure of a first data signal of one of the dual channel DRAMs is set (1006). For example, in response to detecting a failure of the DQ signal of DRAM device L3110d, controller 120 may set an internal bit or register with an indicator that the QB signal of DRAM device L3110d has failed. The controller 120 may also transmit an indicator to the host and/or host operating system that the DQ signal of the DRAM device L3110d has failed.
The first channel is reset (1008). For example, in response to detecting a failure of the DQ signal of DRAM device L3 110d, controller 120 may cease using DRAM device L3 110d. Using a second error detection and correction scheme, a codeword (1010) extending across a third number of slots and having a fourth number of bits per symbol is conveyed through a first channel of a module. For example, in response to detecting a failure of the DQ signals of DRAM device L3 110d, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codewords 404 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L2, L4-L9110a-110f, these codewords 404 having symbol sizes of eight bits communicated over eight time slots and being encoded using an RS (18, 16) error detection and correction scheme.
FIG. 11 is a flowchart illustrating a method of reconfiguring operation of a memory module after a memory channel fails. One or more of the steps illustrated in fig. 11 may be performed by, for example, memory system 100 and/or components thereof. In a first mode, using a first error detection and correction scheme, a codeword (1102) extending across a first number of slots and having a second number of bits per symbol is communicated over a first channel of a module and a first channel of a plurality of dual channel DRAMs on the module. For example, via memory channel A interface 121a, memory channel A145 a of module 150, and data buffer devices 130a-130c, controller 120 may communicate codewords 204 through memory channel A interfaces 111aa-111fa of DRAM devices L0-L9 110a-110f, these codewords 204 having symbol sizes of four bits communicated over two time slots and being encoded using an RS (10, 8) error detection and correction scheme.
In a first mode, using a first error detection and correction scheme, a codeword (1104) extending across a first number of slots and having a second number of bits per symbol is communicated through a second channel of a module and a second channel of a plurality of dual channel DRAMs on the module. For example, via memory channel B interface 121B, memory channel B145B of module 150, and data buffer devices 130a-130c, controller 120 may communicate codewords 204 through memory channel B interfaces 111ab-111fb of DRAM devices L0-L9110a-110f, these codewords 204 having symbol sizes of four bits communicated over two time slots and being encoded using an RS (10, 8) error detection and correction scheme.
A failure of the second channel is detected using a first error detection and correction scheme (1106). For example, the EDC circuitry 125 of the controller 120 may use the RS (10, 8) EDC scheme to detect a failure of circuitry associated with the B-channel (e.g., the memory channel B interface 111db, an array accessed using the memory channel B interface 111db, etc.) of the DRAM device L3 110 d. Using the results from EDC circuitry 125, persistent error detection circuitry 126 may determine that the circuitry associated with the B-channel of DRAM device L3 110d has a persistent fault. An indicator associated with the failure of the first device is set (1108). For example, in response to detecting a failure of circuitry associated with the B-channel of DRAM device L3 110d, controller 120 may set an internal bit or register with an indicator that circuitry associated with the B-channel of DRAM device L3 110d has failed. The controller 120 may also transmit an indicator to the host and/or host operating system that circuitry associated with the B-channel of the DRAM device L3 110d has failed.
The first channel and the second channel are combined and a second mode is entered (1110). For example, the controller 120 may enter the following mode: in this mode, the data symbols and check symbols for the codeword are spread across both the memory channel a interface 121a and the memory channel B interface 121B. In a second mode, the codeword is communicated over the combined first and second channels (1112). For example, using an error detection and correction scheme that expands data symbols, the controller 120 may communicate data through the module 150 and check symbols for codewords are expanded across the memory channel a interface 121a and the memory channel B interface 121B. For example, when only nine (9) DRAM devices with x4 data signals are operating normally, the RS (18, 16) scheme extending over two channels A and B may be used. One symbol may be 4 bits, with 2 bits per DRAM spread over two bursts. One symbol is corrected, meaning that when the DRAM is internally configured to follow a "bounded fault" scheme, the "half" of the DRAM is corrected.
The methods, systems, and devices described above may be implemented in or stored by a computer system. The methods described above may also be stored on a non-transitory computer readable medium. The devices, circuits, and systems described herein may be implemented using computer-aided design tools available in the art and may be embodied in computer-readable files containing software descriptions of such circuits. This includes, but is not limited to, one or more elements of memory system 100 and components thereof. These software descriptions may be: behavior, register transfer, logic, transistor, and layout geometry level description. Moreover, the software description may be stored on a storage medium or conveyed by a carrier wave.
Data formats in which such descriptions may be implemented include, but are not limited to: formats that support behavioral languages (e.g., C), formats that support Register Transfer Level (RTL) languages (e.g., verilog and VHDL), formats that support geometric description languages (e.g., GDSII, GDSIV, CIF, and MEBES), and other suitable formats and languages. Moreover, the data transfer of such files on machine readable media may be performed electronically through different media on the internet or via e-mail, for example. Note that the physical files may be implemented on machine readable media (such as 4 mm tape, 8 mm tape, 3-1/2 inch floppy disk media, CD, DVD, etc.).
FIG. 12 is a block diagram illustrating one embodiment of a processing system 1200 for including, processing, or generating a representation of a circuit component 1220. The processing system 1200 includes one or more processors 1202, memory 1204, and one or more communication devices 1206. The processor 1202, memory 1204, and communication device 1206 communicate using any suitable type, number, and/or configuration of wired and/or wireless connections 1208.
The processor 1202 executes instructions of one or more processes 1212 stored in the memory 1204 to process and/or generate the circuit component 1220 in response to user input 1214 and parameters 1216. The process 1212 may be any suitable Electronic Design Automation (EDA) tool or portion thereof for designing, simulating, analyzing and/or verifying electronic circuitry and/or generating photomasks for electronic circuitry. Representation 1220 includes data describing all or a portion of memory system 100 and its components, as shown.
Representation 1220 may include one or more of the following: behavior, register transfer, logic, transistor, and layout geometry level description. Moreover, representation 1220 may be stored on a storage medium or conveyed by a carrier wave.
The data formats in which representation 1220 may be implemented include, but are not limited to: formats that support behavioral languages (e.g., C), formats that support Register Transfer Level (RTL) languages (e.g., verilog and VHDL), formats that support geometric description languages (e.g., GDSII, GDSIII, GDSIV, CIF and MEBES), and other suitable formats and languages. Moreover, the transfer of data of such files on machine-readable media may be done electronically through different media on the internet or e.g. via e-mail
User input 1214 may include input parameters from a keyboard, mouse, voice recognition interface, microphone and speakers, graphical display, touch screen, or other type of user interface device. The user interface may be distributed among a plurality of interface devices. Parameters 1216 may include specifications and/or characteristics entered to help define representation 1220. For example, parameters 1216 may include information defining a device type (e.g., NFET, PFET, etc.), topology (e.g., block diagram, circuit description, schematic, etc.), and/or device description (e.g., device characteristics, device size, supply voltage, analog temperature, analog model, etc.).
Memory 1204 includes non-transitory computer readable storage media storing any suitable type, number, and/or configuration of processes 1212, user inputs 1214, parameters 1216, and circuit components 1220.
Communication device 1206 includes any suitable type, number, and/or configuration of wired and/or wireless devices that transmit information from processing system 1200 to and/or receive information from another processing or storage system (not shown). For example, communication device 1206 may transmit circuit component 1220 to another system. The communication device 1206 may receive the process 1212, the user input 1214, the parameter 1216, and/or the circuit component 1220, and cause the process 1212, the user input 1214, the parameter 1216, and/or the circuit component 1220 to be stored in the memory 1204.
Implementations discussed herein include, but are not limited to, the following examples:
example 1: a controller comprising four memory channel controller interfaces for communicating with four memory channel module interfaces on a memory module, the memory module comprising a substrate and a dual x2 Dynamic Random Access Memory (DRAM) device, the dual x2 DRAM devices each having a respective first memory access interface and a respective second memory access interface, the first memory access interface and the second memory access interface operating independently of each other to access one of two respective sets of memory cores as non-overlapping sets; a first one of the four memory channel controller interfaces for communicating a first data symbol and a first check symbol through a respective first memory access interface of the dual x2 DRAM device, the first data symbol and the first check symbol being arranged into a first codeword; and a second one of the four memory channel controller interfaces for communicating a second data symbol and a second check symbol through a respective second memory access interface of the dual x2 DRAM device, the second data symbol and the second check symbol being arranged into a second codeword.
Example 2: the controller of example 1, comprising error detection and correction circuitry to process the first codeword to determine whether an error is present in the first codeword.
Example 3: the controller of example 2, wherein the first data symbol and the first check symbol have 4 bits.
Example 4: the controller of example 2, comprising persistent error detection circuitry to determine whether an error in the first codeword is persistent.
Example 5: the controller of example 4, wherein when the persistent error detection circuitry determines that the error in the first codeword is persistent, the controller communicates a third data symbol and a third check symbol through the first memory access interface and the second memory access interface of the dual x2 DRAM device, the third data symbol and third check symbol being arranged into a third codeword.
Example 6: the controller of example 5, wherein the third data symbol and the third check symbol have more bits than the first data symbol and the first check symbol.
Example 7: the controller of example 1, wherein the first data symbols and the first check symbols are encoded according to a first error detection and correction scheme, and the second data symbols and the second check symbols are encoded according to a second error detection and correction scheme, the second error detection and correction scheme being different from the first error detection and correction scheme.
Example 8: a memory controller comprising a first memory channel for communicating a first data symbol field and a first check symbol field through a first independent channel of a plurality of dual independent channel Dynamic Random Access Memory (DRAM) devices disposed on a module, the first data symbol field and the first check symbol field being arranged into a first codeword; and a second memory channel for communicating a second data symbol field and a second check symbol field through a second independent channel of the plurality of dual independent channel DRAM devices disposed on the module, the second data symbol field and the second check symbol field being arranged into a second codeword.
Example 9: the memory controller of example 8, further comprising error detection and correction circuitry to correct errors in a first one of the first data symbol fields based on values in at least one of the first check symbol fields.
Example 10: the memory controller of example 8, wherein each dual independent channel DRAM device of the plurality of dual independent channel DRAM devices communicates with each memory channel of the first memory channel and the second memory channel using a data width of two data bits.
Example 11: the memory controller of example 10, wherein each of the first data symbol field, the first check symbol field, the second data symbol field, and the second check symbol field is a four-bit wide field.
Example 12: the memory controller of example 8, wherein contents of the first data symbol field and the first check symbol field are encoded according to a first error detection and correction scheme, and contents of the second data symbol field and the second check symbol field are encoded according to a second error detection and correction scheme.
Example 13: the memory controller of example 12, wherein the first error detection and correction scheme and the second error detection and correction scheme have different error detection and correction capabilities.
Example 14: the memory controller of example 8, further comprising error detection and correction circuitry to correct errors in a third data symbol field and a third check symbol field arranged into a third codeword based on values in the third data symbol field, wherein the third codeword is communicated using the first memory channel and the second memory channel.
Example 15: the memory controller of example 14, wherein in a first mode, the first codeword is communicated using the first channel, the second codeword is communicated using the second channel, and in the second mode, the third codeword is communicated using both the first memory channel and the second memory channel.
Example 16: a method of operation of a memory controller, comprising: generating a first codeword having a first data symbol field and a first check symbol field; communicating the first codeword through a first independent channel of a plurality of dual independent channel Dynamic Random Access Memory (DRAM) devices disposed on a module; generating a second codeword having a second data symbol field and a second check symbol field; and communicating the second codeword through a second independent channel of the plurality of dual independent channel DRAM devices disposed on the module.
Example 17: the method of example 16, further comprising: an error in a first value of a third codeword received via the first independent channel is corrected based on the first value.
Example 18: the method of example 16, wherein the first codeword is generated from a first value of the first data symbol field using a first error detection and correction scheme and the second codeword is generated from a second value in the second data symbol field using a second error detection and correction scheme.
Example 19: the method of example 18, further comprising: generating a third codeword having a third data symbol field and a third check symbol field; and communicating the third codeword through the first and second independent channels of the plurality of dual independent channel DRAM devices disposed on the module.
Example 20: the method of example 18, further comprising: detecting that the first independent channel has a persistent equipment failure; and based on detecting that the first independent channel has a persistent device failure, placing the memory controller in a mode that generates and communicates a third codeword.
The foregoing description of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and other modifications and variations are possible in light of the above teachings. The embodiment was chosen and described in order to best explain the principles of the invention and its practical application, to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. The appended claims are intended to be construed to include other alternative embodiments of the invention except insofar as limited by the prior art.

Claims (20)

1. A controller, comprising:
four memory channel controller interfaces for communicating with four memory channel module interfaces on a memory module, the memory module comprising a substrate and a dual x2 Dynamic Random Access Memory (DRAM) device, the dual x2 DRAM devices each having a respective first memory access interface and a respective second memory access interface, the first memory access interface and the second memory access interface operating independently of each other to access one respective memory core set of two respective memory core sets as non-overlapping sets;
a first memory channel controller interface of the four memory channel controller interfaces for communicating first data symbols and first check symbols through a respective first memory access interface of the dual x2 DRAM device, the first data symbols and the first check symbols being arranged into a first codeword; and
a second one of the four memory channel controller interfaces is for communicating a second data symbol and a second check symbol through a respective second memory access interface of the dual x2 DRAM device, the second data symbol and the second check symbol being arranged into a second codeword.
2. The controller according to claim 1, comprising:
error detection and correction circuitry to process the first codeword to determine if an error exists in the first codeword.
3. The controller of claim 2, wherein the first data symbol and the first check symbol have 4 bits.
4. The controller according to claim 2, comprising:
persistent error detection circuitry to determine whether an error in the first codeword is persistent.
5. The controller of claim 4, wherein when the persistent error detection circuitry determines that errors in the first codeword are persistent, the controller communicates third data symbols and third check symbols through the first and second memory access interfaces of the dual x2 DRAM device, the third data symbols and third check symbols being arranged into a third codeword.
6. The controller of claim 5, wherein the third data symbol and the third check symbol have more bits than the first data symbol and the first check symbol.
7. The controller of claim 1, wherein first data symbols and first check symbols are encoded according to a first error detection and correction scheme, and second data symbols and second check symbols are encoded according to a second error detection and correction scheme, the second error detection and correction scheme being different from the first error detection and correction scheme.
8. A memory controller, comprising:
a first memory channel for communicating a first data symbol field and a first check symbol field through a first independent channel of a plurality of dual independent channel Dynamic Random Access Memory (DRAM) devices disposed on a module, the first data symbol field and the first check symbol field being arranged into a first codeword; and
a second memory channel for communicating a second data symbol field and a second check symbol field through a second independent channel of the plurality of dual independent channel DRAM devices disposed on the module, the second data symbol field and the second check symbol field being arranged into a second codeword.
9. The memory controller of claim 8, further comprising:
error detection and correction circuitry to correct errors in a first one of the first data symbol fields based on values in at least one of the first check symbol fields.
10. The memory controller of claim 8, wherein each dual independent channel DRAM device of the plurality of dual independent channel DRAM devices communicates with each memory channel of the first memory channel and the second memory channel using a data width of two data bits.
11. The memory controller of claim 10, wherein each of the first data symbol field, first check symbol field, second data symbol field, and second check symbol field is a four-bit wide field.
12. The memory controller of claim 8, wherein contents of the first data symbol field and the first check symbol field are encoded according to a first error detection and correction scheme, and contents of the second data symbol field and the second check symbol field are encoded according to a second error detection and correction scheme.
13. The memory controller of claim 12, wherein the first error detection and correction scheme and the second error detection and correction scheme have different error detection and correction capabilities.
14. The memory controller of claim 8, further comprising:
error detection and correction circuitry to correct errors in a third data symbol field based on values in the third data symbol field and a third check symbol field arranged into a third codeword, wherein the third codeword is communicated using the first memory channel and the second memory channel.
15. The memory controller of claim 14, wherein in a first mode, the first codeword is communicated using the first channel and the second codeword is communicated using the second channel; and in a second mode, the third codeword is communicated using both the first memory channel and the second memory channel.
16. A method of operation of a memory controller, comprising:
generating a first codeword having a first data symbol field and a first check symbol field;
communicating the first codeword through a first independent channel in a plurality of dual independent channel Dynamic Random Access Memory (DRAM) devices disposed on a module;
generating a second codeword having a second data symbol field and a second check symbol field; and
the second codeword is communicated through a second independent channel of the plurality of dual independent channel DRAM devices disposed on the module.
17. The method of claim 16, further comprising:
an error in a first value of a third codeword received via the first independent channel is corrected based on the first value.
18. The method of claim 16, wherein the first codeword is generated from a first value of the first data symbol field using a first error detection and correction scheme and the second codeword is generated from a second value of the second data symbol field using a second error detection and correction scheme.
19. The method of claim 18, further comprising:
generating a third codeword having a third data symbol field and a third check symbol field; and
the third codeword is communicated through the first independent channel and the second independent channel of the plurality of dual independent channel DRAM devices disposed on the module.
20. The method of claim 18, further comprising:
detecting that the first independent channel has a persistent equipment failure; and
the memory controller is placed in a mode to generate and communicate a third codeword based on detecting that the first independent channel has a persistent device failure.
CN202280044680.XA 2021-06-23 2022-06-21 Reliability of four channel memory module Pending CN117546136A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/214,024 2021-06-23
US202163252237P 2021-10-05 2021-10-05
US63/252,237 2021-10-05
PCT/US2022/034338 WO2022271695A1 (en) 2021-06-23 2022-06-21 Quad-channel memory module reliability

Publications (1)

Publication Number Publication Date
CN117546136A true CN117546136A (en) 2024-02-09

Family

ID=89794384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280044680.XA Pending CN117546136A (en) 2021-06-23 2022-06-21 Reliability of four channel memory module

Country Status (1)

Country Link
CN (1) CN117546136A (en)

Similar Documents

Publication Publication Date Title
US20210303383A1 (en) Memory module register access
US8589769B2 (en) System, method and storage medium for providing fault detection and correction in a memory subsystem
US7468993B2 (en) Dynamic reconfiguration of solid state memory device to replicate and time multiplex data over multiple data interfaces
US11934269B2 (en) Efficient storage of error correcting code information
US8489975B2 (en) Method and apparatus for detecting communication errors on a bus
US7523364B2 (en) Double DRAM bit steering for multiple error corrections
US7206962B2 (en) High reliability memory subsystem using data error correcting code symbol sliced command repowering
EP1116114B1 (en) Technique for detecting memory part failures and single, double, and triple bit errors
US20220374309A1 (en) Semiconductor memory devices
JP4783765B2 (en) Electronic device, method of operating electronic device, memory circuit, and method of operating memory circuit
US11804277B2 (en) Error remapping
US20240241791A1 (en) Error coalescing
CN117546136A (en) Reliability of four channel memory module
EP4359905A1 (en) Quad-channel memory module reliability
US20230099474A1 (en) Reliability for dram device stack
US20230081231A1 (en) Interconnect based address mapping for improved reliability
CN115994050A (en) Route allocation based on error correction capability
JPH10214235A (en) Electronic disk device
CN107315649A (en) A kind of list item method of calibration and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination