SE1300783A1 - Handling soft errors in connection with data storage - Google Patents

Handling soft errors in connection with data storage Download PDF

Info

Publication number
SE1300783A1
SE1300783A1 SE1300783A SE1300783A SE1300783A1 SE 1300783 A1 SE1300783 A1 SE 1300783A1 SE 1300783 A SE1300783 A SE 1300783A SE 1300783 A SE1300783 A SE 1300783A SE 1300783 A1 SE1300783 A1 SE 1300783A1
Authority
SE
Sweden
Prior art keywords
bit
representation symbol
symbol
bits
representation
Prior art date
Application number
SE1300783A
Other languages
Swedish (sv)
Inventor
Trond Loekstad
Original Assignee
Abb Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Abb Technology Ltd filed Critical Abb Technology Ltd
Priority to SE1300783A priority Critical patent/SE1300783A1/en
Publication of SE1300783A1 publication Critical patent/SE1300783A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

En datatillförlitlighetshanteringsanordning () innefattar en lagringsstyrenhet () som erhåller en databit () och en indikation på en önskad grad av tillförlitlighet, anordnar en första representationssymbol (RS) som representerar bitens värde och som innefattar ett antal bitar svarande mot den önskade graden av tillförlitlighet och lagrar den första representationssymbolen på en plats (L) i ett dataminne (6). En undersökningsenhet () läser ett bitmönster (BP) från platsen (L), jämför bitmönstret (BP) med den första representationssymbolen (RS) och indikerar det första bitvärdet om ett tillräckligt antal bitar i bitmönstret är identiska med den första representationssymbolen, jämför bitmönstret med en andra representationssymbol och indikerar det andra bitvärdet om ett tillräckligt antal bitar i bitmönstret är identiska med motsvarande bitar i den andra representationssymbolen. Undersökningsenheten indikerar även ett fel om ingen undersökning är lyckad.Fig.iA data reliability management device () comprises a storage controller () which obtains a data bit () and an indication of a desired degree of reliability, arranges a first representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability and stores the first representation symbol in a location (L) in a data memory (6). An exam unit () reads a bit pattern (BP) from the location (L), compares the bit pattern (BP) with the first representation symbol (RS) and indicates the first bit value if a sufficient number of bits in the bit pattern are identical to the first representation symbol, compare the bit pattern with a second representation symbol and indicates the second bit value if a sufficient number of bits in the bit pattern are identical to corresponding bits in the second representation symbol. The examination unit also indicates an error if no examination is successful.Fig.i

Description

10 15 20 25 30 determine the correct data, or even to discover that an error is present at all. In addition, before the correction can occur, the system may have crashed, in which case the recovery procedure must include a reboot. 10 15 20 25 30 determine the correct data, or even to discover that an error is present at all. In addition, before the correction can occur, the system may have crashed, in which case the recovery procedure must include a reboot.

Soft errors involve changes to data - the electrons in a storage circuit, for example - but not changes to the physical circuit itself, the atoms. If the data is rewritten, the circuit will work perfectly again.Soft errors involve changes to data - the electrons in a storage circuit, for example - but not changes to the physical circuit itself, the atoms. If the data is rewritten, the circuit will work perfectly again.

Soft errors can occur on transmission lines, in digital logic, analog circuits, magnetic storage, and elsewhere, but are most commonly known in semiconductor storage.Soft errors can occur on transmission lines, in digital logic, analog circuits, magnetic storage, and elsewhere, but are most commonly known in semiconductor storage.

General problem description Requirements for increased functionality and processing power (Moors law) at the same time as reducing (or at least keep) the chip size and power consumption, the bit area inside the chip is getting down to the size of only some few atoms. This makes the chip more and more vulnerable to the external environment like radiation. This vulnerability increases dramatically at higher altítudes, but is now also very noticeable at sea level.General problem description Requirements for increased functionality and processing power (Moors law) at the same time as reducing (or at least keep) the chip size and power consumption, the bit area inside the chip is getting down to the size of only some few atoms. This makes the chip more and more vulnerable to the external environment like radiation. This vulnerability increases dramatically at higher altítudes, but is now also very noticeable at sea level.

Next the three most common error detection mechanisms are described.Next the three most common error detection mechanisms are described.

Several others exist.Several others exist.

This increase in intermittent soft error is mitigated by the use of Error Correction Code (ECC) built into the chip hardware, typically as 1 bit error correction and 2 bit error detection. By running a scrubbing algorithm the memory that has done a 1 bit error correction is refreshed and set back to its original status ready to correct again if another soft error occurs. If not scrubbed another soft error in the same ECC word will be an error detect without the possibility for repair.This increase in intermittent soft error is mitigated by the use of Error Correction Code (ECC) built into the chip hardware, typically as 1 bit error correction and 2 bit error detection. By running a scrubbing algorithm the memory that has done a 1 bit error correction is refreshed and set back to its original status ready to correct again if another soft error occurs. If not scrubbed another soft error in the same ECC word will be an error detect without the possibility for repair.

Another error detection mechanism used is the simplest form called parity bit. This is a less power and area consuming algorithm doing a quick 10 15 20 25 30 3 summation through a defined number of bits to see if the result is according to a predefined zero (odd) or one (even).Another error detection mechanism used is the simplest form called parity bit. This is a less power and area consuming algorithm doing a quick 10 15 20 25 30 3 summation through a de fi ned number of bits to see if the result is according to a prede fi ned zero (odd) or one (even).

The third related error detection mechanism used is the Cyclic Redundancy Check (CRC). In this case the summation is done on a word basis (|16 bit, 32 bit or 64 bit) over several registers. This is typically done for a memory area like in a Flash (ROM) or on communication line verifying messages. If the result is different from expected an error is detected.The third related error detection mechanism used is the Cyclic Redundancy Check (CRC). In this case the summation is done on a word basis (| 16 bit, 32 bit or 64 bit) over several registers. This is typically done for a memory area like in a Flash (ROM) or on communication line verifying messages. If the result is different from expected an error is detected.

As can be seen here most single bit errors occurring are detected, but without the possibility to repair. In the safety domain this is handled by redundancy where two or more executions of the same code compares the result and based on this detects an error.As can be seen here most single bit errors occurring are detected, but without the possibility to repair. In the safety domain this is handled by redundancy where two or more executions of the same code compares the result and based on this detects an error.

State of the art solutions As the reliability continues to increase in memory devices, the hard error rate has continued to decrease. DRAM vendors report 10 FIT (Failure In Time, where time is 109 device hours). The dominant failure will be soft errors (Transient Faults) with rates in the 100-500 FIT range. These faults are categorized as Single Bit Errors related to the memory cel1's ability to store and retain charge over time. This is mitigated by chip vendors introducing ECC built into registers, cache, and RAM. For low level cache this takes too much processing power and only single bit parity is used. In normal situations this is good enough, but for safety applications higher reliability might be required.State of the art solutions As the reliability continues to increase in memory devices, the hard error rate has continued to decrease. DRAM vendors report 10 FIT (Failure In Time, where time is 109 device hours). The dominant failure will be soft errors (Transient Faults) with rates in the 100-500 FIT range. These faults are categorized as Single Bit Errors related to the memory cel1's ability to store and retain charge over time. This is mitigated by chip vendors introducing ECC built into registers, cache, and RAM. For low level cache this takes too much processing power and only single bit parity is used. In normal situations this is good enough, but for safety applications higher reliability might be required.

What is the purpose of these mechanisms if the error anyway can occur during processing or internal communication on the bus? The point is that data stored normally lasts for a long time (several years) while the time for processing often last just some few Micro or even Nano seconds. 10 15 20 25 30 4 Considering this time difference the probability for soft errors is at large where time is spent.What is the purpose of these mechanisms if the error can occur anyway during processing or internal communication on the bus? The point is that data stored normally lasts for a long time (several years) while the time for processing often last just a few Micro or even Nano seconds. 10 15 20 25 30 4 Considering this time difference the probability for soft errors is at large where time is spent.

In addition the trend now is a move from single core to multi-core processing giving a big potential for required processing power and separation needed to implement this invention.In addition the trend now is a move from single core to multi-core processing giving a big potential for required processing power and separation needed to implement this invention.

This invention will assure that storage of data can be trusted with a defined probability of failure mainly required by the safety application domain today. Due to cost in space and processing it will focus on preserving specific configuration registers that must be reliable. This is especially valid in the safety domain for claiming integrity between different cores in a multi-core environment when this is based on configuration setting during boot up. This setting must be valid for the entire lifetime of a device.This invention will ensure that data storage can be trusted with a defined probability of failure mainly required by the safety application domain today. Due to cost in space and processing it will focus on preserving specific configuration registers that must be reliable. This is especially valid in the safety domain for claiming integrity between different cores in a multi-core environment when this is based on configuration setting during boot up. This setting must be valid for the entire lifetime of a device.

It boils down to the probability required to reach a given safety integrity level (SIL). If this probability can be optimized by adjusting the number of replicated bits, it will make the proof of concept much easier to demonstrate for certification authorities.It boils down to the probability required to reach a given safety integrity level (SIL). If this probability can be optimized by adjusting the number of replicated bits, it will make the proof of concept much easier to demonstrate for certification authorities.

~ What is the probability of failure in RAM memory without any protection? - What is the probability of failure in RAM with ECC protection (1 bit correct / 2 bit detect)? - If there exists an answer to the two previous questions (these are familiar problems in the safety domain where a lot of statistics are available), what is then the number of replicated bits needed to reach the required probability of failure? Each bit that is important will be replicated a defined number of times depending on required probability of failure. An original bit can be either 0 10 15 20 25 30 or 1. If an 8 bit replication scheme is used the original bit is replicated 8 times meaning a 1 will be binary 1111 1111 (0xF F) and 0 will be binary 0000 oooo (oxoo).~ What is the probability of failure in RAM memory without any protection? - What is the probability of failure in RAM with ECC protection (1 bit correct / 2 bit detect)? - If there exists an answer to the two previous questions (these are familiar problems in the safety domain where a lot of statistics are available), what is then the number of replicated bits needed to reach the required probability of failure? Each bit that is important will be replicated a defined number of times depending on required probability of failure. An original bit can be either 0 10 15 20 25 30 or 1. If an 8 bit replication scheme is used the original bit is replicated 8 times meaning a 1 will be binary 1111 1111 (0xF F) and 0 will be binary 0000 oooo (oxoo).

This can be implemented as two functions, one for reading and one for writing.This can be implemented as two functions, one for reading and one for writing.

They may both be implemented in a data reliability handling device 10.They may both be implemented in a data reliability handling device 10.

Alternatively only the storing function is provided in the data reliability handling device, while the reading function in this case may be implemented in a symbol investigating device.Alternatively only the storing function is provided in the data reliability handling device, while the reading function in this case may be implemented in a symbol investigating device.

The writing function may then be implemented through a storing control unit 12, while the reading function may be implemented through an investigating unit 14.The writing function may then be implemented through a storing control unit 12, while the reading function may be implemented through an investigating unit 14.

In order to store data reliably the storing control unit 12 obtains a data bit 11 for being stored in a location L of a data storage 16 such as the above mentioned RAM memory, obtains an indication of a desired degree of reliability of the bit such as the above mentioned SIL, provides a first representation symbol RS representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, in this case through replicating the bit a number of times, and stores the first representation symbol in the location L. In this first example the first representation symbol comprises the original bit and the replicas. The first representation symbol is thus obtained through replicating the bit 11.In order to store data reliably the storing control unit 12 obtains a data bit 11 for being stored in a location L of a data storage 16 such as the above mentioned RAM memory, obtains an indication of a desired degree of reliability of the bit such as the above mentioned SIL, provides a first representation symbol RS representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, in this case through replicating the bit a number of times, and stores the first representation symbol in the location L. In this first example the first representation symbol comprises the original bit and the replicas. The first representation symbol is thus obtained by replicating the bit 11.

The investigating unit 14 may in turn read a bit pattern BP from the location L, compare the bit pattern BP with the first representation symbol RS and indicate a first bit value v if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol. If this fails then the investigating unit 14 may continue to compare 10 15 20 25 30 the bit pattern with a second representation symbol and indicate a second bit value if a sufñcient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol. If none of these comparisons is successful the investigating unit 14 may indicate a fault.The investigating unit 14 may in turn read a bit pattern BP from the location L, compare the bit pattern BP with the first representation symbol RS and indicate a first bit value v if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol. If this fails then the investigating unit 14 may continue to compare 10 15 20 25 30 the bit pattern with a second representation symbol and indicate a second bit value if a sufñcient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol. If none of these comparisons is successful the investigating unit 14 may indicate a fault.

The function for reading one bit by the investigating unit 14 is implemented as: 1. Use a fetch function for reading the 8 bit replicated value, i.e. bit pattern BP, stored somewhere, here at location L of the memory 16 2. The number of 1's and 0's is counted 3. Depending on robust criteria a 1 or 0 is returned The function for writing one bit performed by the storing control unit 12 can be implemented as: 1. use a storage function with the value v of 1 or 0 2. replicate the value v according to the replication scheme, i.e. according to the desired degree of reliability 3. The replicated value is stored at location L of memory 16 as a representation symbol and thus all the bits of a representation symbol have the same value v.The function for reading one bit by the investigating unit 14 is implemented as: 1. Use a fetch function for reading the 8 bit replicated value, i.e. bit pattern BP, stored somewhere, here at location L of the memory 16 2. The number of 1's and 0's is counted Depending on robust criteria a 1 or 0 is returned The function for writing one bit performed by the storing control unit 12 can be implemented as: 1. use a storage function with the value v of 1 or 0 2. replicate the value v according to the replication scheme, i.e. according to the desired degree of reliability 3. The replicated value is stored at location L of memory 16 as a representation symbol and thus all the bits of a representation symbol have the same value v.

Robust criteria is a weighting of a bit counted as 0 and 1. Depending on reliability requirement, it could be that it is required to have all bits set to the same. This will of course reduce the availability, where one bit fault is statistically more likely to occur when replicated several times covering a bigger memory area. E.g. by accepting one bit fault a written 1 replicated as oxFF is still a valid 1 when read back as 0xF7 (or any other combination having 7 1-bits and 1 0-bit).Robust criteria is a weighting of a bit counted as 0 and 1. Depending on reliability requirement, it could be that it is required to have all bits set to the same. This will of course reduce the availability, where one bit fault is statistically more likely to occur when replicated several times covering a bigger memory area. E.g. by accepting one bit fault a written 1 replicated as oxFF is still a valid 1 when read back as 0xF7 (or any other combination having 7 1-bits and 1 0-bit).

A replication scheme tells how many times the bit shall be replicated. If 8- bit replication is not good enough, it might be required that each bit must be replicated 16 or 32 times (bits). The advantage of increasing replicated number of bits is to increase availability. If each bit is replicated 32 times it 10 15 20 25 30 7 might be acceptable to have 4 or more bit errors. E.g. if a 1 is stored as a 32 bit replica (oxFFFFFFFF) it might be fully reasonable to read back 0xFAF 5FAF 5 (8 0's and 24 1's) actually accepting 8 bit error and still have a valid result.A replication scheme tells how many times the bit shall be replicated. If 8- bit replication is not good enough, it might be required that each bit must be replicated 16 or 32 times (bits). The advantage of increasing replicated number of bits is to increase availability. If each bit is replicated 32 times it 10 15 20 25 30 7 might be acceptable to have 4 or more bit errors. E.g. if a 1 is stored as a 32 bit replica (oxFFFFFFFF) it might be fully reasonable to read back 0xFAF 5FAF 5 (8 0's and 24 1's) actually accepting 8 bit error and still have a valid result.

This replication mechanism can be used for storing critical (ie data that must be reliable) temporary and permanent data. Typically usage is storage of data as hardware configuration (registers for cache, memory management unit MMU, HV, peripherals, etc.) and software values (as different CRC values, critical pointers, etc.) that must be trusted in order to claim separation. Building a write-back function into the read function will repair eventually bit faults (scrubbing). This will be a strong solution in the safety domain assuring both safety and availability.This replication mechanism can be used for storing critical (ie data that must be reliable) temporary and permanent data. Typically usage is storage of data as hardware configuration (registers for cache, memory management unit MMU, HV, peripherals, etc.) and software values (as different CRC values, critical pointers, etc.) that must be trusted in order to claim separation. Building a write-back function into the read function will repair eventually bit faults (scrubbing). This will be a strong solution in the safety domain assuring both safety and availability.

Next is sample code showing implemented algorithms: untSB cntlbits (untSB bigBit) { /* This function is counting the number of one's in the input parameter.Next is sample code showing implemented algorithms: untSB cntlbits (untSB bigBit) {/ * This function is counting the number of one's in the input parameter.

* The distribution of faults could be consider as well * (if several faults in sequence or spread around). */ untSB ii; untSB jj = 0; for (ii=0; ii { jj += ( (bigBit >> ii) & 1); l* return jj; untSB bigbit2bit (untSB bigbit) { 10 /* This is the read function returning either a 0 or a 1.* The distribution of faults could be considered as well * (if several faults in sequence or spread around). * / untSB ii; untSB jj = 0; for (ii = 0; ii { jj + = ((bigBit >> ii) &1); l * return jj; untSB bigbit2bit (untSB bigbit) { 10 / * This is the read function returning either a 0 or a 1.

* If error it returns -1. */ untSB cnt = cnt1bits(bigbit); if( cnt > SAFE_BITS_1) return 1; else if( cnt > SAFE_BITS_0 ) return SAFE_BITS_ERROR; // error else return 0; 15 untSB bit2bigbit (untSB bit) { 20 25 /* This is the write function returning either a replicated 0 or 1 * (e.g. for 32 bit replica a 0x000oo000 or oxFFFFFFFF). */ if ( 0 == bit ) return SAF E_BITS_ZEROS; else if( 1 == bit ) return SAFE_BITS_ONES; else return SAF E_BITS_ERROR; // error Advantages of the invention includes Storage of data will always be done according to required probability of failure based on accepted experience data. 30' It is easy to adjust the probability of failure to fit required Safety Integrity Level (SIL) and to prove this. lO 15 20 25 30 9 - The amount of data handled by this algorithm is costly to process and must be carefully planned for. By introduction of multi-core configured as redundant system (claiming HFI' > 1) core integrity must be preserved. This can be achieved by usage of MMU and HV reliable configuration data handling using this replication algorithm.* If error it returns -1. * / untSB cnt = cnt1bits (bigbit); if (cnt> SAFE_BITS_1) return 1; else if (cnt> SAFE_BITS_0) return SAFE_BITS_ERROR; // error else return 0; 15 untSB bit2bigbit (untSB bit) { 20 25 / * This is the write function returning either a replicated 0 or 1 * (e.g. for 32 bit replica to 0x000oo000 or oxFFFFFFFF). * / if (0 == bit) return SAF E_BITS_ZEROS; else if (1 == bit) return SAFE_BITS_ONES; else return SAF E_BITS_ERROR; // error Advantages of the invention includes Storage of data will always be done according to required probability of failure based on accepted experience data. 30 ' It is easy to adjust the probability of failure to the required Safety Integrity Level (SIL) and to prove this. lO 15 20 25 30 9 - The amount of data handled by this algorithm is costly to process and must be carefully planned for. By introduction of multi-core con fi gured as redundant system (claiming HFI '> 1) core integrity must be preserved. This can be achieved by usage of MMU and HV reliable configuration data handling using this replication algorithm.

- The new generation of CPUs with multi-core opens up new possibilities to implement this kind of functionality by letting a CPU (core) separated from time critical operations do this replication read/ write job without affecting the performance of the system.- The new generation of CPUs with multi-core opens up new possibilities to implement this kind of functionality by letting a CPU (core) separated from time critical operations do this replication read / write job without affecting the performance of the system.

- Instead of using a oxFFFFFFFF to represent a binary 1 and oxoooooooo to represent a binary o, a “magic number” could be used to represent the binary values. This to avoid the problem of cleaning a memory area (fill with zero or OXFF) and still have valid return values that are incorrect. E.g. letting “magic number” ox55555555 represent a binary 1 and oxAAAAAAAA (in this case one's complement of 0x55555555) represent a binary 0.- Instead of using an oxFFFFFFFF to represent a binary 1 and oxoooooooo to represent a binary o, a “magic number” could be used to represent the binary values. This to avoid the problem of cleaning a memory area (fill with zero or OXFF) and still have valid return values that are incorrect. E.g. letting “magic number” ox55555555 represent a binary 1 and oxAAAAAAAA (in this case one's complement of 0x55555555) represent a binary 0.

This means in this case the first representation symbol RS representing the bit comprising a number of bits corresponding to the desired degree of reliability is a magic number. This magic number may comprise a combination of ones and zeroes, for instance alternating ones and zeroes.This means in this case the first representation symbol RS representing the bit comprising a number of bits corresponding to the desired degree of reliability is a magic number. This magic number may comprise a combination of ones and zeroes, for instance alternating ones and zeroes.

Furthermore, in order to obtain the magic number the storing control unit may as an example first replicate the bit for obtaining an intermediate symbol and then invert bits in the intermediate symbol for obtaining the first representation symbol RS. It is also possible that the first representation symbol is assigned to a first value of the bit and a second representation symbol is assigned to a second value of the bit. As the bits are binary and have a value v and there is the second representation symbol, i.e. a second magic number for the opposite value vi of the bit, the representation symbols may be related to each other. The second representation symbol may for instance be the one's complement of the 10 15 20 25 30 10 first representation symbol. As can also be seen above the inverting may comprise inverting even bits or inverting odd bits. Furthermore the first representation symbol is a combination of ones and zeroes and the second representation symbol is the one's complement of the first symbol.Furthermore, in order to obtain the magic number the storing control unit may as an example fi rst replicate the bit for obtaining an intermediate symbol and then invert bits in the intermediate symbol for obtaining the first representation symbol RS. It is also possible that the fi rst representation symbol is assigned to a first value of the bit and a second representation symbol is assigned to a second value of the bit. As the bits are binary and have a value v and there is the second representation symbol, i.e. a second magic number for the opposite value vi of the bit, the representation symbols may be related to each other. The second representation symbol may for instance be the one's complement of the 10 15 20 25 30 10 first representation symbol. As can also be seen above the inverting may comprise inverting even bits or inverting odd bits. Furthermore the first representation symbol is a combination of ones and zeroes and the second representation symbol is the one's complement of the first symbol.

Next is sample code showing an implementation of this algorithm: /********************************************************************* ********* * COMPARE MAGIC NUMBER STATISTICALY ********************************************************************** *******! static untSB compibit (untSB bigBit) { /* This function is comparing the magic number bit by bit counting number of hits.Next is sample code showing an implementation of this algorithm: / ************************************************ ******************** ********* * COMPARE MAGIC NUMBER STATISTICALY ************************************************* ******************** *******! static untSB compibit (untSB bigBit) {/ * This function is comparing the magic number bit by bit counting number of hits.

* The result is either 0, 1, or SB_ERROR * (Could also consider if several faults in sequence or spread around). */ untSB ii; untSB j0 = 0; untSB j1 = 0; for (ii=0; ii { if ( (bigBit >> ii) & ((SB_MAGIC__1 >> ii) & 1) ) j1++; else j0++; } if( j1 >= SB_SUM_HIT) 10 15 20 11 return 1; else if (io >= SB_SUM_HIT) return o; else return SB_ERROR; /********************************************************************* ********* * ONE BIT CONVERSION ********************************************************************** *******! static untSB bigBit2bit (untSB bigBit) { /* This is the read function returning either a plain 0 or a 1. */ if ( bigBit == SB_MAGIC_1) return ( 1 ); else if( bigBit == SB_MAGIC_0 ) return ( 0); else return ( complbit (bigBit) ); 2 5 static untSB bit2bigBit (untSB bit) 30 { /* This is the write function returning a safe (magic) 0 or 1 */ if ( 1 == bit) return ( SB_MAGIC_1 ); else if( o == bit) return ( SB_MAGIC_0 ); 10 15 20 12 else return ( SB_ERROR ); // error The data reliability handling device and the symbol investigating device, and therefore also the storing control unit and the investigating unit, may be provided in the form one or more processors with associated program memories comprising computer program code with computer program instructions executable by the processor for performing the functionality of these units.* The result is either 0, 1, or SB_ERROR * (Could also consider if several faults in sequence or spread around). * / untSB ii; untSB j0 = 0; untSB j1 = 0; for (ii = 0; ii { if ((bigBit >> ii) & ((SB_MAGIC__1 >> ii) & 1)) j1 ++; else j0 ++; } if (j1> = SB_SUM_HIT) 10 15 20 11 return 1; else if (io> = SB_SUM_HIT) return o; else return SB_ERROR; / ************************************************ ******************** ********* * ONE BIT CONVERSION ************************************************* ******************** *******! static untSB bigBit2bit (untSB bigBit) { / * This is the read function returning either a plain 0 or a 1. * / if (bigBit == SB_MAGIC_1) return (1); else if (bigBit == SB_MAGIC_0) return (0); else return (complbit (bigBit)); 2 5 static untSB bit2bigBit (untSB bit) 30 { / * This is the write function returning a safe (magic) 0 or 1 * / if (1 == bit) return (SB_MAGIC_1); else if (o == bit) return (SB_MAGIC_0); 10 15 20 12 else return (SB_ERROR); // error The data reliability handling device and the symbol investigating device, and therefore also the storing control unit and the investigating unit, may be provided in the form one or more processors with associated program memories comprising computer program code with computer program instructions executable by the processor for performing the functionality of these units.

The computer program code of a device may also be in the form of computer program for instance on a data carrier, such as a CD ROM disc or a memory stick. In this case the data carrier carries a computer program with the computer program code, which will implement the functionality of the above-described first resource allocation device. One such data carrier 18 with computer program code 20 is schematically shown in fig. 3.The computer program code of a device may also be in the form of computer program for instance on a data carrier, such as a CD ROM disc or a memory stick. In this case the data carrier carries a computer program with the computer program code, which will implement the functionality of the above-described first resource allocation device. One such data carrier 18 with computer program code 20 is schematically shown in fi g. 3.

While the invention has been described in connection with what is presently considered to be most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements. Therefore the invention is only to be limited by the following claims.While the invention has been described in connection with what is presently considered to be most practical and preferred embodiments, it is to be understood that the invention is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modi fi cations and equivalent arrangements. Therefore the invention is only to be limited by the following claims.

Claims (19)

10 15 20 25 30 13 CLAIMS10 15 20 25 30 13 CLAIMS 1. A data reliability handling device (10) comprising a storing control unit (12) conñgured to: obtain a data bit (11), obtain an indication of a desired degree of reliability of the bit, provide a first representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, and store the ñrst representation symbol in a location (L) of a data storage (16).A data reliability handling device (10) comprising a storing control unit (12) conñgured to: obtain a data bit (11), obtain an indication of a desired degree of reliability of the bit, provide a first representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, and store the first representation symbol in a location (L) of a data storage (16). 2. The data reliability handling device (10) according to claim 1, wherein the storing control unit is configured to replicate the bit a number of times for obtaining the first representation symbol.The data reliability handling device (10) according to claim 1, wherein the storing control unit is configured to replicate the bit a number of times for obtaining the first representation symbol. 3. The data reliability handling device (22) according to claim 1, wherein the storing control unit is configured to provide the first representation symbol as a combination of ones and zeroes.The data reliability handling device (22) according to claim 1, wherein the storing control unit is con fi gured to provide the first representation symbol as a combination of ones and zeroes. 4. The data reliability handling device (22) according to claim 3, wherein the first representation symbol (RS) comprises alternating ones and zeroes.The data reliability handling device (22) according to claim 3, wherein the first representation symbol (RS) comprises alternating ones and zeroes. 5. The data reliability handling device according to claim 3 or 4, wherein the bit is binary and has the value (v) and there is a second representation symbol for the opposite value (vi), which second representation symbol is the one”s complement of the first representation symbol.5. The data reliability handling device according to claim 3 or 4, wherein the bit is binary and has the value (v) and there is a second representation symbol for the opposite value (vi), which second representation symbol is the one ”s complement of the first representation symbol. 6. The data reliability handling device according to any previous claim, further comprising an investigating unit (14) configured to read a bit pattern (BP) from the location (L), 10 15 20 25 30 14 compare the bit pattern (BP) with the first representation symbol (RS) and indicate a flrst bit value (V) if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol and indicate a fault if the comparison is unsuccessful.The data reliability handling device according to any previous claim, further comprising an investigating unit (14) con fi gured to read a bit pattern (BP) from the location (L), 10 15 20 25 30 14 compare the bit pattern (BP) with the fi rst representation symbol (RS) and indicate a fl rst bit value (V) if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol and indicate a fault if the comparison is unsuccessful. 7. A method for increasing the reliability of a bit, the method being performed by a data reliability handling device (10) and comprising obtaining a data bit (11), obtaining an indication of a desired degree of reliability of the bit, providing a flrst representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability , and storing the flrst representation symbol in a location (L) of a data storage (16).7. A method for increasing the reliability of a bit, the method being performed by a data reliability handling device (10) and comprising obtaining a data bit (11), obtaining an indication of a desired degree of reliability of the bit, providing a representation rst representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, and storing the fl rst representation symbol in a location (L) of a data storage (16). 8. The method according to claim 7, wherein the obtaining of the first representation symbol comprises replicating the bit a number of times.8. The method according to claim 7, wherein the obtaining of the first representation symbol comprises replicating the bit a number of times. 9. The method according to claim 7, wherein the providing of the flrst representation symbol comprises providing the first representation symbol as a combination of ones and zeroes.9. The method according to claim 7, wherein the providing of the representation rst representation symbol comprises providing the first representation symbol as a combination of ones and zeroes. 10. The method according to claim 9, wherein the flrst representation symbol (RS) comprises alternating ones and zeroes.The method according to claim 9, wherein the fl rst representation symbol (RS) comprises alternating ones and zeroes. 11. The method according to claim 9 or 10, wherein the bit is binary and has the value (v) and there is a second representation symbol for the opposite value (vi), which second representation symbol is the one's complement of the flrst representation symbol.11. The method according to claim 9 or 10, wherein the bit is binary and has the value (v) and there is a second representation symbol for the opposite value (vi), which second representation symbol is the one's complement of the fl rst representation symbol. 12. A computer program product for increasing the reliability of a bit, 10 15 20 25 30 15 the computer program product comprising a data carrier (18) with computer program code (20) which when run in a data reliability increasing device (10), causes the data reliability increasing device (10) to: obtain a data bit (11), obtain an indication of a desired degree of reliability of the bit, provide a ñrst representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, and store the first representation symbol in a location (L) of a data storage (16).12. A computer program product for increasing the reliability of a bit, 10 15 20 25 30 15 the computer program product comprising a data carrier (18) with computer program code (20) which when run in a data reliability increasing device (10) , causes the data reliability increasing device (10) to: obtain a data bit (11), obtain an indication of a desired degree of reliability of the bit, provide a ñrst representation symbol (RS) representing the value of the bit and comprising a number of bits corresponding to the desired degree of reliability, and store the first representation symbol in a location (L) of a data storage (16). 13. A symbol investigating device for investigating data stored in a location (L) of a data storage (16) and comprising an investigating unit (12) configured to read a bit pattern (BP) from the location (L), compare the bit pattern with a first representation symbol (RS) representing a ñrst bit value and indicate the ñrst bit value if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol, compare the bit pattern with a second representation symbol representing a second bit value and indicate the second bit value if a sufficient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol, and indicate a fault if no comparison is successful.13. A symbol investigating device for investigating data stored in a location (L) of a data storage (16) and comprising an investigating unit (12) configured to read a bit pattern (BP) from the location (L), compare the bit pattern with a first representation symbol (RS) representing a first bit value and indicate the first bit value if a suf fi cient number of bits of the bit pattern are identical to corresponding bits of the first representation symbol, compare the bit pattern with a second representation symbol representing a second bit value and indicate the second bit value if a suf fi cient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol, and indicate a fault if no comparison is successful. 14. The symbol investigating device according to claim 13, wherein all the bits of a representation symbol have the same value (v).14. The symbol investigating device according to claim 13, wherein all the bits of a representation symbol have the same value (v). 15. The symbol investigating device according to claim 13, wherein the first representation symbol is a combination of ones and zeroes and the second representation symbol is the one”s complement of the first symbol. 10 15 20 25 30 1615. The symbol investigating device according to claim 13, wherein the first representation symbol is a combination of ones and zeroes and the second representation symbol is the one ”s complement of the fi rst symbol. 10 15 20 25 30 16 16. storage (16), the method being performed in a symbol investigating device A method for investigating data stored in a location (L) of a data and comprising reading a bit pattern (BP) from the location (L), comparing the bit pattern with a first representation symbol (RS) representing a first bit value and indicating the first bit value if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol, comparing the bit pattern with a second representation symbol representing a second bit value and indicating the second bit value if a sufficient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol, and indicating a fault if no comparison is successful.16. storage (16), the method being performed in a symbol investigating device A method for investigating data stored in a location (L) of a data and comprising reading a bit pattern (BP) from the location (L), comparing the bit pattern with a first representation symbol (RS) representing a fi rst bit value and indicating the first bit value if a suf fi cient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol, comparing the bit pattern with a second representation symbol representing a second bit value and indicating the second bit value if a suf fi cient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol, and indicating a fault if no comparison is successful. 17. The method according to claim 16, wherein all the bits of a representation symbol have the same value (v).17. The method according to claim 16, wherein all the bits of a representation symbol have the same value (v). 18. symbol is a combination of ones and zeroes and the second representation The method according to claim 16, wherein the first representation symbol is the one's complement of the first symbol.18. symbol is a combination of ones and zeroes and the second representation The method according to claim 16, wherein the first representation symbol is the one's complement of the fi rst symbol. 19. A computer program product for investigating data stored in a location (L) of a data storage (16), the computer program product comprising a data carrier (18) with computer program code (20) which when run in a symbol investigating device, causes the symbol investigating device to: read a bit pattern (BP) from the location, compare the bit pattern (BP) with a first representation symbol (RS) representing a first bit value and indicate the first bit value if a sufficient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol, 17 compare the bit pattern with a second representation symbol representing a second bit value and indicate the second bit value if a sufficient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol, and 5 indicate a fault if no comparison is successful19. A computer program product for investigating data stored in a location (L) of a data storage (16), the computer program product comprising a data carrier (18) with computer program code (20) which when run in a symbol investigating device , causes the symbol investigating device to: read a bit pattern (BP) from the location, compare the bit pattern (BP) with a first representation symbol (RS) representing a first bit value and indicate the bit rst bit value if a suf fi cient number of bits of the bit pattern are identical to corresponding bits of the the first representation symbol, 17 compare the bit pattern with a second representation symbol representing a second bit value and indicate the second bit value if a suf fi cient number of bits of the bit pattern are identical to corresponding bits of the second representation symbol, and 5 indicate a fault if no comparison is successful
SE1300783A 2013-12-20 2013-12-20 Handling soft errors in connection with data storage SE1300783A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SE1300783A SE1300783A1 (en) 2013-12-20 2013-12-20 Handling soft errors in connection with data storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SE1300783A SE1300783A1 (en) 2013-12-20 2013-12-20 Handling soft errors in connection with data storage

Publications (1)

Publication Number Publication Date
SE1300783A1 true SE1300783A1 (en) 2013-12-23

Family

ID=49921039

Family Applications (1)

Application Number Title Priority Date Filing Date
SE1300783A SE1300783A1 (en) 2013-12-20 2013-12-20 Handling soft errors in connection with data storage

Country Status (1)

Country Link
SE (1) SE1300783A1 (en)

Similar Documents

Publication Publication Date Title
US10789117B2 (en) Data error detection in computing systems
US9747148B2 (en) Error monitoring of a memory device containing embedded error correction
Nair et al. XED: Exposing on-die error detection information for strong memory reliability
US8010875B2 (en) Error correcting code with chip kill capability and power saving enhancement
US9535784B2 (en) Self monitoring and self repairing ECC
US9208027B2 (en) Address error detection
KR20130033416A (en) Methods and apparatus to protect segments of memory
US10606692B2 (en) Error correction potency improvement via added burst beats in a dram access cycle
US20140344643A1 (en) Hybrid memory protection method and apparatus
US8707133B2 (en) Method and apparatus to reduce a quantity of error detection/correction bits in memory coupled to a data-protected processor port
Gottscho et al. Software-defined error-correcting codes
US20140281681A1 (en) Error correction for memory systems
Kajmakovic et al. Flexible soft error mitigation strategy for memories in mixed-critical systems
Longofono et al. Predicting and mitigating single-event upsets in DRAM using HOTH
US9043655B2 (en) Apparatus and control method
TW202246979A (en) Error rates for memory with built in error correction and detection
Henderson Power8 processor-based systems ras
SE1300783A1 (en) Handling soft errors in connection with data storage
TWI509622B (en) Fault bits scrambling memory and method thereof
Argyrides et al. Decimal Hamming: a software-implemented technique to cope with soft errors
Dopson SoftECC: A system for software memory integrity checking
US9921906B2 (en) Performing a repair operation in arrays
Kajmakovic et al. Challenges in mitigating soft errors in safety-critical systems with cots microprocessors
GB2455212A (en) Error detection in processor status register files
US11809272B2 (en) Error correction code offload for a serially-attached memory device

Legal Events

Date Code Title Description
NAV Patent application has lapsed