Documentation: mtd: improve nand_ecc.txt for readability and correctness
This patch correct some representation errors, add a little clarification in some places, and fix indentation problems for pseudo code. It also delete one more white space for one place. Signed-off-by: Wang YanQing <udknight@gmail.com> [Brian: a few tweaks] Signed-off-by: Brian Norris <computersforpeace@gmail.com>
This commit is contained in:
Родитель
1b15b1f5a0
Коммит
fc5adbebac
|
@ -107,7 +107,7 @@ for (i = 0; i < 256; i++)
|
||||||
if (i & 0x01)
|
if (i & 0x01)
|
||||||
rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
|
rp1 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
|
||||||
else
|
else
|
||||||
rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp1;
|
rp0 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp0;
|
||||||
if (i & 0x02)
|
if (i & 0x02)
|
||||||
rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
|
rp3 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp3;
|
||||||
else
|
else
|
||||||
|
@ -127,7 +127,7 @@ for (i = 0; i < 256; i++)
|
||||||
if (i & 0x20)
|
if (i & 0x20)
|
||||||
rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
|
rp11 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp11;
|
||||||
else
|
else
|
||||||
rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
|
rp10 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp10;
|
||||||
if (i & 0x40)
|
if (i & 0x40)
|
||||||
rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
|
rp13 = bit7 ^ bit6 ^ bit5 ^ bit4 ^ bit3 ^ bit2 ^ bit1 ^ bit0 ^ rp13;
|
||||||
else
|
else
|
||||||
|
@ -158,7 +158,7 @@ the values in any order. So instead of calculating all the bits
|
||||||
individually, let us try to rearrange things.
|
individually, let us try to rearrange things.
|
||||||
For the column parity this is easy. We can just xor the bytes and in the
|
For the column parity this is easy. We can just xor the bytes and in the
|
||||||
end filter out the relevant bits. This is pretty nice as it will bring
|
end filter out the relevant bits. This is pretty nice as it will bring
|
||||||
all cp calculation out of the if loop.
|
all cp calculation out of the for loop.
|
||||||
|
|
||||||
Similarly we can first xor the bytes for the various rows.
|
Similarly we can first xor the bytes for the various rows.
|
||||||
This leads to:
|
This leads to:
|
||||||
|
@ -271,11 +271,11 @@ to write our code in such a way that we process data in 32 bit chunks.
|
||||||
Of course this means some modification as the row parity is byte by
|
Of course this means some modification as the row parity is byte by
|
||||||
byte. A quick analysis:
|
byte. A quick analysis:
|
||||||
for the column parity we use the par variable. When extending to 32 bits
|
for the column parity we use the par variable. When extending to 32 bits
|
||||||
we can in the end easily calculate p0 and p1 from it.
|
we can in the end easily calculate rp0 and rp1 from it.
|
||||||
(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
|
(because par now consists of 4 bytes, contributing to rp1, rp0, rp1, rp0
|
||||||
respectively)
|
respectively, from MSB to LSB)
|
||||||
also rp2 and rp3 can be easily retrieved from par as rp3 covers the
|
also rp2 and rp3 can be easily retrieved from par as rp3 covers the
|
||||||
first two bytes and rp2 the last two bytes.
|
first two MSBs and rp2 covers the last two LSBs.
|
||||||
|
|
||||||
Note that of course now the loop is executed only 64 times (256/4).
|
Note that of course now the loop is executed only 64 times (256/4).
|
||||||
And note that care must taken wrt byte ordering. The way bytes are
|
And note that care must taken wrt byte ordering. The way bytes are
|
||||||
|
@ -387,11 +387,11 @@ Analysis 2
|
||||||
|
|
||||||
The code (of course) works, and hurray: we are a little bit faster than
|
The code (of course) works, and hurray: we are a little bit faster than
|
||||||
the linux driver code (about 15%). But wait, don't cheer too quickly.
|
the linux driver code (about 15%). But wait, don't cheer too quickly.
|
||||||
THere is more to be gained.
|
There is more to be gained.
|
||||||
If we look at e.g. rp14 and rp15 we see that we either xor our data with
|
If we look at e.g. rp14 and rp15 we see that we either xor our data with
|
||||||
rp14 or with rp15. However we also have par which goes over all data.
|
rp14 or with rp15. However we also have par which goes over all data.
|
||||||
This means there is no need to calculate rp14 as it can be calculated from
|
This means there is no need to calculate rp14 as it can be calculated from
|
||||||
rp15 through rp14 = par ^ rp15;
|
rp15 through rp14 = par ^ rp15, because par = rp14 ^ rp15;
|
||||||
(or if desired we can avoid calculating rp15 and calculate it from
|
(or if desired we can avoid calculating rp15 and calculate it from
|
||||||
rp14). That is why some places refer to inverse parity.
|
rp14). That is why some places refer to inverse parity.
|
||||||
Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
|
Of course the same thing holds for rp4/5, rp6/7, rp8/9, rp10/11 and rp12/13.
|
||||||
|
@ -419,12 +419,12 @@ with
|
||||||
if (i & 0x20) rp15 ^= cur;
|
if (i & 0x20) rp15 ^= cur;
|
||||||
|
|
||||||
and outside the loop added:
|
and outside the loop added:
|
||||||
rp4 = par ^ rp5;
|
rp4 = par ^ rp5;
|
||||||
rp6 = par ^ rp7;
|
rp6 = par ^ rp7;
|
||||||
rp8 = par ^ rp9;
|
rp8 = par ^ rp9;
|
||||||
rp10 = par ^ rp11;
|
rp10 = par ^ rp11;
|
||||||
rp12 = par ^ rp13;
|
rp12 = par ^ rp13;
|
||||||
rp14 = par ^ rp15;
|
rp14 = par ^ rp15;
|
||||||
|
|
||||||
And after that the code takes about 30% more time, although the number of
|
And after that the code takes about 30% more time, although the number of
|
||||||
statements is reduced. This is also reflected in the assembly code.
|
statements is reduced. This is also reflected in the assembly code.
|
||||||
|
@ -524,12 +524,12 @@ THe code within the for loop was changed to:
|
||||||
|
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
|
cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
|
||||||
|
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur; rp8 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp6 ^= cur; rp8 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp8 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp8 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp8 ^= cur;
|
||||||
|
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur; rp6 ^= cur;
|
||||||
|
@ -537,7 +537,7 @@ THe code within the for loop was changed to:
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur;
|
cur = *bp++; tmppar ^= cur;
|
||||||
|
|
||||||
par ^= tmppar;
|
par ^= tmppar;
|
||||||
if ((i & 0x1) == 0) rp12 ^= tmppar;
|
if ((i & 0x1) == 0) rp12 ^= tmppar;
|
||||||
if ((i & 0x2) == 0) rp14 ^= tmppar;
|
if ((i & 0x2) == 0) rp14 ^= tmppar;
|
||||||
}
|
}
|
||||||
|
@ -548,8 +548,8 @@ to rp12 and rp14.
|
||||||
|
|
||||||
While making the changes I also found that I could exploit that tmppar
|
While making the changes I also found that I could exploit that tmppar
|
||||||
contains the running parity for this iteration. So instead of having:
|
contains the running parity for this iteration. So instead of having:
|
||||||
rp4 ^= cur; rp6 = cur;
|
rp4 ^= cur; rp6 ^= cur;
|
||||||
I removed the rp6 = cur; statement and did rp6 ^= tmppar; on next
|
I removed the rp6 ^= cur; statement and did rp6 ^= tmppar; on next
|
||||||
statement. A similar change was done for rp8 and rp10
|
statement. A similar change was done for rp8 and rp10
|
||||||
|
|
||||||
|
|
||||||
|
@ -593,22 +593,22 @@ The new code now looks like:
|
||||||
|
|
||||||
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
|
cur = *bp++; tmppar ^= cur; rp10 ^= tmppar;
|
||||||
|
|
||||||
notrp8 = tmppar;
|
notrp8 = tmppar;
|
||||||
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur;
|
cur = *bp++; tmppar ^= cur;
|
||||||
rp8 = rp8 ^ tmppar ^ notrp8;
|
rp8 = rp8 ^ tmppar ^ notrp8;
|
||||||
|
|
||||||
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4_6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp6 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
cur = *bp++; tmppar ^= cur; rp4 ^= cur;
|
||||||
cur = *bp++; tmppar ^= cur;
|
cur = *bp++; tmppar ^= cur;
|
||||||
|
|
||||||
par ^= tmppar;
|
par ^= tmppar;
|
||||||
if ((i & 0x1) == 0) rp12 ^= tmppar;
|
if ((i & 0x1) == 0) rp12 ^= tmppar;
|
||||||
if ((i & 0x2) == 0) rp14 ^= tmppar;
|
if ((i & 0x2) == 0) rp14 ^= tmppar;
|
||||||
}
|
}
|
||||||
|
@ -700,7 +700,7 @@ Conclusion
|
||||||
The gain when calculating the ecc is tremendous. Om my development hardware
|
The gain when calculating the ecc is tremendous. Om my development hardware
|
||||||
a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
|
a speedup of a factor of 18 for ecc calculation was achieved. On a test on an
|
||||||
embedded system with a MIPS core a factor 7 was obtained.
|
embedded system with a MIPS core a factor 7 was obtained.
|
||||||
On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
|
On a test with a Linksys NSLU2 (ARMv5TE processor) the speedup was a factor
|
||||||
5 (big endian mode, gcc 4.1.2, -O3)
|
5 (big endian mode, gcc 4.1.2, -O3)
|
||||||
For correction not much gain could be obtained (as bitflips are rare). Then
|
For correction not much gain could be obtained (as bitflips are rare). Then
|
||||||
again there are also much less cycles spent there.
|
again there are also much less cycles spent there.
|
||||||
|
|
Загрузка…
Ссылка в новой задаче