md/raid10: fix the 'new' raid10 layout to work correctly.

In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
supposed to rotate the replicas differently and so provide better
resilience.  In particular it could survive more combinations of 2
drive failures.

Unfortunately. due to a coding error, this some did what was wanted,
sometimes improved less than we hoped, and sometimes - in very
unlikely circumstances - put multiple replicas on the same device so
the redundancy was harmed.

No public user-space tool has created arrays using this layout so it
is very unlikely that zero-redundancy arrays actually exist.  Probably
no arrays using any form of the new layout exist.  But we cannot be
certain.

So use another bit in the 'layout' number and introduce a bug-fixed
version of the layout.
Also when assembling an array, if it has a zero-redundancy layout,
give a warning.

Reported-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
This commit is contained in:
NeilBrown 2015-10-22 13:20:15 +11:00
Родитель c340702ca2
Коммит 8bce6d35b3
1 изменённых файлов: 20 добавлений и 2 удалений

Просмотреть файл

@ -39,6 +39,7 @@
* far_copies (stored in second byte of layout) * far_copies (stored in second byte of layout)
* far_offset (stored in bit 16 of layout ) * far_offset (stored in bit 16 of layout )
* use_far_sets (stored in bit 17 of layout ) * use_far_sets (stored in bit 17 of layout )
* use_far_sets_bugfixed (stored in bit 18 of layout )
* *
* The data to be stored is divided into chunks using chunksize. Each device * The data to be stored is divided into chunks using chunksize. Each device
* is divided into far_copies sections. In each section, chunks are laid out * is divided into far_copies sections. In each section, chunks are laid out
@ -1497,6 +1498,8 @@ static void status(struct seq_file *seq, struct mddev *mddev)
seq_printf(seq, " %d offset-copies", conf->geo.far_copies); seq_printf(seq, " %d offset-copies", conf->geo.far_copies);
else else
seq_printf(seq, " %d far-copies", conf->geo.far_copies); seq_printf(seq, " %d far-copies", conf->geo.far_copies);
if (conf->geo.far_set_size != conf->geo.raid_disks)
seq_printf(seq, " %d devices per set", conf->geo.far_set_size);
} }
seq_printf(seq, " [%d/%d] [", conf->geo.raid_disks, seq_printf(seq, " [%d/%d] [", conf->geo.raid_disks,
conf->geo.raid_disks - mddev->degraded); conf->geo.raid_disks - mddev->degraded);
@ -3394,7 +3397,7 @@ static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new)
disks = mddev->raid_disks + mddev->delta_disks; disks = mddev->raid_disks + mddev->delta_disks;
break; break;
} }
if (layout >> 18) if (layout >> 19)
return -1; return -1;
if (chunk < (PAGE_SIZE >> 9) || if (chunk < (PAGE_SIZE >> 9) ||
!is_power_of_2(chunk)) !is_power_of_2(chunk))
@ -3406,7 +3409,22 @@ static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new)
geo->near_copies = nc; geo->near_copies = nc;
geo->far_copies = fc; geo->far_copies = fc;
geo->far_offset = fo; geo->far_offset = fo;
geo->far_set_size = (layout & (1<<17)) ? disks / fc : disks; switch (layout >> 17) {
case 0: /* original layout. simple but not always optimal */
geo->far_set_size = disks;
break;
case 1: /* "improved" layout which was buggy. Hopefully no-one is
* actually using this, but leave code here just in case.*/
geo->far_set_size = disks/fc;
WARN(geo->far_set_size < fc,
"This RAID10 layout does not provide data safety - please backup and create new array\n");
break;
case 2: /* "improved" layout fixed to match documentation */
geo->far_set_size = fc * nc;
break;
default: /* Not a valid layout */
return -1;
}
geo->chunk_mask = chunk - 1; geo->chunk_mask = chunk - 1;
geo->chunk_shift = ffz(~chunk); geo->chunk_shift = ffz(~chunk);
return nc*fc; return nc*fc;