Our Mission

L.A.M.P. refers to a toolset consisting of Linux + Apache + MySQL + PHP|Perl|Python|Postgresql... providing a powerful framework for rapid development of robust solutions. Each component is open source and has proven to be enterprise ready.

Navigation

Syndicate

Syndicate content

lampgroup.org

Why OpenSource Is Superior

Merry Happy Christmas Holiday New Years!

Anyway, the holidays are always stressful... so hardware never fails to fail at this wonderful time of the year.

In the last few days I've replaced three hard drives, a wireless access point, cable modem and power supply. This is after 3+ years of reliable service from most of the failed components. Cable modems never seem to last more than 18 months -- I think I'm pretty rough on 'em bandwidth-wise.

On the software side, I keep losing raid volumes on the new OpenSuSE testing box. Rebuilds are easy but annoying and I've been unable to determine why.

Googling 'mdadmd' gave me some interesting results and the following developer exchange. It's technical but you can see how openness and peer review can only result in superior results. It may be messy and slow seeming, but that's just the natural product of an open development process.

I guarantee that if you were to see the inner workings of any large proprietary software house, you would see that OpenSource is more organized and efficient in producing the best possible code. These aren't wild-eyed zealots, just persons solving problems.

Hope you all have a great whatever-you-celebrate.

Kelley G

************************************************************

[patch] md superblock update failures
--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Mark found a bug where md doesn't handle write failures when trying to update the superblock.

Attached is the fix he sent to us, and which seems to apply fine to 2.6.11 too.

Sincerely,
Lars Marowsky-Brée

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

--vkogqOf2sHV7VnPd
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=md-superblock-failures

From: Mark Rustad
Subject: md does not handle write failures for the superblock
Patch-mainline: 2.6.12
References: 65306

Description by Mark:

I have found that superblock updates that experience write failures to a raid component device, do not fail the device out of the raid. This results in the raid superblock being updated 100 times and ultimately simply fails. It takes a different type of failing access to the failed device to finally fail the device out of the raid. This can be seen by
simply pulling out a raid device in an idle system (but with sgraidmon & mdadmd running).

The following patch will fail the failing device out of the raid after the attempted superblock update and then retry the update with one fewer device. This seems to work very well in our system.

Acked-by: Jens Axboe
Signed-off-by: Lars Marowsky-Bree

Index: linux-2.6.5/drivers/md/md.c
============================================================ =======
--- linux-2.6.5.orig/drivers/md/md.c 2005-03-16 13:57:10.381445927 +0100
+++ linux-2.6.5/drivers/md/md.c 2005-03-16 13:57:10.714396523 +0100
[at] [at] -1115,6 +1115,7 [at] [at] static void export_array(mddev_t *mddev)
{
struct list_head *tmp;
mdk_rdev_t *rdev;
+ mdk_rdev_t *frdev;

ITERATE_RDEV(mddev,rdev,tmp) {
if (!rdev->mddev) {
[at] [at] -1288,6 +1289,7 [at] [at] repeat:
mdname(mddev),mddev->in_sync);

err = 0;
+ frdev = 0;
ITERATE_RDEV(mddev,rdev,tmp) {
char b[BDEVNAME_SIZE];
dprintk(KERN_INFO "md: ");
[at] [at] -1296,13 +1298,21 [at] [at] repeat:

dprintk("%s ", bdevname(rdev->bdev,b));
if (!rdev->faulty) {
- err += write_disk_sb(rdev);
+ int ret;
+ ret = write_disk_sb(rdev);
+ if (ret) {
+ frdev = rdev; /* Save failed device */
+ err += ret;
+ }
} else
dprintk(")\n");
if (!err && mddev->level == LEVEL_MULTIPATH)
/* only need to write one superblock... */
break;
}
+ if (frdev)
+ md_error(mddev, frdev); /* Fail the failed device */
+
if (err) {
if (--count) {
printk(KERN_ERR "md: errors occurred during superblock"

--vkogqOf2sHV7VnPd--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lars Marowsky-Bree [ Mi, 16 März 2005 14:05 ] [ ID #698421 ]
Re: [patch] md superblock update failures
--lEGEL1/lMxI0MVQ2
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On 2005-03-16T14:05:12, Lars Marowsky-Bree wrote:

> Mark found a bug where md doesn't handle write failures when trying to
> update the superblock.
>
> Attached is the fix he sent to us, and which seems to apply fine to
> 2.6.11 too.

Oops, sorry. Broken diff due to yours truely. Attached patch actually compiles.

Sincerely,
Lars Marowsky-Brée

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

--lEGEL1/lMxI0MVQ2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=md-superblock-failures

From: Mark Rustad
Subject: md does not handle write failures for the superblock
Patch-mainline: 2.6.12
References: 65306

Description by Mark:

I have found that superblock updates that experience write failures to a raid component device, do not fail the device out of the raid. This results in the raid superblock being updated 100 times and ultimately simply fails. It takes a different type of failing access to the failed device to finally fail the device out of the raid. This can be seen by
simply pulling out a raid device in an idle system (but with sgraidmon & mdadmd running).

The following patch will fail the failing device out of the raid after the attempted superblock update and then retry the update with one fewer device. This seems to work very well in our system.

Acked-by: Jens Axboe
Signed-off-by: Lars Marowsky-Bree

Index: linux-2.6.5/drivers/md/md.c
============================================================ =======
--- linux-2.6.5.orig/drivers/md/md.c 2005-03-16 13:57:10.381445927 +0100
+++ linux-2.6.5/drivers/md/md.c 2005-03-16 13:57:10.714396523 +0100
[at] [at] -1115,6 +1115,7 [at] [at] static void export_array(mddev_t *mddev)
{
struct list_head *tmp;
mdk_rdev_t *rdev;
+ mdk_rdev_t *frdev;

ITERATE_RDEV(mddev,rdev,tmp) {
if (!rdev->mddev) {
[at] [at] -1288,6 +1289,7 [at] [at] repeat:
mdname(mddev),mddev->in_sync);

err = 0;
+ frdev = 0;
ITERATE_RDEV(mddev,rdev,tmp) {
char b[BDEVNAME_SIZE];
dprintk(KERN_INFO "md: ");
[at] [at] -1296,13 +1298,21 [at] [at] repeat:

dprintk("%s ", bdevname(rdev->bdev,b));
if (!rdev->faulty) {
- err += write_disk_sb(rdev);
+ int ret;
+ ret = write_disk_sb(rdev);
+ if (ret) {
+ frdev = rdev; /* Save failed device */
+ err += ret;
+ }
} else
dprintk(")\n");
if (!err && mddev->level == LEVEL_MULTIPATH)
/* only need to write one superblock... */
break;
}
+ if (frdev)
+ md_error(mddev, frdev); /* Fail the failed device */
+
if (err) {
if (--count) {
printk(KERN_ERR "md: errors occurred during superblock"

--lEGEL1/lMxI0MVQ2--
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lars Marowsky-Bree [ Mi, 16 März 2005 14:22 ] [ ID #698423 ]
Re: [patch] md superblock update failures
Lars Marowsky-Bree wrote:
[]
> err = 0;
> + frdev = 0;
> ITERATE_RDEV(mddev,rdev,tmp) {
> char b[BDEVNAME_SIZE];
> dprintk(KERN_INFO "md: ");
> [at] [at] -1296,13 +1298,21 [at] [at] repeat:
>
> dprintk("%s ", bdevname(rdev->bdev,b));
> if (!rdev->faulty) {
> - err += write_disk_sb(rdev);
> + int ret;
> + ret = write_disk_sb(rdev);
> + if (ret) {
> + frdev = rdev; /* Save failed device */
> + err += ret;
> + }

Why not
+ int ret;
+ ret = write_disk_sb(rdev);
+ if (ret) {
+ md_error(mddev, rdev);
+ err += ret;
+ }

instead (and remove frdev entirely)?

I mean, in theory, multiple disks may fail...
But wait.. will md_error() call this routine again?

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Michael Tokarev [ Mi, 16 März 2005 15:01 ] [ ID #698424 ]
Re: [patch] md superblock update failures
On Wednesday March 16, lmb [at] suse.de wrote:
> Mark found a bug where md doesn't handle write failures when trying to
> update the superblock.
>
> Attached is the fix he sent to us, and which seems to apply fine to
> 2.6.11 too.

Yes... thanks....
The whole '100 times' thing is completely bogus isn't it...

By co-incidence, I've just recently be modifying this code to do writes more intelligently. My goal was to get it to write all the superblocks in parallel rather than in series. The result is below which will probably go to Andrew shortly. It add the required md_error and removes the 'try 100 times'.
It also loops 'round to re-write the superblock if one of the writes failed, thus dirtying the superblock.

NeilBrown

========================================
Allow md to update multiple superblocks in parallel.

currently, md updates all superblocks (one on each device)
in series. It waits for one write to complete before starting
the next. This isn't a big problem as superblock updates don't
happen that often. However it is neater to do it in parallel, and if the drives in the array have gone to "sleep" after a period of idleness, then waking them is parallel is faster (and someone else should be worrying about power drain).

Futher, we will need parallel superblock updates for a future
patch which keeps the intent-logging bitmap near the superblock.

Also remove the silly code that retired superblock updates
100 times. This simply never made sense.

Signed-off-by: Neil Brown

### Diffstat output
./drivers/md/md.c | 83 ++++++++++++++++++++++++--------------------
./include/linux/raid/md_k.h | 1
2 files changed, 48 insertions(+), 36 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-03-14 11:51:29.000000000 +1100
+++ ./drivers/md/md.c 2005-03-15 10:50:13.000000000 +1100
[at] [at] -328,6 +328,40 [at] [at] static void free_disk_sb(mdk_rdev_t * rd
}

+static int super_written(struct bio *bio, unsigned int bytes_done, int error)
+{
+ mdk_rdev_t *rdev = bio->bi_private;
+ if (bio->bi_size)
+ return 1;
+
+ if (error || !test_bit(BIO_UPTODATE, &bio->bi_flags))
+ md_error(rdev->mddev, rdev);
+
+ if (atomic_dec_and_test(&rdev->mddev->pending_writes))
+ wake_up(&rdev->mddev->sb_wait);
+ return 0;
+}
+
+void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev,
+ sector_t sector, int size, struct page *page)
+{
+ /* write first size bytes of page to sector of rdev
+ * Increment mddev->pending_writes before returning
+ * and decrement it on completion, waking up sb_wait
+ * if zero is reached.
+ * If an error occurred, call md_error
+ */
+ struct bio *bio = bio_alloc(GFP_KERNEL, 1);
+
+ bio->bi_bdev = rdev->bdev;
+ bio->bi_sector = sector;
+ bio_add_page(bio, page, size, 0);
+ bio->bi_private = rdev;
+ bio->bi_end_io = super_written;
+ atomic_inc(&mddev->pending_writes);
+ submit_bio((1<bi_size)
[at] [at] -1240,30 +1274,6 [at] [at] void md_print_devices(void)
}

-static int write_disk_sb(mdk_rdev_t * rdev)
-{
- char b[BDEVNAME_SIZE];
- if (!rdev->sb_loaded) {
- MD_BUG();
- return 1;
- }
- if (rdev->faulty) {
- MD_BUG();
- return 1;
- }
-
- dprintk(KERN_INFO "(write) %s's sb offset: %llu\n",
- bdevname(rdev->bdev,b),
- (unsigned long long)rdev->sb_offset);
-
- if (sync_page_io(rdev->bdev, rdev->sb_offset<<1, MD_SB_BYTES, rdev->sb_page, WRITE))
- return 0;
-
- printk("md: write_disk_sb failed for device %s\n",
- bdevname(rdev->bdev,b));
- return 1;
-}
-
static void sync_sbs(mddev_t * mddev)
{
mdk_rdev_t *rdev;
[at] [at] -1278,7 +1288,7 [at] [at] static void sync_sbs(mddev_t * mddev)

static void md_update_sb(mddev_t * mddev)
{
- int err, count = 100;
+ int err;
struct list_head *tmp;
mdk_rdev_t *rdev;
int sync_req;
[at] [at] -1298,6 +1308,7 [at] [at] repeat:
MD_BUG();
mddev->events --;
}
+ mddev->sb_dirty = 2;
sync_sbs(mddev);

/*
[at] [at] -1325,24 +1336,24 [at] [at] repeat:

dprintk("%s ", bdevname(rdev->bdev,b));
if (!rdev->faulty) {
- err += write_disk_sb(rdev);
+ md_super_write(mddev,rdev,
+ rdev->sb_offset<<1, MD_SB_BYTES,
+ rdev->sb_page);
+ dprintk(KERN_INFO "(write) %s's sb offset: %llu\n",
+ bdevname(rdev->bdev,b),
+ (unsigned long long)rdev->sb_offset);
+
} else
dprintk(")\n");
if (!err && mddev->level == LEVEL_MULTIPATH)
/* only need to write one superblock... */
break;
}
- if (err) {
- if (--count) {
- printk(KERN_ERR "md: errors occurred during superblock"
- " update, repeating\n");
- goto repeat;
- }
- printk(KERN_ERR \
- "md: excessive errors occurred during superblock update, exiting\n");
- }
+ wait_event(mddev->sb_wait, atomic_read(&mddev->pending_writes)==0);
+ /* if there was a failure, sb_dirty was set to 1, and we re-write super */
+
spin_lock(&mddev->write_lock);
- if (mddev->in_sync != sync_req) {
+ if (mddev->in_sync != sync_req|| mddev->sb_dirty == 1) {
/* have to write it out again */
spin_unlock(&mddev->write_lock);
goto repeat;

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-03-15 10:26:24.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-03-15 10:49:06.000000000 +1100
[at] [at] -262,6 +262,7 [at] [at] struct mddev_s

spinlock_t write_lock;
wait_queue_head_t sb_wait; /* for waiting on superblock updates */
+ atomic_t pending_writes; /* number of active superblock writes */

unsigned int safemode; /* if set, update "clean" superblock
* when no writes pending.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Neil Brown [ Mi, 16 März 2005 23:15 ] [ ID #698429 ]
Re: [patch] md superblock update failures
On 2005-03-17T09:15:39, Neil Brown wrote:

> Yes... thanks....
> The whole '100 times' thing is completely bogus isn't it...

Yes.

> By co-incidence, I've just recently be modifying this code to do
> writes more intelligently. My goal was to get it to write all the
> superblocks in parallel rather than in series. The result is below
> which will probably go to Andrew shortly. It add the required
> md_error and removes the 'try 100 times'.
> It also loops 'round to re-write the superblock if one of the writes
> failed, thus dirtying the superblock.

Your patch is much cleaner and nicer. I'll pick that up when it goes to
Andrew.

Minor cleanup:

> [at] [at] -1325,24 +1336,24 [at] [at] repeat:
>
> dprintk("%s ", bdevname(rdev->bdev,b));
> if (!rdev->faulty) {
> - err +=3D write_disk_sb(rdev);
> + md_super_write(mddev,rdev,
> + rdev->sb_offset<<1, MD_SB_BYTES,
> + rdev->sb_page);
> + dprintk(KERN_INFO "(write) %s's sb offset: %llu\n",
> + bdevname(rdev->bdev,b),
> + (unsigned long long)rdev->sb_offset);
> +
> } else
> dprintk(")\n");
> if (!err && mddev->level =3D=3D LEVEL_MULTIPATH)
> /* only need to write one superblock... */
> break;
> }

The "!err &&" part can probably go away, right?

Sincerely,
Lars Marowsky-Br=E9e

--
High Availability & Clustering
SUSE Labs, Research and Development

SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Lars Marowsky-Bree [ Fr, 18 März 2005 11:39 ] [ ID #701673 ]
Re: [patch] md superblock update failures
Lars Marowsky-Bree wrote:
> Minor cleanup:
>
> > [at] [at] -1325,24 +1336,24 [at] [at] repeat:
> >
> > dprintk("%s ", bdevname(rdev->bdev,b));
> > if (!rdev->faulty) {
> > - err += write_disk_sb(rdev);
> > + md_super_write(mddev,rdev,
> > + rdev->sb_offset<<1, MD_SB_BYTES,
> > + rdev->sb_page);
> > + dprintk(KERN_INFO "(write) %s's sb offset: %llu\n",
> > + bdevname(rdev->bdev,b),
> > + (unsigned long long)rdev->sb_offset);
> > +
> > } else
> > dprintk(")\n");
> > if (!err && mddev->level == LEVEL_MULTIPATH)
> > /* only need to write one superblock... */
> > break;
> > }

Maintenance-wise, I'd prefer if (write_disk_sb(rdev) != 0) err++, since seeing things which are of signed type tested ultimately only for difference from zero makes me nervous. Who says somebody won't forget the way it's tested here and let write_disk_sb one day return negative for error, and zero for success?

> The "!err &&" part can probably go away, right?

As to your observation, morally I'm with you on that, since we oughtn't need to write more superblocks if there has been an error than if there hasn't.

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo [at] vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

User login

Who's online

There are currently 0 users and 0 guests online.

Who's new

Popular content

LAMP News