|
|
|
date: Sun, 22 Jun 2008 11:29:56 +0000 (UTC),
group: uk.comp.os.linux
back
Disk failure imminent? But which disk?
Friday night (actually very early Saturday morning) I started getting
errors from my backups:
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: read error from /dev/vg0/var-backup: Input/output error: [block 1385034, ext2blk 0]: count=173129
DUMP: read error from /dev/vg0/var-backup: Input/output error: [sector 1385034, ext2blk 0]: count=173129
...
DUMP: read error from /dev/vg0/var-backup: Input/output error: [sector 1385147, ext2blk 0]: count=173143
DUMP: read error from /dev/vg0/var-backup: Input/output error: [sector 1385148, ext2blk 0]: count=173143
DUMP: DUMP: DUMP: DUMP: mount: you must specify the filesystem type
and similar this morning:
/sbin/lvcreate -A n -L500M -s -nvar-backup /dev/vg0/var
Logical volume "var-backup" created
/sbin/e2fsck -p /dev/vg0/var-backup
/dev/vg0/var-backup: recovering journal
/dev/vg0/var-backup: clean, 1933/256000 files, 400751/512000 blocks
ssh -e none -i /root/.ssh/id_rsa_backup backup@dhcpdns 'mkdir -p /mnt/backup/dumps/mailserver.20080622.1'
mount | ssh -e none -i /root/.ssh/id_rsa_backup backup@dhcpdns 'cat >/mnt/backup/dumps/mailserver.20080622.1/mount.log'
/sbin/dump -z9 -1u -f - /dev/vg0/var-backup | ssh -e none -i /root/.ssh/id_rsa_backup backup@dhcpdns 'cat >/mnt/backup/dumps/mai DUMP: Date of this level 1 dump: Sun Jun 22 02:32:12 2008
DUMP: Date of last level 0 dump: Sun Jun 1 02:38:52 2008
DUMP: Dumping /dev/vg0/var-backup (an unlisted file system) to standard output
DUMP: Label: none
DUMP: Writing 10 Kilobyte records
DUMP: Compressing output at compression level 9 (zlib)
DUMP: mapping (Pass I) [regular files]
DUMP: mapping (Pass II) [directories]
DUMP: estimated 1383159 blocks.
DUMP: Volume 1 started with block 1 at: Sun Jun 22 02:32:13 2008
DUMP: dumping (Pass III) [directories]
DUMP: dumping (Pass IV) [regular files]
DUMP: read error from /dev/vg0/var-backup: Input/output error: [block 1425368, ext2blk 0]: count=178171
DUMP: read error from /dev/vg0/var-backup: Input/output error: [sector 1425368, ext2blk 0]: count=178171
DUMP: read error from /dev/vg0/var-backup: Input/output error: [sector 1425369, ext2blk 0]: count=178171
...
DUMP: read error from /dev/vg0/var-backup: Input/output error: [sector 1425542, ext2blk 0]: count=178192
DUMP: DUMP: DUMP: DUMP: DUMP: DUMP: fopen on /dev/tty fails: No such device or address
DUMP: The ENTIRE dump is aborted.
mount: you must specify the filesystem type
But I can't find what's wrong.
Manually creating the snapshot and running
dump -0 -f /dev/null /dev/vg0/var-backup
works fine.
dd if=/dev/vg0/var of=/dev/null will read the entire partition ok. ditto
dd if=/dev/vg0/var-backup of=/dev/null (although I think in this case
I'm really still mostly reading from /dev/vg0/var). If I create a
separate non snapshot partition then that also reads OK.
The VG is on a RAID on /dev/hda2 and /dev/hdc2
I've done smartctl -t long /dev/hd[ac] and there are no errors. I do
notice that hdc is running hotter than hda - now the machine is mostly
idle again hda is 25C while hdc is 43C. While runing the tests they were
about 40C and 55C respectively. hdc is newer than hda - the original hdc
(bought at the same time as hda) failed fairly quickly.
smartctl says poweron hours are 8375 and 64324. (I don't believe that
64324 - that's more than 7 years - the tests say they were run at 14465
and 20459 lifetime hours which is more believable - the maxtor site says
the warranty expires on 25th November 2008 for /dev/hda and 9th July
2009 for /dev/hdc)
e2fsck -n -f /dev/vg0/var reports no errors.
I want to identify which disk is having problems before I shutdown so I
can then pull that disk. What I really don't want is a problem shutting
down and then the raid getting rebuilt from the faulty disk to the good
disk. I know I've got backups from Friday but I'd rather not have to go
though the effort of restoring.
I'm about to try
dd if=/dev/hda of=/dev/null and likewise for /dev/hdc to see if that
flags anything. But is there anywhere else I should be looking? The
entire dump took two minutes so I don't think it's the snapshot volume
getting full.
(I've also noticed that dump exits with 0 even when it says "The ENTIRE
dump is aborted")
Tim.
--
God said, "div D = rho, div B = 0, curl E = - @B/@t, curl H = J + @D/@t,"
and there was light.
http://tjw.hn.org/ http://www.locofungus.btinternet.co.uk/
date: Sun, 22 Jun 2008 11:29:56 +0000 (UTC)
author: Tim Woodall
|
Re: Disk failure imminent? But which disk?
On 22/06/2008 12:29, Tim Woodall wrote:
> I'm about to try
> dd if=/dev/hda of=/dev/null and likewise for /dev/hdc to see if that
> flags anything.
That was my first thought, what next would depend on the results ...
date: Sun, 22 Jun 2008 12:41:36 +0100
author: Andy Burns
|
Re: Disk failure imminent? But which disk?
On Sun, 22 Jun 2008 12:41:36 +0100,
Andy Burns wrote:
> On 22/06/2008 12:29, Tim Woodall wrote:
>
>> I'm about to try
>> dd if=/dev/hda of=/dev/null and likewise for /dev/hdc to see if that
>> flags anything.
>
> That was my first thought, what next would depend on the results ...
Nothing :-(
Both disks have read from start to end without a murmur:
hda: Peaked at 47C
80293248+0 records in
80293248+0 records out
41110142976 bytes transferred in 842.769365 seconds (48779826 bytes/sec)
hdc: Peaked at 60C
80293248+0 records in
80293248+0 records out
41110142976 bytes transferred in 820.883806 seconds (50080343 bytes/sec)
Maybe whatever the problem was, my fiddling has sorted it out. But I
don't know what I might have done. We'll see tonight when the next
backup runs.
Tim.
--
God said, "div D = rho, div B = 0, curl E = - @B/@t, curl H = J + @D/@t,"
and there was light.
http://tjw.hn.org/ http://www.locofungus.btinternet.co.uk/
date: Sun, 22 Jun 2008 12:15:08 +0000 (UTC)
author: Tim Woodall
|
Re: Disk failure imminent? But which disk?
Andy Burns wrote:
> On 22/06/2008 12:29, Tim Woodall wrote:
>
>> I'm about to try
>> dd if=/dev/hda of=/dev/null and likewise for /dev/hdc to see if that
>> flags anything.
>
> That was my first thought, what next would depend on the results ...
badblocks perhaps? It is slightly more... thorough..
--
| spike1@freenet.co.uk | Windows95 (noun): 32 bit extensions and a |
| | graphical shell for a 16 bit patch to an 8 bit |
| Andrew Halliwell BSc | operating system originally coded for a 4 bit |
| in |microprocessor, written by a 2 bit company, that|
| Computer Science | can't stand 1 bit of competition. |
date: Sun, 22 Jun 2008 15:31:57 +0100
author: Andrew Halliwell
|
Re: Disk failure imminent? But which disk?
On Jun 22, 1:15 pm, Tim Woodall wrote:
>
> Maybe whatever the problem was, my fiddling has sorted it out. But I
> don't know what I might have done. We'll see tonight when the next
> backup runs.
>
No problems today.
Oh well. Looks like I'll just have to keep an eye on it.
Tim.
date: Mon, 23 Jun 2008 06:10:48 -0700 (PDT)
author: unknown
|
|
|