Tim (and Friends),
I got SMART Monitor from the marketplace. Nice to see you finally earn at least a beer!
The app seems to work OK, but I do have a red window above the Drive Status window that says:
Error
Match not found in file.
Any idea what that might be?
Thanks!
Drew
I got SMART Monitor from the marketplace. Nice to see you finally earn at least a beer!
The app seems to work OK, but I do have a red window above the Drive Status window that says:
Error
Match not found in file.
Any idea what that might be?
Thanks!
Drew
In Hardware
Share this post:
Responses (18)
-
Accepted Answer
-
Accepted Answer
-
Accepted Answer
Tim,
There is only one active line in that file:
/dev/sda -o on -a -I 194 -d ata -m admin@whisperingwoods.org -s (S/../.././04|L/../../6/05)
I have since added a sdb and sdc. Should this be updated manually? Or is there a procedure to refresh the install?
Thanks,
Drew
Tim Burgess wrote:
Hi Drew, thanks for trying it out
I'm just pushing an update through the build system which improves the drive detection methodology
Do you have a line like "DEVICESCAN -H -m email@yourdomain.com" in /etc/smartd.conf? -
Accepted Answer
-
Accepted Answer
Tim,
I had not thought so, but as I used to use smart with test entries, perhaps I copied my old 5.2 config file and forgot? In any case, I have now replaced with the DEVICESCAN line as you noted and the error window is gone.
Does your version still perform automatic short and/or long tests on all drives?
Cheers,
Drew
Tim Burgess wrote:
Hi Drew, thanks - the default config only contains the DEVICESCAN line - have you customised the entries by hand?
I ought to patch the app so that it handles custom drive monitoring entries, on the todo list! -
Accepted Answer
-
Accepted Answer
-
Accepted Answer
HELP Please
shortly after installing smart monitor i lost my raid array, march 23 exactly. i think smart tools knocked the array offline. ive attempted to re-add the two missing drives sd[hi]1 but get "not large enough to join array".
the only think i can think of at this point is to --assemble --force /dev/md0 /dev/sd[abcdehikl]1, sdh and sdi are the drives missing.
mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
Raid Devices : 10
Total Devices : 9
Persistence : Superblock is persistent
Update Time : Sat Apr 4 14:55:18 2015
State : active, FAILED, Not Started
Active Devices : 8
Working Devices : 9
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Name : orion.localdomain:0
UUID : 2002d641:a6021121:103a0ed7:81d48464
Events : 145962
Number Major Minor RaidDevice State
10 8 65 0 active sync /dev/sde1
11 8 17 1 active sync /dev/sdb1
2 8 49 2 active sync /dev/sdd1
3 8 33 3 active sync /dev/sdc1
4 8 1 4 active sync /dev/sda1
6 8 81 5 active sync /dev/sdf1
9 8 177 6 active sync /dev/sdl1
8 8 161 7 active sync /dev/sdk1
16 0 0 16 removed
18 0 0 18 removed
12 8 97 - spare /dev/sdg1
mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd'
/dev/sda1:
Events : 145962
/dev/sdb1:
Events : 145962
/dev/sdc1:
Events : 145962
/dev/sdd1:
Events : 145962
/dev/sde1:
Events : 145962
/dev/sdf1:
Events : 145962
/dev/sdg1:
Events : 145946
/dev/sdh1:
Events : 145947
/dev/sdi1:
Events : 145947
/dev/sdk1:
Events : 145962
/dev/sdl1:
Events : 145962
mdadm --examine /dev/sd[a-z]1
/dev/sda1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : 352b3be8:074585e2:9887a950:511dc0b1
Update Time : Sat Apr 4 14:55:18 2015
Checksum : 65378398 - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 4
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdb1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : f81bb836:f5e1511e:6d1a3560:6f385277
Update Time : Sat Apr 4 14:55:18 2015
Checksum : c49eb2db - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : 5a54237b:a8333001:697b3e52:b736c442
Update Time : Sat Apr 4 14:55:18 2015
Checksum : a963a85a - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : f6b9e98c:4e650e14:7b3a8d59:bb5d2ab1
Update Time : Sat Apr 4 14:55:18 2015
Checksum : 43bd25b2 - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : 2da43fbc:43fbfa27:5408cfb5:5ca30640
Update Time : Sat Apr 4 14:55:18 2015
Checksum : 721dad32 - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Unused Space : before=1968 sectors, after=1200 sectors
State : clean
Device UUID : d31e4d85:8d325917:e2d3242a:f09b8a09
Update Time : Sat Apr 4 14:55:18 2015
Checksum : 68632f6d - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 5
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : 0387b7f6:fd56ca8b:42d08295:714c0b9d
Update Time : Sun Mar 22 13:18:54 2015
Checksum : 4d1622aa - correct
Events : 145946
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : c8b2a372:293296c3:5d5f5263:2b661a66
Update Time : Sun Mar 22 14:32:40 2015
Checksum : 97ace3bb - correct
Events : 145947
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 9
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : 783cec58:981ca542:93d9fccb:88f21a21
Update Time : Sun Mar 22 14:32:40 2015
Checksum : 20af5e6e - correct
Events : 145947
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 8
Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
mdadm: No md superblock detected on /dev/sdj1.
/dev/sdk1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : 4e191b1d:d2bc76f1:0ad9675c:057adf9c
Update Time : Sat Apr 4 14:55:18 2015
Checksum : 9fe68b3f - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 7
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdl1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 2002d641:a6021121:103a0ed7:81d48464
Name : orion.localdomain:0
Creation Time : Sun Mar 20 14:23:38 2011
Raid Level : raid5
Raid Devices : 10
Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
Data Offset : 128 sectors
Super Offset : 8 sectors
Unused Space : before=48 sectors, after=2 sectors
State : clean
Device UUID : ec2344bb:dd405bd2:9c62e643:79c256bd
Update Time : Sat Apr 4 14:55:18 2015
Checksum : 26e9ebf0 - correct
Events : 145962
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
-
Accepted Answer
[CODE]
parted /dev/sdh "print free"
Model: ATA WDC WD20EZRX-00D (scsi)
Disk /dev/sdh: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.3kB 2000GB 2000GB primary raid
2000GB 2000GB 2613kB Free Space
smartctl -A /dev/sdh
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-504.8.1.v6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 174 173 021 Pre-fail Always - 4283
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 38
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5391
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2163
194 Temperature_Celsius 0x0022 119 111 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
parted /dev/sdi "print free"
Model: ATA WDC WD20EZRX-00D (scsi)
Disk /dev/sdi: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32.3kB 2000GB 2000GB primary raid
2000GB 2000GB 2613kB Free Space
smartctl -A /dev/sdi
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-504.8.1.v6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 177 176 021 Pre-fail Always - 4133
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 38
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5397
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2219
194 Temperature_Celsius 0x0022 116 108 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
[/CODE] -
Accepted Answer
Hi Eric, you should really open a new thread instead of appending to an old one!
Sorry to hear of your troubles, the SMART monitor doesn't schedule any short or long drive tests by default so shouldn't knock a drive out of an array. The drive status for sdg1, sdh1, and sdi1 look OK but with slightly older array events / time. You should be able to re-add these back to the array? What happened to array sdj1?
Have you recently updated the kernel and rebooted?
Have you tried marking these as failed before re-adding?
it's not obvious from the provided info why they should be any different, the SMART data also looks fine. Can you post the drive layout data for a working array drive, and one or two of the failed ones?parted /dev/sda "unit s print"
Any further clues in 'dmesg' or '/var/log/messages'? you may need to grep the output for mdadm messages. Google searches only turn up an old bug with v2.x code in mdadm, and work arounds in 0.90 array metadata but these seem obsolete now. -
Accepted Answer
here is the configuration, after the raid array went down.
9172's are native to the motherboard and are two ports each, the 9230 is a add in card and is 4 port. the intel is 6 ports. enumeration is intel, 9170, 9170, 9230 i believe.
# lspci
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
02:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 10)
04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)
here is the /var/spool/mail/root logs. looks like the array had problems after i smart damon requested smart status, at one point it is seen running a check. is the drive organization revealed in the logs? in these logs /dev/sdl is the boot drive and sdj is part of the array
From root@orion.domain.lan Sun Mar 22 14:36:21 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root
Delivered-To: root@orion.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id 80080200967; Sun, 22 Mar 2015 14:36:21 -0500 (CDT)
Date: Sun, 22 Mar 2015 14:36:21 -0500
To: root@orion.domain.lan
Subject: SMART error (FailedReadSmartData) detected on host:
orion.domain.lan
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20150322193621.80080200967@orion.domain.lan>
From: root@orion.domain.lan (root)
This email was generated by the smartd daemon running on:
host name: orion.domain.lan
DNS domain: domain.lan
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdg [SAT], failed to read SMART Attribute Data
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root
Delivered-To: root@orion.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id F05DC200A68; Sun, 22 Mar 2015 14:36:42 -0500 (CDT)
Date: Sun, 22 Mar 2015 14:36:42 -0500
To: root@orion.domain.lan
Subject: SMART error (FailedHealthCheck) detected on host:
orion.domain.lan
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20150322193642.F05DC200A68@orion.domain.lan>
From: root@orion.domain.lan (root)
This email was generated by the smartd daemon running on:
host name: orion.domain.lan
DNS domain: domain.lan
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdh [SAT], not capable of SMART self-check
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root
Delivered-To: root@orion.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id 04AE0200967; Sun, 22 Mar 2015 14:36:42 -0500 (CDT)
Date: Sun, 22 Mar 2015 14:36:42 -0500
To: root@orion.domain.lan
Subject: SMART error (FailedReadSmartData) detected on host:
orion.domain.lan
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20150322193643.04AE0200967@orion.domain.lan>
From: root@orion.domain.lan (root)
This email was generated by the smartd daemon running on:
host name: orion.domain.lan
DNS domain: domain.lan
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdh [SAT], failed to read SMART Attribute Data
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root
Delivered-To: root@orion.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id 1BBA8200A68; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
Date: Sun, 22 Mar 2015 14:36:43 -0500
To: root@orion.domain.lan
Subject: SMART error (FailedHealthCheck) detected on host:
orion.domain.lan
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20150322193643.1BBA8200A68@orion.domain.lan>
From: root@orion.domain.lan (root)
This email was generated by the smartd daemon running on:
host name: orion.domain.lan
DNS domain: domain.lan
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdi [SAT], not capable of SMART self-check
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root
Delivered-To: root@orion.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id 287AF200DC1; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
Date: Sun, 22 Mar 2015 14:36:43 -0500
To: root@orion.domain.lan
Subject: SMART error (FailedReadSmartData) detected on host:
orion.domain.lan
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20150322193643.287AF200DC1@orion.domain.lan>
From: root@orion.domain.lan (root)
This email was generated by the smartd daemon running on:
host name: orion.domain.lan
DNS domain: domain.lan
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdi [SAT], failed to read SMART Attribute Data
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root@localhost
Delivered-To: root@localhost.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id A3FCE200967; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
From: mdadm monitoring <root@orion.domain.lan>
To: root@localhost.domain.lan
Subject: Fail event on /dev/md0rion.domain.lan
Message-Id: <20150322193643.A3FCE200967@orion.domain.lan>
Date: Sun, 22 Mar 2015 14:36:42 -0500 (CDT)
This is an automatically generated mail message from mdadm
running on orion.domain.lan
A Fail event had been detected on md device /dev/md0.
It could be related to component device /dev/sdh1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
17581607424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/8] [UUUUUUUU__]
[===========>.........] check = 59.4% (1161084156/1953511936) finish=688.8min speed=19170K/sec
unused devices: <none>
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root@localhost
Delivered-To: root@localhost.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id AE52D200941; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
From: mdadm monitoring <root@orion.domain.lan>
To: root@localhost.domain.lan
Subject: Fail event on /dev/md0rion.domain.lan
Message-Id: <20150322193643.AE52D200941@orion.domain.lan>
Date: Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
This is an automatically generated mail message from mdadm
running on orion.domain.lan
A Fail event had been detected on md device /dev/md0.
It could be related to component device /dev/sdi1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
17581607424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/8] [UUUUUUUU__]
unused devices: <none>
From root@orion.domain.lan Sun Mar 22 14:36:43 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root@localhost
Delivered-To: root@localhost.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id C3AD6200967; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
From: mdadm monitoring <root@orion.domain.lan>
To: root@localhost.domain.lan
Subject: FailSpare event on /dev/md0rion.domain.lan
Message-Id: <20150322193643.C3AD6200967@orion.domain.lan>
Date: Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
This is an automatically generated mail message from mdadm
running on orion.domain.lan
A FailSpare event had been detected on md device /dev/md0.
It could be related to component device /dev/sdg1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
17581607424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/8] [UUUUUUUU__]
unused devices: <none>
From root@orion.domain.lan Sun Mar 22 15:05:00 2015
Return-Path: <root@orion.domain.lan>
X-Original-To: root
Delivered-To: root@orion.domain.lan
Received: by orion.domain.lan (Postfix, from userid 0)
id 4C99C200967; Sun, 22 Mar 2015 15:05:00 -0500 (CDT)
Date: Sun, 22 Mar 2015 15:05:00 -0500
To: root@orion.domain.lan
Subject: SMART error (FailedHealthCheck) detected on host:
orion.domain.lan
User-Agent: Heirloom mailx 12.4 7/29/08
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20150322200500.4C99C200967@orion.domain.lan>
From: root@orion.domain.lan (root)
This email was generated by the smartd daemon running on:
host name: orion.domain.lan
DNS domain: domain.lan
NIS domain: (none)
The following warning/error was logged by the smartd daemon:
Device: /dev/sdg [SAT], not capable of SMART self-check
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent. -
Accepted Answer
Tim Burgess wrote:
What happened to array sdj1?
it is the boot drive
Have you recently updated the kernel and rebooted?
up time was over 100 days, only auto updates.
Have you tried marking these as failed before re-adding?
i don't understand
it's not obvious from the provided info why they should be any different, the SMART data also looks fine. Can you post the drive layout data for a working array drive, and one or two of the failed ones?
parted /dev/sda "unit s print"
good:
# parted /dev/sda "unit s print"
Model: ATA Hitachi HDS5C302 (scsi)
Disk /dev/sda: 3907029168s
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 2048s 3907029167s 3907027120s primary raid
good:
# parted /dev/sdb "unit s print"
Model: ATA WDC WD20EZRX-00D (scsi)
Disk /dev/sdb: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 63s 3907024064s 3907024002s primary raid
failed:
# parted /dev/sdh "unit s print"
Model: ATA WDC WD20EZRX-00D (scsi)
Disk /dev/sdh: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 63s 3907024064s 3907024002s primary raid
failed
# parted /dev/sdi "unit s print"
Model: ATA WDC WD20EZRX-00D (scsi)
Disk /dev/sdi: 3907029168s
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 63s 3907024064s 3907024002s primary raid
Any further clues in 'dmesg' or '/var/log/messages'? you may need to grep the output for mdadm messages.
above for mail/root -
Accepted Answer
-
Accepted Answer
Hi Eric,
autoupdates also include kernel updates if I remember correctly and they could require reboots.
"Have you tried marking these as failed before re-adding?
i don't understand"
What Tim meant was that you do not just remove and add the drive to the array, but mark it as failed first:
mark as failed:
mdadm --manage /dev/md0 --fail /dev/sdb1
remove drive:
mdadm --manage /dev/md0 --remove /dev/sdb1
add drive again:
mdadm /dev/md0 –a /dev/sdb1
md0 and sdb1 needs to be adjusted to your setup. If you just remove and add the drive without fail-mark it could be it just adds the drive as it was before without resync.
Best
Robert -
Accepted Answer
md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
Why am I seeing so many drive geometry layouts? So far I've not been able to figure out the correct drive order, at least that is what I think.
I've been doing --stop and --create
What am I to make of the missing devices in the above order, I did have two hitachi drives fail out a long time ago which were replaced with western digital drives, all 2TB, and expanded to ten drives and one spare. Are the devices removed never reused? -
Accepted Answer
Hi Eric,
stop and create is not the right way to solve this problem. Better do:
example: sdh1
mdadm --manage /dev/md0 --fail /dev/sdh1 (might not be needed as it is already maked as failed, but better still do this step)
mdadm --manage /dev/md0 --remove /dev/sdh1
mdadm /dev/md0 –a /dev/sdh1
If it still shows (F) afterwards, remove the partition and recreate the partition of drive sdh. If this does not help your drive might be broken.
In case (I am not sure if I understood you correctly) you never removed the old drives by:
mdadm --manage /dev/md0 --fail /dev/sdh1
mdadm --manage /dev/md0 --remove /dev/sdh1
You might have more drives in your raid array than you actually have in your computer, because it does not know that they should not be there anymore. So it also does not know, that you want to replace the old ones with new ones.
Hope this helps.
Best
Robert
edit: Do not give anything on the numbers e.g. sdf1[6]. They are not important for this problem. -
Accepted Answer
i got the array to assemble (it did complain in dmesg that the array was misaligned) and am offloading data to usb drives now.
# mdadm --create /dev/md0 --level=5 --raid-devices=10 --chunk=512 --name=orion.localdomain:0 --assume-clean --data-offset=variable /dev/sde1:128s /dev/sdb1:128s /dev/sdd1:2048s /dev/sdc1:2048s /dev/sda1:2048s /dev/sdf1:2048s /dev/sdl1:128s /dev/sdk1:128s /dev/sdi1:128s /dev/sdh1:128s
the secret for me was the drives had different data-offsets, once i identified that, and figured out that you need "--data-offset=variable" and append the offset to the devices with a colon.
when i'm done getting my files, i plan on stopping the array and deleting the partitions and reconstructing the array in raid 6. this time i will save mdadm data and drive serial numbers.
mdadm --examine /dev/sd[a-z]1
hdparm -I /dev/sd[a-z] | grep -E "Number|/dev" -
Accepted Answer
![Login Image](https://www.clearos.com/components/com_easydiscuss/themes/simplistic/images/icon-locked.png)
Please login to post a reply
You will need to be logged in to be able to post a reply. Login using the form on the right or register an account if you are new here.
Register Here »