Forums

Resolved
0 votes
Tim (and Friends),

I got SMART Monitor from the marketplace. Nice to see you finally earn at least a beer!

The app seems to work OK, but I do have a red window above the Drive Status window that says:

Error
Match not found in file.

Any idea what that might be?

Thanks!
Drew
Friday, May 10 2013, 11:50 PM
Share this post:
Responses (18)
  • Accepted Answer

    Sunday, May 12 2013, 07:38 PM - #Permalink
    Resolved
    0 votes
    Hi Drew, thanks for trying it out :) I'm just pushing an update through the build system which improves the drive detection methodology

    Do you have a line like "DEVICESCAN -H -m email@yourdomain.com" in /etc/smartd.conf?
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, May 12 2013, 08:14 PM - #Permalink
    Resolved
    0 votes
    I've forgotten that we had that app in the marketplace. I've bought the app Tim! Thank you for developing.
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, May 12 2013, 09:20 PM - #Permalink
    Resolved
    0 votes
    Tim,

    There is only one active line in that file:

    /dev/sda -o on -a -I 194 -d ata -m admin@whisperingwoods.org -s (S/../.././04|L/../../6/05)

    I have since added a sdb and sdc. Should this be updated manually? Or is there a procedure to refresh the install?

    Thanks,
    Drew

    Tim Burgess wrote:
    Hi Drew, thanks for trying it out :) I'm just pushing an update through the build system which improves the drive detection methodology

    Do you have a line like "DEVICESCAN -H -m email@yourdomain.com" in /etc/smartd.conf?
    The reply is currently minimized Show
  • Accepted Answer

    Monday, May 13 2013, 12:35 PM - #Permalink
    Resolved
    0 votes
    Hi Drew, thanks - the default config only contains the DEVICESCAN line - have you customised the entries by hand?

    I ought to patch the app so that it handles custom drive monitoring entries, on the todo list!
    The reply is currently minimized Show
  • Accepted Answer

    Monday, May 13 2013, 02:14 PM - #Permalink
    Resolved
    0 votes
    Tim,

    I had not thought so, but as I used to use smart with test entries, perhaps I copied my old 5.2 config file and forgot? In any case, I have now replaced with the DEVICESCAN line as you noted and the error window is gone.

    Does your version still perform automatic short and/or long tests on all drives?

    Cheers,
    Drew

    Tim Burgess wrote:
    Hi Drew, thanks - the default config only contains the DEVICESCAN line - have you customised the entries by hand?

    I ought to patch the app so that it handles custom drive monitoring entries, on the todo list!
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, March 21 2015, 09:09 PM - #Permalink
    Resolved
    0 votes
    Drew Vonada-Smith wrote:
    Tim (and Friends),
    I got SMART Monitor from the marketplace. Nice to see you finally earn at least a beer!


    +1

    can run a short test on a drive that is in a raid array??? i've never been brave enough to do that with smartctl... afraid of breaking the array.
    The reply is currently minimized Show
  • Accepted Answer

    Robert
    Robert
    Offline
    Saturday, March 21 2015, 10:01 PM - #Permalink
    Resolved
    0 votes
    Hi Eric,

    Thanks is no problem. I do this regulary and never get any problems with that.

    Best

    Robert
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, April 04 2015, 10:02 PM - #Permalink
    Resolved
    0 votes
    HELP Please
    shortly after installing smart monitor i lost my raid array, march 23 exactly. i think smart tools knocked the array offline. ive attempted to re-add the two missing drives sd[hi]1 but get "not large enough to join array".
    the only think i can think of at this point is to --assemble --force /dev/md0 /dev/sd[abcdehikl]1, sdh and sdi are the drives missing.

    mdadm --detail /dev/md0
    /dev/md0:
    Version : 1.2
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB)
    Raid Devices : 10
    Total Devices : 9
    Persistence : Superblock is persistent

    Update Time : Sat Apr 4 14:55:18 2015
    State : active, FAILED, Not Started
    Active Devices : 8
    Working Devices : 9
    Failed Devices : 0
    Spare Devices : 1

    Layout : left-symmetric
    Chunk Size : 512K

    Name : orion.localdomain:0
    UUID : 2002d641:a6021121:103a0ed7:81d48464
    Events : 145962

    Number Major Minor RaidDevice State
    10 8 65 0 active sync /dev/sde1
    11 8 17 1 active sync /dev/sdb1
    2 8 49 2 active sync /dev/sdd1
    3 8 33 3 active sync /dev/sdc1
    4 8 1 4 active sync /dev/sda1
    6 8 81 5 active sync /dev/sdf1
    9 8 177 6 active sync /dev/sdl1
    8 8 161 7 active sync /dev/sdk1
    16 0 0 16 removed
    18 0 0 18 removed

    12 8 97 - spare /dev/sdg1


    mdadm --examine /dev/sd[a-z]1 | egrep 'Event|/dev/sd'
    /dev/sda1:
    Events : 145962
    /dev/sdb1:
    Events : 145962
    /dev/sdc1:
    Events : 145962
    /dev/sdd1:
    Events : 145962
    /dev/sde1:
    Events : 145962
    /dev/sdf1:
    Events : 145962
    /dev/sdg1:
    Events : 145946
    /dev/sdh1:
    Events : 145947
    /dev/sdi1:
    Events : 145947
    /dev/sdk1:
    Events : 145962
    /dev/sdl1:
    Events : 145962

    mdadm --examine /dev/sd[a-z]1
    /dev/sda1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
    Unused Space : before=1968 sectors, after=1200 sectors
    State : clean
    Device UUID : 352b3be8:074585e2:9887a950:511dc0b1

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : 65378398 - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 4
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdb1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : f81bb836:f5e1511e:6d1a3560:6f385277

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : c49eb2db - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdc1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
    Unused Space : before=1968 sectors, after=1200 sectors
    State : clean
    Device UUID : 5a54237b:a8333001:697b3e52:b736c442

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : a963a85a - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdd1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
    Unused Space : before=1968 sectors, after=1200 sectors
    State : clean
    Device UUID : f6b9e98c:4e650e14:7b3a8d59:bb5d2ab1

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : 43bd25b2 - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sde1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : 2da43fbc:43fbfa27:5408cfb5:5ca30640

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : 721dad32 - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdf1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907025072 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
    Unused Space : before=1968 sectors, after=1200 sectors
    State : clean
    Device UUID : d31e4d85:8d325917:e2d3242a:f09b8a09

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : 68632f6d - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 5
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdg1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : 0387b7f6:fd56ca8b:42d08295:714c0b9d

    Update Time : Sun Mar 22 13:18:54 2015
    Checksum : 4d1622aa - correct
    Events : 145946

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : spare
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdh1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : c8b2a372:293296c3:5d5f5263:2b661a66

    Update Time : Sun Mar 22 14:32:40 2015
    Checksum : 97ace3bb - correct
    Events : 145947

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 9
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdi1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : 783cec58:981ca542:93d9fccb:88f21a21

    Update Time : Sun Mar 22 14:32:40 2015
    Checksum : 20af5e6e - correct
    Events : 145947

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 8
    Array State : AAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
    mdadm: No md superblock detected on /dev/sdj1.
    /dev/sdk1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : 4e191b1d:d2bc76f1:0ad9675c:057adf9c

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : 9fe68b3f - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 7
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    /dev/sdl1:
    Magic : a92b4efc
    Version : 1.2
    Feature Map : 0x0
    Array UUID : 2002d641:a6021121:103a0ed7:81d48464
    Name : orion.localdomain:0
    Creation Time : Sun Mar 20 14:23:38 2011
    Raid Level : raid5
    Raid Devices : 10

    Avail Dev Size : 3907023874 (1863.01 GiB 2000.40 GB)
    Array Size : 17581607424 (16767.13 GiB 18003.57 GB)
    Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB)
    Data Offset : 128 sectors
    Super Offset : 8 sectors
    Unused Space : before=48 sectors, after=2 sectors
    State : clean
    Device UUID : ec2344bb:dd405bd2:9c62e643:79c256bd

    Update Time : Sat Apr 4 14:55:18 2015
    Checksum : 26e9ebf0 - correct
    Events : 145962

    Layout : left-symmetric
    Chunk Size : 512K

    Device Role : Active device 6
    Array State : AAAAAAAA.. ('A' == active, '.' == missing, 'R' == replacing)
    The reply is currently minimized Show
  • Accepted Answer

    Sunday, April 05 2015, 09:23 PM - #Permalink
    Resolved
    0 votes
    [CODE]
    parted /dev/sdh "print free"
    Model: ATA WDC WD20EZRX-00D (scsi)
    Disk /dev/sdh: 2000GB
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos

    Number Start End Size Type File system Flags
    1 32.3kB 2000GB 2000GB primary raid
    2000GB 2000GB 2613kB Free Space


    smartctl -A /dev/sdh
    smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-504.8.1.v6.x86_64] (local build)
    Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
    3 Spin_Up_Time 0x0027 174 173 021 Pre-fail Always - 4283
    4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 38
    5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
    7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
    9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5391
    10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
    11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
    12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38
    192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25
    193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2163
    194 Temperature_Celsius 0x0022 119 111 000 Old_age Always - 28
    196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
    197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
    198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
    199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
    200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0


    parted /dev/sdi "print free"
    Model: ATA WDC WD20EZRX-00D (scsi)
    Disk /dev/sdi: 2000GB
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos

    Number Start End Size Type File system Flags
    1 32.3kB 2000GB 2000GB primary raid
    2000GB 2000GB 2613kB Free Space


    smartctl -A /dev/sdi
    smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-504.8.1.v6.x86_64] (local build)
    Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
    3 Spin_Up_Time 0x0027 177 176 021 Pre-fail Always - 4133
    4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 38
    5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
    7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
    9 Power_On_Hours 0x0032 093 093 000 Old_age Always - 5397
    10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
    11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
    12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 38
    192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 25
    193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2219
    194 Temperature_Celsius 0x0022 116 108 000 Old_age Always - 31
    196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
    197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
    198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
    199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
    200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
    [/CODE]
    The reply is currently minimized Show
  • Accepted Answer

    Monday, April 06 2015, 09:15 PM - #Permalink
    Resolved
    0 votes
    Hi Eric, you should really open a new thread instead of appending to an old one!

    Sorry to hear of your troubles, the SMART monitor doesn't schedule any short or long drive tests by default so shouldn't knock a drive out of an array. The drive status for sdg1, sdh1, and sdi1 look OK but with slightly older array events / time. You should be able to re-add these back to the array? What happened to array sdj1?

    Have you recently updated the kernel and rebooted?

    Have you tried marking these as failed before re-adding?

    it's not obvious from the provided info why they should be any different, the SMART data also looks fine. Can you post the drive layout data for a working array drive, and one or two of the failed ones?
     parted /dev/sda "unit s print"


    Any further clues in 'dmesg' or '/var/log/messages'? you may need to grep the output for mdadm messages. Google searches only turn up an old bug with v2.x code in mdadm, and work arounds in 0.90 array metadata but these seem obsolete now.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, April 08 2015, 02:45 AM - #Permalink
    Resolved
    0 votes
    here is the configuration, after the raid array went down.

    9172's are native to the motherboard and are two ports each, the 9230 is a add in card and is 4 port. the intel is 6 ports. enumeration is intel, 9170, 9170, 9230 i believe.
    # lspci
    00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
    02:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 10)
    04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)
    08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9172 SATA 6Gb/s Controller (rev 11)



    here is the /var/spool/mail/root logs. looks like the array had problems after i smart damon requested smart status, at one point it is seen running a check. is the drive organization revealed in the logs? in these logs /dev/sdl is the boot drive and sdj is part of the array

    From root@orion.domain.lan Sun Mar 22 14:36:21 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root
    Delivered-To: root@orion.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id 80080200967; Sun, 22 Mar 2015 14:36:21 -0500 (CDT)
    Date: Sun, 22 Mar 2015 14:36:21 -0500
    To: root@orion.domain.lan
    Subject: SMART error (FailedReadSmartData) detected on host:
    orion.domain.lan
    User-Agent: Heirloom mailx 12.4 7/29/08
    MIME-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    Message-Id: <20150322193621.80080200967@orion.domain.lan>
    From: root@orion.domain.lan (root)

    This email was generated by the smartd daemon running on:

    host name: orion.domain.lan
    DNS domain: domain.lan
    NIS domain: (none)

    The following warning/error was logged by the smartd daemon:

    Device: /dev/sdg [SAT], failed to read SMART Attribute Data

    For details see host's SYSLOG.

    You can also use the smartctl utility for further investigation.
    No additional email messages about this problem will be sent.



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root
    Delivered-To: root@orion.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id F05DC200A68; Sun, 22 Mar 2015 14:36:42 -0500 (CDT)
    Date: Sun, 22 Mar 2015 14:36:42 -0500
    To: root@orion.domain.lan
    Subject: SMART error (FailedHealthCheck) detected on host:
    orion.domain.lan
    User-Agent: Heirloom mailx 12.4 7/29/08
    MIME-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    Message-Id: <20150322193642.F05DC200A68@orion.domain.lan>
    From: root@orion.domain.lan (root)

    This email was generated by the smartd daemon running on:

    host name: orion.domain.lan
    DNS domain: domain.lan
    NIS domain: (none)

    The following warning/error was logged by the smartd daemon:

    Device: /dev/sdh [SAT], not capable of SMART self-check

    For details see host's SYSLOG.

    You can also use the smartctl utility for further investigation.
    No additional email messages about this problem will be sent.



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root
    Delivered-To: root@orion.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id 04AE0200967; Sun, 22 Mar 2015 14:36:42 -0500 (CDT)
    Date: Sun, 22 Mar 2015 14:36:42 -0500
    To: root@orion.domain.lan
    Subject: SMART error (FailedReadSmartData) detected on host:
    orion.domain.lan
    User-Agent: Heirloom mailx 12.4 7/29/08
    MIME-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    Message-Id: <20150322193643.04AE0200967@orion.domain.lan>
    From: root@orion.domain.lan (root)

    This email was generated by the smartd daemon running on:

    host name: orion.domain.lan
    DNS domain: domain.lan
    NIS domain: (none)

    The following warning/error was logged by the smartd daemon:

    Device: /dev/sdh [SAT], failed to read SMART Attribute Data

    For details see host's SYSLOG.

    You can also use the smartctl utility for further investigation.
    No additional email messages about this problem will be sent.



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root
    Delivered-To: root@orion.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id 1BBA8200A68; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
    Date: Sun, 22 Mar 2015 14:36:43 -0500
    To: root@orion.domain.lan
    Subject: SMART error (FailedHealthCheck) detected on host:
    orion.domain.lan
    User-Agent: Heirloom mailx 12.4 7/29/08
    MIME-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    Message-Id: <20150322193643.1BBA8200A68@orion.domain.lan>
    From: root@orion.domain.lan (root)

    This email was generated by the smartd daemon running on:

    host name: orion.domain.lan
    DNS domain: domain.lan
    NIS domain: (none)

    The following warning/error was logged by the smartd daemon:

    Device: /dev/sdi [SAT], not capable of SMART self-check

    For details see host's SYSLOG.

    You can also use the smartctl utility for further investigation.
    No additional email messages about this problem will be sent.



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root
    Delivered-To: root@orion.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id 287AF200DC1; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
    Date: Sun, 22 Mar 2015 14:36:43 -0500
    To: root@orion.domain.lan
    Subject: SMART error (FailedReadSmartData) detected on host:
    orion.domain.lan
    User-Agent: Heirloom mailx 12.4 7/29/08
    MIME-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    Message-Id: <20150322193643.287AF200DC1@orion.domain.lan>
    From: root@orion.domain.lan (root)

    This email was generated by the smartd daemon running on:

    host name: orion.domain.lan
    DNS domain: domain.lan
    NIS domain: (none)

    The following warning/error was logged by the smartd daemon:

    Device: /dev/sdi [SAT], failed to read SMART Attribute Data

    For details see host's SYSLOG.

    You can also use the smartctl utility for further investigation.
    No additional email messages about this problem will be sent.



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root@localhost
    Delivered-To: root@localhost.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id A3FCE200967; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
    From: mdadm monitoring <root@orion.domain.lan>
    To: root@localhost.domain.lan
    Subject: Fail event on /dev/md0:orion.domain.lan
    Message-Id: <20150322193643.A3FCE200967@orion.domain.lan>
    Date: Sun, 22 Mar 2015 14:36:42 -0500 (CDT)

    This is an automatically generated mail message from mdadm
    running on orion.domain.lan

    A Fail event had been detected on md device /dev/md0.

    It could be related to component device /dev/sdh1.

    Faithfully yours, etc.

    P.S. The /proc/mdstat file currently contains the following:

    Personalities : [raid6] [raid5] [raid4]
    md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
    17581607424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/8] [UUUUUUUU__]
    [===========>.........] check = 59.4% (1161084156/1953511936) finish=688.8min speed=19170K/sec

    unused devices: <none>



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root@localhost
    Delivered-To: root@localhost.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id AE52D200941; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
    From: mdadm monitoring <root@orion.domain.lan>
    To: root@localhost.domain.lan
    Subject: Fail event on /dev/md0:orion.domain.lan
    Message-Id: <20150322193643.AE52D200941@orion.domain.lan>
    Date: Sun, 22 Mar 2015 14:36:43 -0500 (CDT)

    This is an automatically generated mail message from mdadm
    running on orion.domain.lan

    A Fail event had been detected on md device /dev/md0.

    It could be related to component device /dev/sdi1.

    Faithfully yours, etc.

    P.S. The /proc/mdstat file currently contains the following:

    Personalities : [raid6] [raid5] [raid4]
    md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
    17581607424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/8] [UUUUUUUU__]

    unused devices: <none>



    From root@orion.domain.lan Sun Mar 22 14:36:43 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root@localhost
    Delivered-To: root@localhost.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id C3AD6200967; Sun, 22 Mar 2015 14:36:43 -0500 (CDT)
    From: mdadm monitoring <root@orion.domain.lan>
    To: root@localhost.domain.lan
    Subject: FailSpare event on /dev/md0:orion.domain.lan
    Message-Id: <20150322193643.C3AD6200967@orion.domain.lan>
    Date: Sun, 22 Mar 2015 14:36:43 -0500 (CDT)

    This is an automatically generated mail message from mdadm
    running on orion.domain.lan

    A FailSpare event had been detected on md device /dev/md0.

    It could be related to component device /dev/sdg1.

    Faithfully yours, etc.

    P.S. The /proc/mdstat file currently contains the following:

    Personalities : [raid6] [raid5] [raid4]
    md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)
    17581607424 blocks super 1.2 level 5, 512k chunk, algorithm 2 [10/8] [UUUUUUUU__]

    unused devices: <none>



    From root@orion.domain.lan Sun Mar 22 15:05:00 2015
    Return-Path: <root@orion.domain.lan>
    X-Original-To: root
    Delivered-To: root@orion.domain.lan
    Received: by orion.domain.lan (Postfix, from userid 0)
    id 4C99C200967; Sun, 22 Mar 2015 15:05:00 -0500 (CDT)
    Date: Sun, 22 Mar 2015 15:05:00 -0500
    To: root@orion.domain.lan
    Subject: SMART error (FailedHealthCheck) detected on host:
    orion.domain.lan
    User-Agent: Heirloom mailx 12.4 7/29/08
    MIME-Version: 1.0
    Content-Type: text/plain; charset=us-ascii
    Content-Transfer-Encoding: 7bit
    Message-Id: <20150322200500.4C99C200967@orion.domain.lan>
    From: root@orion.domain.lan (root)

    This email was generated by the smartd daemon running on:

    host name: orion.domain.lan
    DNS domain: domain.lan
    NIS domain: (none)

    The following warning/error was logged by the smartd daemon:

    Device: /dev/sdg [SAT], not capable of SMART self-check

    For details see host's SYSLOG.

    You can also use the smartctl utility for further investigation.
    No additional email messages about this problem will be sent.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, April 08 2015, 03:17 AM - #Permalink
    Resolved
    0 votes
    Tim Burgess wrote:
    What happened to array sdj1?
    it is the boot drive

    Have you recently updated the kernel and rebooted?
    up time was over 100 days, only auto updates.

    Have you tried marking these as failed before re-adding?
    i don't understand

    it's not obvious from the provided info why they should be any different, the SMART data also looks fine. Can you post the drive layout data for a working array drive, and one or two of the failed ones?
     parted /dev/sda "unit s print"

    good:
    # parted /dev/sda "unit s print"
    Model: ATA Hitachi HDS5C302 (scsi)
    Disk /dev/sda: 3907029168s
    Sector size (logical/physical): 512B/512B
    Partition Table: msdos

    Number Start End Size Type File system Flags
    1 2048s 3907029167s 3907027120s primary raid

    good:
    # parted /dev/sdb "unit s print"
    Model: ATA WDC WD20EZRX-00D (scsi)
    Disk /dev/sdb: 3907029168s
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos

    Number Start End Size Type File system Flags
    1 63s 3907024064s 3907024002s primary raid

    failed:
    # parted /dev/sdh "unit s print"
    Model: ATA WDC WD20EZRX-00D (scsi)
    Disk /dev/sdh: 3907029168s
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos

    Number Start End Size Type File system Flags
    1 63s 3907024064s 3907024002s primary raid

    failed
    # parted /dev/sdi "unit s print"
    Model: ATA WDC WD20EZRX-00D (scsi)
    Disk /dev/sdi: 3907029168s
    Sector size (logical/physical): 512B/4096B
    Partition Table: msdos

    Number Start End Size Type File system Flags
    1 63s 3907024064s 3907024002s primary raid


    Any further clues in 'dmesg' or '/var/log/messages'? you may need to grep the output for mdadm messages.
    above for mail/root
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, April 08 2015, 05:59 PM - #Permalink
    Resolved
    0 votes
    After I do a mdadm --create should the partition table be available?

    So far no luck!!!
    The reply is currently minimized Show
  • Accepted Answer

    Robert
    Robert
    Offline
    Wednesday, April 08 2015, 09:53 PM - #Permalink
    Resolved
    0 votes
    Hi Eric,

    autoupdates also include kernel updates if I remember correctly and they could require reboots.

    "Have you tried marking these as failed before re-adding?
    i don't understand"

    What Tim meant was that you do not just remove and add the drive to the array, but mark it as failed first:


    mark as failed:
    mdadm --manage /dev/md0 --fail /dev/sdb1
    remove drive:
    mdadm --manage /dev/md0 --remove /dev/sdb1
    add drive again:
    mdadm /dev/md0 –a /dev/sdb1

    md0 and sdb1 needs to be adjusted to your setup. If you just remove and add the drive without fail-mark it could be it just adds the drive as it was before without resync.

    Best

    Robert
    The reply is currently minimized Show
  • Accepted Answer

    Thursday, April 09 2015, 12:36 AM - #Permalink
    Resolved
    0 votes
    md0 : active raid5 sdd1[2] sdc1[3] sdf1[6] sda1[4] sde1[10] sdk1[9] sdb1[11] sdj1[8] sdi1[14](F) sdh1[13](F) sdg1[12](F)

    Why am I seeing so many drive geometry layouts? So far I've not been able to figure out the correct drive order, at least that is what I think.
    I've been doing --stop and --create
    What am I to make of the missing devices in the above order, I did have two hitachi drives fail out a long time ago which were replaced with western digital drives, all 2TB, and expanded to ten drives and one spare. Are the devices removed never reused?
    The reply is currently minimized Show
  • Accepted Answer

    Robert
    Robert
    Offline
    Thursday, April 09 2015, 07:55 AM - #Permalink
    Resolved
    0 votes
    Hi Eric,

    stop and create is not the right way to solve this problem. Better do:

    example: sdh1

    mdadm --manage /dev/md0 --fail /dev/sdh1 (might not be needed as it is already maked as failed, but better still do this step)
    mdadm --manage /dev/md0 --remove /dev/sdh1
    mdadm /dev/md0 –a /dev/sdh1

    If it still shows (F) afterwards, remove the partition and recreate the partition of drive sdh. If this does not help your drive might be broken.

    In case (I am not sure if I understood you correctly) you never removed the old drives by:

    mdadm --manage /dev/md0 --fail /dev/sdh1
    mdadm --manage /dev/md0 --remove /dev/sdh1

    You might have more drives in your raid array than you actually have in your computer, because it does not know that they should not be there anymore. So it also does not know, that you want to replace the old ones with new ones.

    Hope this helps.

    Best

    Robert

    edit: Do not give anything on the numbers e.g. sdf1[6]. They are not important for this problem.
    The reply is currently minimized Show
  • Accepted Answer

    Saturday, April 18 2015, 08:22 PM - #Permalink
    Resolved
    0 votes
    i got the array to assemble (it did complain in dmesg that the array was misaligned) and am offloading data to usb drives now.

    # mdadm --create /dev/md0 --level=5 --raid-devices=10 --chunk=512 --name=orion.localdomain:0 --assume-clean --data-offset=variable /dev/sde1:128s /dev/sdb1:128s /dev/sdd1:2048s /dev/sdc1:2048s /dev/sda1:2048s /dev/sdf1:2048s /dev/sdl1:128s /dev/sdk1:128s /dev/sdi1:128s /dev/sdh1:128s

    the secret for me was the drives had different data-offsets, once i identified that, and figured out that you need "--data-offset=variable" and append the offset to the devices with a colon.

    when i'm done getting my files, i plan on stopping the array and deleting the partitions and reconstructing the array in raid 6. this time i will save mdadm data and drive serial numbers.

    mdadm --examine /dev/sd[a-z]1
    hdparm -I /dev/sd[a-z] | grep -E "Number|/dev"
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, April 22 2015, 08:17 AM - #Permalink
    Resolved
    0 votes
    Thanks for the follow up reply Eric - I didn't know you could do that with mdadm, but did notice the differing data-offsets, perhaps more recent versions are more picky than previous? :-)
    The reply is currently minimized Show
Your Reply