www.fabiankeil.de/gehacktes/gpt-and-geli-recovery/

GPT and geli recovery with ElectroBSD

Recently the following disk experienced a data corruption event of unknown origin:

[fk@steffen ~]$ diskinfo -v /dev/ada1
/dev/ada1
        512             # sectorsize
        4000785948160   # mediasize in bytes (3.6T)
        7814035055      # mediasize in sectors
        4096            # stripesize
        0               # stripeoffset
        7752018         # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        ST4000DM005-2DP166      # Disk descr.
        ZDH0ZY73        # Disk ident.
        No              # TRIM/UNMAP support
        5980            # Rotation rate in RPM
        Not_Zoned       # Zone Mode

The disk was working fine while it was being used, then the system was shutdown and the disk was detached and after a couple of days when the disk was supposed to be used again some sectors were corrupted and the ElectroBSD kernel was no longer able to even read the partition table.

2021-12-05T13:52:00.527737+01:00 steffen kernel <2>1 - - - ada1: <ST4000DM005-2DP166 0001> ACS-3 ATA SATA 3.x device
2021-12-05T13:52:00.527757+01:00 steffen kernel <2>1 - - - ada1: Serial Number ZDH0ZY73
2021-12-05T13:52:00.527777+01:00 steffen kernel <2>1 - - - ada1: 150.000MB/s transfers (SATA, UDMA5, PIO 8192bytes)
2021-12-05T13:52:00.527797+01:00 steffen kernel <2>1 - - - ada1: 3815446MB (7814035055 512 byte sectors)
2021-12-05T13:52:00.527826+01:00 steffen kernel <2>1 - - - ada1: quirks=0x1<4K>
2021-12-05T13:52:00.846081+01:00 steffen kernel <2>1 - - - GEOM: ada1: corrupt or invalid GPT detected.
2021-12-05T13:52:00.846156+01:00 steffen kernel <2>1 - - - GEOM: ada1: GPT rejected -- may not be recoverable.

As a result no devices for the individual partitions were created and the data could not be easily accessed.

The disk didn't store particular important data but recovering the data was a useful exercise in case another disk with more important data behaves similarly in the future.

Smart data not helpful

The disk itself did not report any problems:

fk@t520.local /home/fk $ssh steffen sudo smartctl -a /dev/ada1
smartctl 7.2 2020-12-30 r5155 [ElectroBSD 12.3-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM005-2DP166
Serial Number:    ZDH0ZY73
LU WWN Device Id: 5 000c50 0a23b5ea9
Firmware Version: 0001
User Capacity:    4,000,785,948,160 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Dec 17 13:13:40 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  601) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 675) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x10a5)	SCT Status supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   064   006    Pre-fail  Always       -       90559
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       154
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   080   060   045    Pre-fail  Always       -       93302826
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       552 (209 46 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       154
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   053   040    Old_age   Always       -       25 (Min/Max 16/25)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       12
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1940
194 Temperature_Celsius     0x0022   025   047   000    Old_age   Always       -       25 (0 10 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       493h+16m+55.235s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       13556691644
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       49089938488

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       433         -
# 2  Extended offline    Interrupted (host reset)      00%       409         -
# 3  Short offline       Completed without error       00%         1         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

It was known that the disk once upon a time had been partitioned with cloudiatr. The disk had therefore five partitions, and the last one was the most important one as it contained a ZFS data pool which was only accessible through geli. The data pool contained a volume which was accessed through ggated and contained another ZFS pool with another geli layer that was managed with zogftw.

Unfortunately no backup of the partition table was available but there was a backup of the geli meta data for the data pool:

fk@t520 ~ $sudo geli dump ~/.config/zogftw/geli/metadata-backups/gpt_dpool-ada0_baracuda_4tb.eli 
Metadata on /home/fk/.config/zogftw/geli/metadata-backups/gpt_dpool-ada0_baracuda_4tb.eli:
     magic: GEOM::ELI
   version: 7
     flags: 0x0
     ealgo: AES-XTS
    keylen: 128
  provsize: 3985543004160
sectorsize: 512
      keys: 0x01
iterations: 447024
      Salt: 23[...]05
Master Key: 20[...]29
  MD5 hash: ba992035efd4fac233b83c17c354c99b

A backup of the geli meta data for the cloudia2 pool was available as well:

fk@t520 ~ $sudo geli dump ~/.config/zogftw/geli/metadata-backups/cloudia2.eli 
Metadata on /home/fk/.config/zogftw/geli/metadata-backups/cloudia2.eli:
     magic: GEOM::ELI
   version: 7
     flags: 0x0
     ealgo: AES-XTS
    keylen: 256
  provsize: 3298534882816
sectorsize: 4096
      keys: 0x01
iterations: 805315
      Salt: 8d[...]25
Master Key: 58[...]a7
  MD5 hash: 3fa6840214b3fa32d3e60dbfe76596f1

Failed attempt to discover the geli label

In theory it should be obviously possible to ignore the partition table completely and simply use gnop with the right parameters to create a provider that contained a valid geli label and that had the proper size.

Thus geli was patched to add a search subcommand whose purpose is to:

Search for metadata on the provider, starting at the end and going backwards until valid metadata is found or the beginning of the provider is reached. This subcommand may be useful if, for example, the GPT partition data got corrupted or deleted while the data on the previously accessible partitions is still expected to be valid.

While the patch worked as advertised in tests, it failed to discover the geli label on the actual disk, presumably because the label was completely gone or corrupted.

Theoretically it should also be possible to simply put the backup label back on the disk but if the correct position isn't known this would additionally require the use of gnop to make sure the ZFS meta data is where ZFS looks for it.

Partition table recovery

Instead of doing that, an attempt was made to make the kernel less picky about the state of the partition data.

ElectroBSD already inherited a kern.geom.part.check_integrity sysctl from FreeBSD and a patch was created to extend it.

With the patch and kern.geom.part.check_integrity set to 0 the kernel was able to find some partition data:

2021-12-05T15:49:45.631357+01:00 steffen kernel <2>1 - - - GEOM: hdr_lba_end (7814037127) < hdr->hdr_lba_start (40) or hrdr_lba_end >= last (7814035054) for ada1.
2021-12-05T15:49:45.631376+01:00 steffen kernel <2>1 - - - GEOM: Reading sector 4000787029504 of size 512 from ada1.
2021-12-05T15:49:45.631393+01:00 steffen kernel <2>1 - - - GEOM: ada1: the secondary GPT table is corrupt or invalid.
2021-12-05T15:49:45.631409+01:00 steffen kernel <2>1 - - - GEOM: ada1: using the primary only -- recovery suggested.
2021-12-05T15:49:45.631424+01:00 steffen kernel <2>1 - - - GEOM_PART: integrity check failed (ada1, GPT)

gpart also showed a corrupt partition table but that's better than no partition table at all:

[fk@steffen ~]$ gpart show ada1
=>        40  7814037088  ada1  GPT  (3.6T) [CORRUPT]
          40         512     1  freebsd-boot  (256K)
         552        1496        - free -  (748K)
        2048      409600     2  freebsd-zfs  (200M)
      411648    20971520     3  freebsd-zfs  (10G)
    21383168     8388608     4  freebsd-swap  (4.0G)
    29771776  7784263680     5  freebsd-zfs  (3.6T)
  7814035456        1672        - free -  (836K)

The first four partition appeared valid but unfortunately partition five could still not be attached with geli and gpart's recover subcommand didn't help either.

Deleting partition five and recreating it without specifying a size worked, though:

[fk@steffen ~]$ sudo gpart delete -i 5 /dev/ada1
ada1p5 deleted
[fk@steffen ~]$ sudo gpart add -i 5 -t freebsd-zfs /dev/ada1
ada1p5 added
[fk@steffen ~]$ gpart show ada1
=>        40  7814034982  ada1  GPT  (3.6T) [CORRUPT]
         40         512     1  freebsd-boot  (256K)
        552        1496        - free -  (748K)
       2048      409600     2  freebsd-zfs  (200M)
     411648    20971520     3  freebsd-zfs  (10G)
   21383168     8388608     4  freebsd-swap  (4.0G)
   29771776  7784263240     5  freebsd-zfs  (3.6T)
 7814035016           6        - free -  (3.0K)

The [CORRUPT] marker was gone after a reboot.

At first, geli was still not able to read meta data from the fifth partition but after restoring the backup meta data with the force flag (to ignore the size mismatch) the geli provider could be attached again and the ZFS pool could be imported:

[fk@steffen ~]$ sudo geli restore -f gpt_dpool-ada0_baracuda_4tb.eli /dev/ada1p5
[fk@steffen ~]$ sudo geli dump /dev/ada1p5
Metadata on /dev/ada1p5:
     magic: GEOM::ELI
   version: 7
     flags: 0x0
     ealgo: AES-XTS
    keylen: 128
  provsize: 3985542778880
sectorsize: 512
      keys: 0x01
iterations: 447024
      Salt: 23[...]05
Master Key: 20[...]29
  MD5 hash: 9fdffeb97ca6b34379512a9191cbfaeb

A zpool scrub revealed a few errors but a lot less than expected:

[fk@steffen ~]$ sudo zpool status -v dpool
  pool: dpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 1 days 09:00:47 with 66 errors on 2021-12-07 18:40:55
config:

        NAME          STATE     READ WRITE CKSUM
        dpool         ONLINE       0     0     0
          ada1p5.eli  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        dpool/ggated/cloudia2:<0x1>
        dpool/ggated/cloudia2@2017-04-20_21:27:<0x1>

Unfortunately all the errors occurred in the zvol for the cloudia2 pool. The cloudia2 pool could be accessed over ggated using zogftw on another system and was scrubbed as well:

fk@t520.local /home/fk $sudo zpool status -v cloudia2
  pool: cloudia2
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 3 days 21:03:27 with 6 errors on 2021-12-17 08:14:24
config:

	NAME                  STATE     READ WRITE CKSUM
	cloudia2              ONLINE       0     0     0
	  label/cloudia2.eli  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        cloudia2/dvds/the-good-wife/season-2@2016-06-30_16:33:/THE_GOOD_WIFE_S2D3/VIDEO_TS/VTS_03_4.VOB

Apparently all the errors affected the same file but luckily two copies of the file were available on other pools:

fk@t520 ~ $zogftw lookup the-good-wife/season-2
NAME                                  USED   AVAIL  REFER  MOUNTPOINT
cloudia2/dvds/the-good-wife/season-2  41.3G  99.4G  41.3G  /cloudia2/dvds/the-good-wife/season-2
intenso5/dvds/the-good-wife/season-2  41.3G  2.40T  41.3G  /intenso5/dvds/the-good-wife/season-2
wde5/dvds/the-good-wife/season-2      41.3G  7.00G  41.3G  /wde5/dvds/the-good-wife/season-2

Instead of restoring the file right away I decided to keep the partially-corrupt file around until error correction with zfs receive becomes available in OpenZFS.

Corruption cause unknown

While it would be great to know how the data corruption occurred I was unable to figure it out. As it only affected one disk I suspect that a firmware issue is more likely than a bug in the ElectroBSD patch set or in FreeBSD itself.

While human error can't be ruled out either, I was the only person with access to the disk and I'm not sure how one would accidentally cause a corruption like this.

geli search in action

Finally, just to show that the implemented geli search command actually works if the meta data is still valid:

[fk@steffen ~]$ uname -a
ElectroBSD steffen 12.3-STABLE ElectroBSD 12.3-STABLE #22 electrobsd-n234792-52515feff497-dirty: Fri Dec 17 12:51:48 UTC 2021     fk@steffen:/usr/obj/usr/src/amd64.amd64/sys/ELECTRO_BEER  amd64
[fk@steffen ~]$ sudo geli search /dev/ada1
Searching for GEOM::ELI metadata on /dev/ada1.
Found GEOM::ELI meta data at offset 4000785927680.
Metadata found on /dev/ada1:
     magic: GEOM::ELI
   version: 7
     flags: 0x0
     ealgo: AES-XTS
    keylen: 128
  provsize: 3985542778880
sectorsize: 512
      keys: 0x01
iterations: 447024
      Salt: 23[...]05
Master Key: 20[...]29
  MD5 hash: 9fdffeb97ca6b34379512a9191cbfaeb

Try making the data attachable with: gnop create -o 15243149312 -s 3985542778880 /dev/ada1
[fk@steffen ~]$ sudo gnop create -o 15243149312 -s 3985542778880 /dev/ada1
[fk@steffen ~]$ sudo geli attach /dev/ada1.nop 
Enter passphrase: 
[fk@steffen ~]$ sudo zpool import dpool
[fk@steffen ~]$ sudo zpool status -v dpool
  pool: dpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 1 days 09:00:47 with 66 errors on 2021-12-07 18:40:55
config:

        NAME            STATE     READ WRITE CKSUM
        dpool           ONLINE       0     0     0
          ada1.nop.eli  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        dpool/ggated/cloudia2:<0x1>
        dpool/ggated/cloudia2@2017-04-20_21:27:<0x1>