Samsung SSD resurrection

Yesterday, a discarded Samsung SD863 datacenter SSD found its way to my hands. Although a bit older – this one’s from 2016 – these are supposed to be quite reliable (except for a few rumoured firmware bugs), and 480GB of capacity are not to be disregarded even at today’s remarkable flash prices. Since it did not suffer from the usual drillholes of unreadability, I hooked it up to a linux machine – hopefully immune to virus-infected, supposedly “lost” thumbdrives – to give it a try. Unsurprisingly though, the drive announced 1GB of capacity, which does not fit its type MZ7KM480HAHP and explains its disposal. I had a hunch that the reason might lie with the microcode:

ata6.00: ATA-9: SAMSUNG MZ7KM480HAHP-00005, ERRORMOD, max UDMA/133
...
sd 5:0:0:0: [sdd] 1965352 512-byte logical blocks: (1.01 GB/960 MiB)

ERRORMOD as a version index looks rather like something went horribly wrong, causing some kind of corruption to the firmware. At least it still communicates through SATA, which indicates some backup capabilities.

I want to emphasize at this point that drives which failed in operation most likely still contain valuable or private data. Before throwing a defective drive out, you should make sure the actual flash chips are destroyed if you are not sure about its contents! Otherwise, a malicious third party might use special access tools to recover data even in the event of failing firmware. In my opinion, the hardware encryption is also not to be trusted since it stores key information within the drive and can therefore theoretically be bypassed by undocumented commands.

A final word of warning: this article is not about data recovery, but about restoring a working state of the hardware. Data stored on the drive will most certainly be lost!

My first approach was to download the manufacturer tool “Magician” from the Samsung website. There are two versions, for consumer and for datacenter (DC) applications. However, both failed to recognize the SSD under both Windows and Linux. On the other hand, if this were possible as a rescue method, the original owner would surely have tried. I also tried to query SMART data using smartmontools in linux, which partially worked. In addition, hdparm -I showed that none of the safety locks were engaged. From here on, all further repair attempts are undertaken in linux since it allows much more versatile access to low-level commands.

After browsing for a while, I found several most interesting resources: from people having the identical problem of suddenly failing drives showing 1GB capacities (here), over people pondering the general idea of not having to follow Samsung’s restrictions to update or wanting to modify their drive’s capabilities to well-versed tinkerers trying to extract and reverse-engineer the firmware itself. An interesting read altogether, and highly informative. In addition, I found some hints on a commercial tool called PC-3000, which appears to be able to talk low-level to SSDs of different brands for the purpose of data rescue.

However, none of these sources tackled my problem exactly. Although “my” SSD responds to the safe-mode pin bridge mentioned in the PC-3000 video (connect the two separated vias next to the 5 JTAG vias on the pcb while powering up the drive, picture will follow) by showing its internal ROM as a drive full of zeros, this only allows me to apply certain temporary access methods using modified firmware and manufacturer commands, which I do not have or know. The much more promising approach would be to fix the firmware by uploading a new one. Since the drive reaches the ready state and actually communicates, its download capability should be intact. hdparm even shows DOWNLOAD_MICROCODE as active capability. So, two problems need a solution right now:

  • Where to get a MATCHING firmware file, since none is supplied by Samsung directly?
  • How to upload it? Since Magician does not recognize the drive – probably due to some crucial piece of identity information not being provided – an alternative means needs to be found.

Getting firmware

Regarding the first problem, this forum discussion brought me the idea of searching at well-known resellers of Samsung products, who have a habit of re-branding stuff to their own, like Dell. In the Dell driver repositories, there is an actual firmware update to be found for this very drive version, although the subrevisions don’t match (mine has -00005, Dell states -00D3 or -00D03): LINK. The linux-version of this update is supplied as a .BIN file, which is actually a packed directory that can be easily unpacked:

bash FIRMWARE.BIN --extract firmware

In the freshly created folder “firmware”, there are many linux tools (explained more in detail here) along with a “payload”-folder. Inside, there is a single file called SAMSUNG_SM863_GB55.fwh. So let’s assume that this is the actual firmware binary. Since I do not own the corresponding Dell system it is a pretty safe guess that this will not run natively. How to upload the contents of this file to the drive?

In the previously mentioned hddguru thread, it is mentioned by “sourcerer” that a correct firmware file for Samsung drives is encrypted and contains exactly 1MiB of data. The .fwh is slightly larger which makes me believe that it is padded with some extra stuff, which also matches the discussion in the thread. Let’s leave this for later right now and focus on the second problem, uploading the binary blob.

Uploading microcode

There is an easy – and dangerous – way to send firmware to any drive using the standardized ATA microcode download commands for this purpose. To stream a file to the device, hdparm needs to be called with rather unusual parameters:

hdparm --fwdownload FW.BIN --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdX

No explanations required, I guess. Although the drive most likely performs checksumming and verification of authenticity of the file, once accepted and committed to memory there might be no way back. Before trying this with the firmware file, let’s have a look at the encryption mentioned in the hddguru forums first. This is supposed to be present when the file is distributed through Magician, so it might or might not be needed here since Dell has their own distribution toolset and Magician is not part of the extracted directory. In the case of encryption, chances are that this tool can remove it before loading the firmware as the 2016 release is rather old. Newer ones are supposed to have different encryption or hashing.

Examining the file using a text editor (e.g., nano) shows the following:

  • Some binary gibberish (256 bytes of that to be precise, before actual string spaces follow.
  • Spaces and obvious Samsung firmware revisions/drive type number strings in clear text (144 bytes of that) before spaces stop.
  • Binary gibberish (a LOT).
  • Intermediate clear-text strings which refer to SMART value IDs and variable definitions, sometimes debugging remarks which seem to have survived the compiler.

As mentioned here, it appears that Dell adds a custom 256-byte prefix – check! Then, there are mentions of a Samsung-specific prefix of variable length – 0x200 and 0x1B0 according to the hddguru thread. In my case, 144 bytes if I guessed correctly. The reason for this conclusion is simple: If 256+144 bytes are removed from the beginning, the remaining part is exactly 1.048.576 bytes – or 1MiB (1.024×1.024 bytes). Now, there is discussion whether these 144 bytes are actually needed by the drive. Let’s find out.

Putting it together

Let’s start by truncating 256 bytes first and uploading:

# dd if=fw_original.bin of=fw_short.bin ibs=1 obs=1 skip=256
# hdparm --fwdownload fw_short.bin --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdd
...
SM863_short.fw: file size (1048720) not a multiple of 512

Aha, so hdparm seems to require that the file length divides by 512. Nice that it checks before actually streaming the file. This means that another 144 byte need to go:

# dd if=fw_original.bin of=fw_short.bin ibs=1 obs=1 skip=400
# hdparm --fwdownload fw_short.bin --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdd
...
 Done .

Success!

Follow-up: It turns out that either Dell is using multiple versions of the upload tool, or that the two header sections are both dynamic in length after all. If the truncation for your firmware file is of different length, that may be correct! Check with a hex editor whether the required difference in length matches the pattern at the beginning of the file. Also, it seems as if the drive may actually perform internal checksumming, which blocks the download in case of a device type mismatch (see comments below).

At this point I power-cycled the drive to make the new firmware load if it was not rejected. The drive then was recognized by the PC significantly faster and now showed a proper firmware revision:

ata6.00: ATA-9: SAMSUNG MZ7KM480HAHP-00005, GXM1003Q, max UDMA/133

GXM1003Q, which is the very one it is supposed to have according to the label. This means that the drive has accepted the microcode and there were no gross mismatches in versions. Note how the revision number -00005 is still the same, although the firmware should not have matched. However, capacity at this point was still at 1GB but no more I/O errors. I then requested the drive to perform an ATA security-erase cycle to delete all data possibly remaining and re-initialize the entire memory. The password I assumed to be un-set, requiring a NULL value:

hdparm --security-erase NULL /dev/sdd

which turned out to be correct. Either the password was never set or the drive forgot it along with the rest. The erase takes ~30 minutes for the internal operations to finish. Let’s remove and hot-plug the drive now:

ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.00: ATA-9: SAMSUNG MZ7KM480HAHP-00005, GXM1003Q, max UDMA/133
ata6.00: 937703088 sectors, multi 16: LBA48 NCQ (depth 32), AA
ata6.00: configured for UDMA/133
ata6.00: detaching (SCSI 5:0:0:0)
...
sd 5:0:0:0: [sdd] Synchronizing SCSI cache
sd 5:0:0:0: [sdd] Stopping disk
scsi 5:0:0:0: Direct-Access     ATA      SAMSUNG MZ7KM480 003Q PQ: 0 ANSI: 5
sd 5:0:0:0: [sdd] 937703088 512-byte logical blocks: (480 GB/447 GiB)
sd 5:0:0:0: [sdd] Write Protect is off
sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 5:0:0:0: [sdd] Attached SCSI removable disk

Looking good! SMART no longer shows any noteable errors, lifetime counters have apparently mostly been reset. The disk responds at full capacity and full speed, and has just completed a complete self-test and full-memory write cycle without errors.

18 thoughts on “Samsung SSD resurrection

  1. I’ve just recovered four PM863a 1.92TB (MZ7LM1T9HMJP-00005) drives following and adapting your guide.

    First I downloaded the Win64 .exe file from https://www.dell.com/support/home/de-ch/drivers/driversdetails?driverid=p3c5p, doubleclicked and used the “Extract…” option to get the raw firmware file (GC5B_NF.fwh) that can be found in the payload subfolder of the extracted files/folders

    On a Linux box I then used “dd if=GC5B_NF.fwh of=GC5B_NF_MOD.fwh ibs=1 obs=1 skip=304” to remove the leading garbage of this file.
    Then I tried to flash it with “hdparm –fwdownload GC5B_NF_MOD.fwh –yes-i-know-what-i-am-doing –please-destroy-my-drive /dev/sdb” but that only worked halfways, as it finished with an error message. (SG_IO: bad/missing sense data error)

    see also: https://imgur.com/a/8CGwHwq

    I then removed another 512 byte from the firmware file (dd if=GC5B_NF_MOD.fwh of=GC5B_NF_MOD2.fwh ibs=1 obs=1 skip=512) and tried to flash this one.
    Now the process completed without any errors.

    see also: https://imgur.com/a/t6tLo7a

    Anyhow, after unplugging and replugging the disks, they were all working again. The “ERRORMOD” information was gone and the disks showed again their correct version in the firmware field.
    And as mentioned in this blog post, the version did not change by this whole procedure and still shows the same as printed on the label of the disk

    see also: https://imgur.com/a/jgdjgir

    Interestingly, we had two disks with firmware GXT5104Q and two with GXT5204Q
    On the ones with GXT5104Q this procedure erased? some SMART values, at least the “Uncorrectable Error Count” and “ECC Error Rate” were reset to zero and also the “Total Bytes Written” is 0.00

    see also: https://imgur.com/a/jgdjgir

    On the GXT5204Q drives though, these values persisted

    see also: https://imgur.com/a/mymUPrG

    Btw. the command “hdparm –security-erase NULL /dev/sdX” did not work in my case, on any of the disks. I alsways got an “SG_IO: bad/missing sense data” error no matter what I tried.
    So I’ve used the Samsung DataCenter tool for Windows, to achieve a similar result.

    see also: https://imgur.com/a/h3vd3pY

  2. Many thanks for writing this up. I’ve been living with a dead SM863 in my iMac for the last couple of years (due to laziness not wanting to take it apart). After reading someone else’s comment about not needing the firmware update, I too just ran the secure erase part (after booting the iMac to Linux from USB). The command came straight back with “validation failed” – but with the correct size returned and not the 1GB figure. And voila, it all works now (sans data). I owe you a beer.

  3. Hi,
    From what I can tell, you download the .BIN file, extract it and remove the header of the .fwh file.  That makes sense. But later on you’re actually uploading the .bin file to the hard drive. 
    My question is, how did you get the edited .fwh file back into the .bin file?
    The header of the .bin file is not the same as the .fwh file and not the one that should be edited, correct?
    Thanks a lot for your help
    Roy

  4. Man! You just saved an SSD (Samsung SM863) which slept for almost three years at the bottom of a drawer, I left it there after hours and hours of research on the net as well as sleepless nights, I had even viewed most of the links mentioned in your article, and tried a lot of things without success.

    The command: “hdparm –yes-i-know-what-i-am-doing –please-destroy-my-drive –fwdownload ./xxx /dev/sdx” always returned this error to me: “SG_IO: bad / missing sense data, sb [ ]: 70 00 05 00 00 00 00 0a 04 51 e0 00 21 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 “.

    And without any hope I then tried this one: “hdparm –security-erase NULL /dev/sdx”, and against all odds it solved the problem.

    Thank you very much, you made my day!

  5. Hi,
    It’s me again. I now have same issue with 2x SM863a mz7km960hmjp, one has dell label on it model is ending 0D3 and one samsung label (no dell p/n on it) ending 00005
    I downloaded both firmwares I could find – both links below, truncated by 304, hdparm upload starts but gives error “SG_IO: bad/missing sense data” and there is no change, after restart still 960mb
    Anything I can try? Are these firmwares correct? As with the previous I did these 1st firmware didn’t work but worked the 2nd one but in this case none work that I could find
    https://www.dell.com/support/home/uk/en/ukbsdt1/drivers/driversdetails?driverid=mfwyr
    https://www.dell.com/support/home/uk/en/ukbsdt1/drivers/driversdetails?driverid=97d8j
    I found these by googling dell mz7km960hmjp but also found them here
    http://www.poweredgec.com/latest_poweredge-14g.html
    Your help would be much appreciated.

    1. Hi Stef,
      I’m sorry for the late answer – pretty busy right now. I hope I can manage to look into your problem next weekend – I’ll let you know.
      Mario

        1. Ok, I need a while to check the firmware, but it seems to me you probably picked the right one. GD57, Serial-ATA version.

          Until then:
          * Can you run hdparm -I on the drive in question and check the security section of any lock-down mode is active? If so, try to perform a secure erase as shown in the blog article before uploading microcode.
          * You are on a native SATA interface, correct? Not a USB converter or a Software-RAID or something like that?
          * Check that AHCI mode is enabled and any drive-level security disabled in BIOS to allow for the full capabilities of the command set.

            1. Ok, according to the binary of GD57 as downloaded from Dell, removing 304 bytes is correct.
              This very firmware should work for both of your drives independent of the model number postfix. Apparently the error you receive corresponds to the drive not entering the correct state when starting the upload. It could be that the upload/checking mechanism behaves differently for this model.

              It seems this is not a trivial issue, need to investigate this more.

              1. Hi
                Quick update, I just resolve the issue with both.
                I used ata secure erase https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
                After checking it was “not frozen” I used these 2 lines

                hdparm –user-master u –security-set-pass Eins /dev/X
                then
                time hdparm –user-master u –security-erase Eins /dev/X

                The drives now show the normal GB size and work fine in Windows
                I did not even update/use the Dell firmware with hdparm, they work now and will not touch them :)
                I remember trying the secure erase command with the previous 2 SSDs and it didn’t work, this is why I found this post

    1. I have done it, this firmware worked fine and now the drive shows just under 900gb what is supposed to show for a 960gb ssd. initialized gpd and formatted ntfs fine
      thank you very much!!!!!!!!!!!!!!!!!

        1. That matches what I see in the hex editor. The first 64 bytes are no longer binary, but contain blanks and the prior drive firmware revision required for update (irrelevant without the DELL tool). The next 240 bytes are actual strings enumerating the applicable drive part numbers and their codes. In sum, your 304 bytes. It seems that both areas are dynamic in length after all. However, if the drive rejects non-matching updates, that means that the binary blob following immediately as the header of the first firmware block contains some kind of clear identification to the drive. Interesting!

      1. Hi stef,
        congratulations ;-)
        I was just writing an answer to you, but I guess it is irrelevant now. The first version your tried to download is actually a SAS firmware. I guess the drive really does some plausibility checking regarding the content, then. Your files also have different headers, same content as mine but different length and order. Your decision for the offset is correct, though. Seems like Dell is using different types of update tools.
        As you probably know by now, the download does not take a long time. Usually it finishes in below one minute.
        Enjoy your drives’ second life ;-)
        Mario

  6. Hi
    I have 2x pm863a with same issue (showing under 1gb and not initializing/formatting etc). The drives don’t have a branded label on them.
    I downloaded and extracted the fwh file from Dell’s firmware https://www.dell.com/support/home/uk/en/ukbsdt1/drivers/driversdetails?driverid=h5p55
    After executing the sudo hdparm… I get “/dev/sda:” and a cursor blinking under this. No movement in hours, no error, no “done”.
    I truncated the above Dell firmware by 304 with dd command as the dell firmware has 1,048,880
    Anything else I can try?
    I presume updating the firmware shouldn’t take hours with hdparm?

Leave a Reply

Your email address will not be published. Required fields are marked *