Yesterday, a discarded Samsung SD863 datacenter SSD found its way to my hands. Although a bit older – this one’s from 2016 – these are supposed to be quite reliable (except for a few rumoured firmware bugs), and 480GB of capacity are not to be disregarded even at today’s remarkable flash prices. Since it did not suffer from the usual drillholes of unreadability, I hooked it up to a linux machine – hopefully immune to virus-infected, supposedly “lost” thumbdrives – to give it a try. Unsurprisingly though, the drive announced 1GB of capacity, which does not fit its type MZ7KM480HAHP and explains its disposal. I had a hunch that the reason might lie with the microcode:
ata6.00: ATA-9: SAMSUNG MZ7KM480HAHP-00005, ERRORMOD, max UDMA/133 ... sd 5:0:0:0: [sdd] 1965352 512-byte logical blocks: (1.01 GB/960 MiB)
ERRORMOD as a version index looks rather like something went horribly wrong, causing some kind of corruption to the firmware. At least it still communicates through SATA, which indicates some backup capabilities.
I want to emphasize at this point that drives which failed in operation most likely still contain valuable or private data. Before throwing a defective drive out, you should make sure the actual flash chips are destroyed if you are not sure about its contents! Otherwise, a malicious third party might use special access tools to recover data even in the event of failing firmware. In my opinion, the hardware encryption is also not to be trusted since it stores key information within the drive and can therefore theoretically be bypassed by undocumented commands.
My first approach was to download the manufacturer tool “Magician” from the Samsung website. There are two versions, for consumer and for datacenter (DC) applications. However, both failed to recognize the SSD under both Windows and Linux. On the other hand, if this were possible as a rescue method, the original owner would surely have tried. I also tried to query SMART data using smartmontools in linux, which partially worked. In addition, hdparm -I showed that none of the safety locks were engaged. From here on, all further repair attempts are undertaken in linux since it allows much more versatile access to low-level commands.
After browsing for a while, I found several most interesting resources: from people having the identical problem of suddenly failing drives showing 1GB capacities (here), over people pondering the general idea of not having to follow Samsung’s restrictions to update or wanting to modify their drive’s capabilities to well-versed tinkerers trying to extract and reverse-engineer the firmware itself. An interesting read altogether, and highly informative. In addition, I found some hints on a commercial tool called PC-3000, which appears to be able to talk low-level to SSDs of different brands for the purpose of data rescue.
However, none of these sources tackled my problem exactly. Although “my” SSD responds to the safe-mode pin bridge mentioned in the PC-3000 video (connect the two separated vias next to the 5 JTAG vias on the pcb while powering up the drive, picture will follow) by showing its internal ROM as a drive full of zeros, this only allows me to apply certain temporary access methods using modified firmware and manufacturer commands, which I do not have or know. The much more promising approach would be to fix the firmware by uploading a new one. Since the drive reaches the ready state and actually communicates, its download capability should be intact. hdparm even shows DOWNLOAD_MICROCODE as active capability. So, two problems need a solution right now:
- Where to get a MATCHING firmware file, since none is supplied by Samsung directly?
- How to upload it? Since Magician does not recognize the drive – probably due to some crucial piece of identity information not being provided – an alternative means needs to be found.
Regarding the first problem, this forum discussion brought me the idea of searching at well-known resellers of Samsung products, who have a habit of re-branding stuff to their own, like Dell. In the Dell driver repositories, there is an actual firmware update to be found for this very drive version, although the subrevisions don’t match (mine has -00005, Dell states -00D3 or -00D03): LINK. The linux-version of this update is supplied as a .BIN file, which is actually a packed directory that can be easily unpacked:
bash FIRMWARE.BIN --extract firmware
In the freshly created folder “firmware”, there are many linux tools (explained more in detail here) along with a “payload”-folder. Inside, there is a single file called SAMSUNG_SM863_GB55.fwh. So let’s assume that this is the actual firmware binary. Since I do not own the corresponding Dell system it is a pretty safe guess that this will not run natively. How to upload the contents of this file to the drive?
In the previously mentioned hddguru thread, it is mentioned by “sourcerer” that a correct firmware file for Samsung drives is encrypted and contains exactly 1MiB of data. The .fwh is slightly larger which makes me believe that it is padded with some extra stuff, which also matches the discussion in the thread. Let’s leave this for later right now and focus on the second problem, uploading the binary blob.
There is an easy – and dangerous – way to send firmware to any drive using the standardized ATA microcode download commands for this purpose. To stream a file to the device, hdparm needs to be called with rather unusual parameters:
hdparm --fwdownload FW.BIN --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdX
No explanations required, I guess. Although the drive most likely performs checksumming and verification of authenticity of the file, once accepted and committed to memory there might be no way back. Before trying this with the firmware file, let’s have a look at the encryption mentioned in the hddguru forums first. This is supposed to be present when the file is distributed through Magician, so it might or might not be needed here since Dell has their own distribution toolset and Magician is not part of the extracted directory. In the case of encryption, chances are that this tool can remove it before loading the firmware as the 2016 release is rather old. Newer ones are supposed to have different encryption or hashing.
Examining the file using a text editor (e.g., nano) shows the following:
- Some binary gibberish (256 bytes of that to be precise, before actual string spaces follow.
- Spaces and obvious Samsung firmware revisions/drive type number strings in clear text (144 bytes of that) before spaces stop.
- Binary gibberish (a LOT).
- Intermediate clear-text strings which refer to SMART value IDs and variable definitions, sometimes debugging remarks which seem to have survived the compiler.
As mentioned here, it appears that Dell adds a custom 256-byte prefix – check! Then, there are mentions of a Samsung-specific prefix of variable length – 0x200 and 0x1B0 according to the hddguru thread. In my case, 144 bytes if I guessed correctly. The reason for this conclusion is simple: If 256+144 bytes are removed from the beginning, the remaining part is exactly 1.048.576 bytes – or 1MiB (1.024×1.024 bytes). Now, there is discussion whether these 144 bytes are actually needed by the drive. Let’s find out.
Putting it together
Let’s start by truncating 256 bytes first and uploading:
# dd if=fw_original.bin of=fw_short.bin ibs=1 obs=1 skip=256 # hdparm --fwdownload fw_short.bin --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdd ... SM863_short.fw: file size (1048720) not a multiple of 512
Aha, so hdparm seems to require that the file length divides by 512. Nice that it checks before actually streaming the file. This means that another 144 byte need to go:
# dd if=fw_original.bin of=fw_short.bin ibs=1 obs=1 skip=400 # hdparm --fwdownload fw_short.bin --yes-i-know-what-i-am-doing --please-destroy-my-drive /dev/sdd ... Done .
Success! At this point I power-cycled the drive to make the new firmware load if it was not rejected. The drive then was recognized by the PC significantly faster and now showed a proper firmware revision:
ata6.00: ATA-9: SAMSUNG MZ7KM480HAHP-00005, GXM1003Q, max UDMA/133
GXM1003Q, which is the very one it is supposed to have according to the label. This means that the drive has accepted the microcode and there were no gross mismatches in versions. Note how the revision number -00005 is still the same, although the firmware should not have matched. However, capacity at this point was still at 1GB but no more I/O errors. I then requested the drive to perform an ATA security-erase cycle to delete all data possibly remaining and re-initialize the entire memory. The password I assumed to be un-set, requiring a NULL value:
hdparm --security-erase NULL /dev/sdd
which turned out to be correct. Either the password was never set or the drive forgot it along with the rest. The erase takes ~30 minutes for the internal operations to finish. Let’s remove and hot-plug the drive now:
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.00: ATA-9: SAMSUNG MZ7KM480HAHP-00005, GXM1003Q, max UDMA/133 ata6.00: 937703088 sectors, multi 16: LBA48 NCQ (depth 32), AA ata6.00: configured for UDMA/133 ata6.00: detaching (SCSI 5:0:0:0) ... sd 5:0:0:0: [sdd] Synchronizing SCSI cache sd 5:0:0:0: [sdd] Stopping disk scsi 5:0:0:0: Direct-Access ATA SAMSUNG MZ7KM480 003Q PQ: 0 ANSI: 5 sd 5:0:0:0: [sdd] 937703088 512-byte logical blocks: (480 GB/447 GiB) sd 5:0:0:0: [sdd] Write Protect is off sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:0:0:0: [sdd] Attached SCSI removable disk
Looking good! SMART no longer shows any noteable errors, lifetime counters have apparently mostly been reset. The disk responds at full capacity and full speed, and has just completed a complete self-test and full-memory write cycle without errors.