Harddisk troubles

Somehow I managed to get a little work done last night (2AM), so here’s the story about the harddrive:

Some time ago, a friend dropped a dead 1TB harddisk on my workbench. It was installed in an brand new external disk enclosure and had signed off as soon as it was loaded with data. Eeek!

His first attempt was to write to the manufacturer of the drive and ask for a replacement PCB – some companies do it and in most cases the PCBs can be simply exchanged if the revision numbers are identical (they MUST be if you don’t want to risk even more data loss). In this case, they only offered him to exchange his drive for a new one. That would have been fine except for the data being lost.

So, I offered to take a look at the device. After plugging it in for the first time, I immediately noticed that the BIOS could not recognize the disk at all – it even hung during disk ID-ing, which is a strong sign of the drive controller not grasping what the heck is going on anymore. Meanwhile, the disk itself made beeping and whirring noises, followed by sharp clicks. As I could not hear the distinct noise of the spinning discs, I figured the spindle motor controller was a nice place to start off.

Current harddrive PCBs (WD10EACS in this case) consist  of two main function groups which are fully integrated into generic ICs or ASICs, meaning application specific integrated circuits.

PCB of WD10EACS
WD10EACS circuit board

The ICs in the red area are the buffer and the main controller. This chip contains the whole intelligence of the disk. SMART programs and everything regarding data organization or transfer runs in here. The buffer IC temporarily stores the data being written/read while it has not been processed by the head mechanism or sent to the pc. I also marked some other important stuff on the PCB along the way.

The green area is what makes your disk spin. The spindle/VCM driver IC (STM SMOOTH L7251 3.1) generates a three phase motor drive signal for the platter spindle and moves the arm according to the main controller’s wishes. You can find a datasheet on the web, but it is for the predecessor L7250, which is similar in function but not in its pinout.

Now, what most people don’t realize is, how complicated harddrives are. They have full onboard diagnostic programs (which are VERY poorly documented of course, and these well-protected secrets are what makes data rescue companies so special) and even though they look simple on both outside and inside, they have to be precisely calibrated for optimal performance. Hence the need for exactly matching PCBs. But, since this case has all indications of hardware failure, no diagnostic program will fix the damage.

Motor drive testing in this case is best done with the drive board unscrewed from the disk. This will eventually increase the “failed start” SMART counters in the disk’s long-term memory, but whatever – it’s busted anyway. The reason for unhooking the board is that the motor coils, if not defective, present unknown resistances and inductances between the drive outputs and make your measurement extremely difficult. After connecting the four probes of my digital scope to the spindle connector, I started sampling and plugged the board in…

Scope screenshot for HDD drive signals
Scope screenshot of motor phases

…and this is what happened. Excuse the poor contrast, I usually only use white backgrounds when printing, to save toner.

What you see is the main startup algorithm doing its job. The controller first tries to find out what position the rotor magnet in the spindle motor is in, so it can generate matching signals for rapid acceleration. It does that by applying voltage to the different coils in changing combinations, following a pre-programmed pattern. After measuring the current rise-time for each combination, it can calculate the magnetic influences in the motor coils and from that the actual motor position. To explain the different traces, yellow blue and green are the actual three phases and the magenta-colored trace shows the center tap which is not present right now, since the motor is not connected. The phases are switched against the rail voltages by MOSFET half-bridges integrated into the IC.

I marked the important part in the graph – the yellow phase is missing something on the top of its waveform – there should be short high-pulses like on the other two. The controller notices that and tried to restart the process at the red mark. After failing twice, it decides to try and spin the disk some. This is called coarse drive mode, the visible pulsing is not meant to spin the motor to a certain frequency but rather to just turn it in case the motor is “stuck” in a non-discernible state (even though that should never happen). This part of the signal is responsible for the whirring noise as the motor actually moves, but poorly so because of the missing high level on one phase.

The next step will be to either find an exchange for the driver chip since I can’t find a matching PCB, or find  a way to replace the internal MOSFET bridge of the drive with an external one. If anyone knows where to get the actual L7251 3.1 datasheet, I’d be grateful for a hint.