Friday, June 24, 2011

Ugly troubleshooting...

Life always seems to go in stages -> alternating between the excitement of possibilities and the frustration of not having things work as well as you want them to...  Whether it's learning to walk across the floor, or electronic design there is always a learning curve (the academic hurdles of studying and thinking and planning I enjoy, building I enjoy, the implementation hurdles of finding design mistakes, understanding exactly what unanticipated things exist - not so much).  If there is only one thing wrong I find troubleshooting an interesting process; but sometimes there are going to be so many things wrong that my normal (comfortable) approaches don't work... at these times I need to be in the "right mood" to figure things out or I just make things worse.

So I finally set aside some time and started to figure things out.  The first problem I found was that while the basic msp430 / PIC communication worked if I used the niblet card (simple card) -> adding the apricot card locks the bus...

It didn't take long to figure out that the fpga was interfering with the EN1 and MSTEN lines, so I disabled the fpga -> and it still fails... So I pulled the msp430 and it still fails (which now became apparent that something was very wrong).  I wasn't thinking that I would have to program the fpga to test the msp430, but I thought to myself "ok, well I'll just throw a simple configuration on it and that should fix it.  Maybe it has odd behavior if it isn't programmed at all." -> hooked up the jtag wires -> nothing visible on the chain... ugh... (walked away from it for a few days).

So one day after work on a rainy day I looked at the problem again and decided to pull the msp430 and just jumper the power supply enables - I used a socket in the design so I could do this, but it didn't work as well as expected - sockets are great for probing but not ideal for jumping enables - better than nothing but I wish I added dip switches.  The power supplies 3v3 and 2v5 come up with the correct voltages, but still nothing on the jtag chain.  Checked the 1v2 core voltage and that's right too.  (walked away again).

Went back over the schematics and nothing jumped out as wrong to me so the next day I put a scope on the power supplies thinking that maybe there were transient problems that wouldn't show on a meter but could affect the fpga... nope 1v2,2v5,3v3 are all nice and clean.  Tried a few different options for setting up the bitstreams and those didn't work... Put the jtag wires on a scope and they looked right (didn't decode but they looked reasonable). (Walked away again because I was getting very frustrated).

Went back and re-read a lot of documents, hundreds of pages.  Made notes, made guesses, came up with a plan but didn't have time for a few days.  Rechecked the board traces and pads - no apparent shorts (good) - decided to pull the board off the backplane and only attach power -> yay JTAG chain worked - correct device seen - good, could write bitstream - good, failed verify - not good.  Checked configuration settings - checked voltages - checked jumpers - all looked good, but it didn't work... hrmm... tried changing lots of configuration settings, dropping the speeds way down, nothing seemed to make a difference (walked away).

Woke up in the morning and realized that I might have made a really stupid mistake - on many microcontrollers pins can have multiple functions and depending on the architecture there are different ways to assign them to a given pin (always through configuration bits and/or software accessible registers.  FPGAs also have multiple functions on some pins and I had assumed that the ucf and verilog setting would compile into a bitstream that would similarly configure the pins... big big big stupid stupid stupid mistake... doesn't work that way at all for configuration. (kind of happy that I now knew what the problem was though).

So I went back and re-read a lot of documents and found out that while the pins used in configuration can be used once the bitstream is loaded, they can indeed interfere with the configuration process.  M[2:0] were correct and the PUDC/INIT_B/DONE were fine but I screwed up VS[2:0] (my notes originally said that I didn't need these if I was using the internal flash on the S3AN - wrong -> I had used VS1 and VS0 as a differential pair and VS2 wasn't used -> I had tied all my unused pins to GND and this put the chip into an undocumented/reserved state... ugh... got home, desoldered the VS2 pin and bent it up so it wasn't connected (it has an internal pullup because of the M[2:0] settings) -> and now jtag worked to write the bitstream, to verify the bitstream, but I got the dreaded "DONE did not go high" result and so the chip couldn't be programmed.  Ugh... checked and indeed DONE was low.  Tried to change a bunch of configuration settings but nothing made a difference...

Now I thought about a warning that I was getting when generating the bitstream (an ambiguous warning that I couldn't find referenced anywhere online or in XILINX support documents or forums).  Ended up being the differential pair that I was using on pins that also have configuration functions - figured out when I just simplified the design to not use any differential pairs -> but that wasn't the problem with failure to configure.  Back to documents again...

Realized that the CCLK (the configuration clock generated internally to pull the data from the internal flash) should be coming out on a pin (which again I was using for something else) -> checked in on a scope -> no signal... Hrmmm... desoldered and ripped it off -> still no clock on the stub and DONE never goes high... ugh... (Went hiking and swimming)

Thought that maybe, just maybe the SPI lines (also multiuse pins) were messed up (I wasn't using an external flash chip so I didn't care about them but wired them to the SPI part of the backplane anyway -> I pulled the PIC off the packplane and bingo, jtag scans, configures, verifies, and the fpga programs.. yay... (but all the testing signals I expected to see on the bus were not present)

Checked the clock going into the fpga (50MHz XPRESSO from Fox) - nothing... checked the schematic and it looked right but clearly no signal... Soldered another oscillator on a testing pcb, wired GND, 3v3, and that one had a beautiful happy sine wave and 50 MHz.  Connected the GND and 3V3 from this one to the apricot board (same power sources as the onboard oscillator) and nothing... Whoops... Checked the voltage on the board and it's way way off...

At first I thought I might have added the wrong regulator, but that looked right (the voltage was still low though) - desoldered the onboard oscillator (not fun) and then the voltage was right...  Went back to the spec sheet and I had screwed up the pin numbering (while I thought I checked it before, apparently I flipped the pin sequence).  Rigged up a "dead bug" (in this case a dead bug flailing in the wind) with a new crystal (upside down chip individual wires pad to pad - still didn't work, but now the voltage was lower... ugh... (went out to the fields to play with the dog).

Came back and looked at the schematic and found a stupid mistake.  I had a small regulator that was always on for the msp430 and a big 3v3 regulator for the fpga that was under enable control with powergood feedback.  I had decided that the msp430 really didn't need a big bulk storage cap so I connected that one to the big regulator and left the small regulator without a big cap -> the msp430 doesn't need much power at all so this is fine -> it becomes not fine when you attach an oscillator to the little ldo and not the big one (This is why the voltage was low (RMS) and the oscillator couldn't run) -> jumped it to the big regulator and bingo the oscillator works and the fpga does what it's supposed to... yay... (I have a vague memory of thinking that there was so much available current on the little ldo that I can attach the oscillator there, but somehow the fact that I took the electrolytic cap off got lost).

So now I have something that works (but is ugly as hell and has lots of jumpers) - not all that useful overall, but a very very valuable learning experience.  It's always good to find out where you make mistakes, what things you tend to overlook, and more importantly where in the process of debugging you miss the obvious (many times in this whole process I partially tested an idea and stopped when I found the first thing wrong rather than completing that whole type of analysis - I think that because I have so much control of things at work I have gotten used to only changing one things at a time and I forgot that in an uncontrolled situation many variables can change at the same time.  I made this board with the intention of testing some signals and to test the power supply/filtering and control logic for a little fpga which I later plan to use with some ADCs that need differential IO (and I can't do this with coolrunners (cplds).  I totally screwed up the fpga configuration process - discovered some things I would do differently in the future - and discovered a few weaknesses in how I think... Not a bad experience overall considering how much I learned... just not a lot to show for the effort this time.

Sometimes you have to go through the ugly/awkward/frustrating stuff to get to the point that you an do what you want...