Tuesday, September 3, 2019

Two minutes

Of course, blog posts are inspired by daily activity.. so look out for DC BIAS!

Two minutes may not seem like a lot of time.  It really depends on what you are doing.  Sometimes I think about how much I cost to my company and how I am filling those hours.  "Am I doing something that really benefits the company or am I wasting time?" often is the thought.   Two minutes is the perfect time to do nothing while nothing is happening.

The product was a switching regulator.  A type of DC to DC converter that converts a higher, lower accuracy voltage to a lower, very accurate voltage.  A switch in the primary of the converter closes in series with an inductor.  When the switch closes, the current ramps with time, loading up the inductor with energy.  At a later point, the switch opens and the energy from the inductor is captured into a load device connected to the power-converter.  This switching action repeats, and over a period of time delivers power to the load, such as a resistor, charing circuit or possibly a simple LED.  Our customer returned the chip since it was exhibiting some odd behavior.  I was told to "check it out" so off to the lab I went, finding a socketed board, power supply and programmable load resistor.

I put the DCDC chip in the socket and tightened it down.  Its important when testing power-chips that the chip be really tight in the socket.  Small amounts of resistance in the connections can quickly heat up the socket.  It seems like there is no such thing as a cheap socket anymore.  These modern packages are tricky and socket vendors step up with their advanced mechanical solutions.  Some use pogo-pins, others use polymer resin, all of them are expensive and can have long lead time, so care needs to be taken with the setup.  Plugged in the board power, flipped power switch on the board and the POWER LED came to life as did current through the load.  I checked the output voltage, it was perfect, matching data-sheet specifications.  So why was I looking at the part?

"Works great"- I said.
"It hates cold" - the VP barks out

So we have this "cold spray" stuff in the lab.  I do not know what its made out of, other than its very volatile and quickly cools enough to form ice after a 20 second blast from the can.  So power-down the board, and wait for 30 seconds or so, then I give the DC DC chip a good 20 second BLAST of cold spray.  I see ice start to form around the socket, so I knew we are good and cold.  Next step was test.

Flipped the switch, the main power-unit springs to life, delivering power to the SOCKET.  However, nothing happens.  I turn off the power-unit then turn it back on.  (Did you try turning it on and off?)  Still nothing happens.  No response from the chip. 

Often when debugging I take a short walk to clear my head, after doing so returning to the chip I found that it was "on".   While I was out taking a walk, the chip "came to life".   I think this was near Halloween since I was wondering about ghosts.    So for the next experiment, I decided to not leave the bench.

Repeat- Blast 20 seconds of cold-spray on the DC DC Chip.  After I saw ice on the socket I applied power.  Again, nothing happened.  Then I waited.  Sometimes "Brute Force" is the answer. 
So I stared at the board, trying not to blink. 
30 seconds goes by.... nothing happening other than ice melting
60 seconds now.. Ice is beginning to thaw quite a bit now .. still no light
90 Seconds now... its getting old.  Im wondering whats for lunch now..
120 Seconds...  BINGO!  Light turns on, load comes to life.

Once warm, power cycle board and it comes up each time when warm.

So what can do this?  What can cause a circuit to shut-down in a cold environment?
Now I need to say that if you put a "good chip" in the socket, it works quite well with freeze spray.  There is something different about the bad ones.  Of course, the bad chip "found me" not the other way around, so I knew it was a "special" chip with respect to operating cold.

To further debug, I used the next common tool, which is to adjust voltages on the board.  Its not uncommon for DC DC converter chips to use regulated voltages on a board.  So I started to vary these voltages by adjusting some off chip components.   Pull the chip out of the board, do some soldering to change something off chip, then put the chip back in the socket, apply cool spray and wait for 2 minutes.

Two minutes is a perfect amount of time, to do basically nothing!  You can't browse the web.  You can't do any tricky math.  You can't even hold a conversation with anyone.  Just me, the board, and 2 minute windows of time.  Most of the time nothing made a difference.  However some off-chip components seem to improve the situation.  By raising a DC voltage I was able to speed things up a little. 

Now If I am debugging, often I can't solve a problem unless I get "MORE" information.  When it comes to debugging information is king. The good news is that in 2 minutes I can do quite a bit of thought as to what might be causing the 2-minute start.   The cold-spray had to go to something more controlled.  Since cold air had the most dramatic effect.  We have something called a "Thermonics" unit which blows hot or cold air on a part in a very controlled way.  With a Thermonics unit and a thermocouple you can set the case temperature of a part in a socket quite accurately.  In addition to the controlled cooling,   At -20C Forced air, I was able to re-create the two-minute turn-on.  I then started to gradually increase the temperature.

I had data like
Temperature     Delay
-20C                 2 Minutes
-10C                 1 Minute
0C                    10 Seconds
10C                  1  Second
27C                 "Instantaneous"

Again, to debug, more information is always better. What I did next was attach an oscilloscope.  I set the scope to trigger on the rising edge of the main power while measuring the delay to the LED power indicator.  I started the experiment at 10C and worked the temperature up slowly.  At 27C it was not "instantaneous" but several thousands of a second.  Even better news, was that the "good parts" we had in stock took microseconds to start at room temperature.  This means we could easily devise a test that bins the parts based on start-up time and prevent the sensitive ones from ever getting to the customer. 

So what can do this?

Well, we know that the THRESHOLD voltage, or the voltage in which a MOSFET "turns on" is a strong function of temperature.  At COLD temperatures, it takes more "voltage" on the gate of a MOSFET to "turn on".   In addition, the THRESHOLD voltage is something that varies as the chips are manufactured. So, depending on what lot you look at, the THRESHOLD can vary a little.  Of course, there are also MISMATCHES in thresholds, in that not all transistors on the SAME die have the exact same THRESHOLD.  We normally do statistically based simulations where we model the manufacture of chips, including the THRESHOLD voltage. 

In the main bias of this particular chip, there was a single NMOS transistor that had to "turn on" to make the reference clock, and the power converter, run as expected.  Due to manufacturing, and some randomly bad luck, that key transistor was "off" when then chip was cold.  Now leakage currents in the power-supply and heat in the environment would eventually get the CHIP hot enough such that it would start.  Once the chip starts, it creates its own heat which keeps the converter running.

A key observation in debugging this was the "exponential" improvement in startup time vs. temperature.  The main bias device was in "sub-threshold" when it was in the "bad state".  Sub-threshold-biased FETS behave like BJTs which are exponential.  To make things worse, this chip was made out of SOI (Silicon on Insulator) so the MOSFET is isolated from its environment for the most part by glass.  However, there is metal in the chip that goes outside to the package pins that can bring in the heat.  Its clear that the bias circuit was NOT simulated properly..  to do that you must:
1.  Check simulations to show that the voltage on the FET exceeds its THRESHOLD by say 100mV
2.  Check simulations at COLD temperature to prove this is still the case
3.  Check simulations with "SLOW" manufacturing conditions to prove the MOSFET is still ON
4.  Check with "Monte-Carlo" simulations (at least 100 random cases) to prove the FET is STILL ON (with margin)

Now if you do the above four checks, you would find that the circuit on the chip fails.    This chip was designed by very senior people, however, everyone makes mistakes.  A chip NEVER cares who you are or HOW you design.  If its not a good design you will see it in the lab sooner or later.  Also, a large number of DEBUGS I have done include issues with DC bias.  Never take DC bias casually, especially if you are senior designer.  Never tape-out a DC bias block without proper verification AND ideally a peer review.  That way you wont spend hours in the lab, two minutes at a time.

That took more than 2 minutes to write.  Hopefully not more than 2 minutes to read!

-SSA
“The postings on this site are my own and do not necessarily represent the postings, strategies or opinions of Microsoft."