Monday, November 16, 2015

Si Valley Scourge: The Neglected Band-gap Reference Circuit

Personal Note:
Hi all, its been a while since my previous post.  Thanks for checking out this blog, as I keep stating its more for my sanity than anything else.   Things in silicon valley can change quickly and it takes focus to hold on, that can take away from writing.  I didn't realize how many people enjoyed this blog until I stopped writing it.

Now that I am in-between jobs I can follow up with a few posts.  I was let go (along with my whole team as a RIF) by Semtech a week after 64GHz 32nm silicon arrived.  It appears that if you do good work, you can work yourself out of your job.  Its not unusual for hardware designers to be treated this way in the valley since the perceived value now is all software.  So Facebook, Twitter, Google, someday you may need to increase the speed of your network software with new chips.  Who is going to do it reliably and cost effective is hard to answer.

The development costs are skyrocketing, the analog designers are retiring and few students are embracing analog.  Eventually it makes sense that software companies may need to employ chip designers to increase the performance of the complete software solution.  This needs to be done soon as circuit designers age and retire the risk of the developments and cost will increase.  Just a warning that presently the chip designers are going away as the base of the vertical integration play that software capitalizes on.

Now for a post to underscore the value of senior mentors

------------------------------------

Bandgap: Scourge of the Valley    

Millions of dollars of yield loss, debug, mask revisions have been related to a simple DC bias circuits.  The neglect of the band-gap is a scourge of Silicon Valley.  Every day, people wake up, drive to work to deal with production problems, some related to poor DC circuits that could have easily been prevented.  Wasting time and money drive me nuts, maybe this post can help wake up senior designers to this issue.

I have been threatening to write this "bandgap" post for months.  I was out having a drink with Jim Gorecki and Mike Le earlier in 2015 in Santa Clara.  We were talking about high-speed track-holds, interleavers for GHZ rate ADCs and their different circuit constructions and how not to.  Mike was discussing some tips of SAR (Successive approximation ADC) design he learned while at Broadcom (pre Avago), he always has good ideas.   I love what I do and really enjoy talking shop with experts, even off hours respecting IP most people who know me are aware of this.  I can turn it off but my assumption is engineering is fair game when around other good engineers and drinks.  So after splitting a pitcher with Jim (I think it was Bass Ale) he asked the waitress for a pen and paper.  What is Jim going to do?  I was feeling pretty fuzzy since I don't normally do any analysis when drinking.  Jim wanted some drunken circuit analysis and I was game.  The conversation was circuits we hate, and how some rookies take personal ownership of bad-ideas unknowingly.

The waitress returns with pen and paper and Jim begins to draw a band-gap reference circuit.  Now for those who do not know what this is, I will explain.  Most analog chips need a reference voltage that is independent of temperature.  Say if you want your chip to turn on when the power-supply is say 90% accurate.  You can build a circuit to compare the power supply voltage to this internal reference voltage, allowing a decision to be made.  Since the circuit compensates for temperature, the turn-on voltage should ideally be independent of temperature.  The band-gap circuit produces DC current and voltage references used in the analog. 

Bandgap reference circuits are called that since the stable output-voltage tends to be around 1.17V, which is very close to the band-gap potential in silicon.  This voltage is created by summing two different signals, one that is proportional to temperature (PTAT) and another that is the opposite (CTAT).  By scaling and summing CTAT and PTAT its possible to get a voltage that is independent of temperature (to the first order).  What is interesting about the band-gap is that its "Operating Point" or DC condition is supposed to be basically the same over temperature.  This is supposed to make things easier to design.  Now a subtle point here is that the CTAT and PTAT need to balance.  There are two solutions, one when CTAT and PTAT are NON zero, and the when everything is off PTAT=CTAT=0.  So Band-gap reference circuits REQUIRE a start-up circuit to make sure they don't stick at 0.  Start-up issues are a risk in these types of circuits. Of course, the experienced analog designer is ALWAYS trying to make things easier, reduce components exploiting the situation etc.  Unfortunately experienced designers RARELY design band-gap references these days.  The chip business now is under cost pressure and low-staff consolidation.  This means under-resourced programs and late hours so often DC circuits (like bandgap reference) are designed by rookies or even interns to reduce the load on senior designers.  The lack of mentors is a huge problem  in the valley made worse by the aging/retiring of senior analog.   The more transistors, the more signals, the more difficult the start-up circuit becomes, often exceeding the skill level of a rookie.

Side Note for non-EE folk:  A voltage must be available for a chip to function.  The circuits that control this by nature need to be simple, since the failure is so devastating.   I think the band-gap circuit is analogous to a seat-belt on an airplane.  Next time you board a plane look at the seat-belt buckle.  Notice how simple/few components involved.  The seatbelt is simple, tough and reliable, as a good reference circuit should be.

So Jim draws his circuit, starting at the core which is a pair of mismatched diodes that give the PTAT signal and the diode voltages themselves that give the CTAT signal.  Now this part of the diagram is standard. Its important that you understand what is going on and not do anything to break the CTAT and PTAT relationship.  So the initial bunch of diodes Jim drew made sense.  After that, things got crazy.  Jim kept drawing, drawing more transistors (why so many when not needed?).  Jim finally completed the circuit and asked some questions. The first thing I mentioned (now 4 beers in) is that the amplifier was designed to handle a huge input voltage range.  Recall earlier I said the band-gap voltage should NOT CHANGE.  But the circuit was designed in such a way that the bandgap voltage could be anywhere at anytime.  (normally this fact is used to simplify the design)  The way the (rookie) designer implemented this was to put a couple of front-ends on the amplifier that cover different voltage ranges.  This was done probably because the "Bandgap is a boring circuit, its DC, and I am fresh out of college so I will use all of my circuit skills to build the worlds best bandgap".  Or more likely, the rookie designer didn't want to just start with a previous working design since he/she wanted to prove themselves.   No mentor was there to talk the rookie out of it.  Chip design is NEVER about the designer and should only be about making a good solid chip design.  Chips don't care who designed them or what your name is. 

So why did Jim hate the circuit it so much?  Well its poor design was preventing a company from making money on some parts.  It can be a million dollar issue over time depending on the manufacturing cost and how the test is done.  We are in industry to sell parts.  So this over-designed DC circuit is presenting the work of a team of 10+yr designers, marketing, production to NOT be fully capitalized.   Plus it was a dumb analog idea (I think the later is what bugged Jim more than me). Looking at the circuit even after half a pitcher of BASS I could see at least 3 operating conditions leading to problems.  One when the input of the amplifier starts at a high voltage, another with low voltage and a third when its in-between.  To verify that start-up circuit was tough since you would have to force the circuit into the different states and watch how it starts.  Due to the complexity of the DC bias for the rail-to-rail input new failure mechanisms were introduced to the design affecting the start-up.  So again, the road to hell is paved with good intention...  A few trash-cans full of parts..  Bandgap non-start is a product killer.

 "It works great in simulations", "It always starts up" are statements from the original rookie designer.  The lab/ATE state differently.  (The lab is the truth and anything else is bullshit-KD).  Jim and I know that just because it "works" in the simulator is interesting, but not sufficient to conclude the design belongs on a production die.  Area, Power, temperature performance, power-supply rejection, robustness, start-up and BJT choice in size all affect a design and may not be covered in simple simulations.  Also, what about  margin?  The best designers have a break-it mentality in which the circuit is stressed in simulations to beyond the required specifications.  This is in an effort to look to see if a circuit is "at the edge of a cliff" or very sensitive to margin.  A new circuit needs to be checked if it doesnt have a long history of ATE results, particularly a reference circuit.  In Jim's case, an overly complicated amplifier that sums the PTAT and CTAT signals had a start-up problem.

Now for some do/do-not bandgap-pitfalls/hints:
1.  DO NOT: use the smallest BJT available from the FAB.  The model may not be accurate and your matching will be poor.  The bigger the BJT it is, the less sensitive to modeling imperfections and mismatch.  DO NOT edit the layout or the model parameters of the FAB supplied BJT.
2.  DO NOT: put switches in series with your diodes.  The diodes are to produce the CTAT signal, which can be counter-acted by a high temperature increase in the resistance of the switch.
3.  DO: Change the temp-co by adjusting resistors, not switching in/out diodes; use voltage division
4.  DO:  Make sure the output transistors are cascoded for high DC power-supply rejection and simulate PSRR AC and Transient
5.  DO: Make sure your start-up circuit has been peer reviewed by at least one 10+ yr designer with proper test-bench functionally covering all start-up devices.
6.  DO:  Not: run the bandgap on a high power-supply without a voltage limiter on the output voltage. Check for safe-operating conditions in both powered and powered-down states.  There is a chance you can blow a thin-oxide FET mirror when an internal block powers-down.
7.  DO:  Centroid diode array with a full-set of dummies around.
8.  DO:  Use a unit resistor - The reference resistor placed around the chip has to match.  The ONLY way is to make it a schematic design and layout of the reference resistor.  This should have proper W/L for matching and an appropriate contact count for electromigration.  DO NOT USE SQUARE resistors or resistors < 1 Square in a reference circuit.  These will make a poor reference.
9.  DO NOT: run the output mirror in sub-threshold since matching will be poor.  Long channel lengths are not a bad thing.
10.  DO: included test-modes (analog test-bus) to isolate the output voltage, PTAT and CTAT signals for debug
11.  DO: send internal VDD and VSS of the band-gap out the test-bus to identify IR drops to/from bandgap inside the chip.
12. DO: Include a bypass mode to force the Bandgap output externally.
13.  DO: Make sure your Bandgap has more than 80Degrees of phase margin
14.  DO:  Double-check package model for an off-chip resistor.  A common mistake is to have a circuit go unstable when a customer puts capacitance on the external reference pin.  This can oscillate and I have seen this in 3 different companies.
15.  DO: For RF circuits low-frequency phase-noise can come from the bandgap.  Its important that any RF circuits have the noise included from the reference for proper jitter calculations.
16.  DO:  Think through the design first before getting busy with schematics/layout.
17.   DO: Check for IR drops in reference currents distributed from the band-gap - The pitfall here is to use too-narrow wires in the bandgap output.  Often I will calculate the metal width to send a currrent 1000u and then put all the output ports on the bandgap with that metal width.  This way when  a layout person connects it, they have a hint at how thick the wire needs to be.  Also schematic notes help.
18.  DO: monte-carlo to check the start-up circuit.  Does it always start-up?  Does it always disengage?
19.  DO:  Stress the circuit beyond specs (push the temperature high, supply low to see how it fails)
20.  DO: Not have any extra bandwidth in the circuit.  Its easy in deep submicron to get high bandwidths (and noise) from internal nodes.  So low, power and filter capacitance is a good thing here.  You should have a spec on how clean the output current and voltage are from noise. In audio designs, this can be very challenging since we can hear low-frequency.

At a previous company I was technical lead over a chip with a band-gap reference driven by a intern.  This band-gap had to be spun at least 3 times during the development (the most expensive I have seen in my career).  The weakness here is that the intern had basically no experienced mentor.  The base floorplan was flawed and not reviewed before route, output un-cascoded and diodes were switched leading to high temperature PTAT failures.  Phase-margin and poor ESD on the External reference pin struggles.   Nearly an impossible to follow schematic.  We spent months on the design simply because someone didnt "begin with the end in mind" on the circuit.  I feel sorry for our intern but I was unable to change his situation for political reasons.  Few things upset me more then wasted company time, money, and an opportunity to train a rookie correctly (lost opportunities are my ultimate frustration). 

Most importantly remember:  We are all chip designers, not block designers.  If there is ONE bad block on a chip, then the WHOLE chip could be bad.  So its important that rookies ask for help, to make sure they are doing the right thing.  Experienced people should also be very interested in the pedigree of the reference circuit since if it fails, the rest of their work could be meaningless.  So don't skip that band-gap design review!  Lets make sure the new designers start and stay on the right track.

So double-check that bandgap.  Who designed it?  Is it in production?

Not only are band-gaps neglected, there are also other circuits similar that need mentorship.  These killer circuits  should have a senior mentor and include:
1.  UVLO Under-voltage lock-out circuits with hysteresis
2.  Power-on reset circuits
3.  Start-up oscillators (relaxation oscillators)
4.  Any voltage regulator that has a PIN that goes off-chip
5.  Reference circuits with multiple power-supplies/sequencing
6.  ESD/Padring hookup with multiple supplies


It just takes one mistake to kill your product, and unlike software people, we can't just simply recompile.  Senior analog folks are getting rarer and spread thin, however its NOT a good idea to isolate them from the above designs.  The cost of the DC failure is increasing, and I would hate to think of what a start-up problem would cost to fix in FINFET technology.  It could cost a few million in masks, months of debug in the lab all because you wanted to save a few $$ on the designers.