Street Smart Analog: 2012

Wednesday, December 12, 2012

Analog test-anti-test path

Earlier in Street Smart Analog Lingo I mentioned an analog test-bus or analog test-path. These animals are excellent for debug of silicon, particularly deep-submicron where probe pads are REALLY Huge. A 2u probe-pad was no big deal back in the day but that area is really useful in geometries below 0.13u.

The test-path issue came up recently so I figure I would blog about this useful debug tool.

The Good:
DC analog signals such as currents and voltage references can be sent on/off chip helping to isolate DC bias problems. An "analog mux" is placed on the test-bus normally each block has a little mux that allows a signal to be passed from the INSIDE to a PAD on the outside of the chip. Outside of the chip the appropriate test-device or current/voltage source can be attached to the test pin. This is useful for tuning in band-gaps, bias generators and debugging low-freqency clocks or slow-speed ADCs.

High-speed (differential) signals can also be sent out an analog test-bus. These are trickier to deal with but I have seen an 800MHz test-bus employed on an 12Gbps receiver.
(ISSCC 2006 - Keyeye 12Gbps). That test-bus had a dedicated output buffer created from a thin-oxide PMOS transistor. This was a "source-follower' with the off-chip resistor being a several-K Ohm resistor. With the correct (~10V) power-supply, the circuit could be tuned to an impedance of 50 ohms to match the board trace. The poor-little transistor was biased well beyond 10 year lifetime limits however it allowed us to "tune-in" our analog DFE, NEXT and ECHO cancellers. This circuit also made a fine figure for our ISSCC paper. We achieved about 8 bit linearity with a bandwidth of near 800MHz. If you left it on too-long or raised the voltage too high the chip would blow. The eye got cleaner until it popped. Later on we included EQ on scope capture data to reduce the burn-out problem.

Medium bandwidth signals can also be sent through a mux into a front-end of a receiver. The transformer in an Ethernet chip had a dual-purpose as a balun. You could put a single-ended RF generator (with associated filter network) on the differential input side of the transformer. Then on the "chip side" you could adjust the center-tap to give whatever common-mode was required for the internal block being tested. A "leap-frog" test-path was included to send the signals to the various front-end blocks helping to debug harmonic-distortion problems, AGC ranges, low-pass filter bandwidths and ADC linearity. This path should be simulated before tape-out.

One advantage of an analog test-bus is that you can always disconnect it in a metal-rev, so reliability is not a concern, especially in the early stages of analog-front-end (AFE) bring-up.

The Bad:
I have also seen the analog test-bus cause failures. These are subtle but this is the point of street smart analog. The test-bus needs to be verified like any other circuit. Neglecting to do so can cause bad things to happen.

The ultimate sin of the "test-bus" is to reduce the performance of the circuit's primary function.

Failure #1: Some pads on chips have voltages that go "above the rail". These are called "open-drain" where an off-chip pull-up resistor or transformer is required off-chip to supply current. A common mistake is to connect a PMOS switch to the pad with body tied to the chip supply. If you take a PMOS terminal above the highest supply, a diode will turn on inside the chip and steal current with its characteristic nonlinear temperature dependent way often puzzling the layman. Also these parasitic diodes can blow. We learn in college that the PMOS body needs to be connected to the highest supply. (source-body connections also have pitfalls and are do-able, but tricky and may affect a circuit in its normal mode.) So as a general rule, unless you really have to, never us a PMOS switch, especially if you have an open-drain or a transformer. Dan Ray said "No P on the Pad". Notice my " ad", it has no P.

Failure #2: Bad neighbor behavior. What I mean by this is that several blocks normally share an analog test-bus such as a "DC" bus. There is a desire to prevent noise from coupling back in from the test-bus so often we would employ a "T" switch. This is a switch that consists of a T network with three switches. When the bus is "off", the middle switch prevents noise coupling through. When the test-bus is "on" the middle switch is off and the two outer switches connect internal node to the outside. I have seen a case where someone left out one of the switches in the T. So when the test-bus was disabled, it was pulled to ground preventing other blocks from using it. So if you have an analog test-bus, a "test-case" should include "open". I would do this by loading the test-bus with a 1Meg resistor in sims to a voltage mid-rail in simulations. You can also pull the resistor above the rail (on an open drain pin) to check for P on the pad if that is a concern.

Failure #3: Low priority verification. The first shot at that 800Mhz differential test-bus did not work all that well. We had hired an excellent consultant to design repeater to send a signal to the source-follower pad. This IP never did make the first tape-out. The focus was on tape-out and verification of the main function, but prevented debug later on forcing a quicker spin. So if you are going to put a test-bus in, you should "Do it like you mean it" and verify it too. If there are buffers they should be reviewed and plot reviewed. The test-bus methodology should be done "up front" in the design and not snuck in at the last moment, since it could ruin your floor plan. Thinking ahead and planning are always a good idea when it comes to analog chip design. You can try to substitute long hours but you'll always lose to the thinker-planner. Think tortoise and hare...

Keyeye Ref: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1696054

Sunday, December 2, 2012

Missing Teeth

With switched-capacitor circuits, one of the most critical parts of the design is the clock generator. As a friend of mine once said:

"When your switched-capacitor circuit doesn't work, check your clocks. After that, check your clocks again."   (Perry Heedley-1998)

It was back in 1999 we had our first-generation gigabit SOC back in the lab. The process was 0.35u. Supply 3.3V. We had a strange problem with non-uniform sampling. When we sent the clock out the "test-bus" we saw that it had missing pulses. Missing pulses are not a good thing and the Flash ADC ENOB was terrible. Lots of tones! On the scope the clock looked like a boxer who was missing teeth. We also had supply dependence where high supply and cold spray made it worse. What was going on?

On Friday I met a new friend who had a similar story. (So sorry buddy!) This inspired me to write this blog post on this common screw-up. If I have seen one common mess-up in that the something goes wrong with a reference clock.

In these larger Ethernet chips, we distribute the clock as a differential signal. The advantage of going differential is that the signal is not affected by clock skew and the rise/fall time match perfectly (by design). If you distribute critical clocks with single-ended circuits stop reading now since you are hopeless. The differential approach gives you a uniform sensitivity to noise on the chip and in the environment (see Ali Hajimiri's wonderfully written "Low-Noise Oscillators"). Another advantage of using a differential clock, is that ideally you can send it across power-supply domains. (when things are normal)

Now if you want a good non-overlapping clock its easy to go overboard. Normally you have a "non-overlapping clock generator".   Its a circuit who's job it is to make sure a set of clocks do not occur at the same time. A trade-off in those designs is the rise/fall time. If the clock coming out of the block has a fast rise and fall time, the clocks are less apt to overlap. However, this comes at a cost. The designer keeps increasing the size of the generator to make the output edges faster and faster. Eventually coming to a solution. There is a trade off between non-overlap time and Operational Transconductance Amplifier (OTA) settling. It almost always seems easier to use a big clock buffer transistors than to beef-up your amplifier bandwidth.

A huge pitfall of these "massive" clock generators is that they can generate huge amounts of noise and "ground bounce".   Or as Stephen Lewis (UC Davis) would say "Making sparks". The huge clock buffer circuits create massive amounts if dI/dT. Huge current spikes with peaks upwards of close to an amp can find there way into your big clock buffer. These currents hit your package (with inductance) which translate them into huge voltage spikes.

When it comes to "noisy neighbors" on a chip, it always takes an aggressor and a receptor. In this case, I was able to debug this animal but putting the clock-generator into a schematic along with a simple package model consisting of package inductance. I then put the clock source on a different power-supply in my schematic to see what happened. I did this by hand in HSPICE since I am not the hugest fan of schematic capture. I did this hand-written test-bench in real time in the lab right next to an oscilloscope with the bad clock on it. It was me, Sailesh Rao, Jim Parker and Dave Nack all gathered around the setup. I kept tweaking the test-bench, and Q factor (4) on the bondwires until BINGO. I was able to match the waveform from the scope in HSPICE. High-five from Dr. Rao! What happened?

The ground-bounce was so big that it was measured in VOLTS. Yes, our 3.3V supply had volts of ground-bounce on it from a huge clock generator. By increasing the temperature or lowering the supply on the clock generator, we could work around the problem. This part wasn't going to sample in this state. The ground-bounce was too big from uber-big clockgen!

The main PLL and the ADC with the uber-clockgen were on different power supply pins. Analog guys like to use A BUNCH of power supplies, normally to keep noise from coupling around. However, this can sometimes backfire. When breaking up power-supplies its important to visualize the return paths of all the currents and how they will affect each-other. In this case, the PLL sent the clock to the ADC who caused so-much ground-bounce that the buffer amplifier receiving the clock in the ADC missed pulses. This happened since the amplifier only had a common-mode range of about a volt, with more than a volt of ground bounce between the supplies.

So now, hopefully everyone knows that you can make a clock-generator "too-big". A technique to finding these is to just turn-on base-layers in your layout and look for huge MOSfets. Always ask yourself why you have a big transistor, since everything in the area will know about it. Also people should be aware that more supplies are not always better.

So what is a solution?
A. NERF your clockgen - Simulate it with bondwires
B. Add on-chip bypass capacitors to prevent dI/dT from hitting the bondwire
C. Improve the common-mode range of your clock buffer.
D. Design a set of inter-supply "repeaters" with huge common-mode range
E. Use DC Blocking capacitors

We solved this one with A and B. The ADC worked much better after we fixed that. We still had more challenges but....

"When your switched-capacitor circuit doesn't work, check your clocks. After that, check your clocks again."

Sunday, November 18, 2012

Street Smart Analog Terminology

In everyday life I basically use these terms over and over again. I assume that everyone knows what these things mean, but that is probably not the case. So I am going to publish and maintain a list of my "Streetsmart Analog Lingo".

Most of this lingo comes from other smart people in this business. I can't give credit to everyone. The late Dan Ray was an expert in this area. Dan was a founder at Level One and one of my first mentors. He was awesome in analog. Tim Dyer (my identical twin brother), Perry Heedley (CSUS), David Viera, Patrick Isakanian, Paul Hurst, Stephen Lewis, Bob Pease, and Dave Nack contributed some of these over the years.

Street-smart Analog Lingo:

A1 Release: Release of silicon on the very first version. This happens very rarely, maybe 1% of the time with analog circuits. Assuming it will happen is unrealistic and can actually be discouraging.

All Layer: A design that requires all mask layers to be changed or all new masks

APR: Automatic place-and-route. Machine generated layout. Also called DDA (Digital Design Automation)

Antenna: Long piece of metal touching gate-poly - can damage poly leading to huge offsets. Also can be a single-ended wire (test-point) on a circuit board operating at over 200 MHz.

Bake It: Temperature cycle

Boomerang: Bad evaluation board returned from the customer

Brown-thumb: A designer who uses "unconventional" techniques or "special tricks" to do design. Often these characters are associated with unreliable circuit designs and poor execution. Also associated with using poor methodology and bad practice.

Change layer: Metal layer dedicated for changes/programmability

Carpet Bomb It: see NFS

Chip Designer: There are only chip designers. All block designers should be interested in how their circuit affects the chip then are working on. Especially important in a (System on a Chip) SOC

Cup-cake: Cross sectional shape of copper metalization

Expert-layout schematic: an analog schematic without any layout hints or notes

E^Overnight: Bit-error rate test requiring no errors if left overnight.

Fib Slut: A part that has been in and out of a "Focused Ion Beam" (FIB) more than twice

Follow the Dollar: The process of following the customer's money to your paycheck. It should be easy to "follow the Dollar" unless you know you are operating at a loss.

FOS Schedule: Full of shXt schedule. Normally used to get management pregnant on a chip design program

50% Schedule: Project schedule that requires everything to go perfectly. No competent marketing person or design manager ever commits to a 50% schedule unless its a FOS Schedule.

Hair-dryer: Heat gun

Hare: high-power low-impedance approach to design. Opposite of tortoise.

Hidden state: Circuit state not designed - normally from a bad reset circuit. Often appears when an over-confident analog person does digital design.

Leapfrog test-path: Analog test-path that allows blocks in a analog signal chain to by bypassed for debug.

Luck: When the mistakes you made didn't matter

Irregular layout: Any circuit block that is not square or rectangular. Also called "donut block" or "block with a tit on it"-Dan Ray.

Magic-fingers: The opposite of brown-thumb. Someone who executes

Magic-circuit: Circuit designed by someone who doesn't understand it. Often a Brown-thumb.

Magic smoke: When it leaves the chip it no longer works.

Maskview: Job-deck view which is a manual mask check. Ideally a "Zen" moment and never to be done in a panic.

Metal-up; Metal: A design that requires just metal changes - quicker and bypasses HTOL. Often a way to patch a design for a quick fab turn.

MAS Document Micro-architecture spec document or "chip Bible". So useful but so shunned, a one stop shop for circuit-block and interface information.

Nack Hack: {names after the late Dave Nack} A circuit board without a "toe-tag" containing unknown changes or hacks.

Nail: Type of die probe that is simply a piece of metal

Noisy Neighbor: Noisy circuit block interacting with nearby circuits

NFS: Nuke it From Space {Aliens II}. A circuit that is flakey or has unknown issues that should be completely re-designed. A circuit or a layout that is fundamentally flawed.

Onion Peel: (peel the onion) When a chip revision comes back and a new problem is uncovered (often hidden by what was fixed)

Pencil Tap: Before GPIB and Labview we would tap knobs with pencils for fine-tuning

Pizza Mask: A multi-reticle run of silicon. Can be "metal-up" or all-layer. Normally used for debug or system level designs without a full set of models.

Poke it with a stick: Low risk vehicle in trying a new technique or new process. Also used in debug to determine if the problem is sensitive to external stimulus.

Popcorn: moisture in the package causing trouble in re-flow splitting the part

Put a fork in it: Basically done, any more effort spent on it is wasted

Leakage: sub-threshold drain-source current in MOS that makes a sensitive temperature sensor.

Relentless beating: To solve a problem with several simultaneous solutions

Sim Slave: Design resource asked to do simulations without understanding

Smoke Test: First power-up of new silicon

Spoiled-via: Connection between metals that is open or flakey

Tape-out: Sending the plans of the chip to the FAB for mask generation. An important milestone for non-experts but meaningless for true experts, since you may have a "Turd"

Team Analog: Design, debug or architecture work done in an interactive manner.

Testbus: Test path snaked through an analog design that goes to an external pin for debugging

Trainwreck: When two layouts crash-into eachother. Also a machine-generated schematic or one in which the wires cross.

Thump test: Finding a signal integrity problem (loose connection) by thumping your fist on the bench.

Turd: An incompletely verified piece of silicon. Due to time-constraints, not all simulations are run.

Works by accident: Analog circuit or subsystem category with a flaw that works fine anyway. For example critical layout sensitivity that happens to balance, when later edited may surprise fail.

Works in simulations: Analog always works in sims... famous last words.

You can't bullshit electrons: Just because you designed it doesn't mean it will work

Zap: ESD testing

Zorch: Catastrophic failure of a DC DC converter IC. (Parasitic Zener)

Thursday, November 15, 2012

Hiroshi's Desk

Trust is the glue that holds business relationships together.

Today I made a visit to R2 semiconductor where I visited an old friend and met a new one. I saw intelligence, perseverance, and focus. Both very focused and technically solid people, you often find the cream of the crop in small outfits like R2. Its tough working in a start-up there are so many issues to deal with many non-technical. I really appreciate what their team has done. My visit reminded me of the story of Hiroshi's wallet.

I joined a start-up Keyeye around late 2002 or early 2003 time-frame. At my previous company we had been doing research and development on communication circuits until that changed. At the situation we were trying to start our family my wife didn't want to move. Keyeye was one of a very few ways to stay in Sacramento and still do cutting edge mixed-signal design outside of university research. I took a huge pay-cut with the goal to make it back in stock.

Hiroshi Takatori was a founder and the CTO of Keyeye at the time. We don't talk much anymore unfortunately, but what I can say about Hiroshi is that he is a brilliant and incredibly hard-working man. Hiroshi basically dedicated a big chunk of his life toward the success of Keyeye. He was very careful in who he hired. He had many criteria but one was to bring aboard straight-shooters (like himself) and people he could trust. His style is Japanese and he liked to do all the system simulations which he did at his desk which was located right in the center of our office building. He had no cube walls around his desk, he would sit watching the company work from his central location. We all had cube-walls fortunately. You could not go into the break-room, the front-door or the lab without passing by his desk. He pretty much had the same set of items on his desk all the time. His computer, butcher paper (for system diagrams), bucket of pens, FORTRAN print-outs, a container of dried sea-weed and his wallet sitting on the edge of the desk.

What I found interesting wast that over the first 3 years I worked there (before the move) his wallet basically sat in the same place everyday. It was a fat wallet with lots of notes, business cards and money popping out the sides. It was always there, always in the same spot. We would all walk by it every day multiple times. Guests visiting Keyeye would sometimes comment on it since it was so big and bulky looking, no wonder he didn't leave it in his pocket.

Nobody ever touched Hiroshi's wallet. We all feared the dried seaweed.

I found that Hiroshi's wallet symbolized one of the key elements of characters in a start-up which is trust. When in a start-up you wear many hats, do many functions. You focus on the success of the company your funding partners are helping you to create. There are few checks and balances. Your responsibility is huge, and your risk is high. If your character is weak, then you do not belong there. You do not deserve the responsibility. You need to trust each-other. Your funding partners need to trust you to deliver. If you are ever at a start-up and looking to hire someone, ask yourself if you would you trust them with your wallet? If not, then keep looking.

Sunday, November 11, 2012

$100,000+ Frizbee

After an IC design is completed the plans are "taped-out" to a FAB that processes the wafer. The first step is to generate the masks. Depending on the type of process there could be from 10 to over 40 masks. Each mask combined with photo resist and a light source are used to pattern layers on a wafer. These patters together form transistors, capacitors, resistors, inductors, diodes and the interconnect layers used to connect the elements together.

The cost of a wafer depends on several things, but for an older process technology, say something 15 years old the wafer cost may be between $600 and $1000 each. Now if you have a small die, you can get thousands of ICs (or dice) on the wafer giving them a cost of pennies each. Now if you can shrink the design (with a more advanced process) you can fit more die on a wafer and lower cost. Also yield increases with a smaller die size since you get more die per wafer. This all makes sense and is documented many a textbook. However this is when things go right...

Several times in my career I have seen the misprocessed wafer. Normally you are waiting for the wafer to get back from the FAB and you get a funny email. There are WAT (wafer acceptance test) structures on the wafer normally off to the side or between the "dice". The FAB probes these structures and records the data. They compare the WAT data to a specification table. If there is a problem, then the FAB lets you know, this is all part of their quality control process. Of course, how bad the failure is and how far off off spec are important. I will discuss one such case. This pretty much represents every case I have seen with bad material.

Case#1: Year~1999. Process 0.13u. Failure: "Transistor threshold off due to incorrect oxide thickness module". In the FAB, the process steps are often called "modules". These modules are sometimes mixed-up or done incorrectly. In this case, the wrong oxide was used for the IO transistors which were also used inside the analog front-end. FAB apologized and was making new material. Now, I was young and new in my career at the time. The chip was a "huge" SOC with more than 16 million transistors. About 1/2 of the content was analog. 1/2 digital. We all worked hard and were waiting for months to get the silicon on the bench. I thought "what hurt" to get an early look.

Got the packaged parts and plugged in the first one, and nothing happened. Plugged in another, nothing. Anyone who knows me understands I don't give up easy, so I asked for a "pile". After going through 20 I found one that "wiggled" or gave evidence of activity. We then trained a tech in the screen process and out of 100 parts we found 3 that wiggled. Only one of the 3 actually did anything interesting. The bad parts had a strange problem in that the IO pads would oscillate in different patterns at about 100 Hz.

Why was the chip IO oscillating at 100 Hz? Why was the chip performance so bad? Was it due to the process mistake or was there a problem in the design? We have an early look so lets use this time while we wait for new material. Since it was a base-layer screw-up at the FAB it took 2 months to get new material, so we plowed forward. We found that parts with different packages had better or worse yield. The ADC (which I did the architecture for) worked fine on the good parts. However other strange behavior existed. So we took the 100Hz problem and decided to debug it.

The team was me and 3 other people. We used the "company" debugging process. The team worked for two months (32 man-weeks) to find out what was going on. We started with an "ebeam" prober which tracks activity at junctions in the IC in a vaccuum. We isolated a section of the pad-ring called JTAG what was known as "bondary scan". Since we didn't scan the analog we could investigate the analog when the digital was not-available since the IO interface was oscillating. We used FIB (Focused Ion Beam) to isolate the elements of the JTAG circuitry. To get to this point took us about 4 man-months of work with the expensive debug equipment. We finally got to a point where we found a logic gate that appeared to have a floating input. I got a HSPICE simulation to demonstrate that the floating gate and its surrounding layout can oscillate at around 100 Hz. I though we found our smoking gun. We had our FA (Failure Analysis) FIB Expert cut the the metals around the gate to identify what appeared to be a bad Via (connection between layers). We then sent the photo to the FAB (at 24 man-weeks of debug) to ask if this is related to the oxide defect. The FAB said they didnt believe the photo since we did the FA ourselves. So we went back and found another part and isolated the bad-spot and had the FAB do the FA. Now we were 32 man-weeks into the debug, the new wafers were due back soon. I got a "sheepish" email from the FAB saying that the bad oxide layer affected the vias (don't ask me how). Attached was a photo of the bad spot without any name or record of the FAB or the design. The open via caused a "relaxation" oscillator to be formed by a combination of gate-leakage and parasitic coupling in a logic-gate in the JTAG circuitry.

The new material showed up and the part worked as designed.

So what did we learn during the 32 man-week exercise? We learned that the misprocessed wafer caused the problem. During this time the FA team left other priorities aside. Schedules slipped on the next generation part. Engineers and management were worried about the design, we learned nothing new. Or did we? It certainly was educational. What else could have been done with those resources is never to be known. What other things could we have done with that company time and money?

Well I sure learned something. We spent over $100,000 in labor and FA to prove that a bad wafer was bad. We also proved that this delayed the progress of the team and hurt next generation. The wafers were scrapped or "Frizbees".

Now, be careful if you work with me and have a "known bad wafer" shipped. I am very clumsy around those things these days. I tend to smash them against the wall or throw them in the parking lot. Its hard enough to get a mixed-signal IC working when it is processed correctly, but when its not, its pointless, especially with more complex designs.

Hopefully I just saved someone a few hundred thousand dollars... I have never seen the damage of a mis-processed wafer debug be any cheaper. I have seen it a total of three times in my career and in every case, greed, impatient people and disappointed customers are involved. I no case was it ever worth the effort.

Tuesday, October 30, 2012

Definitions: Infinite Mass, UCCC and UCTAH

In the old days or with small chips, the system level designer used to request the whole system be created before the design is started. Back in the day the Analog designer did everything including layout. Having one guy do everything sounds great but is basically uncompetitive since its serial and non interactive. (You may believe your own bullshit) I have met a few fans of the serial approach, they tend to be older than I am defensive, and have low self esteem. A good chip design program should be able to survive if any member disappears, including the lead. There may be a delay but the chip goes on.

I call the "show must go on" approach to integrated circuit (IC) design "Infinite Mass". This term was coined by a mentor of mine, the late Dave Nack. Dave used this expression to describe the management style used on him in a bad way. Dave Nack was my manager at the time and didn't like being told how to manage analog. He told me that pushing back on good solid analog technique was like "Stopping Jupiter in its path".

Infinite mass is all about project momentum. This can be an asset or a weapon. Once you get your team assembled and start the Micro architecture specifications(MAS), you begin to solicit input from the different experts on your team (specific to the task). Having different people work in parallel in the definition stage builds a relationship between the people and the product. Micro-management is the enemy here. Relationships foster dedication and encourage quality. The Micro-architecture spec completion is a key milestone, even if it has a few blank pages or tables. It becomes a cornerstone of the chip and its often a living document since things are discovered during the development. The MAS eventually should contain key details of all the blocks

You can't push back against infinite mass. At some point after the MAS is defined and the project is underway, each sub-block needs to go through its own architectural phase. During this time the system is in flux since the analog or digital may not be possible without devastating results. The team needs to be open to marketing input before the architecture is closed. However, after the MAS is basically closed, basically the "cooks are in the kitchen" and things need to be stable otherwise they wont finish. This type of "Infinite Mass" design puts huge pressure on marketing and product definition. The bigger the chip, the longer marketing has before the architecture is closed. However, once its closed it should be difficult to change, with "chip death" one of the options. A late change in the game could take the whole chip from compelling to crap.

Two killers of SOCs that I know of are:
1. Unanticipated Collateral Consequence of Change (UCCC): Complex systems can break in subtle ways. Late in the game a change could "look ok" but cause pathology that may be hard to detect before the chip comes back. I remind non-analog people that each transistor has at least 4 connections: Drain, Gate, Source,Bulk. If your design has, say 100,000 transistors, then you have 400,000 connections. If you change any one, it could affect the others. The change needs to be carefully executed. The later you go in the project, the more of the connections are "made", therefore increasing the risk of change while increasing verification. Its actually more complicated than this simple example, which leads to the second killer

#2. Underestimation of the complexity of the task at hand: (UCTAH) This assumes that the thing you don't know about is easy to solve or doesn't matter. This is big for people who like to assume things or think they know it all. One hidden killer in optical sensors is the package. Delivering a reliable part that detects light in reasonably cheap package is very challenging. You can have the best circuit designer in the world but he won't help make your part survive the re-flow soldering process. The answer to UCTAH is honest feedback and a good relationships with peers in your field. When in doubt, get on the phone and ask a few people you trust. If you find your understanding of the situation lacking, maybe you have UCTAH.

Tuesday, October 23, 2012

Is There no such thing as Analog IP? The Analog IP paradox

My favorite part of IC design is working on the architecture. At the beginning you have the most freedom to make decisions while enduring an period of study. Its great to be paid to look at the Journal of Solid State Circuits, text books, design reviews and schematic databases. Sizing up the task and the different directions it could possibly take. Balancing risk and time-to-market with a handy IP library and a design team ready to go. Personally I have had good luck with IP.

I have met analog designers who believe there is no such thing as analog IP. Most every chip I have ever worked on has circuits re-used from a previous one. Maybe some of the layouts change, but in the hands of a skilled analog designer the schedule reduction is dramatic. A full understanding of the block and the history of the circuit block is required. Of course, we don't want to start with a problematic block or known bad architecture. If you don't believe in IP you probably have a high opinion of yourself. Or a big "S" on that t-shirt underneath your button-down shirt. This paragraph is one of the reasons I created this blog. Designing everything from scratch each time is hardly street-smart, although may appeal to someone new to the game. It's called Not Invented Here (NIH) syndrome.

I recall meeting K. Nagaraj at ISSCC a 14 or so years ago. I used a correlated-double-sampled switched-capacitor integrator he published (around 1996) with Paul Ferguson (lead author). I said hey Nagaraj thanks for the awesome integrator, made my chip much easier and reduced the risk. "I never built that circuit" was his response. Amazing, I told him I read the paper and built it right "off the page". He said that wasn't normal and that most people cannot build these things. Does K. Nagaraj believe in IP? Is it not the IP but who can use it?

I can't over emphasize that its important that you understand the block you "borrowed". This becomes easier with experience, and the more blocks you have seen. If you don't have experience with it, you probably shouldn't mess with it. If an "IP" Tzar were to exist, he/she would have experience with several architectures of ADCs, DACs, PLLs and DC DC converters. Gray hair is on your head. You may write a blog about how you make your living making and selling analog circuits.

An example of this I saw recently when a block was being taken from one chip and "performance improved". The chip lead grabbed the piece of IP but didn't fully understand it. We were just about to close the micro-architecture spec (MAS) when he did figure out the issue. As a manager I feel I failed him, however truthfully at this point its easy to change. The architectural phase did its duty, it kept a mistake from even getting near silicon. This is all part of the chip-making process. There will be more posts on this topic in this blog later on.

How much is it worth? Does it have asset value? Well, the answer to that is "no". I asked a VP/GM about this at my company, he basically said IP for the sake of IP is worthless. This was from a guy who sold IP even back "in the day". IP sitting around has no apparent value. So unless someone gives it a home in a chip making money, its doing nothing but eating up a forgotten chunk of disk space. What is strange is that in the right hands, it may save a company risk, time and therefore money. This doesn't sound worthless to me. Its the IP Paradox.

Thursday, October 18, 2012

STOP and THINK don't waste time

We had a case recently where two methods of analysis didn't agree. This is normally a time to stop and think. However, "customer pressure" caused us to lurch forward with a wild-guess fix. Only I blew the whistle, yes I can be a bummer at work. I get paid way too much to deal with this sort of thing. When different measurement methods conflict, there is information to be had! STOP and THINK.

Circuit analysis can be done in a few different ways. In most normal situations, a bad circuit is bad, no matter how you analyze it. For example, we have both AC and Transient analysis in a circuit simulator. The AC analysis uses linearized elements based on the circuit and its operating condition. This analysis normally is used to determine stability, gain and bandwidth of a circuit. The transient analysis is easier for non experts to understand. Its a time-domain simulation of the circuit which is equivalent to having an oscilloscope probe available to monitor every point in time on the circuit. The way the waveforms "wiggle" is based on the inputs to the system as well as the properties of the circuit such as gain and bandwidth.

If you have a case where the AC analysis says your circuit is really fast and functional, and the transient analysis shows a "flat line" or no response, then you have an issue. The correct thing to do in this situation is to determine WHY the two analysis conflict. Fast vs. dead? Both can't be true. One analysis says the circuit is fast, the other says it doesnt work. Never in this situation is it worthwhile to "fix" the dead circuit or "force it" to work. Don't bother until you understand why the two simulations give different results. Any other activity is a waste of time, unless you hate your job and just want to burn time. I discussed this with a circuit expert at Maxim today with violent agreement.

A real-world example of this is op-amp slew. This is a nonlinear phenomena when a circuit does not have enough current to charge and discharge the junctions in the circuit fast enough. A linearized "AC" analysis will show health. However, a transient analysis would show a response in slow motion. What this means, is that for small signals the circuit is fast, but for larger "wiggle" the circuit is starved for current. In this case, you conclude you "didn't use enough current" in the main design. Once you allow the circuit to have the current it requires, the two response should match. Your circuit was broken. When a linear analysis does not match a transient analysis, 9/10 times it has to do with something nonlinear in the system. The rules of superposition no longer apply. Search for the nonlinearity and you will find your problem.

In conclusion, when different methods of analysis don't agree, you need to STOP and THINK about why they don't. That is enough information in itself. Don't assume a fix for something you don't understand, or you will fix it again.. and again.... and again... and again...

Sunday, October 14, 2012

Normal order of Failures watch your DC bias

During our group meeting on Friday we discussed a recent Journal of Solid State circuits paper.

Yunzhi Dong and Kenneth Martin "A High-Speed Fully-Integrated POF Receiver with Large-area Photo Detectors in 65nm CMOS" IEEE JSSC Sept 2012 pp2080.

This is a very well written paper starting from the dynamic behavior of photo diodes to the implementation of the receiver. The receiver looks similar to those I have worked on in the past and the line-up clever with a fully-differential transimpedance amplifier with a replica diode at the input. A single point gain control after the TIA followed by filtering and a buffer amplifier to get the signal out. Dong and Martin chose a continuous time high-frequency boost approach these have advantages in power and jitter tolerance but are more difficult to design. Thats why its Dong's Ph.D. I really liked the impedance analysis at the TIA input. The TIA itself was very clever and amazingly works at 65nm voltage levels. The eye diagrams look great that tells you something.

Dong did a good job in his design and his layout. If you built a circuit with the exact same schematics as Dong and Martin and did a different layout, it may not work at all. At GHz rates the layout is so critical that a missing shield or imbalanced line could ruin, or worse yet, degrade performance. When debugging a receivers like this in the past I have seen the same issues over and over (at the chip level):
#1 Bad circuit architecture (wrong circuit, incorrect design, incorrect specs. )
#2 Bad Layout (imbalanced, mirror symmetric, IR drops, coupling)
#3 DC Bias (improper DC bias point)

#1 is obvious. #2, I have met analog designers who claim that the layout is not important. Maybe they don't respect the job of a good layout designer. Depending on the SNR and bandwidth requirements your layout sensitivity can go from non-existent to extreme. #3. Its amazing how many 15yr+ analog designers mess up DC bias. Famous quote "That voltage was close to what I needed so I simply used it." If you hear that quote, get ready for the lab.

Thursday, October 11, 2012

Trade Offs vs. Marketing

Sometimes there is a trade-off between being reactionary or doing the right thing. Its easy to confuse poor product definition with inability to execute quickly. We have a customer who wants us to create a lower-cost version of a part they are already using. However, unable to get information on exactly how the part is being used, its impossible to predict how the replacement part will act in their system.

A key challenge in light sensor design is that the chip is sensitive to geometry. The light sensor is an optical-mechanical-electrical solution. So, depending on the environment around the part, the behavior will change. Reflections, shadows, and objects near the sensor can affect its performance. In general, if the customer is not willing to share information about their system, the odds of giving a satisfactory solution are low. Its hard to pass up money these days, but sometimes a customer doesn't understand when they are being unreasonable.

I told our marketing team that instead of trying to rush to second-source a part without system specifications, its better to look ahead into the future and anticipate where the customer is going. This allows a normal design in process. This is especially true when not all the technology to build the replacement part is ready. We need to aim to exceed the performance of the present customer solution with a new part that is lower in cost than their present solution. By saying no to the short-term business, we can create a better more valuable part. This more valuable part will be useful to many customers, since at least one is already buying an inferior part.

Wednesday, October 10, 2012

The beginning

Hello! I have been considering writing a blog related to my life as an anlog integrated circuit designer. I have been working with analog circuits now for over 35 years. I have witnessed hundreds of millions of dollars in product developent, not all of it good. Hence my desire to document. Maybe a nugget from this blog will help you save time and money. Or maybe you think chip design and debug interesting. Stay tuned!