Wednesday, May 15, 2019

The Product development treadmill - Focus

Yes, I know its been a long time since I made a post, however for the same reason I have posted in the past.

I was browsing the web, and found a Forbes article Getting Off The "Bad Growth" Treadmill by Cesare Mainardi and Paul Leinwand.  The commentary in that relevant posting is that the low-risk way to expand your market can be a disaster.  The definition of "low risk" is a moving target as technology development increases its speed.  This has been a problem in my career, and is being compounded by the effects that drive the singularity as discussed by Ray Kurzweil.  As progress speeds up, and our competitors get better, the treadmill runs even faster! Playing it safe more "unsafe" than ever before.  But what does this mean?  What is safe?

So now I can tell my story.  It was the early days at my first real job.  We had been working on non-ADC based Ethernet PHYs, 10-T, 100-T.  Those product developments were not without their challenges, however they are simple compared to today's ADC based Ethernet chip designs.  Back then we could fit a whole transmitter on one large schematic sheet.  The receiver was a handful of blocks including a few data slicers.  We had simpler process technologies with only a handful of metal interconnect layers.  Verification was mainly analog at the block-level and the top-level simulations often black-boxed the analog, since few loops went from analog to digital and back.   So when we started the next generation, or Gigabit ethernet, we used the same strategy as before.

Since ADCs were new to the company, we "had to" create ADC test-chips.  Now test-chips are nice, however they do consume resources.  The whole design cycle is involved, from definition, schematics, layout, verification, tape-out, packaging, test-board, test-program, lab testing and data analysis.  So the test-chip was basically a huge effort.  In the end, we got a reasonably good Flash ADC from the test-chip.  I was studying this as I was working on the gain-control circuit in the Analog Front-end (AFE).  I also owned the ESD and GMII interface.  I worked on these while I watched the more senior people assemble the test-chip for the PHY.  So we were toiling away when...

We heard that Broadcom (our competitor) was sampling 1000-T.   Our competitor went "ugly early", instead of creating a test-chip, they created a functional 1000-T tranceiver.  Now it didn't have the lowest power, nor did it have a 100% standard compliant link.  However, it was the FIRST 1000-T Phy.  The fact it was the first changed everything.  Broadcom had been doing ADCs for years and didn't choose to create a lone ADC test chip just to later throw away.  They used their valuable resources to focus on what was new, which was the ADC based PHY concept.  Now that must have been a wild development at Broadcom, but the focus was keen in that they just focused on the new stuff.  The Broadcom chip didn't support the slower modes even, just enough to prove the 1000-T standard concept.  In hindsight, I admire their focus and the strategy of their approach.

Late in our development it was determined that the chip wouldn't work because some of the "stripped out" features were required.  Confusion about what was "good enough" erupted.   Key analog lead resources were overloaded by the changes.  Test-chip quality blocks were expected to be product level quality.  The scale of the development became obvious to the designers since this chip was about 10X more complicated than the previous analog based phys, and the old methodology was not working.  So we changed, added program managers, reduced the load on key lead resources.  These changes also took time.  (Now looking back our competitor also had to deal with the change in style, however there was a clear focus, they were acting while we were reacting.)  So after many long-nights we did tape out. 

Never did we ever consider that a new or different approach would need to be introduced?  Issue tracking is one that can free leads from memorizing huge sections of the design.  Also, the design can be staged.  For example, it is possible to run multiple variations of a chip.  (Sometimes called a Pizza mask).  Some versions (slices) could be for debugging, other versions for samples.  This "parallel" approach helps mitigate risk and long schedules. Now back to the story.

When silicon came back, the evidence of the rush was clearly visible. On the 5th revision, we were able to send data between to like parts. Meanwhile at that point, the competitor was in production selling parts.   While our competition was working on their next generation, we were still fighting with the first.   By the 12th revision of the part it was basically working however the specs changed to match reality.  Meanwhile more competitors appeared, with lower power parts, Marvell's Alaska, ironically assisted by designers who left.  I should have followed.

I have often wondered about marketing vs. engineering leading chip development.  Change in project focus at the last minute is a product killer.  Later during executive training I found that what bothered me is that when people who understand the technology are not included in decisions, disasters occur.  The reason being, is that (I believe) it is unethical for those who lack the proper knowledge to make technical decisions that affect the architecture of a chip.  This would be like a pediatrician planning brain surgery.  In good organizations, there is feedback and accountability that can reduce or eliminate this bad behavior if it were to occur.  Post-mortems occur, with a wide audience invited. Good companies even have a step in the product development where execution of marketing and engineering were compared to original estimates.  A test for this is to simply ask a program manager for history of resources on a past program.  If that information is not available in any form, then there may be a problem since you can't learn from your own past.

One thing true about product development these days, is that the biggest risk is making a mistake in the strategy.  If you take on too-little risk, you may not have a compelling part.  If you take on too-much risk, then you will have a painfully buggy product.  Marketing and engineering need to pick the balance.  The development strategy and methodology that works with one type of chip may be a horrible idea for another type of chip.  A good post-mortem and accountability help an organization grow in the direction that improves execution allowing for greater risk.  When in doubt, focus on delivering, there will always be distractions. 


1 comment:

  1. You're so right about it: not having a feedback/review, and pushing last minute changes is a disaster formula. Thanks for your blog

    ReplyDelete