ABSTRACT

Title of Document: ANALYSIS AND DESIGN OF HIGH-SPEED A/D CONVERTERS IN SIGE TECHNOLOGY

PO-HSIN CHEN, Ph.D., 2007

Directed By: Professor, Martin Peckerar, Department of Electrical and Computer Engineering

Mixed-signal systems play a key role in modern communications and electronics. The quality of A/D and D/A conversions deeply affects what we see and what we hear in the real world video and radio. This dissertation deals with high-speed ADCs: a 5-bit 500-MSPS ADC and an 8-bit 2-GSPS ADC. These units can be applied in flat panel display, image enhancement and in high-speed data link. To achieve the state-of-the-art performance, we employed a 0.13-µm/2.5-V 210-GHz (unity-gain frequency) BiCMOS SiGe process for all the implementations. The circuit building blocks, such as the Track-and-Hold circuit (T/H) and the comparator, required by an ADC not only benefit from SiGe’s superior ultra-high frequency properties but also by its power drive capability.
The T/H described here achieved a dynamic performance of 8-bit accuracy at 2-GHz Nyquist rate with an input full scale range of 1 V_{p-p}. The T/H consumed 13 mW of power. The unique 4-in/2-out comparator was made of fully differential emitter couple pairs in order to operate at such a high frequency. Cascaded cross-coupled amplifier core was employed to reduce Miller effect and to avoid collector-emitter breakdown of the HBTs. We utilized the comparator interpolation technique between the preamplifier stages and the latches to reduce the total power dissipated by the comparator array. In addition, we developed an innovative D/A conversion and analog subtraction approach necessary for two-step conversion by using a bipolar pre-distortion technique. This innovation enabled us to decrease the design complexity in the subranging process of a two-step ADC.

The 5-bit interpolating ADC operated at 2-GSPS achieved a differential nonlinearity (DNL) of 0.114 LSB and an integral nonlinearity (INL) of 0.076 LSB. The effective number of bits (ENOBs) are 4.3 bits at low frequency and 4.1 bits near Nyquist rate. The power dissipation was reduced more than half to 66.14 mW, with comparator interpolation. The 8-bit two-step interpolating ADC operated at 500-MSPS. It achieved a DNL of 0.33 LSB and an INL of 0.40 LSB with a power consumption of 172 mW. The ENOBs are 7.5 bits at low frequency and 6.9 bits near Nyquist rate.
ANALYSIS AND DESIGN OF HIGH-SPEED A/D CONVERTERS
IN SIGE TECHNOLOGY

By

PO-HSIN CHEN

Dissertation submitted to the Faculty of the Graduate School of the
University of Maryland, College Park, in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
2007

Advisory Committee:
Professor Martin Peckerar, Chair
Professor Pamela Abshire
Professor Aris Christou
Professor Neil Goldsman
Professor Timothy Horiuchi
Dedication

To my lovely wife Shu-Jung and to my parents, who supported me through years of hard work. Without them, this would not have been possible.
Acknowledgements

The text of Chapter 3 and 5, in part, is a reprint of the material as it appears in Proceeding of 2007 IEEE International Conference on Integrated Circuit Design and Technology (ICICDT). The dissertation author was the primary researcher and the first author in this publication. He designed and developed the research that forms the basis for these chapters.

I came to the U.S. to start the advanced research on mixed-signal integrated circuits in 2003. I completed the course requirement to be qualified in the Ph.D. program. From spring 2004, I was fortunate enough to work under the instruction of Professor Peckerar and want to thank him for giving me the opportunity to pursue state-of-the-art research using the most advanced SiGe technology. His support and encouraging personality was of great help during my research.

I would also like to thank Professor Abshire and Professor Goldsman for their help and support. I had the chance to share testing equipment with their labs and I also had valuable discussions with their students.

I would like to thank my parents who gave me both financial and mental support. They encouraged me to pursue higher education. I shared my American experience with my parents on the phone every week and they always inspire me to keep going.
More than anyone, I would like to thank my lovely wife, Shu-Jung, for her unending support and love. Shu-Jung and I married in 2004. She came here in that year when we started our family life. She sacrificed her time and energy throughout this period. She pointed out many things to me on my road to personal improvement. I learned a lot from her. I have dedicated this dissertation to Shu-Jung and to my parents. I appreciate their support during these times.
# Table of Contents

Dedication ..................................................................................................................... ii
Acknowledgements ...................................................................................................... iii
Table of Contents .......................................................................................................... v
List of Tables .............................................................................................................. vii
List of Figures ............................................................................................................ viii
Chapter 1: Introduction ................................................................................................. 1
  1.1 Motivation ........................................................................................................... 1
  1.2 Applications ........................................................................................................ 2
  1.3 Proposed Architecture ......................................................................................... 3
  1.4 Research Contributions and Achievements ........................................................ 7
Chapter 2: SiGe Heterojunction Bipolar Transistor Technology ............................... 11
  2.1 Introduction ....................................................................................................... 11
  2.2 SiGe HBT .......................................................................................................... 12
    2.2.1 Extrinsic Base .......................................................................................... 12
    2.2.2 Integration of SiGe HBT with Standard CMOS Process ......................... 14
  2.3 Summary ........................................................................................................... 15
Chapter 3: Track-and-Hold Circuit and Comparator.................................................. 17
  3.1 Introduction ....................................................................................................... 17
  3.2 Track-and-Hold Circuit Design ........................................................................ 19
    3.2.1 Introduction .............................................................................................. 19
    3.2.2 Switched Emitter Follower T/H............................................................... 24
  3.3 Comparator Design ........................................................................................... 41
    3.3.1 Introduction .............................................................................................. 41
    3.3.2 Proposed Structure ................................................................................... 44
  3.4 Summary ........................................................................................................... 53
Chapter 4: DAC, Subtractor, Residue Amplifier, Delay Element and Encoder ........... 55
  4.1 Introduction ....................................................................................................... 55
  4.2 DAC .................................................................................................................. 57
List of Tables

Table 1.1 Comparison of reported 8-bit two-step ADCs. 8
Table 3.1 Comparison of the THD (in dB). 36
Table 3.2 Droop rate (typical value) of the T/H (in mV/ns). 39
Table 4.1 Comparison of $\Delta V_{\text{diff_out}}$ variations (3:1). 86
Table 5.1 Performance summary. 102
Table 5.2 Performance summary and comparison. 109
List of Figures

Fig. 1.1 Block diagram of a flash ADC. 3

Fig. 1.2 Block diagram of the proposed 8-bit 2-step ADC. 5

Fig. 1.3 Detail of the proposed 8-bit two-step interpolated ADC. 6

Fig. 2.1 (a) Doping profile of germanium in the base. (b) Energy band vs. lateral distance. 13

Fig. 2.2 $f_T$ versus $I_C$ for emitter area of $1 \, \mu m^2$ (dot, 0.5-\mu m 47-GHz), $0.26 \, \mu m^2$ (square, 0.18-\mu m 120-GHz) and $0.18 \, \mu m^2$ (triangle, 0.12-\mu m 210-GHz) [2.3]. 14

Fig. 2.3 Process flow for integration of SiGe HBT and CMOS [2.4]. 15

Fig. 3.1 T/H in an ADC. 17

Fig. 3.2 A simple switched capacitor sampler. 20

Fig. 3.3 A fully differential T/H. 21

Fig. 3.4 Main issues in a T/H. 21

Fig. 3.5 Issues in a T/H from the circuit point of view: (a) Charge dump/injection. (b) Hold-mode feed-through. (c) Hold-mode droop. 23

Fig. 3.6 SEF T/H. 24

Fig. 3.7 Common-emitter amplifier with emitter degeneration. 25

Fig. 3.8 (a) Differential amplifier with diode connected load. (b) Small-signal model of one branch in (a). (c) Simplified model of (b). 28

Fig. 3.9 Conventional differential pair with two emitter degenerated resistors in series with current sources. 29
Fig. 3.10  Current gain of a BJT.  
Fig. 3.11  SEF current switch.  
Fig. 3.12  Frequency response of an emitter follower.  
Fig. 3.13  Small-signal model of the SEF in Fig. 3.10.  
Fig. 3.14  $V_{B(Q1)}$ as a function of time.  
Fig. 3.15  Bottom-plate sampling technique.  
Fig. 3.16  Feedforward capacitors can reduce hold mode feedthrough.  
Fig. 3.17  Quantization method of an analog input signal.  
Fig. 3.18  Proposed preamplifier design.  
Fig. 3.19  (a) A positive-feedback latch. (b) Latching time vs. initial voltage difference.  
Fig. 3.20  Emitter-coupled logic differential latch.  
Fig. 3.21  Loading of $Q_3$.  
Fig. 3.22  Complementary self-biased differential amplifier.  
Fig. 4.1  (a) Functionality of a DAC. (b) Transfer characteristic of a DAC.  
Fig. 4.2  A simple resistor-string DAC.  
Fig. 4.3  A 3-bit DAC using binary switch array.  
Fig. 4.4  A 3-bit R-2R laddered DAC.  
Fig. 4.5  (a) A binary-weighted current-steering DAC. (b) A glitch due to the latch asynchronism.  
Fig. 4.6  An equally-weighted current steering DAC.  
Fig. 4.7  A charge-redistribution DAC.  
Fig. 4.8  Current-steering circuit in [4.12].
Fig. 4.9  Topology of the proposed CSC employs bipolar pre-distortion to achieve simpler design equations. 

Fig. 4.10  Schematic of the current sources in Fig. 4.9. 

Fig. 4.11  Subtractor using an operational amplifier. 

Fig. 4.12  Schematic of the proposed CSS. The unity-gain buffer amplifier (Q_a–Q_d) incorporates with seven CSCs to perform 3-bit subtraction. D_0–D_6 signals are thermometer codes from the previous stage. 

Fig. 4.13  Saw-toothed waveform represents the ideal residue signal (after amplified) for a 3-bit coarse conversion with a descending ramp input signal. 

Fig. 4.14  (a) Histograms of V_1 and V_2 distribution due to device mismatch. (b) Histograms of V_1 – V_2.  

Fig. 4.15  \( \Delta V_{\text{diff}_\text{out}} \) variations for different geometries. 

Fig. 4.16  Residue amplifier. 

Fig. 4.17  (a) Block diagram of the delay amplifier. (b) Propagation delay caused by the first T/H and the second T/H circuit. 

Fig. 4.18  (a) Dynamic shift register. (b) Transient response of the DSR in (a). 

Fig. 4.19  Conventional ROM implementation. 

Fig. 4.20  Using three-input logic gates to correct bubble errors. 

Fig. 5.1  Block diagram of multi-step ADC. 

Fig. 5.2  The interpolation architecture. 

Fig. 5.3  Generation of V_B and V_C. 

Fig. 5.4  Zero-crossing points of a flash comparator array with an LSB of 2 mV.
Fig. 5.5  DNL and INL of the 5-bit ADC.  
Fig. 5.6  Simulated spectra for $f_{in} = 700.3$ MHz at 2-GSample/s.  
Fig. 5.7  Simulated SNDR vs. input frequency.  
Fig. 5.8  The block diagram of the proposed two-step interpolating ADC.  
Fig. 5.9  Clock diagram of the ADC.  
Fig. 5.10 Distribution of the digital output codes.  
Fig. 5.11 Simulated DNL and INL.  
Fig. 5.12 Simulated frequency spectra of the CSS with $f_{in} = 220.3$ MHz.  
Fig. 5.13 Simulated frequency spectra of the ADC with $f_{in} = 10.7$ MHz.  
Fig. 5.14 Simulated SNDR vs. input frequency.  
Fig. 5.15 Chip layout of the 5-bit interpolating ADC.  
Fig. 5.16 Chip layout of the (a) 6-bit two-step interpolating ADC and (b) 8-bit two-step interpolating ADC.  
Fig. 5.17 Verification setup.  
Fig. 5.18 Measured digital output waveforms.  
Fig. 5.19 Measured digital output waveform for B4 at 10-MSPS. Left: $f_{in} =$ 154.39 Hz; right: $f_{in} = 57$ kHz.  
Fig. 5.20 Measured digital output waveforms: $f_{in} = 1$ kHz, input amplitude $\sim 1 V_{pp}$, $f_s = 10$ MHz.  
Fig. 5.21 Measured digital output waveforms: $f_{in} = 5$ MHz, input amplitude $\sim 1 V_{pp}$, $f_s = 10$ MHz.  
Fig. 5.22 Measured digital output waveforms: $f_{in} = 20$ MHz, input amplitude $\sim 1 V_{pp}$, $f_s = 5$ MHz.
Fig. 5.23  Die photos.  
118

Fig. A.1  MOSFET differential amplifier.  
123

Fig. A.2  BJT differential amplifier.  
124
Chapter 1: Introduction

1.1 Motivation

Electronic products that contain high-speed mixed-signal blocks, such as personal wireless handsets or digital cameras, play a more and more important role in our daily life. Not only portable communicating components, but also wired communications are in widespread use. A major portions of these devices deal with discrete-time digital signals through internal digital signal processing (DSP). Nonetheless, signals propagating in the real world are typically in analog nature, for both wired and wireless communications. Therefore, an interface placed between the analog parts and the digital parts to transfer the signal from analog domain to digital domain or vice versa is necessary. Based on the signal transformations, interfaces are classified into two categories. One is analog-to-digital converter (ADC) and the other one is digital-to-analog converter (DAC). An ADC, as shown nominally, digitizes analog input signals and then outputs digital codes. A DAC, on the other hand, converts digital input codes to analog waveforms. The ability to achieve low power dissipation and small physical size are main drivers for the pervasive use of portable electronics. Moreover, the demand for data rate has been increasing due to widely-used graphical interfaces. Consequently, high-speed conversions, along with low power and low area demands, are main considerations in these applications. This thesis focuses on researching high-speed ADCs from these perspectives.
1.2 Applications

ADCs find application in communications, instrumentation, data storage and image processing. Requirements for various applications are quite different. For instance, the conversion rate for wideband communication [1.1] [1.2] systems is required to be at some GHz with “low” resolutions (less than 12 bits). On the other hand, for image or video data acquisition [1.3] [1.4], 12 bit or higher resolution is needed to reduce the effect of quantization error. Generally speaking, high-speed ADCs operate from 100 MSample/s (MSPS) up to several GSample/s (GSPS) with a resolution ranging from 4- to 8-bit. The sampling rate, power and area consumptions are always the driving issues for applications. Conversely, high-resolution ADCs have resolution from 12 to 15 bits and sampling rates at lower than some tens of MHz. Additional patience must be paid to process variations and to device mismatches [1.5], circuit nonlinearity, noise and distortion if we are to achieve the desired specifications. Some specific applications are optical receivers with 5-b 10-GSample/s [1.6]; 6-b 1.3/1.6-GSPS in disk-drive read channels and Ethernet [1.7] [1.8] [1.9]; 8-b 55-MSample/s video application [1.10]; 8-b 150-MSPS for gigabit Ethernet and flat-panel displays [1.11]; 10-b 300-MSPS in medical engineering systems and HDTV systems [1.12]; and 12-b 5-MSPS [1.13].
1.3 Proposed Architecture

Several architectures are available for achieving analog-to-digital conversion, such as integrating ADCs, flash (or parallel) ADCs, two-step ADCs, pipelined ADCs, successive approximation ADCs and delta-sigma (Δ-Σ or oversampling) ADCs. An integrating ADC charges or discharges a timing capacitor during the conversion cycle. A flash ADC performs digitized conversion in parallel fashion as shown in Fig. 1.1. The flash architecture is widely employed in high-speed ADCs. However, due to the parallel conversion, an n-bit flash ADC requires \((2^n - 1)\) comparators throughout the ADC. A two-step or pipelined ADC produces digital output codes in the same manner as for a flash ADC but with two or multiple steps. It requires more complex circuitry to perform this kind of A/D conversion but it results in less power and area consumptions. A successive approximation ADC converts analog quantity into a digital word through a succession of trial-and-error steps. An oversampling converter,
such as a $\Delta$-$\Sigma$ ADC, is hard to be explained in short but easy to be implemented by using $\Delta$-$\Sigma$ modulator. Since the $\Delta$-$\Sigma$ ADC compromises the input frequency for higher noise performance (and, thus, the resolution), it is not favorable in high-speed conversions.

Aiming at high-speed conversion with a moderate resolution, this work intended to research and develop an 8-bit 500-MHz two-step ADC and a 5-bit 2-GHz ADC. Based on the above discussion, a flash-like ADC should be chosen for implementation. Nevertheless, due to high power dissipation and die area, a basic flash structure is not appropriate for cost saving since it needs $(2^8 - 1)$ comparators. The proposed 8-bit ADC adopts the two-step structure to reduce the total number of comparators substantially, namely, at most $(2^3 - 1) + (2^5 - 1) = 38$ comparators for a $(3 + 5)$-bit architecture and meanwhile to maintain the conversion rate. The ADC consists of a 3-bit flash-like first-step ADC and a flash-like 5-bit second-step ADC. In order to characterize the discrete building blocks, we also implemented a 5-bit interpolating ADC operated at 2 GHz in this work.

In this work, the analog circuitry was implemented in fully differential architecture. The fully differential structure can eliminate even order harmonic distortion, and hence it provides better linearity than that of a single-ended structure. We developed a 4-in/2-out fully differential comparator in SiGe technology. The comparator further utilizes cascoded HBTs to avoid collector-emitter breakdown and
to reduce Miller effect. Therefore, this comparator not only has smaller distortion, but also has a high operating frequency.

The reason for choosing the \( (3 + 5) \)-bit architecture stemmed from the consideration of circuit complexity. A 3-bit DAC is easier to realize than a 4 (or more) -bit DAC. Consequently, the total number of comparators was reduced from \( (2^8 - 1) = 255 \) to \( (2^3 - 1) + (2^5 - 1) = 38 \). Fig. 1.2 illustrates the block diagram of the proposed two-step ADC. The input stage is a Track-and-Hold circuit (T/H) to sample the input analog signals. Followed by the T/H is the first-step ADC that generates the 3 most significant bits (MSBs). The input of the second-step ADC is the amplified subtraction of the previous converted results, which are converted back to analog form, from the original input analog signal. The resulting residue signal is amplified to the full scale range and converted by the second-step ADC. The 5 least significant bits (LSBs) are converted by the second-step ADC.

![Fig. 1.2 Block diagram of the proposed 8-bit 2-step ADC.](image)

Fig. 1.2 Block diagram of the proposed 8-bit 2-step ADC.
The flash-like ADC in each of the two steps utilizes a comparator interpolation technique [1.12] [1.14] to further reduce part count and to reduce loading from the T/H. Fig. 1.3 depicts the two-step interpolated ADC in more detail. The CSS block in Fig. 1.3 stands for current-steering subtractor. In this dissertation, we proposed a unique DAC/subtractor design using bipolar pre-distortion to control the currents flowing through a buffer amplifier. The differential voltage drop in the buffer amplifier can produce digital-to-analog conversion as well as subtraction. The detail discussion of the ADC architecture is addressed in Chapter 5.

Fig. 1.3 Detail of the proposed 8-bit two-step interpolated ADC
Typically, input offset in complementary metal-oxide-semiconductor (CMOS) differential analog circuits affects the quality of signal processing enormously. Most input offset cancellation techniques (or autozeroing) use “input or output offset storage” techniques [1.15]. They store the offset information on a capacitor either at the input or at the output terminals. However, these mechanisms require an additional clock period to recover the input referred offset, limiting use in high-speed ADCs. Furthermore, because the bipolar junction transistors (BJTs) are the main components in our analog implementations, the input offset effect is relatively small compared to the CMOS counterpart. A technique called “averaging” [1.9] inherently coming from comparator interpolation can further decrease the effect of input offset. Therefore, both the interpolation and averaging techniques can improve the differential nonlinearity (DNL) of the ADC without autozeroing.

1.4 Research Contributions and Achievements

We developed two unique building blocks to realize the 5-bit interpolating ADC and the 8-bit two-step ADC. The 4-in/2-out fully differential comparator is one of the unique contributions in this work. The preamplifier is composed of an input buffer (4 emitter followers), a cross-coupled fully differential amplifier and an output buffer (2 emitter followers). The cross-coupled amplifier is made of two cascoding differential pairs. In such a way, we can avoid device breakdown and achieved a bandwidth of 13.78 GHz for the 5-bit interpolating ADC. This preamplifier provides a gain of 13 dB. The comparator consumes 6 mW and has a very small metastability probability of $8.53 \times 10^{-10}$ (the latch has a gain of 10 dB).
The other unique contribution is from the CSS design. The CSS employs bipolar pre-distortion technique to control two currents steered in the subtractor core. The currents flowing through the differential resistive load produces a half LSB subtraction or a half LSB addition. This operation allows us to retrieve the unconverted signal in the first-step ADC and feed it into the second-step ADC. Compared to a reported technique, our CSS reduces the design variables to simplify the design equation. Furthermore, it provides a linear current transfer function that enables us to utilize this structure in either high resolution or low resolution applications for a given buffer amplifier. On the other hand, this also allows us to choose proper resistance for the load of the buffer amplifier since different resistance in a given technology has different process deviation.

### TABLE 1.1

**Comparison of Reported 8-bit Two-step ADCs**

<table>
<thead>
<tr>
<th></th>
<th>[1.16]</th>
<th>[1.17]</th>
<th>[1.18]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Published year</td>
<td>1995</td>
<td>2001</td>
<td>2004</td>
<td>2007</td>
</tr>
<tr>
<td>Technology</td>
<td>0.8-µm BiCMOS</td>
<td>0.35-µm CMOS</td>
<td>0.13-µm CMOS</td>
<td>0.13-µm BiCMOS</td>
</tr>
<tr>
<td>Conversion rate (MHz)</td>
<td>200</td>
<td>100</td>
<td>125</td>
<td>500 (5-b at 2 GHz)</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>500</td>
<td>109</td>
<td>21</td>
<td>172</td>
</tr>
<tr>
<td>Full scale range (V)</td>
<td>1</td>
<td>-</td>
<td>-</td>
<td>1</td>
</tr>
<tr>
<td>DNL/INL (LSB)</td>
<td>0.50/0.50</td>
<td>0.39/0.43</td>
<td>0.15/0.25</td>
<td>0.33/0.40</td>
</tr>
<tr>
<td>ENOB</td>
<td>7.0</td>
<td>7.3</td>
<td>7.6</td>
<td>7.5</td>
</tr>
</tbody>
</table>
Table.1.1 compares the specifications of three reported 8-bit two-step ADCs. We can see during the past 6 years the speed of two-step ADCs utilizing CMOS technologies does not increase much. As for the bipolar-CMOS (BiCMOS) counterpart, the speed in 1995 was even faster than that of the recent CMOS ADC in 2004. Since BJTs have higher unit-gain frequency ($f_T$) than that of the field effect transistors (or FET) under a specific technology, we can expect higher operation frequency from the BJT circuits. However, the BiCMOS case consumed much higher power due to larger current drive of the BJTs. Thanks to newly developed semiconductor technologies, designers have a chance to use processes featuring a higher $f_T$ with lower current drive. Silicon-Germanium (or SiGe) is one the best choices for high-speed and high-integration implementations. We will discuss more detail about SiGe technology in Chapter 2.

Based on the comparison, this work achieved the conversion rates at 500-MSPS for the 8-bit prototype and 2-GSPS for the 5-bit prototype. The reductions of power and area consumptions were also a major work in this design. The simulation results showed less than 200 mW can be obtained. The static performance (i.e. DNL and integral nonlinearity or INL) and dynamic performance (i.e. effective number of bit or ENOB) are also competent to the state-of-the-art performance.

The proposed ADC is the first 8-bit two-step implementation in 0.13-µm SiGe BiCMOS technology. The high-$f_T$ property allowed us to drive the devices at lower
currents but we still got competent results. Besides, we have developed a novel CCS to generate the residue signal from the first-step ADC.

The thesis is organized as follows: the 0.13-µm SiGe technology is introduced in Chapter 2. Chapter 3 describes the details of main building blocks: T/H and comparator. Description of building blocks such as CSS, residue amplifier and encoder are followed in Chapter 4. A combined subtractor and DAC circuit is demonstrated in this chapter. Encoders and other elements used for time matching are also discussed. The core analog circuits were all designed in SiGe HBTs (i.e. heterojunction bipolar transistors), while the digital parts were CMOS device implementations. Chapter 5 integrates the final 8-bit two-step interpolating ADC and 5-bit ADC with detail descriptions. Simulation results and measurement results are also presented in this chapter. Conclusions are drawn in Chapter 6.
Chapter 2: SiGe Heterojunction Bipolar Transistor Technology

2.1 Introduction

SiGe HBT (*heterojunction bipolar transistor*) technology has been available since early 1980s at the IBM Corporation. The needs for low-cost, compact and power effective wireless communication devices sped up the research in SiGe technology over the last decade. Generally speaking, SiGe’s key attractiveness is its high $f_T$ (unit-current gain frequency) and $f_{\text{max}}$ (maximum oscillation frequency). In August 2005, IBM has claimed that its 4th generation (120 nm) SiGe process has an $f_T$ up to 210 GHz. Such a high-$f_T$ device has found applications in high-speed data acquisition systems, voice/video signal processing and personal cellular communication handsets. As a result, the development of process technologies drives the communication systems to much higher frequencies.

SiGe technology has, not only superior frequency response, but also several other advantages over other technologies. With a higher gain (thus lower $R_b$), the SiGe HBT has lower noise and 1/f noise than that of an identically fabricated Si BJT and has lower noise than CMOS.

One of the most valuable features of SiGe technology is the capability of integrating it with an existing standard CMOS process. Current GaAs HBT technology has a performance competitive with that of the SiGe HBT. However, it is
not possible to integrate the GaAs process with the well-developed Si-based CMOS process. Therefore, the SiGe BiCMOS technology is more favorable for application in high-frequency integrated systems.

2.2 SiGe HBT
2.2.1 Extrinsic Base

A SiGe HBT differs from a Si BJT in its base material. In SiGe HBT technology, the germanium (Ge) doping is graded through the base region. It increases gradually from the emitter edge to the collector edge [2.1]. Fig. 2.1 (a) shows the Ge doping profile through the base region. The Ge doping results in a smaller bandgap in the base than that of an intrinsic Si BJT. The bandgap near the collector edge has a smaller value than that near the emitter edge because a larger Ge concentration leads to a smaller bandgap. Fig. 2.1 (b) presents the change of the energy band diagram at thermal equilibrium versus lateral due to Ge doping. The gray dotted line in the base region represents an intrinsic Si BJT, while the solid ramping line stands for the SiGe HBT. The graded energy band will produce a small electric quasi-drift field at the emitter-base junction. This field can accelerate the electron injection from the emitter to the collector. Thus, the electron base transit time is reduced. As a result, both $f_T$ and $f_{max}$ are increased. Along with the quasi drift field, the narrow base width further improves the high frequency performance.
Investigations of SiGe HBTs with $f_T$s greater than 200 GHz have been published since early 2000 [2.2] [2.3]. In [2.3], an $f_T$ of 210 GHz has been reported. The $f_T$ for different technologies is shown in Fig. 2.2. We can find that under the same $f_T$, the operating current of the 0.12-μm 210-GHz technology has been reduced to one tenth than that of the 0.5-μm 47-GHz technology. Therefore, the newer technology (0.12-μm 210-GHz) can achieve the same performance (to the first order of approximation) with only one tenth the current dissipated.
2.2.2 Integration of SiGe HBT with Standard CMOS Process

It has been proven that SiGe HBT has much better performance at high frequencies and has lower power dissipation and low noise over the standard pure-Si BiCMOS process. Moreover, its capability in integration with standard CMOS process makes it outshine III-V (e.g. GaAs) technology and gives designers more design flexibilities. High Q passive components (inductors, capacitors) in SiGe technology make monolithic integrated circuits possible.

The most considerable step in the SiGe HBT fabrication processes is the growth of the low-temperature epitaxial base. Since the source and the drain formation in
CMOS process requires high-temperature dopant activation, IBM® has developed a “base after gate” integration flow [2.4] in order to integrate the SiGe HBT with the Si CMOS technology. Fig. 2.3 shows such a process flow. By doing so, the MOSFET processes are less susceptible to damage by the low-temperature HBT processes. The detail procedure can be seen in Fig. 2.3.

![Diagram of process flow](image)

Fig. 2.3 Process flow for integration of SiGe HBT and CMOS [2.4].

### 2.3 Summary

By replacing the Si base with a graded Ge doped Si base, a graded base energy bandgap can be formed in the SiGe HBT. This produces a quasi electrical field that decreases the base transit time, reduces base leakage current, increases $\beta$ (and $g_m$) and
results in a higher $f_T$. Furthermore, the SiGe HBT can be integrated with standard Si CMOS processes. Therefore, combining the device performance and cost consideration, SiGe technique is an attractive choice in high-frequency circuits.
Chapter 3: Track-and-Hold Circuit and Comparator

3.1 Introduction

In an ADC, the conversion starts with analog signal sampling followed by voltage comparisons and final encoding. The input analog signal sampling is accomplished by a sampler, such as sample-and-hold circuit (S/H) or track-and-hold circuit (T/H). Generally, the T/Hs are applied as a pre-sampler in front of the comparator array (or quantizer) to improve the high-frequency performance of the ADC. Fig. 3.1 shows such an arrangement. The T/H intends to increase both the sampling frequency and the resolution. In our two-step ADC, the T/H further enables us to capture the synchronized signals for both steps. We will discuss the use of the T/H as an analog delay amplifier in Chapter 4. Thus, the T/H plays an important role in the data acquisition systems, especially for high-speed application and multi-step ADCs. Besides, T/Hs are also applied in communications, imaging and video to enhance the sampling quality.

![Diagram of T/H in an ADC](image)

Fig. 3.1 T/H in an ADC.

Basically, either the S/H or the T/H takes two none-overlapping clock cycles to sample the input signal. During the first cycle, the sampler tracks the input signal. In
the second cycle, the sampled signal is held at its final voltage level at the end point of the tracking period. Switched-capacitor circuits are widely employed in order to hold the sampled signal. During the switching on period, the signal is charged to the capacitor. Once the switch is off, the signal is held by the capacitor.

According to circuit structures, we separate T/Hs into two categories: closed-loop T/Hs and open-loop T/Hs. The former ones feature much higher accuracy since the negative feedback loop can stabilize the gain against parameter changes generated by the process. However, the closed-loop architectures may have a much longer response time. The latter techniques are widely applied in high-frequency systems, but designers need to pay more attention to matching and accuracy issues. We will discuss our T/H design more detail in Section 3.2.

After the input signal is sampled by the T/H, the output of the T/H will be compared to the quantized voltage references. For an n-bit flash ADC, it requires \(2^n - 1\) equally-spaced references along with the same amount of comparators to generate n-bit thermometer codes. Since \(n\) is typically greater than 5, the comparators play a very important role and occupy a large chip area. Any malfunction of a comparator will lead to conversion errors. In Section 3.3, the design considerations for the proposed comparator will be discussed. Since a comparator is to distinguish the input signal from a specified voltage, it conventionally offers some gain at the input stage to reduce the effect of input offset. A latching stage will shift to positive or to zero output and further store the compared result. Therefore, the comparator is composed of an amplifier and a decision making latch. This is dealt with in greater
detail below. Section 3.4 summarizes the discussion of the T/H and comparator designs.

3.2 Track-and-Hold Circuit Design

3.2.1 Introduction

Since the quality of an A/D conversion depends critically on the sampled analog signal, the restricted requirements on a T/H design are always important. For an n-bit ADC, assuming no noise contribution from the T/H but only harmonic distortions, the total harmonic distortion (THD) should follow the design constraint [3.1]:

\[
THD < -(1.76 + 6.02 \cdot n) \text{ dB.} \tag{3.1}
\]

Therefore, for an 8-bit ADC, the THD should be less than -49.92 dB.

Theoretically, an ADC needs a frozen signal at each sampling point to perform a precise conversion. Therefore, the input analog signal should be frozen at an instant and then be held ideally for a moment. Because of these properties, the switched-capacitor technique is extensively employed in T/H designs. In Fig. 3.2, the switched-capacitor circuit acts as a simple T/H. The tracking process is initiated by closing \( S_2 \) and opening \( S_1 \) during the same clock period, meanwhile the input signal, \( V_2 \), will be charged to the hold capacitor, \( C_1 \). Once \( S_2 \) is opened, the signal in \( C_1 \) is held to the value of \( V_2 \) at the last instant when \( S_2 \) is still closed. During the held period (\( S_2 \) is opened), the held signal can be transferred to the output buffer of the T/H by closing \( S_1 \). Therefore, the output voltage, \( V_1 \), is equal to \( V_2 \) at the last instant of the tracking period.
Fig. 3.2 A simple switched capacitor sampler.

An ideal T/H features high sampling rate, high resolution and low power consumption. In reality, however, the demand for ultra high speed and ultra high resolution are contradiction to each other. For various applications, several T/H architectures were developed in the literature [3.2] [3.3] [3.4]. Regarding the speed requirement of the proposed ADC, we chose open-loop T/H to meet the sampling rate.

Practically, a T/H consists of an input buffer, a switched-capacitor circuit and an output buffer. Fig. 3.3 presents a practical T/H design in fully differential fashion. The input buffer can isolate the input analog signal from switching feedthrough generated by the switched-capacitor and should provide high linearity within a wide input range. As mentioned, the switched-capacitor tracks and holds the input signal. The output buffer can be considered as a voltage follower to drive the next stage. Both the input buffer and the output buffer give design flexibilities to meet the requirements of the T/H specifications. For instance, the input buffer can filter out undesired signals and the output buffer provides the driven capability for the following stage.
A careful survey of potentially deleterious design issues must be made to lessen or to avoid degradation of the sampling quality. Fig. 3.4 shows a typical transient response of a T/H under a sine wave input [3.5] [3.6]. Some non-ideal phenomena are evident in Fig. 3.4. The first issue, as labeled 1 in Fig. 3.4, represents the settling of the turn-off switch during the transition of track mode to hold mode. The switch dumps part of charges into the hold capacitor during the clock transition. Fig. 3.5(a) depicts the charge dump (or charge injection) process after the MOSFET switches off.
This nonlinear behavior depends on the switched charge characteristic. The charge dump can be minimized by a closed-loop structure by grounding the inputs of the output buffer to a virtual ground. Moreover, even-order distortions can be cancelled out by fully-differential architecture (see Appendices).

The second limitation in Fig. 3.4 is droop during the hold mode. This is caused by the nonzero input current (or leakage) at the output buffer as shown in Fig. 3.5(b). The nonzero input current only occurs when BJT is used as the input transistor of the output buffer.

The third issue is the hold mode feed-through, as shown in Fig. 3.5(c). It also occurs in the hold mode as the input signal is coupled through the parasitic capacitor to the hold capacitor. Therefore, the hold-mode feed-through is signal independent and can occur in both closed-loop and open-loop architectures. Some of the above drawbacks can be cancelled by using fully differential architecture. Others will be alleviated through some specific techniques as addressed later.
Fig. 3.5 Issues in a T/H from the circuit point of view: (a) Charge dump/injection. (b) Hold-mode feedthrough. (c) Hold-mode droop.
3.2.2 Switched Emitter Follower T/H

The proposed T/H employs both input buffer and output buffer along with a “switched emitter follower” (SEF) [3.3] [3.5] [3.7] as the switching circuit. The SEF can be operated at ultra high frequencies due to its current mode operation and follower properties. Fig. 3.6 illustrates the differential SEF T/H. It is composed of a differential input buffer, one set of SEF and a differential output buffer.

![Fig. 3.6 SEF T/H.](image)

**Input Buffer** The input buffer stage is made of a differential pair (Q1 and Q2) with emitter degeneration \((2R_E)\). The loadings for the differential pair are two resistors (two \(R_C\)s). Since the purpose of the buffering stage is not to provide high gain for the input signal but to provide speed and linearity, we chose resistive loadings for both input buffer and output buffer. The major requirement of the input buffer is high linearity within a wide input range. A none-degenerated bipolar
differential pair (i.e. $2R_E = 0$) suffers from small linear input range which is resulted from the exponential relation between the input voltage and the output current:

$$I_O = I_S e^{V_{pe}/V_T}.$$  \hspace{1cm} (3.2)

The output voltage can be obtained through (3.2):

$$V_O = V_{CC} - I_O R_C$$
$$= V_{CC} - I_S e^{V_{pe}/V_T} \cdot R_C.$$  \hspace{1cm} (3.3)

From (3.3), the linear range is smaller than 2 or 3 $V_T$ which is definitely insufficient for our current purpose.

![Common-emitter amplifier with emitter degeneration.](image)

In this work, we use an emitter degeneration resistor to provide the requisite linearity. We verify this concept by inspecting a single-ended common-emitter amplifier with emitter degeneration, as shown in Fig. 3.7. Applying the small-signal model, under practical conditions ($\beta_0 \gg 1$, $r_o \gg R_E$ and $g_m r_o \gg 1$, where $\beta_0$ is the dc
current gain, $r_o$ is the output resistor and $g_m$ is the transconductance of the BJT), the effective transconductance becomes:

$$G_m = \frac{g_m}{1 + g_m R_E}.$$  \hfill (3.4)

The relation between the input voltage and the output voltage becomes:

$$V_O = V_{CC} - I_o R_C$$  
$$= V_{CC} - g_m R_C \cdot V_{BE}. \hfill (3.5)$$

By substituting (3.4) into (3.5), we rearrange (3.5):

$$V_O = V_{CC} - \frac{g_m R_C}{1 + g_m R_E} \cdot V_{BE}. \hfill (3.6)$$

Since $R_E$ is greater than 600 $\Omega$ and $g_m$ is near 0.03 for 750 $\mu$A collector current, we have $g_m R_E >> 1$. Thus, (3.6) further approximates:

$$V_O = V_{CC} - \frac{R_C}{R_E} \cdot V_{BE}. \hfill (3.7)$$

The derivation shows that we can reach an approximately linear relation without using any extra transistors which might cost extra headroom. In (3.5) through (3.7), $V_{BE}$ indicates the difference between the input voltage and the emitter voltage of the input BJT.

The nonlinearities introduced by the input buffer were minimized by using the smallest active devices in the signal paths (thus, the smallest parasitic nonlinearity). The gain of this stage was maintained at a constant value throughout the input full scale range by a parametric simulation to obtain the optimized $R_E$ and $R_C$. 
There are other ways to obtain the targeted linear range. In [3.5], a diode-connected load was used. In [3.7] the pre-distortion technique was used to get the linear relationship between the input and the output of the differential pair. The diode load was not employed in our design because it would limit our headroom under the 2.5-V supply. Fig. 3.8 (a) shows the diode-connected load differential amplifier. However, the voltage transfer function is highly linear by this means. Fig. 3.8 (b) illustrates the small-signal model of one branch of the differential amplifier with diode-connected loads. A simplified model is shown in Fig. 3.8 (c). By applying KCL at the output node, we get:

\[
\begin{align*}
    i_1 &= g_{m1}v_{ip} + \frac{v_o}{r_{o1}} = i_2 = \frac{v_o}{R_o + R_2},
\end{align*}
\]  

(3.8)

where the variables with subscript \(i\) (\(i = 1\) and 2) stand for the parameters of \(Q_i\). In (3.8),

\[
R_2 = r_{o3} \parallel R_{o3} \parallel \frac{1}{g_{m3}}.
\]  

(3.9)

By rearranging (3.8), the transfer function becomes:

\[
\frac{v_o}{v_{ip}} = \frac{g_{m1}}{R_o + R_2} - \frac{1}{r_{o1}}.
\]  

(3.10)

From this analysis, we find that the low frequency gain is independent of the bias current but the distortion is dominated by \(r_{o1}\). Therefore, the low frequency nonlinearity is mainly derived from Early effect. For a high frequency transfer function, the parasitic capacitances need to be taken into account and the results will be much more complicated.
Fig. 3.8 (a) Differential amplifier with diode-connected load. (b) Small-signal model of one branch in (a). (c) Simplified model of (b).
Conventionally, the emitter degenerating resistor in a differential pair is connected as shown in Fig. 3.9. An emitter resistor is connected in series with current source for each branch of the differential pair and thus voltage will drop in the signal path. Therefore, this method provides linearity at the expense of headroom. However, the emitter degeneration enables us to employ resistive loading for driving the next stage (i.e. SEF) and it is more suitable than the active load in high frequency operations. Shown in Fig. 3.6, the degeneration connection gives us the same result as the conventional (Fig. 3.9) does.

![Fig. 3.9 Conventional differential pair with two emitter degenerated resistors in series with current sources.](image)

Other effects causing nonlinearity over the frequency response in the input buffer includes the Early Effect at low frequency and high frequency current gain. As mentioned before, the output resistance of the HBTs should be considered in the calculation for more precise prediction. The dc-bias dependent (i.e. change of the $V_{CB}$ bias) characteristic of base width modulation leads to finite output resistance and
distortion of low frequency characteristics. On the other hand, the current gain falls off as the operation frequency goes “relatively” high. By “relatively high” frequency, we mean beyond the 3-dB bandwidth ($f_\beta$) of the BJT [3.8]. Fig. 3.10 shows the current gain vs. frequency of a common-emitter BJT. If we take the limit values of the bipolar device (i.e. unity-gain frequency at 250 GHz and dc current gain equals 250), $f_\beta$ is near to 1 GHz where the highest applicable frequency for a Nyquist-rate ADC is 2 GSPS.

![Diagram of current gain vs. frequency](image)

**Fig. 3.10 Current gain of a BJT.**

**Switched Emitter Follower** The properties of an ideal switch should include zero “on” resistance and infinite “off” resistance. Besides, the settling time of the switching should be minimized. However, at transistor level, there is no such device to perform “ideal” switching. In practical, MOSFETs and BJTs are normally employed as the switching devices because of their three-terminal properties. MOSFETs are extensively applied in voltage switching over BJTs since MOSFETs produce negligible voltage drop between source and drain and the gate is the control terminal. However, in high-speed switching, current-mode operation would be

30
preferable. Thus, as a result of superior current drive and higher operation frequency, BJT switches would be the better choice in our design.

To meet the high-speed and mid-scale resolution requirements, we employed a switched emitter follower (SEF) [3.3] [3.5] [3.7] as the current-mode switch in the T/H. Fig. 3.11 illustrates a single-ended SEF. It consists of two NPN BJTs (Q₂ and Q₃) as current switches and an emitter follower (Q₁) as a voltage follower. The switches steer the current of the emitter follower between the two current branches. The output (Y) of the SEF is the emitter of Q₁. Therefore, the output signal Y follows the signal at node X under the current drive provided by Q₂. Once the current is steered the other branch (i.e. Q₃), no signal will be transferred to Y. This technique minimizes the circuit complexity by using NPN BJTs only. Since the voltage signal transferred by the SEF drops one V_{BE,ON}, the output buffer can be used to provide the desired output voltage level. The detail operation for each operation state will be discussed shortly.

Fig. 3.11 SEF current switch.
**Track mode** During track mode, Q₂ (in Fig. 3.11) is on and Q₃ is off (i.e. clock “Track” at high and clock “Hold” at low). Thus, the tail current, Iₜₚ, flows through Q₂ and Q₁ is biased by Iₜₚ. Consequently, the voltage at node Y follows the voltage at node X with one V_{BE,ON} drop. Q₁ operates in forward-active region in track mode. Therefore, Q₁ functions as an emitter follower.

The frequency response of the emitter follower is plotted in Fig. 3.12. It is shown that the dominant pole is near to the unity-gain frequency of the BJT. In [3.9], it also appears that the dominant pole is close to \( \frac{g_m}{C_p} \), which is equal to the \( \omega_T \) (i.e. \( 2\pi f_T \)) of the device. As for the whole SEF, the dominant pole is determined by \( C_H \) and \( g_{m1} \). This pole locates at 31.83 GHz in our design. Thus, the SEF can provide a fixed gain over a large frequency range. Since both the input buffer and emitter follower are highly linear over a broad frequency range, the signal stored in \( C_H \) depends on the input signal linearly.

![Fig. 3.12 Frequency response of an emitter follower.](image)

In the meantime, the signal at Y continuously charges the hold capacitor, \( C_H \). Therefore, the input signal is stored on \( C_H \). The value (or size) of \( C_H \) affects the T/H
performance in two aspects: speed and droop rate. The speed affects the quality of signal tracking, while the droop rate influences the held signal. In the tracking process, the slew rate depends on the value of the $C_H$:

$$\frac{dv}{dt} = \frac{I_{sw}}{C_H}.$$ \hspace{1cm} (3.11)

Form (3.11), a small $C_H$ and/or a large $I_{sw}$ reduce the charging time. However, a small $C_H$ will increase the droop rate in the hold mode. Thus, a tradeoff between speed and droop rate must be made. In our design, $C_H$ is equal to 300 fF and $I_{sw}$ is equal to 1.5 mA. Thus, the slew rate (SR) for the SEF is 5 V/ns. This SR guarantees the feasibility of 2-GHz sampling. We will look at droop rate in more detail shortly.

Due to the switching operation of the T/H, the mix of the input and clock signals will cause distortions and noise. To reduce the noise produced by the current switching, a larger $I_{sw}$ is favorable. Therefore, a large $I_{sw}$ not only increases the speed of slew but also benefits the signal-to-noise plus noise ratio (SNDR). Besides, in differential architecture, the even order harmonics are cancelled. Thus, only the odd order harmonics are counted in the total harmonic distortion (THD). Ideally, the input signal is tracked linearly during the track mode. However, in addition to the inherent nonlinearities introduced by the input buffer, the SEF itself contributes distortion since the biasing current, $I_{sw}$, modulates the base-emitter voltage by charging and discharging the hold capacitor in the track mode. By adding this effect to the voltage transfer function, we obtain:

$$v_o = v_{in} - V_I \ln\left(\frac{I_{sw}}{I_S}\right),$$ \hspace{1cm} (3.12)
where I_s is a device constant describing the transfer characteristic of the transistor in the forward-active region. By substituting (3.11) into (3.12), we will get the harmonic distortions in terms of v_in due to I_{sw}. The THD (in dB) can be estimated by [3.7] [3.10]:

$$THD = 40 \log\left(\frac{f_{in} C_H}{I_{sw}}\right) + 20 \log A_{in} - 18,$$

(3.13)

where f_{in} is the input signal frequency and A_{in} the input signal amplitude. Therefore, we can estimate the THD for different input signals through (3.13). The switch current can further refer to the input signal, C_H and timing requirements [3.7]:

$$I_{sw} = 2\pi f_{in} C_H \frac{t_{hold}}{t_{sett}}.$$

(3.14)

In (3.14), t_{sett} represents the settling time of the held signal settled in the hold mode. Two aspects affect the settling: RC time constant of the SEF and the aperture distortion [3.10]. The nonzero aperture time represents the time interval from the track mode to the hold mode. Suppose the input full scale range is 1-V_{p-p} and the highest allowed input frequency is 1 GHz under 2-GSPS. Thus, I_{sw} is determined by C_H and t_{sett}. By substituting (3.14) into (3.13), the THD is directly related to t_{sett}. Therefore, this first order approximation gives us the design guideline on the distortion performance: the settling time of the switched-capacitor dominates the THD. The THD is -58 dB for our design at 2-GSPS.

Fig. 3.13 illustrates the simplified small-signal model of Fig. 3.11. Since the hold capacitor is typically much larger than the input parasitic capacitance, C_{pi}, the
dominant pole is determined by $C_H$. The equivalent resistance looked into $C_H$ approximates $\frac{1}{g_{m1}}$, hence, the dominant pole appears at:

$$f_1 = \frac{1}{2\pi \frac{C_H}{g_{m1}}} = \frac{g_{m1}}{2\pi C_H} = 31.83G. \quad (3.15)$$

The time constant, $\tau_1$, is equal to:

$$\tau_1 = \frac{C_H}{g_{m1}}. \quad (3.16)$$

Thus, in our design, $\tau_1$ is equal to 5.2 ps. For 1% settling, $v_o = 99\% v_{in}$, by using the step function approximation [3.11], we obtain the settling time is close to $4.6\tau_1$, which is 23.92 ps. It reveals that the device sizes and biasing conditions chosen are suitable for our application.

![Small-signal model of the SEF in Fig. 3.11.](image)

We verified the THDs for different input frequencies (differential sine waves with a 250-mV amplitude) at various sampling rates in the simulations. Table 3.1 lists the simulation results. $f_s$ represents the sampling frequency. As expected, higher sampling rate or higher input frequency results in worse THD. From Table 3.1, the
T/H can provide more than 9-bit resolution at low sampling rates or low input frequencies while around 8-bit resolution at 2-GSPS Nyquist rate.

**TABLE 3.1**

Comparison of the THD (in dB)

<table>
<thead>
<tr>
<th>$f_{in}$/$f_s$</th>
<th>21.37 MHz</th>
<th>99.71 MHz</th>
<th>209.57 MHz</th>
<th>473.1 MHz</th>
<th>872.3 MHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>100 MHz</td>
<td>-61.43</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>200 MHz</td>
<td>-59.29</td>
<td>-53.31</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>500 MHz</td>
<td>-58.44</td>
<td>-55.53</td>
<td>-50.06</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>1 GHz</td>
<td>-61.91</td>
<td>-58.73</td>
<td>-54.42</td>
<td>-49.34</td>
<td>-</td>
</tr>
<tr>
<td>2 GHz</td>
<td>-59.07</td>
<td>-57.33</td>
<td>-54.23</td>
<td>-50.40</td>
<td>-48.37</td>
</tr>
</tbody>
</table>

**Hold mode** In hold mode, as shown in Fig. 3.11, the “Track” clock is low and the “Hold” clock is high. Thus, $Q_2$ is turned off and $Q_3$ is turned on. Consequently, the SEF starts to contribute $I_{sw}$ in the input buffer. As a result, the extra voltage drop produced by $R_CI_{sw}$ decreases the voltage level at node X. The reduction of the base voltage of $Q_1$ will further lead to the change of operation region of $Q_1$. The emitter voltage of $Q_1$ appears across $C_H$, the lowered base voltage makes $Q_1$ leave the forward active region and enter the cutoff region ($V_{BE} < V_{BE,on}$ and $V_{BC} < V_{BC,on}$). Fig. 3.14 depicts the variation of the $Q_1$ base voltage, $V_{B(Q1)}$, as a function of time. The analysis of the charge fraction appearing across $C_H$ is given below.
Fig. 3.14 $V_{B(Q)}$ as a function of time.

Fig. 3.15 Bottom-plate sampling technique.

**Charge Injection/Dump** Although the differential sampling has good rejection of coupling noise and supply noise and a small common-mode to differential-mode gain, the charge dump or charge injection from the turn-off switch still exists. In CMOS T/H design, the “bottom-plate sampling” technique [3.12] can be used to eliminate charge injection. Fig. 3.15 shows an example of bottom-plate sampling. Once $M_2$ turns off during the hold mode, the bottom plate of $C_H$ is “floating”, such that the stored charge can not be changed. Thus, most of the charge from the $M_1$ goes to the input and no charge goes to $C_H$. To be more precisely, since the MIM capacitor
has small parasitic capacitance such that there is still a small amount of charge going to \( C_H \). Nevertheless, the changed value should be fairly small.

The bottom-plate sampling is not suitable in the proposed work which employs HBTs throughout the signal paths. In this work, the switching action is caused by the change of the emitter follower from one operation region to the other operation region. \( Q_2 \) in Fig. 3.11 should be an open device ideally during the hold mode. In reality, \( Q_2 \) looks like a small capacitor and the value of the capacitance varies with the base-emitter junction voltage [3.13]. Parts of the charge will gather at the junction capacitor and other parts will go to \( C_H \). Thus, this signal-dependent non-zero junction capacitor will causes non-constant charge injection during the hold mode. Suppose the cutoff junction capacitor is \( C_{bc(cutoff)} \), the ratio of the held signal affected by the track-to-hold transition is proportional to:

\[
\frac{C_{bc(cutoff)}}{C_H + C_{bc(cutoff)}}
\]  

(3.17)

We used the smallest device size and thus reduced the variation factor. We can also find a tradeoff between the speed (the capacitance of \( C_H \)) and the charge injection.

**Droop rate** The droop rate is defined as the voltage drop of the output signal per unit time during the hold mode. The droop rate should be minimized as much as possible. In our design, the droop rate stems from the nonzero base current of the output buffer. It can be described quantitatively as follows. Suppose the voltage held by \( C_H \) is \( V_{\text{hold}} \) and is equal to:
\[
\frac{dV_{\text{hold}}}{dt} = \frac{I_{Bo}}{C_H},
\]

(3.18)

where \( I_{Bo} \) is the base current of the output buffer during the hold period. From (3.18) and (3.11), both equations are inversely proportional to \( C_H \) but the requirements are in the contradict manners. We need a smaller \( I_{Bo} \) and/or a larger \( C_H \) to get a better holding capability. But a large \( C_H \) will reduce the slew rate during the track mode. As mentioned earlier, \( C_H \) is equal to 300 fF and \( I_{sw} \) is equal to 1.5 mA, thus the SR is equal to 5 V/\( \text{ns} \). Upon this \( C_H \), we can figure out the droop rate by further finding out \( I_{Bo} \). However, since \( I_{Bo} \) is signal-dependent, the droop rate is also signal-dependent. In other words, a smaller output current in the output buffer will result in a smaller \( I_{Bo} \).

Table 3.2 shows the simulated results. The worst case (\( f_{in} = 872.3 \) MHz and \( f_s = 2 \) GHz) shows less than one LSB deviation in the hold mode.

<table>
<thead>
<tr>
<th>( f_{in} )</th>
<th>21.37 MHz</th>
<th>99.71 MHz</th>
<th>209.57 MHz</th>
<th>473.1 MHz</th>
<th>872.3 MHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>100 MHz</td>
<td>0.61</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>200 MHz</td>
<td>0.62</td>
<td>0.81</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>500 MHz</td>
<td>0.40</td>
<td>1.34</td>
<td>2.31</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>1 GHz</td>
<td>0.35</td>
<td>1.85</td>
<td>3.07</td>
<td>5.07</td>
<td>-</td>
</tr>
<tr>
<td>2 GHz</td>
<td>0.87</td>
<td>2.03</td>
<td>4.53</td>
<td>7.83</td>
<td>11.70</td>
</tr>
</tbody>
</table>

**Hold mode feedthrough** Another cause of non-ideality during the hold mode is the signal *feedthrough*. The feedthrough is caused by the base-emitter junction capacitor which couples the signal at X to the hold capacitor. We have used the
minimum HBTs for the followers to reduce the coupling. Besides, by adding the feedforward capacitors as shown in Fig. 3.16 we can cancel the feedthrough.

![Fig. 3.16 Feedforward capacitors can reduce hold mode feedthrough.](image)

We used metal-insulator-metal (MIM) capacitors as the hold capacitors because of their higher process accuracy and more options on the capacitance selection. Furthermore, the bottom plate of the MIM capacitor is connected to the sampling node, Y, such that we can use the parasitic capacitance of the bottom plate as part of the hold capacitor.

**Settling Time** The hold capacitor and switching current dominate the settling time since these two parameters determine the slew rate of the whole T/H. For the desired specifications, a settling time of less than \( \frac{1}{10} \) period was required. Based on the requirements of the settling time, droop rate and charge injection, a 300-fF hold capacitor can achieve the 2-GHz sampling with acceptable distortion. The dynamic
current of the switch, 1.5 mA, also enhances the sampling speed. There is no static current such that the total static power consumption is not affected by the SEFs.

**Output Buffer** The output buffer is made of a differential pair with emitter degeneration. The main functions of this buffering stage are to drive the comparators and to provide proper output common mode. The output common mode was designed to be exactly the same as the input common mode of the comparators. The current of this stage is adequate to drive the comparators. The use of this gain stage can also make up for the insufficient gain from the two previous stages. The total power consumption of the T/H is around 13 mW.

### 3.3 Comparator Design

#### 3.3.1 Introduction

While the T/H circuit captures the analog input signal in a data acquisition system, the comparator is the key component in producing digital output. Thus, the functionality of a comparator is to compare the analog input signal to a reference voltage level. Fig. 3.17 illustrates a conventional flash ADC. For n-bit resolution, the sampled analog signal is compared to \(2^n - 1\) digitized voltage levels. As indicated in Fig. 3.17, the full scale range is divided into \(2^n - 1\) segments. Each voltage is a quantized reference level. Consequently, \(2^n - 1\) comparators are needed in n-bit conversion. By making comparison to each reference voltage for the \(2^n - 1\) comparators, each comparator will output a logical “1” for an input greater than the
referred voltage and a logical “0” for the opposite condition. These comparisons result in 

$$(2^n - 1)$$ \text{ thermometer codes.}$$

A typical comparator design is composed of a preamplifier and a latch. The preamplifier can reduce the effect of input offset and thus reduces the probability of metastability errors [3.14]. The input offset can cause invalid comparison leading to incorrect digital codes at the output. Since the input offset can be rejected by the preamplifier, this factor does not have a serious negative impact on latch operation. The latch is normally made up of a positive feedback loop to lock the compared result. Thus, a shorter response time is more appropriate for high-speed applications.

Fig. 3.17 Quantization method of an analog input signal.
Although input offsets are small for most cases, in a flash-based ADC, a small
difference between the input and the reference will always exist since the input is a
continuous waveform. Several techniques known as the autozeroing or input offset
cancellation [3.15] [3.16] [3.17] [3.18] were developed to reduce the impact of input
offsets for generic amplifiers and comparators. Nevertheless, in our ultra high-speed
mid-resolution ADC, we expected not to use any extra circuitry which will cost
additional time penalty, but to simply use bipolar devices and utilize interpolation-
averaging technique, as discussed in Chapter 5. For the proposed work, a combination
of a 3-bit ADC and a 5-bit ADC, has large LSB for each sub-ADC, 31.25 mV and
125 mV, respectively. Thus, the comparators are less susceptible to input offset.

Since the preamplifier has already amplified the compared signals, the latch has
negligible influence by the input offset. The main function of the latch is to quickly
lock the compared result and memorize it. Typically, latches with positive feedback
circuitry are seen in the literatures because of the short response time. The positive
feedback can provide ultra high gain in a short instant, such that the charging time of
the latch can be decreased enormously. The analog output swing of the latch was
limited to some hundreds of mV (100 ~ 200 mV) to confine the response time within
a desired range. In this high frequency BJT comparator design, the swing is about 200
mV. Thus, additional circuitry to convert this small swing into a digital level is
required. This comparator also incorporates a digital-level converter as the output
buffer to obtain the digital levels, i.e. 0 V and 2.5 V. In a fully differential CMOS
design, the output common mode could be designed in the middle of the rail-to-rail
range for each branch such that the differential output is a rail-to-rail logical signal. Therefore, no additional buffer is needed. In the following sections, we will see the design and simulation of the comparator in more detail.

3.3.2 Proposed Structure

**Preamplifier** To exclude the nonlinearity due to even-order harmonic distortion, we employed fully differential architecture throughout the analog circuitry in the comparator design. As shown in Fig. 3.18, the preamplifier consists of an input buffer and a fully differential amplifier as the core. Since the preamplifier is the input stage of the comparator, it was designed to be sensitive to its input signals such that it can make the right comparison. Thus, the preamplifier should provide a certain gain to distinguish the input differences. For a good preamplifier, it also has a large input range.

**Input Buffer** The input transistors (Q₁ – Q₄) connected in the follower form not only provide approximately constant input impedances but also isolate the core amplifier from the T/H. The constant input impedances become constant load impedances for the differential resistor ladder and for the T/H. However, the placement of the followers decreases the input range. Thus, we compensated one V_{BE} drop in the input range to increase the overall performance. In this work, the input and output swings of the T/H are both 1 V_{p-p}, 500 mV for each branch. Consequently, the use of the followers does not significantly affect the input dynamic range. In Fig.
3.18, assuming each BJT has a $V_{BE}$ drop of 0.7 V, and the tail current source made of MOSFET has a minimum overdrive voltage of 0.2 V, and the lowest input voltage is 1.6 V.

![Fig. 3.18 Proposed preamplifier design.](image)

The input buffer can also isolate signal feedthrough directly from the preamplifier core back to the T/H and to the differential resistor ladder. The bias of the input buffer should also have the capability to drive the input capacitor of the preamplifier core.

**4-in/2-out Differential Pair as Preamplifier Core** The preamplifier core is composed of two cross-coupled differential amplifiers ($Q_5$ – $Q_{14}$). The output terminals are cross-coupled to the load resistors ($R_C$). Thus, the amplifier has four input terminals and two output terminals. The cascoding transistors ($Q_9$ – $Q_{12}$) prevent the amplification transistors ($Q_5$ – $Q_8$) from exceeding the oxide breakdown voltage, $BV_{CE}$, since $BV_{CE}$ is between 1.5 V and 1.7 V. $Q_{13}$ and $Q_{14}$ are voltage
followers to buffer the outputs of the preamplifier to the latch. They also provide isolation from the latch to decrease kickback noise. A reset switch can be added across the differential outputs to erase the memory of the last output status.

**Input Offset** Generally, the mismatches between devices contribute the input offset to a differential pair. The random input offset will then affect the linearity of the converter and can be seen in DNLs and INLs. In Fig. 3.18, the mismatches mainly stem from the load resistors and the amplification transistors. From [3.19], the offset voltage of an emitter-coupled pair can be written as:

\[
V_{os} = V_T \ln\left[\frac{R_{c2} A_2 N_{d2} W_{B2}(V_{CB2})}{R_{c1} A_1 N_{d1} W_{B1}(V_{CB1})}\right].
\]  

Eq. (3.19) directly relates to the geometries of the devices and the doping properties of the process. We can reduce the offset voltage through two means: physical layout and large gain of the differential pairs. The former one means a symmetric layout methodology to minimize the matching inconsistency. The latter one can reduce the input referred offset. According to the SiGe BiCMOS process employed, the standard cell devices are fabricated to be within the minimum variation. The load resistors are poly resistors which have smaller process variation than that of diffused resistors.

**Metastability** A large gain not only overcomes the input offset but also lowers the probability of metastability error which is defined as the probability of unknown state error per unit time in the flash type converters with regenerative latches. It occurs when the sampled input is so small differed from the referred voltage level that the output can not decide a proper state for the comparison in the latch period.
Intuitively, suppose comparator has a larger gain, the probability of metastability will be smaller. If we consider the step response, this probability [3.14] [3.20] can be expressed as

\[ P_{err} = \frac{2(2^n - 1)V_o}{V_{IR}A_{pa}A_{la}} e^{-t_r/\tau}, \]  

(3.20)

where \( n \) is the bit number of the converter, \( V_o \) is the required output swing that the latch will make a valid digital state, \( V_{IR} \) is the analog input range, \( A_{pa} \) is the gain of the preamplifier, \( A_{la} \) is the gain of the latch, \( t_r \) is the time duration of a latch mode and \( \tau \) is the latch mode time constant. Eq. (3.20) clearly states the need for a large gain from the preamplifier. Nevertheless, the reduction of \( \tau \) and/or the increase of \( t_r \) are more effective in lowering \( P_{err} \), as it appears in the exponential decay term. In our design, the probability error is \( 8.53 \times 10^{-10} \) under 2-GHz operation (\( A_{pa} = 13 \) dB, \( A_{la} = 10 \) dB, \( V_o = 200 \) mV), because of the ultra-low \( \tau \) (e.g. 11.55 ps for the 5-bit interpolated ADC if we ignore Miller effect). Thus, the error probability of this design is relatively small. A comparator with broader bandwidth is not only having better speed but the probability of metastability error is also improved greatly. We discuss this more deeply in the latch design. Other ways to reduce metastability such as time interleaving type converters and logical encoding (e.g. Gray encoding) [3.21] [3.22] are also available for current ADC designs.

**Gain** Despite a large gain can improve the accuracy of the ADC, the large gain also accompanies a large time constant. Thus, it also requires more time to charge the input capacitance of the latch. It is a contradiction in this high-speed ADC. Typically,
the preamplifier with a gain of 5 dB to 15 dB is adequate for a conversion rate greater than 200MHz. The gain in our preamplifier is 13 dB.

**Bandwidth**  For the speed consideration, the minimum size HBTs were used. Each of the HBT has an input capacitance closed to $6 \sim 7$ fF. Suppose we ignore the base-collector capacitance and take $7$ fF as the total input capacitance, the preamplifier can be operated at an ultra high frequency. However, by taking the whole 5-bit sub-ADC into account, the 3-dB bandwidth is 4.89 GHz under a flash structure with the output resistance of the T/H equal to 300 Ω. Nonetheless, thanks to the interpolating technique, the bandwidth is increased to $13.78$ GHz. This is a result from only using 11 preamplifiers out of the 31 preamplifiers. Note that the actual bandwidth is smaller than this estimation, since we have to include all the parasitic capacitances and Miller effect.
Latch Since the latch secures the comparison result, a positive feedback structure is commonly employed. By doing so, the output can be much quickly latched. Fig. 3.19 (a) depicts a latch incorporating two inverting amplifiers to form the positive feedback [3.23]. Suppose the inverting amplifiers have a gain of $-A_{la}$ and a time constant of $\tau$. The relations between the input and output can be found through:

\begin{align}
-A_{la} V_o &= V_{in} + \tau \frac{dV_{in}}{dt}, \quad (3.21a) \\
-A_{la} V_{in} &= V_o + \tau \frac{dV_o}{dt}. \quad (3.21b)
\end{align}

By subtracting (3.20b) from (3.20a), we can obtain:

\[ \frac{d(V_{in} - V_o)}{dt} = \frac{A_{la} - 1}{\tau} (V_{in} - V_o), \quad (3.22) \]
Eq. (3.22) indicates an exponential term in \((V_{in} - V_o)\), that is:

\[
V_{in} - V_o = V_{io0} e^{\frac{(A_{la} - 1)L}{\tau}},
\]

(3.23)

where \(V_{io0}\) represents the difference between \(V_{in}\) and \(V_o\) at \(t = 0\). Through (3.23), we can find the required time, \(T_l\), for the latch to perform a voltage difference \(\Delta V_{io}\) in \(V_{io}\) (\(V_{io} = V_{in} - V_o\)):

\[
T_l = \frac{\tau}{A_{la} - 1} \ln \frac{V_{io0} + \Delta V_{io}}{V_{io0}}.
\]

(3.24)

Therefore, if \(V_{io0}\) is very small, \(T_l\) is very large. Fig. 3.19 (b) illustrates this phenomenon. For three different initial voltage differences \(V_{io1}, V_{io2}\) and \(V_{io3}\), the response time for latching varies. As shown in Fig. 3.19 (b), we can see \(V_{io1} > V_{io2} > V_{io3}\). According to (3.24), the latch takes the longest time to latch the case of \(V_{io3}\), while it takes the shortest time for the case of \(V_{io1}\). As a result, \(T_3 > T_2 > T_1\).

We can also define the **metastability error of the latch** as it can not make a decision during the latch period due to the small \(V_{io0}\). The probability of metastability error becomes [3.24][3.25]:

\[
P(T_l > \frac{T}{2}) = e^{\frac{(1-A_{la})T}{2\tau}},
\]

(3.25)

where \(T\) is the clock period. We have assumed the latch period is equal to one half clock period in (3.25). Eq. (3.25) suggests that we can reduce the metastability error by increasing \(A_{la}\) and/or reduce \(\tau\).
ECL  Our latch employed “emitter-coupled logic” (ECL) architecture [3.26] [3.27] because of its high current drive capability. It is more extensively used than any other architecture in high-speed applications. Thus, we chose ECL to implement the latch circuit. Fig. 3.20 shows such an implementation. The circuit shows three emitter-coupled pairs from by Q₁ – Q₂, Q₃ – Q₄ and Q₅ – Q₆. Q₁ – Q₂ construct the input amplifier. Q₃, Q₄, Q₇ and Q₈ form a positive feedback latching circuit. Q₅ and Q₆ switch the tail current between the two pairs, Q₁/Q₂ and Q₃/Q₄. When CK is low (i.e. $\overline{CK}$ is high), the latch is in the amplification mode. Thus, Q₁ and Q₂ amplify the input signal with a low-frequency gain of $-g_mR_c$. While CK is high, the tail current flows through the Q₃/Q₄ pair such that the latch is in the latch mode. The gain in the latch mode is greater than that in the amplification mode since positive feedback can increase the gain enormously.
Because the ECL operates in current mode with ~ 200 mV voltage swing at the output terminals, the response time is quite small compared to that of a voltage mode operation counterpart. In order to estimate the RC time constant during the latch mode, we look at Q₃ and find its loading. Since $R_C$ is much smaller than any inherent resistance from the HBT (in parallel), the equivalent resistance is closed to $R_C$. The capacitance comes from Q₇ and Q₄. To obtain the equivalent capacitance, we draw an approximate loading schematic for Q₃, as shown in Fig. 3.21. We find that the total capacitance is $C_{\pi 7}$ in series with $C_{\pi 4}$. Therefore, the total capacitive loading is reduced by this positive feedback. In this work, the smallest HBT, which has an input capacitance of ~ 7 fF, was used. As a result, the capacitive loading becomes 3.5 fF if we ignore Miller effect. Therefore, the time constant is around 2 ps. If we include Miller effect, the value would become 10 ps which is still quite small for a sampling rate from 500 MHz to 2 GHz.
Digital-level Converter Since the output swing of the latch was restricted to ~200 mV, an extra stage converting this analog swing into rail-to-rail digital logic is necessary. We employed a complementary self-biased differential amplifier (CSDA) [3.28] as the swing converter, as shown in Fig. 3.22. The CSDA is made of $M_1$ – $M_6$. This circuit biases itself through a negative feedback loop which can further stabilize the bias condition without slew rate issues. The complementary MOSFET works as the load for its counterpart. The inverter ($M_7$ and $M_8$) followed by the CSDA boosts the digital signal with additional gain and isolates the CSDA from the back-end encoder. Therefore, the complete comparator is composed of the first stage shown in Fig. 3.18, the second stage shown in Fig. 3.20 and the third stage shown in Fig. 3.22. The total static power consumption is 6 mW.

3.4 Summary

This chapter discussed the topologies selected for our T/H and comparator. We also stressed the detail design considerations for high-speed applications. The T/H
showed good linearity over a wide frequency range as we mentioned in Section 3.2. Noises caused by the hold-mode feedthrough and droop rate were also taken into account when we wanted to meet the operation speeds (2 GHz). This T/H has an unloaded bandwidth of 31.83 GHz and it consumes relatively low power dissipation, 13 mW. The slew rate is 5 V/ns. According to the THD simulation, it can provide the dynamic performance of 9-bit at low input frequency. Furthermore, a near 8-bit dynamic performance can be achieved at 2-GSPS.

The comparator has an input full scale range of 1 V. Because of the low supply voltage at 2.5 V, we employed fully differential architecture throughout the analog part of the comparator. The differential output of the analog part was then converted to a single-ended digital rail-to-rail. In our 8-bit (3-bit plus 5-bit) two-step ADC, the comparator has the capability to distinguish two signals with a difference less than the LSB for each step (i.e. 125 mV for the first step and 31.25 mV for the second step). The propagation delay, including the settling time and latch time period, is less than 500 ps such that we can operate the ADC at the desired sampling rate, 2-GSPS, for the 5-bit ADC (8-bit at 500-MSPS). The total static power consumed is 6 mW. The comparator can provide a total gain of 23 dB that leads to a metastability probability of $8.53 \times 10^{-10}$ at 2-GSPS. We achieved a bandwidth of 13.78 GHz for our 5-bit interpolating ADC.
Chapter 4: DAC, Subtractor, Residue Amplifier, Delay Element and Encoder

4.1 Introduction

In addition to the T/H and comparator, the subtractor, DAC and residual amplifier are also critical components in a two-step ADC. The first step ADC, or the coarse ADC, sorts the analog signal in a low-resolution manner. The input of the second step is the amplified subtraction of the first step converted results. These are converted back to analog form, from the original input analog signal. The resulting residue signal is amplified to the full scale range and then converted to digital codes. Thus, the total resolution becomes the sum of the resolution provided by each stage. Therefore, a digital-to-analog converter (DAC) is a necessary part of the circuit. An analog subtractor subtracts the DAC output from the original input signal. After that, the residue amplifier amplifies the subtracted signal and feeds to the second step ADC. Since the second step ADC performs the high resolution (lower order bits) A/D conversion, it is also called the fine ADC.

Accurate subtractions and sub-conversions are major hurdles to overcome in high performance two-step ADCs. These are the main components dealt with in this chapter. We will introduce some DAC architectures and then focus on our current-steering D/A cell in Section 4.2. In Section 4.3, a novel combinational DAC circuit and subtractor will be presented in detail.
The residue amplifier is another key component in the two-step ADC. Although, the topology for the amplifier is very simple, even the simple design achieves the desired noise performance, linearity and bandwidth in our application. The discussion and simulation of the residue amplifier are addressed in Section 4.4.

While the DAC, subtractor and residue amplifier work on generating the second step input signal, the first step ADC continues producing the digital output codes. Consequently, the digital outputs of the two steps are not synchronized. To solve this problem, we need both an analog delay element and a digital delay element. The analog delay amplifier can hold up the original analog input signal to wait for the D/A-subtraction. We used two T/Hs to realize the analog delay buffer amplifier. The digital delay elements are placed at each binary output terminal to synchronize the output binary codes of the two step ADCs. The generally known “dynamic shift register” was employed for the digital delay element. Both delay elements will be discussed in Section 4.5.

Section 4.6 presents the design of the thermometer-code to binary-code encoder. Since we have two-step conversion, two encoders are required in the ADC. Both encoders utilize “read-only-memory” (ROM) along with bubble code correction technique to correct possible metastabilities caused by the comparators. Finally, our conclusion concerning designs of these circuit elements are given in Section 4.7.
4.2 DAC

4.2.1 Introduction

In contrast to an ADC, a DAC converts digital codes into analog signals. The block diagram in Fig. 4.1 (a) depicts the functionality of a DAC. The inputs to the DAC are digital codes \((b_0, b_1, b_2, \ldots, b_{N-1})\) and they are in binary forms typically. The transfer characteristic of the DAC is shown in Fig. 4.1 (b). This figure depicts a DAC converting 3-bit digital input into an analog output. \(A_o\) is the analog output, \(D_{in}\) represents the digital input code (000, 001, 010, 011, 100, 101, 110 and 111) and \(A_{FS}\) is the full scale range of the analog output signal. We can obtain the output signal through the transfer function:

\[ A_o = \Delta \times D_{in}, \] (4.1)

where \(\Delta\) represents the step size or the magnitude of one LSB of the converter and is equal to \(\frac{A_{FS}}{2^N}\), \(N\) is the bit number. \(D_{in}\) can be further written in the following form:

\[ D_{in} = b_{N-1}2^{N-1} + \ldots + b_12^1 + b_02^0. \] (4.2)

By substituting (4.2) into (4.1), we get:

\[ A_o = \Delta \times (b_{N-1}2^{N-1} + \ldots + b_12^1 + b_02^0) \] (4.3)

or

\[ A_o = A_{FS} \times (\sum_{i=0}^{N-1} b_i2^{i-N} + b_02^{-N}). \] (4.4)

From (4.4), we can easily relate the output analog signal to the input digital codes.
4.2.2 Architectures

There are several architectures available in designing a DAC. Each of them has its own advantages and disadvantages. Depending on application, designers can choose a proper architecture to fulfill the requirements. Some commonly used architectures, such as voltage-scaling DAC (e.g. resistor-string DAC and resistor-ladder DAC), current-steering DAC and charge-redistribution DAC, will be briefly introduced below.
Fig. 4.2 A simple resistor-string DAC.

**Voltage-Scaling DAC**  Fig. 4.2 shows a resistor-string DAC, generally considered the simplest DAC. The digital input could be in any digital format. The DAC employs a thermometer code decoder to produce the thermometer codes. Each thermometer code controls a switch which will turn on or off to indicate a specific voltage to the output. The output will eventually add up all the voltages to complete the D/A conversion.

This is a voltage-type DAC with no resistive load at the output. Hence, there is no output current. The advantages of this DAC are good accuracy, inherently monotonic and demonstrate low glitch energy. However, a large parasitic capacitance, which results in a slower conversion rate, always loads the output. In addition, a decoder is needed. An alternative architecture shown in Fig. 4.3 enormously reduces the total number of resistors as well as switches by means of binary switch array.
Moreover, no decoder is needed. The binary codes directly control the switches. Nonetheless, the switch network still limits the conversion speed. The resistor array directly affects the linearity performance. For instance, the DNL depends on the local matching of the neighboring resistors and the INL depends on the global matching of the resistor string.

![Diagram of a 3-bit DAC using binary switch array.](image)

**Fig. 4.3** A 3-bit DAC using binary switch array.

Regarding the matching property of the modern VLSI process, the resistor string architecture has resolution limitation (typically 8-9 bits). Most of the device mismatches are due to random variation and gradient variations of the process.
Random variation is mainly caused by resistor mismatches [4.1], while doping, thermal and oxide thickness affect the gradient variations [4.2] [4.3]. These first order and higher order effects will lead to a poor INL result [4.1]. By trimming or calibration the resistor string, the mismatching could be reduced.

Nonetheless, one can employ some special layout schemes [4.4] [4.5] [4.6] to overcome the mismatching without adding any complexity to the circuit. In [4.4], the unit resistor was divided into some multiple sub-serial and sub-parallel connected resistors and was laid out systematically [4.5] [4.6] to minimize both the random and the gradient errors. The tradeoff between power, area and accuracy of this technique should also been taken into account.

Fig. 4.4 A 3-bit R-2R laddered DAC.

Another voltage-scaling DAC employs an R-2R resistance ladder network and an operation amplifier. Fig. 4.4 illustrates a 3-bit example. The R-2R ladder network
can generate an input code dependent output current. The output current can be expressed as:

\[ I_o = \frac{1}{2R} \left( \frac{V_{ref}}{8} b_0 + \frac{V_{ref}}{4} b_1 + \frac{V_{ref}}{2} b_2 \right), \]  

where \( V_{ref} \) is the analog full scale range. The operation amplifier connected in the follower scheme provides the output voltage. Therefore, the output voltage becomes:

\[ A_o = -I_o \cdot 2R = -\left( \frac{V_{ref}}{8} b_0 + \frac{V_{ref}}{4} b_1 + \frac{V_{ref}}{2} b_2 \right). \]  

For an n-bit R-2R laddered DAC, the output voltage can be written as:

\[ A_o = -\sum_{i=0}^{n-1} \frac{V_{ref}}{2^{n-i}} b_i. \]  

Similar to resistor-string architecture, this architecture also requires good matching within the resistive ladder to achieve desired accuracy. Other than these conventional prototypes, a nonlinear resistor string DAC [4.7] and an interpolating resistor string DAC [4.8] were developed to improving the performances.

**Current-Steering DAC**  As contrasted with voltage-scaling architectures, current-steering DACs (also called current-scaling DACs) can directly drive the resistive load. Thanks to its large current driven capability, it is widely used in high-speed applications. Fig. 4.5 (a) shows a *binary-weighted current-steering DAC*. According to the literal meaning, the current sources are binary weighted. The binary inputs can directly control the latches (or switches). Despite of its speed, monotonicity is not guaranteed. Moreover, potentially large “glitches”, depicted in
Fig. 4.5 (b), can occur in the output signal during the code transitions. The glitches are due to the timing skews. Because of these glitches, the latches should be synchronized. Another current-steering ADC called *equally-weighted current-steering DAC* is shown in Fig. 4.6. As shown in the schematic, all the current sources have the same current flow. This DAC is inherently monotonic and has fewer glitches. However, a thermometer decoder is necessary to generate the thermometer control codes.

![Diagram of a binary-weighted current-steering DAC](image)

Fig. 4.5 (a) A binary-weighted current-steering DAC. (b) A glitch due to the latch asynchronism.
Another major drawback in a current-steering DAC is nonlinearity. Since the current-steering technique is extensively used in high-speed applications (because of its current-mode operation), the high current drive also causes nonlinearities. To reduce the effect of nonlinearities, matching between current sources must be improved. It is known that symmetrical layout strategies and stable current references can much improve the matching between current sources. In [4.9], a standard 0.5-µm digital CMOS technology was utilized without any calibration or trimming, their 12-bit DAC reached 300-MSPS with DNL and INL equal to 0.3 LSB and 0.6 LSB, respectively. However, different fabrication processes suffer from various degrees of nonlinearities. Some trimming mechanisms should be applied in the circuits to reduce the error caused by the device mismatches. In [4.10], a bidirectional laser trimming network was demonstrated to achieve 14-bit accuracy at 100-MSPS. A trimmable current mirror approach introduced in [4.11] was reported with a resolution of 14-bit at 100-MSPS. This methodology can cover all the nonlinearity range.
**Charge-Redistribution DAC** In addition to voltage-scaling DACs and the current-steering DACs, charge-redistribution DACs (or charge-scaling DACs) are classed as another type of DAC architecture. A charge-redistribution DAC utilizes a capacitor array to produce an output voltage. Charges stored in the capacitors are redistributed, hence the name: charge-redistribution DAC. Fig. 4.7 illustrates a binary-weighted charge-redistribution DAC. Similar to the voltage-scaling DACs and current-steering DACs, the capacitor array can be either binary-weighted or equally-weighted. The operation principle is also similar to its voltage-mode and current-mode counterparts. Thus, the analog output can be represented by:

\[
A_o = \sum_{i=0}^{n-1} \frac{V_{\text{ref}}}{2^n} b_i.
\]  

(4.8)

![Fig. 4.7 A charge-redistribution DAC.](image)

This type of DAC suffers from the top plate parasitic capacitance, thus introduces a gain error. This is why calibration is needed in high-resolution (≥ 10-bit)
applications. One can place an operation amplifier at the back end to provide output voltage and to mitigate the effect of the parasitic capacitors.

Since the trend of the modern integrated circuits is to decrease the die area as much as possible in order to reduce fabrication cost, this architecture is not a favorable candidate in our work. Furthermore, the possible long charging and discharging times may not be appropriate for high-speed applications.

4.2.3 Proposed Design

For the purpose of multi-Giga Hz applications, the equally-weighted current steering DAC is a favorable option due to its high-current drive and fewer glitches. Another reason of choosing equally-weighted over binary-weighted is that the former one leads to a less waiting time for the second step conversion than that of the latter one since the signals are transported through the encoder. In practical designs, the static transfer characteristic of a current-steering DAC suffers from sever gain loss, offset, non-monotonicity and other nonlinearities. Thus, considerable attention is paid to layout in order to reduce device mismatch.

To begin with the discussion of the DAC design, be aware know that a subtractor in a pipelined ADC is conventionally implemented by an operational amplifier connected differentially or in switched capacitor fashion. These closed-loop structures generally have small bandwidth and limited speed. The use of a current-driven structure along with an open-loop amplifier can avoid this problem. An open-
loop D/A current-steering subtraction technique has been introduced in [4.12]. In this section, we will start with this current-steering DAC cell. The control mechanism of this reported architecture depends not only on the two input voltages, but also on the amplifier degeneration resistance and other biasing conditions. We will show how we reduce the design complexity later in this section.

![Current-steering circuit](image)

**Fig. 4.8** Current-steering circuit in [4.12].

Fig. 4.8 illustrates the main elements of this design approach (our new approach is summarized in the section following). The four bipolar transistors, $Q_3$–$Q_6$, are identical and function as current switches. $D$ and $\overline{D}$ are complementary logic signals controlling the steering of $I_1$ and $I_2$ between the two differential branches. $Q_1$, $Q_2$ and two $R_E$s form the degenerated differential amplifier with resistive loads. The sum of $I_1$ and $I_2$ is equal to the tail current, $I_{EE}$. The two input voltages, $V_1$ and $V_2$, along with $I_{EE}$ are to determine the value of $I_1$ and $I_2$. Here, we can take $V_1$ and $V_2$ as two differential inputs which have a signal with same amplitude but with opposite sign added to a common mode. Therefore, the small signal approximation can be applied if
the amplitude is within the linear operation range given by $R_E$ and $I_{EE}$.

Assuming $Q1$ and $Q2$ are identical, with a transconductance of $g_{m1}$ and $g_{m2}$, respectively, and $D$ at high, the voltages of the current-steering circuit (CSC) become:

$$V_{op} = -\frac{g_{m1} \cdot R_o}{1 + g_{m1} \cdot R_E} V_1, \quad (4.9a)$$

$$V_{on} = -\frac{g_{m2} \cdot R_o}{1 + g_{m2} \cdot R_E} V_2. \quad (4.9b)$$

Because the product of $R_o$ and $(I_1 - I_2)$ is designed to be half an LSB of the first step, the difference between $I_1$ and $I_2$ makes $g_{m1}$ unequal to $g_{m2}$. Here, $g_{m1} = I_1 / V_T$, and

$$I_1 = I_S \exp\left(\frac{V_1 - V_{x1}}{V_T}\right). \quad (4.10)$$

In (4.10), $V_T = \frac{kT}{q}$ and $I_S$ is a device constant describing the transfer characteristic of the transistor in the forward-active region and can be expressed as:

$$I_S = \frac{qAD_n n_{po}}{W_B}. \quad (4.11)$$

In (4.11), $A$ is the cross-sectional area of the emitter junction, $D_n$ is the diffusion constant of electrons, $n_{po}$ is the electron density at the emitter-base interface and $W_B$ is the base width. Thus, we have $g_{m1}$ in terms of the bias condition:

$$g_{m1} = \frac{I_S}{V_T} \exp\left(\frac{V_1 - V_{x1}}{V_T}\right). \quad (4.12a)$$

A similar expression can be derived for $g_{m2}$:

$$g_{m2} = \frac{I_S}{V_T} \exp\left(\frac{V_2 - V_{x2}}{V_T}\right). \quad (4.12b)$$
Therefore the output differential voltage can be written as:

\[
V_{op} - V_{on} = \frac{-R_o V_1}{V_T (V_{x1} - V_T)} + \frac{R_o V_2}{V_T (V_{x2} - V_T)}.
\] (4.13)

Through (4.13), one can develop a differential output voltage drop across the output terminals. Because (4.13) depends on \( R_o, R_E, V_1, V_2, V_{x1}, V_{x2}, I_s \) and \( I_{EE} \), careful control and trimming of these variables is essential for an accurate differential output voltage.

The above derivation is valid only if \( I_{EE} R_E \) is much larger than the input voltage range (the small-signal approximation). A more accurate analysis directly takes the voltage drop produced by the current and the resistor into account, yields:

\[
\begin{align*}
V_{op} &= V_{CC} - I_1 R_o = V_{CC} - I_s e^{V_{x1}/V_T} \cdot R_o, \\
V_{on} &= V_{CC} - I_2 R_o = V_{CC} - I_s e^{V_{x2}/V_T} \cdot R_o.
\end{align*}
\] (4.14a) (4.14b)

Therefore,

\[
\begin{align*}
V_{op} - V_{on} &= -I_s (e^{V_{x1}/V_T} - e^{V_{x2}/V_T}) \cdot R_o \\
&= -I_s e^{V_T} (e^{V_{x1}/V_T} - e^{V_{x2}/V_T}) \cdot R_o \\
&= -I_s (e^{V_T} - 1) \cdot R_o.
\end{align*}
\] (4.15)

As a result, the output still depends on the same variables (\( V_{x1} \) and \( V_{x2} \) depend on \( I_{EE} \) and \( R_E \)) as shown in (4.13).

The use of emitter degeneration gives a greater degree of design freedom. But the use of degenerating resistors is problematic in deeply scaled technologies due to
their inherently small headroom as well as the real-estate consumed by these circuit elements.

These resistors also result in a more complex control mechanism. Thus, due to the exponential relations and the internal biasing conditions ($V_{x1}$ and $V_{x2}$), the matching or trimming process (they used a gain matching stage and resistor matching in [4.12] but are not discussed in detail) must be complicated to achieve a specific accuracy. Moreover, the tail current, $I_{EE}$, should be limited according to the restricted headroom since the supplies have been shrunk along with the feature dimension (or the minimum device length) for most of the advanced semiconductor technologies. Furthermore, considering the absolute value of the output levels, $I_{EE}$ needs to be set within a certain range such that the output signals can be confined within a reasonable range.

Because the CSC should provide a large controllable range of steered currents, our method employs a **bipolar pre-distortion technique** [4.13]. This application is unique to our work. It provides a broad operating range of currents and reduces the control complexity. The proposed CSC is shown in Fig. 4.9. $Q4$ ($Q3$) and a diode-connected load, $Q5$ ($Q6$) form the input stage. The output stage is composed of an emitter-coupled differential amplifier, formed by $Q1$ and $Q2$, with resistive load, $R_o$. The four n-type MOSFETs, $M1$–$M4$, serve as current switches with no gate current flowing into the signal paths. Because the four MOS switches operate in the triode region, they do not occupy any significant voltage drop. The diode-connected BJTs...
along with input transistors and output differential pair reestablish the relationship between the input and output. The use of $Q1$ and $Q2$ also isolates the current source from the MOS switches. $V_1$ and $V_2$, generated by a simple resistor voltage divider, are two input control voltages.

![Diagram](image)

Fig. 4.9 Topology of the proposed CSC employs bipolar pre-distortion to achieve simpler design equations.

The collector current for an HBT is similar to a normal Si-based BJT [4.14]. Suppose all the HBTs are identical and $Q1$–$Q4$ are operated in forward active region, the collector currents for these four transistors can be represented as:

\[ I_s = I_s \exp\left(\frac{V - V_s}{V_T}\right), \quad (4.16a) \]

\[ I_3 = I_s \exp\left(\frac{V_3 - V_s}{V_T}\right), \quad (4.16b) \]

\[ I_1 = I_s \exp\left(\frac{V_a - V_1}{V_T}\right), \quad (4.16c) \]
\[ I_2 = I_s \exp(\frac{V_b - V_s}{V_T}). \]  
(4.16d)

By dividing (4.16a) by (4.16b) and taking the logarithm of the result, the input voltage difference becomes:

\[ V_1 - V_2 = V_T \ln \frac{I_3}{I_4}. \]  
(4.17)

Similarly, from (4.16c) and (4.16d), we have:

\[ V_a - V_b = V_T \ln \frac{I_1}{I_2}. \]  
(4.18)

The voltage drop across the diode is equal to the product of the thermal voltage \((V_T)\) and the logarithmic value of the ratio between the diode current and \(I_s\). Assuming the two supply rails are \(V_{CC}\) and ground, the following relations are valid:

\[ V_{CC} - V_a = V_T \ln \frac{I_4}{I_s}, \]  
(4.19a)

\[ V_{CC} - V_b = V_T \ln \frac{I_3}{I_s}. \]  
(4.19b)

By subtracting (4.19a) from (4.19b), the difference between \(V_a\) and \(V_b\) is given as:

\[ V_a - V_b = V_T \ln \frac{I_3}{I_2}. \]  
(4.20)

By setting (4.18) and (4.20), we obtain:

\[ \frac{I_1}{I_2} = \frac{I_3}{I_4}. \]  
(4.21)

Eq. (4.21) represents a linear relationship between the input-stage currents and output-stage currents. Therefore, the relation between the input-stage currents and the
output currents is linearized. This relation allows us to design a higher resolution two-step ADC. For a higher resolution, \( I_{EE} (I_1 + I_2) \) becomes smaller for a given full scale range. We can still guarantee the accuracy of the smaller \( I_1 \) and \( I_2 \) through a large \( I_{Ctrl} (I_3 + I_4) \), because a large current source is less affected by process variation. Using this large current source to control the smaller currents is more accurate than generating the small currents directly. By substituting (4.21) into (4.17), the condition between the input voltages and output currents becomes:

\[
V_1 - V_2 = V_r \ln \frac{I_1}{I_2}.
\] (4.22)

We can rewrite (4.22) to get the relation between \( I_1 \) and \( I_2 \):

\[
I_1 = I_2 e^{\frac{V_1 - V_2}{V_r}}.
\] (4.23)

Because the sum of \( I_1 \) and \( I_2 \) is fixed by the current source, \( I_{EE} \), we can obtain the desired \( I_1 \) and \( I_2 \) by adjusting the input voltage difference. That is:

\[
V_{op} - V_{on} = -(I_1 - I_2)R_o = -I_2 (e^{\frac{V_1 - V_2}{V_r}} - 1)R_o.
\] (4.24)

By comparing (4.24) with (4.13) or (4.15), we see that we have reduced the control complexity by replacing the dependences of \( V_1, V_2, V_{x1} \) and \( V_{x2} \) with a dependence on \((V_1 - V_2)\). In other words, besides the input voltages and the tail current, no other biasing conditions are in (4.24). Therefore, the proposed approach can reach a certain degree of accuracy without any implicit biasing conditions (which are \( V_{x1} \) and \( V_{x2} \) in (4.13)). We directly use dc diode current equation through the analysis because the operations of voltages are not small signals but dc levels.
The use of bipolar pre-distortion provides a linear relation between the input-stage currents and the output currents as shown in (4.21). Since the derivation follows the diode equations, from (4.16a) to (4.16d), without any assumption, (4.21) is valid for a broad range of I/O currents. Moreover, the use of $I_{\text{Ctrl}}$ does not cause lose of headroom in the signal branches (i.e. the $I_1$ and $I_2$ branches). In this discussion, $V_1$ and $V_2$ govern the relative value between $I_1$ and $I_2$. However, an accurate $I_{EE}$ and an accurate $R_o$ are also required to generate a precise subtraction since $I_1 + I_2 = I_{EE}$ for either the reported or the proposed technique. We used 3-bit DAC because its linearity is much better than that of a 4-bit DAC.

**Sensitivity Analysis** An inherent property of the bipolar current steering is that the output voltage depends on $V_1$ and $V_2$ exponentially. To alleviate inaccuracies resulting from the exponential relation in our approach, we set $I_{EE}$ slightly larger than the value desired to produce half an LSB change at the differential outputs in Fig. 4.9. This is not a necessary requirement for successful operation (in theory). But practically it does result in ease of control and it also saves power. The small difference should be limited to produce a voltage drop that is much smaller than one LSB at the output. Thus, even if the ratio of $I_1/I_2$ changes significantly, $(I_1 - I_2)$ is held fixed. The effect of variation in $(V_1 - V_2)$ due to process mismatch can be minimized in this way.

The added advantage of the current approach should not be underestimated. There is no straight-forward way to relieve the exponential relation derived from Fig.
4.8. This summarizes the approach we have built upon and improved in this work. Trimming is needed when the former circuit is employed. In addition, in the old approach, $I_{EE}$ should also be set close to the half an LSB limitation in order to save the headroom and to make the output voltages within a reasonable range to drive the following stage.

In Fig. 4.9, suppose $(I_1 - I_2)$ is equal to 229.26 µA to produce half LSB of 62.5 mV for $R_o$ of 272.611 Ω. In our design, we set $I_{EE}$ to 240.30 µA, such that $I_1$ equals 234.78 µA and $I_2$ equals 5.52 µA. Through (4.22), the input differential voltage should be set to 97.52 mV at room temperature ($V_T = 26$ mV). However, suppose the process variation makes $(V_1 - V_2)$ become 107.52 mV. The output difference turns into -63.446 mV. This is but a 0.946 mV deviation from its ideal value. On the other hand, in the case that $(V_1 - V_2)$ is 10 mV less than the desired, -61.136 mV is generated at the output and only a 1.364 mV variance is produced. Therefore, designers can obtain nearly ideal design values under this constraint. The tolerable variation depends on the ADC resolution. In practice, both $V_1$ and $V_2$ are fixed voltages generated on-chip. The magnitude of $I_{EE}$ can be set by the current mirror.

The variation of the output difference due to $I_{EE}$ under a fixed $(V_1 - V_2)$ is proportional to the percentage of the $I_{EE}$ variance since $I_1$ and $I_2$ are proportional to $I_{EE}$:

$$I_1 = \frac{I_{EE}}{V_1-V_2},$$

$$1 + e^{V_v}$$

(4.25a)
By taking the difference between (4.25a) and (4.25b), the output differential voltage also depends on $I_{EE}$ linearly. Therefore, a 10% $I_{EE}$ variation produces a 10% error at the outputs. As for the reported topology in Fig. 4.8, $I_1$ and $I_2$ are still proportional to $I_{EE}$ according to:

$$I_1 = \frac{I_{EE}}{1 + e^{\frac{V_r}{V_T}}} \left( \frac{x_1}{(x_1 + x_2)} \right),$$

(4.26a)

$$I_2 = \frac{I_{EE}}{1 + e^{\frac{V_r}{V_T}}} \left( \frac{x_2}{(x_1 + x_2)} \right).$$

(4.26b)

Thus, the tail current is another critical part in both topologies. A Monte Carlo simulation has been conducted and will be discussed in detail in Section 4.3 to see the impact of $I_{EE}$ on the accuracy.

![Fig. 4.10 Schematic of the current sources in Fig. 4.9.](image)

The setup to generate the tail current is shown in Fig. 4.10. Because $M1$ is operated in saturation region, the fractional variation of the drain current, $I_D$, approximates as:
\[
\frac{\Delta I_D}{I_D} = \frac{-2\Delta V_{th}}{V_G - (V_{th} + \Delta V_{th})}.
\]  

(4.27)

The ground (or substrate) is connected to 0 V, \( V_{th} \) is the threshold voltage of \( M1 \) and \( \Delta V_{th} \) is the variation of \( V_{th} \) due to process. Since the \( I_D \) variation is inversely proportional to the gate voltage, a large \( V_G \) dilutes the current deviation. Thus, the gate voltage, \( V_G \), of \( M1 \) is set to the supply voltage. This is consistent with the testing results in [4.15]. The matching of the \( I_{EEs} \) for multiple CSCs should be done in the geometry selection and symmetric layout phases of in the physical implementation. Further conclusions regarding these simulation results will be drawn later in this chapter.

### 4.3 Subtractor

The conventional implementation of a subtractor incorporates a closed-loop operational amplifier in either the switched-capacitor mode [4.16] or in the simple differential mode [4.17] as shown in Fig. 4.11. We can obtain the output voltage trough:

\[
V_{out} = \frac{R_2}{R_1} (V_1 - V_2).
\]

(4.28)

Hence, when \( R_1 \) is equal to \( R_2 \), we have the subtraction of the differential inputs. To obtain an accurate subtraction by this method, a near ideal operation amplifier is required. Furthermore, the matching between the four resistors also affects the accuracy. However, the closed-loop architecture is not suitable for high-speed purposes. Based on the proposed current-steering type DAC, the subtractor could be integrated with the 3-bit DAC to achieve higher speed.
Fig. 4.11 Subtractor using an operational amplifier.

Fig. 4.12 Schematic of the proposed CSS. The unity-gain buffer amplifier (Qa–Qd) incorporates with seven CSCs to perform 3-bit subtraction. D_0–D_6 signals are thermometer codes from the previous stage.

Since we designed a combinational DAC/subtractor circuit, the discussion and notations in this section will follow the previous section. Now let us include M1–M4 and R_o in Fig. 4.9 into the analysis. In Fig. 4.12, we now show a 3-bit CSS that contains a buffer amplifier and seven CSCs. The unity-gain buffer amplifier, formed by Qa–Qd, works as the subtractor core and has an output common mode voltage equal to V_{cm}. Qa and Qb form the input buffering stage. This stage provides constant
input impedance and also isolates the amplifier core (composed of Qc and Qd) from the previous stage.

Ma is the reset switch. The gain of this buffer amplifier is mainly determined by \( R_o/R_E \) and also affected by the transconductances of Qc and Qd. Although the gain approximates \( R_o/R_E \) that varies slightly over the input range and tends to distort (or saturate) slightly near the extremes of the input range, this does not affect the overall linearity much. The product of \( R_E \) and \( I_o \) should be greater than the desired input amplitude to provide an adequate linear input range. If only considering the buffer amplifier and one CSC, the output voltages, \( V_{op} \) and \( V_{on} \), become:

\[
V_{op} = V_{cm} - I_1 R_o \cdot D - I_2 R_o \cdot \overline{D},
\]

\[
V_{on} = V_{cm} - I_2 R_o \cdot D - I_1 R_o \cdot \overline{D}.
\]

When \( D \) is at logic high (or 1), \( V_{op} \) reduces \( I_1 R_o \) from the common mode level. \( V_{on} \) reduces \( I_2 R_o \) from the common mode. When \( D \) goes logic low (or 0), the currents flowing through the two branches are switched such that \( V_{op} \) decreases \( I_2 R_o \) and \( V_{on} \) decreases \( I_1 R_o \). By subtracting (4.29b) from (4.29a), a voltage difference appears on the differential output:

\[
\Delta V_{diff, out} \equiv V_{op} - V_{on} = -(I_1 - I_2)R_o \cdot D + (I_1 - I_2)R_o \cdot \overline{D}.
\]

By designing \( I_1 \) to be greater than \( I_2 \), (4.30) is a subtraction when \( D \) is at high and an addition when \( D \) is at low. Consequently, we are able to utilize this structure as a combination of a subtractor and a current-steering DAC to implement a combinational CSS. The DAC is made up of \( 2^N - 1 \) CSCs for N-bit resolution.
Because the buffer amplifier converts voltage signals into current signals, $V_{cm}$ becomes $(V_{CC} - I_o R_o)$. The seven equally-weighted CSCs act as a 3-bit DAC. They perform eight possible subtractions (0-, 0.5-, 1-, 1.5-, 2-, 2.5-, 3- and 3.5-LSB subtractions) at the outputs. The seven digital signals, labeled as $D_0$–$D_6$, are thermometer codes from the previous ADC stage. They control either half LSB subtraction from $\Delta V_{\text{diff.out}}$ or half LSB addition to $\Delta V_{\text{diff.out}}$. By setting:

$$ (I_1 - I_2) R_o = \frac{\text{LSB}}{2} \quad (4.31) $$

and including all operational states, $\Delta V_{\text{diff.out}}$ becomes:

$$ \Delta V_{\text{diff.out}} = V_{\text{diff.out,0}} - \frac{\text{LSB}}{2} \sum_{i=0}^{6} D_i + \frac{\text{LSB}}{2} \sum_{i=0}^{6} D_i. \quad (4.32) $$

We define $V_{\text{diff.out,0}}$ as the original differential output produced by the buffering stage (transferred from the input signals). The product of (4.32) and 8 becomes the full scale differential residue signal, which is shown in Fig. 4.13. The dash line represents the input ramp signal and the saw-toothed line is the residue signal. The residue signal ideally lies within $+\frac{FS}{2}$ and $-\frac{FS}{2}$.

![Saw-toothed waveform represents the ideal residue signal (after amplified) for a 3-bit coarse conversion with a descending ramp input signal.](image-url)
The operation of the CSS follows (4.32). For instance, the input ranges from -4 LSB to 4 LSB. If the input signal has an amplitude of 1.4 LSB at some point, the lowest five thermometer codes will be high and the highest two will be low (0011111). Therefore, the output signal at the CSS will have an amplitude of -0.1 LSB. The output amplitude of the CSS should range from -0.5 LSB to 0.5 LSB, ideally.

Based on the above discussion, the advantages of the proposed technique are shown as follows: (1.) a simpler design equation – eq. (4.24), (2.) fewer design variables, (3.) no external components, (4.) CMOS switching, and (5.) improved headroom and low real-estate consumption. Our approach dissipates a well-defined (and affordable) extra power through the generation of $I_{Ctrl}$. This current allows us to closely control $I_1$ and $I_2$ over a wide range. Without any trimming circuitry, the resulting circuit operates at higher frequencies. In contrast, the previous technique employs an additional gain matching stage controlled by two off-chip voltages to trim the biasing conditions. This matching stage, used to generate a proper $V_1$, $V_2$, $V_{x1}$ and $V_{x2}$, can only trim relative values of the branch currents $I_1$ and $I_2$. However, the absolute value of these currents is still maintained by keeping the tail current, $I_{EE}$, constant.

The gain matching stage and the external control voltages exhibit the drawbacks of extra power dissipation, as well as complicated and indirect control mechanism. Although the emitter degenerating resistances give more degrees of trimming
freedom, it does so at the expense of headroom. As for the current steering, the use of the n-type MOSFETs as the switching devices can provide more precise current signals with less distortion. Since the total number of bipolar transistors is the same for either case, the use of the degenerating resistors in the old approach degrades the headroom and occupies more real-estate (the MOSFET switches have relatively small area demand in our approach).

Let us now compare both of the approaches described here on an equal footing (allowing no additional trimming process for the control circuit and requiring equally accurate $I_{EE}$). Then, only the value of $(V_1 - V_2)$ affects the subtraction accuracy in our method but $V_1$, $V_2$, $V_{x1}$ and $V_{x2}$ impact on the previously reported subtraction.

One thing to be kept in mind is that in the open-loop subtraction, the accuracy depends critically on $I_1$, $I_2$ and $R_o$. The requirement of a precise tail current is essential in all cases. Accurate current sources are needed for either case. Other current steering methods may have their advantages and disadvantages, but the control mechanism is the major consideration determining precise tail current. The control mechanism distinguishes the various approaches. For example, a simple bipolar differential pair is applicable, but it has more variables to control (as shown in (4.16a) and (4.16b)). Or, one may use MOSFETs as the amplifier core or use only two current sources. However, the MOSFETs have more control problems due to process variation and mismatch in modern deep sub-micron and nanometer
technologies. The matching of the MOSFET devices becomes more critical and generally leads to inferior offset control. This is evident in existing literature [4.18].

**CSS Sensitivity** To verify the feasibility of the proposed CSS topology, a Monte Carlo simulation with 1000 iterations has been conducted. We included the process mismatch parameters, provided by the technology vendor, in the simulation. $V_1$ and $V_2$ in Fig. 4.12 are designed to be 1.95 V and 1.85 V, respectively. We employed an $R_o$ of 272.611 Ω and an $I_{EE}$ of 240.30 µA such that we can approximately set $(V_1 - V_2)$ equal to 100 mV (97.52 mV exactly). In Fig. 4.14 (a) the mean values for $V_1$ and $V_2$ are 1.95077 V and 1.85087 V, respectively. The standard deviations are 1.937 mV and 2.270 mV, respectively. Because both $V_1$ and $V_2$ were generated by the same divider, Fig. 4.14 (b) further illustrates the correlation between these two voltages. The difference mean is 99.898 mV with a standard deviation of 0.447 mV. The histograms in Fig. 4.14 (b) show a normal distribution and all values lie between 98.50 mV and 101.25 mV. The maximum deviation is caused by 101.25 mV (compared to 97.52 mV), which only gives 0.39 mV deviation at the outputs. Therefore, by correlating $V_1$ and $V_2$ in the design equation not only simplifies the design process but also results in a smaller input voltage deviation. As long as the deviation of $V_1$ and the deviation of $V_2$ have the same polarity, the influence of the deviations on the outputs can be cancelled to a certain degree. This verifies the efficacy of the control mechanism in the CSS approach.
As emphasized above, the tail current, $I_{EE}$, generated by the MOSFET current mirror, is a critical design parameter. We followed these principles to achieve a low tail current variation: (1.) high gate voltage; (2.) large device dimension. Thus, the gates of the current sources were biased at the supply voltage, 2.5 V.
The variables related to process mismatch include the device geometries (channel width, channel length and number of fingers), threshold voltage, electron mobility, body effect and other process dependent parameters. To approach an optimized result, we conducted a joint parametric and Monte Carlo simulation with 100 iterations to see the impact of the process mismatch in the geometry. The dimensions of the three MOSFETs in Fig. 4.10 were set to be equal since it was easier to match them in the physical layout. The minimum effective channel length in this thick oxide MOSFET is 240 nm (under 2.5-V supply). The error percentage of the resistors provided by the technology PDK is normally between 0.06% and 1.04% at room temperature. We used the resistor with 0.06% error value for the unit cell to obtain lower or higher resistance by parallel connection or series connection for $R_o$. The simulation shows the total current variations ($I_{EE}$) were typically below 1.5% if the channel length was greater than 360 nm. The variations of the differential output voltage are plotted in Fig. 4.15. The MOSFETs were all with single finger and single
gate contact except one set with two fingers. The vertical axis represents the ratio of the standard variation of $\Delta V_{\text{diff\_out}}$ to its mean value and the horizontal axis represents (channel width : channel length). It reveals that the smaller variations occur when the larger channel lengths along with larger channel widths were used. Table 4.1 compares different dimensions with same geometry ratio of 3:1. Thus, we concluded that the more accurate current source came from a large channel width with a large channel length. The smallest deviation, 0.682% came from the MOSFET of 8.64 µm $\times$ 2.88 µm (single finger). That represented 0.218 LSB deviation at the output. In our simulation, the variations of $\Delta V_{\text{diff\_out}}$ were typically smaller than 1% if the dimensions were at least 2:1. The geometric ratios larger than 4:1 were not studied in the simulation because they generate too much current (large power dissipation) and matching of the parallel resistors would be much more difficult. The number of fingers did not significantly affect the variation in our simulation.

<table>
<thead>
<tr>
<th>Channel length (µm)</th>
<th>Standard deviation (%)</th>
<th>Output deviation (LSB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.36</td>
<td>0.821</td>
<td>0.263</td>
</tr>
<tr>
<td>0.72</td>
<td>0.731</td>
<td>0.234</td>
</tr>
<tr>
<td>1.44</td>
<td>0.706</td>
<td>0.226</td>
</tr>
<tr>
<td>2.88</td>
<td>0.682</td>
<td>0.218</td>
</tr>
<tr>
<td>5.76</td>
<td>0.684</td>
<td>0.219</td>
</tr>
<tr>
<td>5.76 (2 fingers)</td>
<td>0.683</td>
<td>0.219</td>
</tr>
</tbody>
</table>
Depending on the target resolution and power dissipation, one can select proper design parameters to achieve the required specifications. Here, we demonstrate that the proposed technique can be applied in a practical ADC design.

4.4 Residue Amplifier

The topology of the residue amplifier is shown in Fig. 4.16. It is composed of an emitter follower and a degenerated differential pair. The input follower isolates the amplifier core from the CSS and also provides a constant load for the CSS. M1 is the reset switch. The gain approximates $R_o/R_E$, because $g_m R_E$ is much larger than 1. The product of $R_E$ and $I_o$ should give adequate linear input range (62.5 mV in amplitude). Thus, the residue signal is amplified to the original full scale range with a gain of 8. A non-ideal gain of 8 can be compensated by adjusting the reference levels in the second step. In addition, $I_o$ also needs to drive the following 11 preamplifiers of the interpolating comparator array.

![Fig. 4.16 Residue amplifier.](image-url)
4.5 Delay Element

Fig. 4.17 (a) Block diagram of the delay amplifier. (b) Propagation delay caused by the first T/H and the second T/H.
**Analog Delay Element**  The delay amplifier, shown in Fig. 4.17 (a), is implemented by two T/Hs to provide the same latency as for the thermometer codes from the coarse conversion. The clock terminal on the left of each T/H block represents the “Track” terminal as we have seen in Chapter 3, while the clock terminal on the right of each T/H block represents the “Hold” terminal. Fig. 4.17 (b) depicts the latency produced by the delay amplifier. The first T/H gives a half period delay while the second T/H gives another half period delay. Thus, one period latency is generated by the delay amplifier. This delay amplifier is placed in front of the CSS to synchronize the analog inputs and the thermometer codes.

**Digital Delay Element**  Aside from the analog delay amplifier, the digital counterparts are also required to synchronize the output binary codes or elsewhere in the digital circuitries. Fortunately, digital delay elements are much easier in implementation and have been used extensively because a digital signal only contains two possible voltage levels. In our ADC, we utilized the *dynamic shift register* (DSR) as the digital delay element. Fig. 4.18 (a) displays a simple DSR with a half period delay. It comprises two switches (MOSFETs) and two inverters. The switches have non-overlapping input clock signals to control the delay time while the inverters store the signal fed into it in $C_{gsn}/C_{gsp}$ ($C_{gsn}$ is the input capacitance of the nMOS, while $C_{gsp}$ is the input capacitance of the pMOS). For instance, when M1 is on, $D_{in}$ is stored in IVT1 in the opposite phase. The signal is labeled as $D_{mid}$. While M2 is turned on and M1 is off, $D_{mid}$ is transferred to IVT2 and leads to the output signal, $D_{out}$. Fig. 4.18 (b) shows an example of the transient response. Therefore, the input signal is
delayed by a half clock period. Please note that $D_{\text{mid}}$ is the complementary signal of $D_{\text{mid}}$.

![Dynamic shift register diagram](image)

Fig. 4.18 (a) Dynamic shift register. (b) Transient response of the DSR in (a).

### 4.6 Encoder

For flash-type ADCs, we need to convert the thermometer codes from the comparator array into binary codes. The most popular encoding method is through read-only-memory (ROM) encoding. The conventional ROM encoder utilizes standard two-input CMOS digital gates as the interfaces. Each CMOS gate compares
every adjacent pair of thermometer codes to decide either turning on or off the MOS switches, as shown in Fig. 4.19.

![Fig. 4.19 Conventional ROM implementation.](image)

The thermometer code was named as a result of its similarity to the mercury thermometer. Normally, the mercury level fills the glass tube below the measured temperature. However, if there is air in the mercury, an air bubble will erroneously raise the highest level of the mercury. Consequently, these air bubbles will give us incorrect information about the temperature. Similarly, in a flash ADC, when comparators suffer from input offset, voltage reference shift or timing shift, bubble codes (i.e. “0-codes”) could be generated between the “1”-codes. For example, in Fig. 4.19, the thermometer code should be “00111”, but the one bubble makes it “00101”. These bubble codes will leads to final conversion errors. The ROM encoder utilizing
two-input digital gates can not discriminate these errors and thus results in the wrong binary codes.

The “101-like” bubbles can be corrected by using three-input CMOS logic gates [4.19]. Fig. 4.20 shows part of the implementation. Since the three-input logic gate compare three consecutive thermometer codes, the comparison will correct the bubble error in between these three thermometer codes.

![Diagram](image)

Fig. 4.20 Using three-input logic gates to correct bubble errors.

There could be two-bubble (i.e. 1001), three-bubble (i.e. 10001) and so forth existing in flash ADCs. Other techniques based on the digital gates can be developed to eliminate bubble errors. *Gray encoding* is another technique to correct errors without adding extra inputs to the digital gates. However, additional encoding
procedure costs extra timing, which could be signal dependent. The extra timing would lead to timing mismatch. Therefore, depending on the required specifications, different topologies can be used to optimize the result. In our work, since the LSBs for each step are big enough that there is almost no blurred area between each thermometer code. Thus, we simply use three-input logic gates to secure the final output result.

All current sources in this prototype were implemented by n-type MOSFET current mirrors. All devices were simulated with real model parameters extracted and provided in the product design kit (PDK) for the BiCMOS technology [4.20].

4.7 Summary

Based on the current steering technique, we integrated the subtractor circuit with a 3-bit DAC. A similar design has been demonstrated in [4.12] for an operation frequency at 75 MHz. The authors of this reference used external control circuits to monitor the DAC currents. In this work, we used a bipolar pre-distortion technique to control the linear I/O current relation over a wide range. This allows us to use either large or small current in the circuitry depending on the process variation (for both current device and resistor). Our simulation results showed good performance over the device mismatches. Regarding the control signals, we showed a maximum deviation of 0.30 mV at the differential output. A deviation of 0.218 LSB can be achieved when we used a geometry ratio of 3:1 for the MOSFET current source. The CSS provides an ENOB of 7.98-bit (an input of 220.3 MHz) and SFDR of 77.25 dB.
Other components in the two-step ADC such as the residue amplifier and delay elements were also presented. Besides, we have used 3-input logic gates to correct single-bubble error in our thermometer-to-binary encoder. Since the full scale range (1 V) can be used for each step, the LSB for the two steps were not difficult to be distinguished. Thus, the single-bubble correction is adequate in our design.
Chapter 5: ADC Implementation: Fully Differential Two-Step and Interpolating

5.1 Introduction

The flash-type ADC is the most commonly used architecture in high-speed applications. However, power and area consumptions limit its application for a resolution between 4-bit and 6-bit. The power and area are doubled whenever an additional bit is needed. Furthermore, the total input capacitance of the comparator array is doubled, thus reducing the slew rate and failing the speed requirement. To overcome these limitations, we employed two-step [5.1] and interpolation [5.2] [5.3] techniques in our ADCs.

We will discuss the two-step architecture in more detail in Section 5.2. The introduction of comparator interpolation technique and its benefits will be drawn in Section 5.3. The final implementations of the 5-bit interpolating ADC and 8-bit two-step interpolating ADC are shown in Section 5.4. The layouts and measurement results are placed in Section 5.5. Finally, a brief summary of the chapter is given in Section 5.6.

5.2 Fully Differential Two-Step Structure

5.2.1 Fully Differential Architecture

The fully differential analog circuit has double input/output swing than that of a single-ended circuit. Because of its odd-symmetric, the even-order distortion is
cancelled out at the differential outputs. Thus, the total harmonic distortion (THD) can be much reduced. In the previous two chapters, we have seen the T/H, comparator, Subtractor/DAC and residue amplifier are all fully differential.

5.2.2 Two-Step Structure

Fig. 5.1 Block diagram of multi-step ADC.

The advantages of a multi-step ADC are the reductions of power, chip area and input capacitance of the comparator array. By dividing the total number of bits into a few steps, we can also reduce the loading to the T/H. Fig. 5.1 shows the block diagram of a multi-step ADC (or pipelined ADC). The input stage is a T/H, which samples the input signal, followed by M stages of sub-ADCs. The input of each sub-conversion stage is the amplified subtraction of the previous converted results, which are converted back to analog form, from the original input analog signal. The resulting residue signal is amplified to the full scale range and then converted to
digital codes. By doing so for each stage, the total resolution becomes the sum of the resolution provided by each stage. As a result, a multi-step ADC requires more complex circuits. Therefore, we used two-step architecture to reduce the complexity a little bit.

As mentioned in Chapter 4, this 8-bit ADC is made of a 3-bit ADC and a 5-bit ADC. Why not 4-bit plus 4-bit? In the Subtractor/DAC design shown, due to the equally-weighted DAC employed, increasing one bit in the first step, the subtractor will sink double the current. Thus, a relative large current might cause the core transistors in the subtractor to leave the forward-active region. As a result, a 3-bit DAC has better performance than that of a 4-bit DAC.

5.3 Interpolating-Averaging Technique

5.3.1 Comparator Interpolation

The interpolating techniques were introduced in flash ADC technology to reduce both power and area of the chip. In this work, the second step is a 5-bit ADC. A conventional 5-bit flash ADC needs 31 comparators. This means 31 preamplifiers and 31 latches. However, with an interpolating factor of 2 [5.4], we only need 11 preamplifiers. In other words, we took two preamplifiers away from every 3 comparators sets. Fig. 5.2 depicts how we have reduced the total number of preamplifiers in the comparator array.
Fig. 5.2 The interpolation architecture.

Fig. 5.3 Generation of $V_B$ and $V_C$.

Fig. 5.3 depicts the relations of between $V_A$, $V_B$, $V_C$ and $V_D$ as labeled in Fig. 5.2. Although two preamplifiers are removed, the resistor ladder can still generate linear signals. A descending input signal is shown in the left of Fig. 5.3. On the right we set
the indicated output signals of the interpolating preamplifiers. As time goes forward, both $V_A$ and $V_D$ decrease. So do $V_B$ and $V_C$, as we see in Fig. 5.2. $V_B$ and $V_C$ should be between $V_A$ and $V_D$. However, under the assumption of a linear gain given by the preamplifiers, $V_B$ and $V_C$ can be produced by the set-up in Fig. 5.2. Thus, the resulting $V_B$ and $V_C$ signals are shown in Fig. 5.3.

5.3.2 Averaging

Through comparator interpolation, the zero-crossing point for every latch should ideally be the same as that in the flash ADC. Furthermore, the input offsets are also averaged by the resistive voltage dividers. Consequently, we can obtain smaller DNLs [5.2] [5.4] [5.5]. Fig. 5.4 depicts the probable zero-crossing distribution of a flash converter [5.6]. The zero-crossing point distributes normally (i.e. has a Gaussian distribution) with a standard deviation of $\sigma$.

![Diagram of zero-crossing points](image.png)

Fig. 5.4 Zero-crossing points of a flash comparator array with an LSB of 2 mV [5.6].
Because the input offsets of each comparator are normally distributed, the zero-crossing position is also a Gaussian distribution for each comparator. Since the DNL is directly related to the zero-crossing position, this probability distribution can be taken as the DNL distribution. Therefore, a larger LSB and/or a smaller standard deviation lead to a smaller DNL. Theoretically, the standard deviation in the proposed interpolating architecture can be reduced to just one half of the original one. By running Monte Carlo simulation with random input offsets, we have in fact reduced the standard deviation to 45% of the original one.

The standard deviation (or input offset) can be decreased by increasing the interpolation factor. However, since the comparator is operated with large signals, the numbers of latches and resistor loadings, and signal levels will affect the accuracy of the comparator interpolation. The interpolating factor of 2 was carefully chosen through simulation and optimization.

5.4 Full Implementation

To verify the employed techniques and structures, we have designed two prototype ADCs: 5-bit interpolating ADC and 8-bit two-step interpolating ADC. The former one proved the power can be saved more than one half at 2-GSample/s. The latter showed the CSS can be employed in the high-speed conversion. The prototypes with a full scale range of 1 V_{p-p} have been designed and simulated in a 0.13-µm SiGe BiCMOS technology.
The first ADC, 5-bit interpolating, was characterized at 2-GSample/s. Without the comparator interpolation, the total power would be 132.37 mW. However, by utilizing comparator interpolation, the total power dissipation of the ADC has been reduced to 66.14 mW (without output buffers).

![DNL and INL of the 5-bit ADC](image)

**Fig. 5.5 DNL and INL of the 5-bit ADC.**

To characterize the linearity provided by the interpolation technique, a static simulation has been conducted. The simulation showed the 5-bit ADC with the interpolating factor of 2 achieves a DNL of 0.114 LSB and an INL of 0.076 LSB. The results are shown in Fig. 5.5. The effective number of bits (ENOB) is equal to 4.3 bits with an input sine-wave of 50.7 MHz. The spectra for an input frequency ($f_{in}$) of 700.3 MHz at 2-GSample/s is shown in Fig. 5.6. The ENOB for this near Nyquist-rate frequency approximates 4.1 bits. Fig. 5.7 shows the simulated SNDR for different input frequency. Table 5.1 summarizes the projected performance of the 5-bit interpolating flash ADC.
Fig. 5.6 Simulated spectra for $f_{\text{in}} = 700.3$ MHz at 2-GSample/s.

Fig. 5.7 Simulated SNDR vs. input frequency.

TABLE 5.1

<table>
<thead>
<tr>
<th>Performance Summary</th>
<th>Technology</th>
<th>0.13-µm 2.5-V SiGe BiCMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Sampling rate</td>
<td>2 GSample/s</td>
<td></td>
</tr>
<tr>
<td>Input range</td>
<td>1 $V_{\text{p-p}}$</td>
<td></td>
</tr>
<tr>
<td>Resolution</td>
<td>5 bit</td>
<td></td>
</tr>
<tr>
<td>DNL/INL</td>
<td>0.114/0.076 LSB</td>
<td></td>
</tr>
<tr>
<td>ENOB</td>
<td>4.3/4.1‡ bits</td>
<td></td>
</tr>
<tr>
<td>Power dissipation</td>
<td>66.14 mW</td>
<td></td>
</tr>
</tbody>
</table>

‡ $f_{\text{in}} = 50.7$ MHz.

‡‡ $f_{\text{in}} = 700.3$ MHz.
The second prototype utilized the CCS to achieve 8-bit resolution. Fig. 5.8 illustrates the block diagram of this design. The ADC employed two-step (or two-stage) pipelined architecture and each stage further utilized comparator interpolation to reduce the total power dissipation. The overall conversion was composed of a 3-bit first-step ADC and a 5-bit second-step sub-ADC. The two-step architecture reduced the loading capacitance for the input T/H in order to speed up the operation. Moreover, it was simpler to generate clocks and to synchronize signal flow with fewer stages.

Fig. 5.8 The block diagram of the proposed two-step interpolating ADC.

In Fig. 5.8, the input stage is a T/H that tracks the input differential signals for half a period and holds them for the other half period. The stored signals are sent to the first-step 3-bit ADC for conversion. In the meanwhile, the held signals are also transported to the second-step ADC. The front-end of the second-step is a delay
amplifier that delays the held signals for the same propagation delay required in the first-step (coarse) conversion.

Fig. 5.9 Clock diagrams of the ADC

Fig. 5.9 illustrates the clock distributions for each clocked block. We used non-overlapping clocks throughout the system. The “Track” clock and the “Hold” clock are fed into the “Track” terminals and the “Hold” terminals, respectively, of the T/H mentioned in Chapter 3. While the T/H holds signal, the first-step comparators amplify the input signals. When the T/H tracks the signals, the comparators latch the compared results. Therefore, “ck1” clock is fed into the “CK” terminal of the latches (as shown in Fig. 3.20 in Chapter 3) in the first step. The $\overline{CK}$ terminal in the latch is with the reverse phase to “ck1”.

According to Fig. 4.17 (a), “ckd1” is the same as the “Hold” clock and “ckd2” is the same as the “Track” clock. Thus, the outputs of the delay amplifier have one-period latency. The CSS buffers the input signals and subtracts the first-step output when the second T/H in the delay amplifier holds signals (or in the “ckd1” clock
period). Then, the CSS is reset during the high-period of “ckd2” to erase the previous operation. Thus, “cks” is in phase with the “ckd2” clock. The residue amplifier is reset by “ckr” which is the reverse version of “cks”. The comparators in the second step amplify the input signals when the residue amplifier in the amplification mode (“ckr” at low) and latch the results when the residue amplifier in the reset mode. Thus, “ck2” is applied to the “CK” terminals of the comparators in the second step ADC.

The ADC has been characterized at 500-MSample/s without any calibration or trimming correction. Fig. 5.10 shows the simulated distribution of the digital codes when differentially descending ramp signals were applied to the inputs. The ideal code width is 15 ns and the rail-to-rail level is 0-to-2.5 V. Fig. 5.11 extracts the linearity performance from Fig. 5.10. It shows a maximum DNL of 0.33 LSB and a maximum INL of 0.40 LSB. The prototype ADC reached this performance without any compensation or any extra circuitry.

![Fig. 5.10 Distribution of the digital output codes.](image)
The CSS was characterized with a sine wave input of 220.3 MHz. During the simulation time span, the period was divided into eight equal intervals and the reset switch was off. Each of the intervals represents one of the eight possible thermometer codes (0000000, 0000001, 0000011, 0000111, 0001111, 0011111, 0111111 and 1111111). Thus, the output was still a sine wave but with one LSB shift for the common mode of each interval. We calculated the average values of the spectrums over the eight intervals. Fig. 5.12 shows the dynamic frequency spectra of the CSS. The SNDR is 49.82 dB and the spurious free dynamic range (SFDR) is equal to 77.25 dB. Note that at least an equivalent 5-bit resolution should be given by the CSS.
Fig. 5.12 Simulated frequency spectra of the CSS with $f_{\text{in}} = 220.3$ MHz.

The ADC achieved an SNDR of 46.73 dB, which is equal to an ENOB of 7.5 bits, for a 10.7 MHz sine wave input and a low static power consumption of 172 mW (which does not include the output buffers). Fig. 5.13 shows the simulated spectra.

Fig. 5.13 Simulated frequency spectra of the ADC with $f_{\text{in}} = 10.7$ MHz.
The SNDR for a 220.3 MHz sine wave is equivalent to 6.9 bits at 500-MSample/s. Fig. 5.14 shows the SNDR for different input frequency. Table 5.2 summarizes the simulated performance and compares the characteristics of this design with two previous reported 8-bit two-step ADCs [5.7] and [5.8]. The table reveals that the presented design operated at a much higher conversion rate with competitive static and dynamic performances. Both the DNL and INL of the propose ADC are better than those of [5.7]. Although [5.8] has the best DNL/INL performance with the lowest power dissipation, the loading for the ADC is at 90% of the full output loading. Therefore, the total number of output codes is less than 256.

Fig. 5.14 Simulated SNDR vs. input frequency.
TABLE 5.2

PERFORMANCE SUMMARY AND COMPARISON

<table>
<thead>
<tr>
<th>Ref.</th>
<th>This design</th>
<th>[5.7]</th>
<th>[5.8]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>2.5-V 0.13 µm SiGe BiCMOS</td>
<td>3.8-V 0.4 µm CMOS</td>
<td>2.5-V 0.13 µm CMOS</td>
</tr>
<tr>
<td>Conversion rate</td>
<td>500 MHz</td>
<td>100 MHz</td>
<td>125 MHz</td>
</tr>
<tr>
<td>Power</td>
<td>172 mW</td>
<td>167.6 mW*</td>
<td>21 mW</td>
</tr>
<tr>
<td>DNL</td>
<td>-0.33/0.21 LSB</td>
<td>-0.51/0.80 LSB</td>
<td>-0.15/0.15 LSB</td>
</tr>
<tr>
<td>INL</td>
<td>-0.29/0.40 LSB</td>
<td>-0.58/0.59 LSB</td>
<td>-0.25/0.25 LSB</td>
</tr>
<tr>
<td>ENOB</td>
<td>7.5 bits</td>
<td>7.3† bits</td>
<td>7.6‡ bits</td>
</tr>
</tbody>
</table>

* With a supply voltage of 3.8 V.

† At an input frequency of 10 MHz.
‡ At an input frequency of 8 MHz with 90% ADC output loading.

5.5 circuit Layout and Measurement Results

5.5.1 Circuit Layout

In order to verify the feasibility of the proposed design, we have taped out several ADCs to the IBM East Fishkill foundry for IBM 8HP runs through the Department of Defense (DOD) Trusted Access Program Office (TAPO). Details of the 8HP process are given in [5.9]. 8HP is a BiCMOS process with 130 nm design rules. Three main categories are 5-bit interpolating-flash ADC, 6-bit two-step ADC and 8-bit two-step interpolating ADC. Besides, discrete subsystems were also fabricated separately. Fig. 5.15 shows the chip layout of the 5-bit interpolating-flash ADC. This layout included input clock buffers and output buffers. The input clock buffers were made of inverter chains to convert sine wave signals into pulse signals. The output buffers were also made of inverter chains to drive the output load of the chip. The chip size is
2 mm × 2 mm, while only 1 mm × 1 mm area is occupied by the T/H and comparators.

Fig. 5.16 shows the 6-bit ADC and 8-bit ADC layouts.

Fig. 5.15 Chip layout of the 5-bit interpolating ADC.
Fig. 5.16 Chip layout of the (a) 6-bit two-step interpolating ADC and (b) 8-bit two-step interpolating ADC.
5.5.2 ADC Testing and Measurement Results

Typically, the characterization of an ADC involves the static and the dynamic tests. The static test measures the static performances of the ADC, such as the DNL, INL, gain error and gain offset. The dynamic test evaluates the dynamic performances of the ADC. The dynamic specifications include the ENOB, SNDR and SFDR etc.

The static performance is measured when a slowly varying ramp signal is fed into the ADC. Ideally, the output digital codes should be *monotonic*. By mapping these time-dependent digital codes to an I/O transfer plot and comparing to the original ramp signal, we can find the static specifications of the ADC.

Generally, the inputs for the ADC dynamic test are high-frequency sine waves. Thus a *continuous wave generator* (CWG) should be used to generate the input signal. By measuring the transient response and performing the Fourier transformation, we can get the spectral information. From there, we can obtain the ENOB, SNDR and SFDR. The same function can be performed by a spectrum analyzer directly.

To verify the feasibility of the proposed building blocks and ADCs in the 8HP SiGe technology, we conducted a functionality experiment. A wire-wrap board was built as the sample holder. Fig. 5.17 shows the testing setup. The device under test (DUT) is our 5-bit interpolating ADC (in DIP40 package) that was mounted on the wire-wrap board. The DUT was biased by a 2.5-V supply. The input signal (a single-ended sinusoidal wave) was generated by the Agilent 33220A 20MHz
Function/Arbitrary Waveform Generator (AWG). The two non-overlapping clocks were generated by the SONY/Tektronix AWG 2041. The digital outputs were probed by the Tektronix MSO 4104 Mixed Signal Oscilloscope that can provide a highest 400-MHz measurement.

![Diagram of verification setup]

Fig. 5.17 Verification setup.

The common mode for the input sinusoidal signal was tuned to around 1.85 V and the amplitude was 500 mV. The Track-mode clock for the T/H and the latch-mode clock for the comparators were biased by CH1 of the SONY/Tektronix AWG 2041. The Hold-mode clock for the T/H and the amplification-mode clock for the comparators were biased by $\overline{CH1}$ (complementary to CH1). The clock signals have an offset of 1 V and an amplitude of 1.2 V. It means a sinusoidal wave swinging between 0.4 V to 1.6 V (ideally, we need a full swing from 0 V to 2.5 V). When an input sinusoidal signal of 1.33 kHz was applied and sampled at 1 MHz, the output digital signals were measured by the Tektronix MSO 4104 Mixed Signal Oscilloscope.
Oscilloscope. Fig. 5.18 shows the measured digital output waveform of B4. B4 is the MSB, B3 is the second MSB, so on and so forth. Thus, B0 stands for the LSB. Under the testing condition (single-ended input but with 100% full scale), Fig. 5.18 shows the desired output waveform: equal duty cycle for logic 1 and logic 0.

Fig. 5.18 Measured digital output waveform for B4 at 1-MSPS.

Fig. 5.19 show B4 sampled at 10 MHz for an input frequency of 154.39 Hz and 57 kHz, respectively. As expected, both output waveforms are square waves with

right: $f_{in} = 57$ kHz.

Fig. 5.19 show B4 sampled at 10 MHz for an input frequency of 154.39 Hz and 57 kHz, respectively. As expected, both output waveforms are square waves with
equal duty cycle. Therefore, the measurement revealed consistent result with the simulation result.

Fig. 5.20 shows the input sinusoidal signal (In) that has an amplitude around 1 V<sub>p-p</sub> (from 1.3 V to 2.26 V) and a frequency of 1 kHz. The sample was converted at

\[ f_{\text{in}} = 1 \text{ kHz}, \quad \text{input amplitude} \sim 1 \text{ V}_{\text{p-p}}, \quad f_s = 10 \text{ MHz}. \]
10-MSPS. Fig. 5.20 also shows the 5 digital outputs (B4 – B0). Because a relative low input frequency and square-wave clocks have been applied, all the 5 output bits can be well latched to the digital rail-to-rail (0 V – 2.5 V).

Fig. 5.21 Measured digital output waveforms: $f_{in} = 5$ MHz, input amplitude ~ 1 V$_{p-p}$, $f_s = 10$ MHz.

To test the Nyquist rate sampling, we increased the input frequency to 5 MHz and maintained the sampling rate at 10 MHz. The outputs are shown in Fig. 5.21. Both B4 and B3 suffer from some high frequency impulses that may be generated by the aliasing tones since no anti-aliasing filter is placed in the front-end of the ADC. Similar waveform can be found for the other bits (B0 – B2). However, their waveform stays at either 0 V or 2.5 V for most of the time since the sampling rate is only double of the input frequency.
Fig. 5.22 Measured digital output waveforms: $f_{\text{in}} = 20$ MHz, input amplitude $\sim 1$ V$_{\text{p-p}}$, $f_s = 5$ MHz.

Since under-sampling technique has been utilized in communications, the chip was also characterized in the under-sampling condition ($f_{\text{in}} > f_s$). Fig. 5.22 shows the outputs (B4 – B0) and the clock signal (CK is not in scale in time domain). Compared to the previous measurement, the outputs have the best latching property (no impulses
in the waveforms). Therefore, considering the output waveforms, this implementation would be a better choice in under-sampling applications.

Fig. 5.23 Die photos.

(a) Bondpad

(b) Density fill patterns

MIM caps

Fig. 5.23 shows the die photos for one of the chips. Since the IBM process utilized a thick passivation layer and thick metals (on top, near the surface) for density fill patterns, the detail of the circuit can hardly be seen through these die
photos. Fig. 5.23 (a) shows the die is filled with density fill patterns. Only the MIM capacitors which used a thick metal as the top plate can be seen in Fig. 5.23 (b).

5.6 Summary

Two key techniques: two-step structure and interpolation technique, were applied to the fully differential ADCs. The two-step structure not only reduced the input capacitance of the comparator array but also reduced the total power and area consumptions. Furthermore, the comparator interpolation technique reduced the power and area of each sub-ADC, too. The interpolation also averaged the offset of each preamplifier of the comparator and resulted in better DNLs.

The 5-bit interpolating ADC has a DNL of 0.114 LSB and an INL of 0.076 LSB. It consumes 66.14 mW and achieves an ENOB of 4.3-bit and 4.1-bit for low input frequency and near Nyquist rate at 2-GSPS, respectively. The chip size is 4 mm$^2$. The 8-bit two-step interpolating ADC has a DNL of 0.33 LSB and an INL of 0.40 LSB. It consumes 172 mW and achieves an ENOB of 7.5-bit and 6.9-bit for low input frequency and near Nyquist rate at 500-MSPS, respectively. The chip size is 8.75 mm$^2$. We also showed it is feasible to realize the implementation in the IBM 8HP SiGe HBT technology through the measurement results.
Chapter 6: Conclusion and Contributions

6.1 ADC Building Blocks

This dissertation presented high-speed T/H, comparator designs in SiGe HBT technology. We discussed the design considerations and tradeoffs based on the high sampling rate. The T/H has a bandwidth of 31.83 GHz and power dissipation of 13 mW. The comparator has an accumulative gain of 23 dB and consumes 6 mW. The T/H and comparator have been characterized at 2 GHz and the results showed that they are applicable in our 5-bit ADC (2-GSPS) and 8-bit ADC (500-MSPS).

A novel analog subtractor/DAC (the CSS) combinational circuit has been demonstrated and employed in the two-step ADC. The linear relation between I/O current allows us to design a higher resolution two-step ADC. We can guarantee the accuracy of the smaller current through a large current, because a large current source is less affected by process variation. Using this large current source to control the smaller currents is more accurate than generating the small currents directly. We also showed this new topology can reduce the complexity of the control circuitry and it has low sensitivity to the device mismatches caused by the process variations. The simulation result showed the largest deviation caused by the input control voltages is 0.30 mV at the output. This deviation is negligible in our 8-bit ADC. In addition, we have seen the deviation caused by the geometry of the MOSFET current sources can be minimized to 0.218 LSB. It can be further improved, but the power dissipation will also be increased.
6.2 High-Speed ADCs

Both the two-step and interpolation techniques helped us to reduce the power and area consumptions in our 5-bit ADC and 8-bit two-step ADC. Through the comparator interpolation technique, the comparator array has a bandwidth of 13.78 GHz for our 5-bit ADC. Based on the circuits developed and systems topologies, several chips have been simulated, laid out and fabricated. The simulation results showed competitive performance when compared to the state-of-the-art ADCs. These claims were backed up by limited part test.

6.3 Future Work

In order to improve the dynamic performance, future implementations can refine the designs to the first order approximation by increasing the current drives. For instance, the biasing current for the comparator and CSS can improve the signal-to-noise ratio (SNR). Consequently, the ENOB can be increased. On the other hand, based on the proposed techniques, ADCs with higher resolutions would be a direction in a broad application field.
Distortion in the Differential Amplifier

A.1 Nonlinearity

The output of a nonlinear system can be represented by the weighted summation of input harmonics. Suppose the input signal in time domain is $x(t)$, the output becomes:

$$y(t) = a_0 + a_1 x(t) + a_2 x^2(t) + a_3 x^3(t) + \cdots. \tag{A.1}$$

In (A.1), the weighted coefficients, $a_0$, $a_1$, $a_2$, $\cdots$, are determined by the system architecture. We can apply (A.1) in a differential amplifier and get the two outputs as:

$$y_+(t) = a_0 + a_1 x(t) + a_2 x^2(t) + a_3 x^3(t) + \cdots, \tag{A.2}$$

$$y_-(t) = a_0 - a_1 x(t) + a_2 x^2(t) - a_3 x^3(t) + \cdots. \tag{A.3}$$

We obtain (A.3) since the input signal is $-x(t)$. Therefore, the differential output becomes:

$$y_{\text{diff}}(t) = y_+(t) - y_-(t)$$

$$= 2[a_1 x(t) + a_3 x^3(t) + a_5 x^5(t) + \cdots]. \tag{A.4}$$

The even-order harmonic terms are dismissed from the I/O relationships. As a result, the differential architectures are extensively used to reduce the circuit distortion.
Suppose $v_{id}$ and $-v_{id}$ are the differential inputs to the amplifier as shown in Fig. A.1. The output currents can be obtain through the square law:

\[
I_1 = \frac{I_{tail}}{2} + \frac{\mu C_{ox} W}{4 L} v_{id} \sqrt{\frac{4I_{tail}}{L} \frac{W}{v_{id}^2}} + \frac{\mu C_{ox} W}{4 L} v_{id} \sqrt{\frac{4I_{tail}}{L} \frac{W}{v_{id}^2}},
\]

\[
I_2 = \frac{I_{tail}}{2} - \frac{\mu C_{ox} W}{4 L} v_{id} \sqrt{\frac{4I_{tail}}{L} \frac{W}{v_{id}^2}} - \frac{\mu C_{ox} W}{4 L} v_{id} \sqrt{\frac{4I_{tail}}{L} \frac{W}{v_{id}^2}}.
\]

Given $|x|<1$, we have:

\[
\sqrt{1+x} = \sum_{n=0}^{\infty} \frac{(-1)^n(2n)!}{(1-2n)n!4^n} x^n.
\]

Since both $M_1$ and $M_2$ are in saturation region for amplification, $|v_{id}| < \sqrt{\frac{2I_{tail}}{\mu C_{ox} W}}$. 

123
Thus, the differential output voltage becomes:

\[
v_{od} \equiv (I_1 - I_2) \cdot R \\
= \frac{\mu C_{ox} W}{2} \frac{W}{L} \left[ v_{id} + \frac{\mu C_{ox} W}{8I_{tail}} v_{id}^3 + \frac{\mu C_{ox} W}{32I_{tail}} v_{id}^5 + \cdots \right].
\]  

(A.7)

Therefore, the even-order harmonics are cancelled out each other.

**A.3 BJT Differential Amplifier**

![BJT differential amplifier diagram](image)

Fig. A.2 BJT differential amplifier.

Suppose \( V_{BE0} \) provides the DC bias voltage for \( Q_1 \) and \( Q_2 \) in Fig. A.2 is \( V_{BE0} \) and the differential inputs are \( v_{id} \) and \(-v_{id}\). The output currents can be represented by the diode current equation. That is:

\[
I_1 = I_s e^{\frac{V_{BE0} + v_{id}}{V_T}}.
\]  

(A.8)

Similarly,

\[
I_2 = I_s e^{\frac{V_{BE0} - v_{id}}{V_T}}.
\]  

(A.9)
Given

\[ e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!} \]  

we have the differential output current as:

\[ I_{od} = I_1 - I_2 \]

\[ = 2I_s e^{V_{gs}/V_T} \left[ \frac{V_{id}}{V_T} + \frac{1}{6} \left( \frac{V_{id}}{V_T} \right)^3 + \frac{1}{120} \left( \frac{V_{id}}{V_T} \right)^5 + \cdots \right]. \]  

(A.11)

The differential output voltage is the product of (A.11) and R. Similarly, the even-order harmonics are cancelled out each other.
References

[1.1] G. Geelen and E. Paulus, “An 8b 600 MS/s 200 mW CMOS Folding A/D
Converter Using an Amplifier Preset Technique,” *IEEE International Solid-

[1.2] K. Azadet et al., “Equalization and FEC Techniques for Optical
Transceivers,” *IEEE Journal of Solid-State Circuits*, vol. 37, no. 3, pp. 317-

69-mW 10-bit 80-MSample/s Pipelined CMOS ADC,” *IEEE Journal of


Receivers,” *IEEE Journal of Solid-State Circuits*, vol. 39, no. 10, pp. 1671-

[1.7] M. Choi and A. A. Abidi, “A 6-b 1.3Gsps A/D Converter in 0.35-µm
CMOS,” *IEEE Journal of Solid-State Circuits*, vol. 36, no. 12, pp. 1847-1858,


IEEE Transaction on Semiconductor Manufacture, vol. 10, no. 2, pp. 209-217, 
May 1997.


Resistor String DACs,” IEEE International Symposium on Circuits And 

Error Compensation in Thermometer-Encoded DAC Arrays,” IEEE 

CMOS DAC,” IEEE Journal of Solid-State Circuits, vol. 34, no. 12, pp. 1708-

Digital Frequency Synthesizer Using Nonlinear Digital-to-Analog Converter,” 
1999.


