**ABSTRACT** 

Title of Document: LEAD-FREE ELECTRONICS USE AND

REPAIR DYNAMIC SIMULATION

Andrew Charles Chaloupka, Master of Science,

2009

Directed By: Dr. Peter Sandborn, Professor of Mechanical

Engineering

The conversion from tin-lead to lead-free electronics has increased concern amongst engineers about the reliability of electronic assemblies that have been repaired

with lead-free parts. Program-level management is often told by engineers that the "sky

is falling" due to an unforeseen technical issue but is not moved to action without the

occurrence of an unfortunate incident or a quantitative business case. Unfortunately,

engineers often do not have the tools to articulate the risks and impacts that they foresee

in terms that management understands such as cost and availability.

In order to communicate the impact of the tin-lead to lead-free electronics

conversion in terms of cost and availability, a simulation of fielded electronic systems to

and through a repair facility was created. Systems manufactured with tin-lead parts (or a

mixture of tin-lead and lead-free parts) that potentially have to be repaired with a mixture

of tin-lead and lead-free parts are modeled. The model includes the effects of repair

prioritization, multiple possible failure mechanisms, no-fault-founds, and un-repairable

units. These effects are used to quantify and demonstrate the system- and enterpriselevel risks posed by the tin-lead to lead-free conversion issue.

Example analyses were performed on electronic assemblies that use SAC (tin, silver and copper) and tin-lead solder using a repair process modeled after a NSWC Crane Aviation Repair Process. The components considered consisted of SMT passive, BGA, CSP and TSOP packaged parts that experienced three different thermal cycling profiles. The impact of the conversion from tin-lead to SAC for the example system is studied and the cost and availability impacts were quantified.

The case studies revealed that when exposed to usage profiles characteristic of consumer electronics, low maximum and mean thermal cycling temperatures with long dwell times, SAC exhibited significantly reduced repair costs when compared to tin-lead. For usage profiles characteristic of aerospace and high performance applications, high maximum and mean thermal cycling temperatures with short dwell times, SAC exhibited significantly increased repair costs when compared to tin-lead. It was also found that the NSWC Crane Aviation Repair Process (as modeled) is more than capable of handling a population of 8,000 LRUs even when experiencing a 50% reduction in capacity. As a result, prioritizing the repair of LRUs had no significant impact on the cost or availability metrics for the cases considered. In addition, the rate of LRU field deployment had no impact when using the NSWC Crane Aviation Repair Process.

### LEAD-FREE ELECTRONICS USE AND REPAIR DYNAMIC SIMULATION

By

Andrew Charles Chaloupka

Thesis submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Master of Science 2009

Advisory Committee: Professor Peter Sandborn, Chair Associate Professor Patrick McCluskey Associate Professor Jeffery Herrmann © Copyright by Andrew Charles Chaloupka 2009

### Dedication

This thesis is dedicated to my parents Roland and Linda, my sister Tracy, my service dog in training, and my officemates in graduate school.

My mother has supported me through every endeavor I have ventured. Her motivation and dedication to her career has provided me with a strong model of how to operate both professionally and efficiently in my own work.

My father has strongly influenced me with his engineering background to be the best and most creative in my work. He imbues the fortitude to never settle for anything but the best.

My sister Tracy has always been there to offer support and comic relief. Her continued determination in her career has been an enduring example growing up.

Although it may seem as an unconventional dedication, my service dog in training, Maya, has been an over arching support over the past seven months. Her ability to show unconditional love, her willingness to stay by my side morning and night either in my office or at home, has fueled my strength and stamina to push through this thesis in just under one calendar year.

I would also like to dedicate this to my officemates in graduate school. With their help I have experienced what it means to be a Maryland Terrapin.

### Acknowledgements

First and foremost, I would like to thank my advisor, Dr. Peter Sandborn for his continued support in this thesis. His guidance has allowed me to successfully complete my research in a limited amount of time.

I would also like to thank Dr. McCluskey and Dr. Herrmann for being part of my defense committee and for inspiring me to be a better decision maker.

I would like to acknowledge the contributions of several individuals to this work: first to Bill Russell at Raytheon and Denny Fritz at SAIC whose vision and guidance proved invaluable throughout the course of my research. I would also like to thank the Naval Surface Warfare Center at Crane Indiana for providing data and technical feedback. Finally, I wish to acknowledge the CALCE Electronic Products and Systems Consortium at the University of Maryland for funding this work.

## Table of Contents

| Dedication                                                              | ii  |
|-------------------------------------------------------------------------|-----|
| Acknowledgements                                                        | iii |
| List of Tables                                                          | vi  |
| List of Figures                                                         | vii |
| Chapter 1: Introduction                                                 | 1   |
| 1.1 Making a Case to Management                                         | 2   |
| 1.2 Lead-Free Solder                                                    | 3   |
| 1.2.1 The Conversion to Lead-Free                                       | 4   |
| 1.3 Repair Culture Concerns                                             | 5   |
| 1.4 Thesis Objectives and Tasks                                         | 6   |
| Chapter 2: Model Development                                            | 8   |
| 2.1 Modeling Repair Processes                                           |     |
| 2.1.1 Modeling Using Discrete Simulation                                | 11  |
| 2.1.2 Advantages of a Simulation                                        | 12  |
| 2.2 Introduction to Modeling Repair Using Discrete Event Simulation     | 13  |
| 2.2.1 Mode of Execution                                                 | 14  |
| 2.2.2 Process Modeling (Process Flow and Steps)                         | 15  |
| 2.2.3 Conversion of Non-Time Based Distributions                        | 18  |
| 2.2.4 Sampling                                                          | 19  |
| 2.3 The Modeling Process                                                | 20  |
| 2.3.1 Modeling the Queues                                               | 21  |
| 2.3.2 Adding Spares/Repairing Process                                   | 28  |
| 2.3.3 Early Retirement                                                  | 29  |
| 2.3.4 Branching                                                         |     |
| 2.3.5 How LRUs Get in and Out of the Repair Process Flow                | 31  |
| 2.3.6 Time Step Selection and Management                                | 31  |
| 2.3.7 The Impact of Low Capacity Process Steps on the Total Repair Time | 33  |
| 2.4 Outputs                                                             |     |
| 2.4.1 Average Cost per LRU                                              | 35  |
| 2.4.2 Average Repair Time                                               | 36  |
| 2.4.3 Availability                                                      | 37  |
| 2.5 Model Summary                                                       | 38  |
| Chapter 3: Model Test Case                                              | 40  |
| 3.1 Test Case Development                                               |     |
| 3.1.1 LRU Introduction and Retirement Schedules                         | 40  |
| 3.1.2 LRU Operational Profile                                           |     |
| 3.1.3 Developing the Failure Mechanism Distributions                    | 43  |
| 3.1.4 Repair Process                                                    |     |
| 3.2 Analysis Results                                                    |     |
| 3.2.1 Test A Results                                                    | 59  |
| 3.2.2 Test B Results                                                    |     |
| 3.2.3 Test C Results                                                    |     |
| 3.2.4 Test D Results                                                    |     |
| 3.2.5 Test E Results                                                    | 73  |

| 3.2.6 Test F Results                                     | 76  |
|----------------------------------------------------------|-----|
| 3.2.7 Test G Results                                     | 78  |
| 3.2.7 Test H Results                                     | 80  |
| 3.2.8 Test I Results.                                    | 82  |
| Chapter 4: Conclusions                                   | 85  |
| 4.1 Conclusions                                          |     |
| 4.2 Contributions.                                       | 86  |
| 4.3 Future Work                                          | 87  |
| 4.3.1 Throwaway Applications                             | 87  |
| 4.3.2 Process Step Durations                             |     |
| 4.3.3 Multiple Instances of a Package Type on a Test LRU | 89  |
| 4.3.4 Multiple Failures on the Same Date                 |     |
| 4.3.5 Vibration Failure Mechanism                        | 89  |
| 4.3.6 Maintenance Data Integration                       | 90  |
| 4.3.7 Continuation of Damage During the Repair Process   |     |
| Appendix A – Simulation Details                          |     |
| Repair Process Step Animation                            | 101 |
| Appendix B – calceFAST Failure Mechanism Reference       |     |
| First Order Thermal Fatigue Model For Leadless Packages  |     |
| Glossary                                                 |     |
| References                                               |     |

## List of Tables

| Table 2.1: Process Step "Field Failure Identification"                       | 17      |
|------------------------------------------------------------------------------|---------|
| Table 2.2: Example Process Step Notation                                     | 24      |
| Table 3.1: Temperature Cycling Requirements, Mandated and Preferred Test Par | ameters |
| within Mandated Conditions [GEIA 2008]                                       | 43      |
| Table 3.2: Thermal Cycling Cases 1-3 Used to Compare Solder Reliability      | 48      |
| Table 3.3: LCC Attributes Defined in calceFAST                               | 48      |
| Solders                                                                      | 49      |
| Table 3.4: Weibull Parameters, LCC Package, for Thermal Cases 1-3            | 50      |
| Table 3.5: BGA Attributes Defined in calceFAST                               | 51      |
| Table 3.6: Weibull Parameters, BGA Package, for Thermal Cases 1-3            |         |
| Table 3.7: BGA Attributes Defined in calceFAST                               | 53      |
| Table 3.8: Weibull Parameters, CGA Package, for Thermal Cases 1-3            | 56      |
| Table 3.9: Baseline NSWC Repair Process                                      | 58      |
| Table 3.10: Parameters for Thermal Cycling Case 1                            | 59      |
| Table 3.11: Test A Metrics                                                   | 63      |
| Table 3.12: Parameters for Thermal Cycling Case 2                            | 66      |
| Table 3.13: Test B Metrics.                                                  | 68      |
| Table 3.14: Parameters for Thermal Cycling Case 3                            | 69      |
| Table 3.15: Test C Metrics.                                                  | 71      |
| Table 3.16: Test D Metrics                                                   | 73      |
| Table 3.17: Case E Metrics                                                   | 76      |
| Table 3.18: Case F Metrics                                                   | 78      |
| Table 3.19: Case G Metrics                                                   | 79      |
| Table 3.20: Case H Metrics                                                   | 82      |
| Table 3.21: Case I Metrics                                                   | 84      |
| Table 4.1: Case Study Results, Tests A-I, Percent Differences                | 85      |

# List of Figures

| Figure 2.1: Breakdown of a Process Steps Queue into the Repair Section and Waiting   |      |
|--------------------------------------------------------------------------------------|------|
| Pool                                                                                 | . 16 |
| Figure 2.2: First Time Step of Field Failure Identification Process Step             | . 17 |
| Figure 2.3: Second Time Step of Field Failure Identification Process Step            | . 17 |
| Figure 2.5: LRU Flow through the model from Fielding to End of Support               | . 21 |
| Figure 2.6: Priority Levels and Relation to Mission Criticalness                     |      |
| Figure 2.7: Impact of Priority on Total Repair Time                                  |      |
| Figure 2.8: No Priority Sorting                                                      | . 28 |
| Figure 2.9: Priority Sorting                                                         | . 28 |
| Figure 2.10: Example of Original to Spare LRU Relationship                           | . 30 |
| Figure 2.11: Implementation of Unique and Independent Repair Processes               |      |
| Figure 2.12: Example Process Steps with different durations                          | . 32 |
| Figure 2.13: Example Process Steps                                                   | . 34 |
| Figure 2.14: Total Repair Time (hours) for LRUs 1-40 in the example process shown in | n    |
|                                                                                      | . 35 |
| Figure 2.15: Usage Procedure for the Model                                           | . 39 |
| Figure 3.1: Baseline Deployment Schedule                                             | . 41 |
| Figure 3.2: Medium Deployment Schedule                                               | . 41 |
| Figure 3.3: Rapid Deployment Schedule                                                | . 42 |
| Figure 3.4: Comparing Data Generated Using calceFAST to Experimental TTF Data        | . 45 |
| Figure 3.5: Convergence of Weibull Parameters by Increasing Sample Size              | . 46 |
| Figure 3.6: Case 1, Weibull Plot of LCC Package Cycles to Failure for SnPb and SAC   |      |
| Solders                                                                              | . 49 |
| Figure 3.7: Case 2, Weibull Plot of LCC Package Cycles to Failure for SnPb and SAC   |      |
| Solders                                                                              | . 49 |
| Figure 3.8: Case 3, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and  |      |
| SAC Solders                                                                          | . 50 |
| Figure 3.9: Case 1, Weibull Plot of BGA Package Cycles to Failure for Both SnPb and  | ļ    |
| ~                                                                                    | . 51 |
| Figure 3.10: Case 2, Weibull Plot of BGA Package Cycles to Failure for Both SnPb an  |      |
| SAC Solders                                                                          | . 52 |
| Figure 3.11: Case 3, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and |      |
| SAC Solders                                                                          | . 52 |
| Figure 3.12: Case 1, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and |      |
| SAC Solders                                                                          |      |
| Figure 3.13: Case 2, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and |      |
| SAC Solders                                                                          |      |
| Figure 3.14: Case 3, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and |      |
| SAC Solders                                                                          |      |
| Figure 3.15: Histogram Comparing Repair Cost for SnPb and SAC, Test A                |      |
| Figure 3.16: Effect of Increasing NFF Percent on Population Growth                   |      |
| Figure 3.17: Histogram Comparing Availability for SnPb and SAC, Test A               |      |
| Figure 3.18: Histogram Comparing Repair Time for SnPb and SAC Test A                 | . 62 |

| Figure 3.19: Individual Lowest and Highest LRU Repair Cost Compared to the Aver     | rage       |
|-------------------------------------------------------------------------------------|------------|
| LRU Repair Cost for SAC Solder                                                      | 64         |
| Figure 3.20: Individual Lowest and Highest LRU Availability Compared To the Ave     | erage      |
| LRU Availability for SAC Solder                                                     | 65         |
| Figure 3.21: Individual Lowest and Highest LRU Repair Time Compared To the Av       | erage      |
| LRU Repair Time For SAC Solder                                                      | 66         |
| Figure 3.22: Histogram Comparing Repair Cost for SnPb and SAC, Test B               |            |
| Figure 3.23: Histogram Comparing Availability for SnPb and SAC, Test B              | 67         |
| Figure 3.24: Histogram Comparing Repair Time for SnPb and SAC, Test B               | 68         |
| Figure 3.25: Histogram Comparing Repair Cost for SnPb and SAC, Test C               | 69         |
| Figure 3.26: Histogram Comparing Availability for SnPb and SAC, Test C              | 70         |
| Figure 3.27: Histogram Comparing Repair Time for SnPb and SAC, Test C               | 70         |
| Figure 3.28: Histogram Comparing Repair Cost for Baseline and 20% Reduced Post      | <u>,</u>   |
| repair Reliability, Test D                                                          | 72         |
| Figure 3.29: Histogram Comparing Availability for Baseline and 20% Reduced Post     | ; <b>-</b> |
| repair Reliability, Test D.                                                         | 72         |
| Figure 3.30: Histogram Comparing Repair Time for Baseline and 20% Reduced Pos       | t-         |
| repair Reliability, Test D                                                          | 73         |
| Figure 3.31: Histogram Comparing Repair Cost for Baseline, Medium and Fast Field    | ding       |
| Rates, Test E                                                                       | 74         |
| Figure 3.32: Histogram Comparing Availability for Baseline, Medium and Fast Field   | ding       |
| Rates, Test E                                                                       | 75         |
| Figure 3.33: Histogram Comparing Repair Time for Baseline, Medium and Fast Fiel     | ding       |
| Rates, Test E                                                                       | 75         |
| Figure 3.34: Histogram Comparing Repair Cost for Baseline and Reduced Capacity      |            |
| Process Steps, Test F                                                               | 77         |
| Figure 3.35: Histogram Comparing Availability for Baseline and Reduced Capacity     |            |
| Process Steps, Test F                                                               | 77         |
| Figure 3.36: Histogram Comparing Repair Time for Baseline and Reduced Capacity      |            |
| Process Steps, Test F                                                               | 78         |
| Figure 3.37: Histogram Comparing Repair Cost for Prioritized and Un-prioritized LI  | RUs,       |
| Test H                                                                              | 80         |
| Figure 3.38: Histogram Comparing Availability for Prioritized and Un-prioritized Ll | RUs,       |
| Test H                                                                              |            |
| Figure 3.39: Histogram Comparing Repair Time for Prioritized and Un-prioritized L   | RUs,       |
| Test H                                                                              |            |
| Figure 3.40: Histogram Comparing Repair Cost for Single and Double Package Insta    | ance       |
| LRUs, Test I                                                                        | 83         |
| Figure 3.41: Histogram Comparing Availability for Single and Double Package Insta   | ance       |
| LRUs, Test I                                                                        | 83         |
| Figure 3.42: Histogram Comparing Repair Time for Single and Double Package Inst     | tance      |
| LRUs, Test I                                                                        |            |
| Figure A.1: Progression of a Modeling to Implementation                             |            |
| Figure A.2: Tab (1), Welcome                                                        | 92         |
| Figure A.3: Tab (2) Reliability Models                                              |            |
| Figure A.3: Distribution Input Window                                               | 93         |

| Figure A.4: Tab (3) LRU Specific Inputs       | 94 |
|-----------------------------------------------|----|
| Figure A.5: Tab (4) Process Specific Inputs   | 95 |
| Figure A.6: Tab (5) Runtime Outputs           | 96 |
| Figure A.7, Computational Choice Window       | 96 |
| Figure A.8: Tab (6) Cumulative Metrics Output | 97 |
| Figure A.9: Distribution of Repair Cost       |    |
| Figure A.10: Distribution of LRU Availability |    |
| Figure A.11: Distribution of Repair Time      |    |
| Figure A.12: Tab (7) Solution Control         |    |
| Figure A.13 Solution Control Details          |    |
| Figure A.14: Quantity Plot.                   |    |
| Figure A.15: Process Flow Animation           |    |

### Chapter 1: Introduction

The impact of transitioning to lead-free parts is affecting the electronics industry and most severely the aerospace and defense industries that produce products that require high levels of reliability. Products produced with applications known as AHP (Aerospace and High Performance) are characterized by severe or harsh operating environments, long service times, and high consequences of failure [GEIA 2008]. With these consequences of failure, AHP manufactures, currently are excluded from the RoHS directive. The current directive excludes equipment solely for the purpose of national security and military purposes that are not included in the consumer categories described in the RoHS Directive.

Although excluded from using lead-free parts, most defense and aerospace manufacturers utilize the same supply chain as commercial electronics manufacturers for parts and boards. This is important as in many cases AHP electronics must be repairable at the soldered assembly level [GEIA 2008]. While the supply chains for AHP parts can still produce legacy products that contain tin-lead solder, they have relatively little motivation to do so because the defense and aerospace industry represent less than 5% of the total market share [Russell 2007]. Therefore, commercial manufacturers are focused on providing parts for the commercial electronics industry. The limited availability of

\_

<sup>&</sup>lt;sup>1</sup> WEEE has only exclusions, RoHS has exemptions and exclusions. When equipment is left entirely out of legislation it is termed excluded. This means that certain types of equipment are out of the scope of WEEE entirely, i.e., equipment for the sole use in Aerospace and Defense applications. Exemptions are a series of applications of banned substances that are exempted from some of the RoHS requirements, i.e., Medical and Telecommunications [U.S. Department of Commerce, 2009]. Equipment for use in Aerospace and Defense applications are excluded and not mentioned in RoHS.

<sup>&</sup>lt;sup>2</sup> RoHS – Restrictions on Hazardous Substances is a European directive that restricts hazardous materials in electronics equipment [European Union 2002/95/EC, 2002/96/EC]

lead-based items has become a major driver in the design and sustainment of defense and aerospace systems as the number of tin-lead electronic suppliers' has decreased. This challenge will require the defense and aerospace industry to convert to lead-free long before the RoHS directive requires it to (if ever), i.e., their current exclusion from RoHS is effectively a moot point.

Abundant data exists on the short-term reliability (i.e., less than 5 years) of leadfree solder joints under single loading conditions [Ganesan et al. 2005]. However, data
on combined loading conditions and long-term reliability is limited. Many AHP leadfree products will be serving in platforms where long-term (greater than 15 years)
reliability is a critical requirement. The impact of reliability may be most prevalent at the
system- and enterprise- level for legacy tin-lead assemblies that have been repaired with
lead-free solder. **Legacy systems** refer to systems that have been manufactured in the
past using tin-lead solders and must continue to be supported for the foreseeable future,
while new systems refer to those that were manufactured using lead-free technology. **Enterprise-level impact**, refers to the impact on support logistics (repair flow: repair
time, repair cost, backlog) over the support life cycle of equipment. The impact of the
conversion to lead-free must be quantified in order to provide performance expectations
and provide risk mitigation if and when needed to program-level management.

#### 1.1 Making a Case to Management

Engineers communicate to program-level management every day that the "sky is falling" due to some previously unforeseen technical issue, but management is rarely moved to action without a quantitative demonstration of the system- or enterprise-level

risks posed by the issue. The potential for reduced and less predictable reliability of leadfree electronics increases the probability that a serious technical issue will arise. While engineers have the resources to model and quantify system reliability, they often lack the ability to articulate the risk/impact of the reliability (or changes to the reliability) in terms of cost and availability that management will understand. To provide engineers with a tool that they can use to develop sound proposals (i.e., business cases) to program-level management, a model is needed. This model needs to track large populations of LRUs from field introduction to retirement and accumulate characteristics of the repaired units, including repair cost, repair time and unit reliability. An LRU is defined as a "Line Replaceable Unit", i.e., an electronic card (or board) that can be removed from the field and repaired or replaced. The acronym LRU is used in this thesis synonymously to Shop Replaceable Assembly (SRA), Shop Replaceable Unit (SRU), and Weapon Repair Assembly (WRA). In addition to tracking a population of LRUs, it is important to provide a distinct comparison of traditional tin-lead and lead-free solder reliability. This will allow engineers to make a direct comparison of tin-lead and lead-free solders and the impact in cost and availability they can have on long term fielding.

### 1.2 Lead-Free Solder

For the past 60 years, soldering materials have traditionally been composed of tin and lead. The transition to lead-free solders, meaning that the content of the element lead is <0.1% by weight, applies both to printed circuit boards (PCBs) soldering materials, namely solder paste, or wave solder for surface-mount or through-hole assembly respectively, and finishes used on part terminals and PCB mounting pads [Ganesan et al.

2005, GEIA 2008]. Many different lead-free solders have been proposed and used, however, the most common are Sn-3.0Ag-0.5Cu (SAC 305) alloys due to low melting temperatures and good wettability compared with the Sn-Ag alloys [Zhu et al. 2005]. Currently engineers are developing SAC alloys containing the elements of Indium and Bismuth to improve application properties such as anti-oxidization, stability and melting point [Ma 2006]. Although improvements have been made with lead-free solders such as SAC, many concerns with reliability still exist. The example case in this thesis considers the use of SAC 305, the most commonly used lead-free solder in industry [Hillman 2006].

#### 1.2.1 The Conversion to Lead-Free

Legislative pressures resulting from the RoHS Directive on lead in electronics (and similar pending legislation throughout the world), the enacted Japanese take-back legislation (and similar pending legislation elsewhere in the world), and marketing policies from electronics companies, are the driving forces behind lead-free solder adoption [Eveloy 2005].

The primary driving force of the lead-free conversion is RoHS, a European Directive passed in 2003 that restricts the use of certain hazardous substances in electrical and electronic equipment. The aim of the RoHS Directive is to control the use of certain hazardous substances in newly fielded and future repaired electrical and electronic equipment (EEE) [European Union 2002/95/EC, 2002/96/EC]. Hazardous substances whose use is restricted include: lead, mercury, cadmium, hexavalent chromium, polybrominated biphenyls (PBB) and polybrominated diphenyl ethers (PBDE) [European

Union 2002/96/EC]. Electrical and electronic equipment are categorized into ten categories ranging from household appliances to sports equipment. However, not included in this list are electronics associated with defense and aerospace electronics due to reliability concerns and the implications of failure.

An analysis of individual companies' strategies and consumer reaction within the electronics industry shows that to date, the main benefit of migrating to lead-free electronics has been an increase market share through product differentiation, in terms of product environmental friendliness [Pecht 2005]. Thus, due to the consumer's growing environmentally conscious, manufactures are voluntarily migrating to lead-free technology because these manufacturers wish to be considered environmentally friendly [Casey 2002]. The actual value to the environment (if any) of the conversion from tin-lead to lead-free electronics is not clear and will not be addressed in this thesis.

Irregardless of the reasons for conversion from tin-lead to lead-free electronics, the conversion is a reality (the "train has left the station" and there is no going back) and the ramifications of the conversion need to be understood.

### 1.3 Repair Culture Concerns

Two different cultures exist associated with the handling of failed electronics. Commercial electronics manufactures follow a throwaway culture with their consumer products, i.e., the ideology that throwing away a failed product and replacing it is less expensive than repairing it. In the defense and aerospace industry, a repair culture is followed. The belief behind this culture is that it is more economical to repair than to throwaway and replace.

A legacy aerospace or defense system, a system that was manufactured prior to the RoHS directive with tin-lead technology must continue to be supported post the RoHS directive (maybe for many years). Many military platforms today are operating with legacy technology from the 1980s and 1990s. If a part fails, it may be necessary to repair it using a newer lead-free technology as equivalent or identical lead-based parts become less available (obsolete). The introduction of lead-free repair and manufacturing processes on a legacy system introduces new failure mechanism associated with the addition of lead-free parts and the exposure to thermal profiles not experienced during the original tin-lead manufacturing process.

#### 1.4 Thesis Objectives and Tasks

The objective of this thesis is to provide a model that can be used by engineers to demonstrate to program management the repair cost and availability impacts of reliability changes and various repair scenarios for mixtures of legacy and new electronic systems. The thesis will accomplish the following:

- Describe the development of a new model for the repair of electronic systems
- Develop test cases based on a combination of reliability simulation and experimental results for a representative set of electronic parts
- Provide demonstration results from the test cases from which application-specific and general conclusions about the impact of lead-free parts can be drawn.

Chapter 2 describes the model developed in this thesis in detail in order to provide the reader with background regarding its operation. Topics discussed include: queuing, priority sorting, reliability distribution sampling, modeling a repair process, and the formulation of cost and availability metrics that are the output of the model. The model developed in this thesis has been implemented in software in a tool called the Lead-Free Dynamic Simulator (LFDS). For a description of the software, see Appendix A.

Chapter 3 applies the model to a set of example problems. The sample cases used for validation have been created under the guidance from the Naval Surface Warfare Center (NSWC) at Crane, IN. The repair process was modeled after the Navy's 3M (Maintenance and Material Management) system. Based on this case study, engineers will have the capability to study the implications on repair cost, availability, and repair time due to the conversion from tin-lead to SAC solders.

Chapter 4 provides a summary of results and conclusions based on the case study in Chapter 3. Also included are a set of contributions and recommendations for future work associated with this research.

### Chapter 2: Model Development

Traditional methods of studying a system include experimenting with the actual system and experimenting with a model of the system. Both may produce similar results, however, in some situations, it may not be feasible to test the actual system. Creating a model, either physical or mathematical, allows engineers to gain insight into the expected outcome of the system's operation. The emulation of the system's operation over time introduces the capability to monitor years of activity, the time between LRU field introduction and end of support. One method of modeling a system's operation over time is known as a discrete event simulation. This chapter discusses the development of a discrete event simulation based model for evaluating the impact of reliability on the part repair process for traditional lead-tin and lead-free solders.

### 2.1 Modeling Repair<sup>3</sup> Processes

The service repair model developed in this thesis describes the process in which operating LRUs are tracked to and through repair after failure. The model developed here also models an independent "post-repair" reliability that can represent "as good as new" or "not good as new" repair. Since the process in this model assumes a single of echelon supply (central depot), and does not take into account the product structure of failed units (assemblies, subassemblies), the model will be referred to as a single-echelon, single-indenture model [Sleptchenko et al. 2002].

-

<sup>&</sup>lt;sup>3</sup> Repair refers to fixing units that have failed during field use. Alternatively, "rework" refers to fixing units identified as defective during a manufacturing process (prior to fielding). Rework is not addressed in this thesis.

In the area of repair process modeling, a great deal of effort has been done to solve classical repair problems such as "the military logistics problem of stocking repairable parts for aircrafts at bases which are capable of repairing some, but not all broken parts, and at a central depot which serves all of the bases" [Guide and Srivastava 1997]. This method of understanding, based on Sherbrooke's METRIC model [Kennedy et al. 2002, Sherbrooke 1968], identifies a perspective of the repair process as multiechelon, and multi-indenture, focusing entirely on inventory constraints and replenishment quantities. Later models such as MOD-METRIC and VARI-METRIC are extensions to the base METRIC model that include many modifications to study batch repairs and lateral shipments. These models however, focus almost entirely on the optimal stocking of parts as bases (or forward locations) and a central depot facility that repairs failed units returned from the bases while providing some predetermined level of service [Guide and Srivastava 1997]. There objective is typically to maximize the availability of aircraft, or conversely minimize shortages and hence the number of grounded aircraft, subject to a budget constraint [Guide and Srivastava 1997].

Due to the increased numerical complexity associated with multi-echelon and multi-indentured processes, Diaz and Fu developed a simple model of a single repair shop consisting of one or more single server queues [Diaz and Fu 1997]. This model's primary focus is inventory control and therefore is most appropriate in a resource-constrained environment such as in most industrial settings.

Improvements continued to the METRIC model by Graves in 1985, which introduced the complexities of modeling general service time distributions and multiple types of repairs.

The METRIC model and the single-server method modeled developed by Diaz and Fu do not allow for studies of the impact of components that require different repair steps, or of components that can fail due to multiple mechanisms. Grave's model, although addressing the multiple repair types, differentiates the repair type by another Poisson distribution failing to capture the relationship between LRU and LRU.

The modeling methods, METRIC, MOD-METRIC, VARI-METRIC and the single-server method modeled developed by Diaz and Fu, assume: a Poisson failure process, an infinite LRU population (so that arrival rate at the depot is constant and independent of the actual number of working LRUs), and ample repair capacity (so that the distribution of LRUs in the repair facility is Poisson) [Diaz and Fu 1997]. However, the repair model in this thesis must allow for multiple failure mechanisms, the distribution of failures over time (or cycles), and the ability to distinguish between specific failure mechanisms in the repair process in order to address the tin-lead (SnPb) to lead-free (SAC) conversion. The difference between sampling Weibull distributions for time to failure data and assuming a Poisson failure rate makes the model significantly different. Although existing models track components or LRUs independently, not as populations, they do not carry the specific information unique to the LRU. Tracking the component's specific TTF, mechanism that caused failure, priority level, introduction date or end of support date is very important because this information can be very different for each LRU.

In the model required in this thesis, the repair process is resource limited (in fact, part of the outcome of this research is the required repair process capacity) and each LRU is tracked individually following FIFO queuing rules when interacting with LRUs of

similar priority. None of the known repair models meet the specific requirements; therefore, a new model will be developed.

### 2.1.1 Modeling Using Discrete Simulation

Discrete simulations include two distinct modeling techniques, time-based and discrete event.

For time-based simulation, the progress of the modeled repair process occurs at discrete points in simulation time and are labeled time steps. **Simulation time** is defined as the time being represented within the model. While the state of the process may be observed precisely at time steps 1, 2, 3, etc., its progress between any two consecutive time steps is assumed to be atomic and cannot be perceived by an external observer [Ghosh et al. 2000]. Time-based simulation assumes that important changes only occur at the discrete time steps, and nothing important occurs between consecutive time steps. Therefore, the choice of the time step value is determined by the maximum desired rate of progress, in terms of time of the simulation process.

In discrete event simulation, the process being modeled is advanced by events not time steps. The Cambridge English Dictionary [Cassidy 2007] defines "event" as anything that happens, especially something important or unusual. In the discipline of discrete event simulation, an event refers to any significant incident associated with the state of the process being modeled, expressed in terms of any frame of reference (time, space, energy, etc.) [Ghosh et al. 2000].

### 2.1.2 Advantages of a Simulation

Advantages of computer simulation include the ability to compress and expand time, the ability to control sources of variation, avoidance of errors in measurement, the ability to stop and review, the ability to restore system state, facilitation of replication, and control over the level of detail [Fishman 2001]. The ability to compress or expand time is facilitated in the simulation by running through multiple years of events in a matter of minutes or even seconds depending on the required level of computation. The ability to control (and identify) variation is accomplished through a statistical analysis of the relationship between the independent (input) and dependent (output) factors [Fishman 1978]. Unlike field experiments, which exhibit unavoidable errors of measurement, no measurement errors exist in simulations since the programmed simulation produces numbers free of any superimposed variation due to external and uncontrollable sources [Fishman 1978]. The ability to stop and review intermittent results only exists in simulations, as with field experiments it is often impossible to completely stop all active processes. The ability to restore the systems state allows the researcher to re-run the model to output additional data, and to duplicate the previous run to include this data. The ability to replicate experiments allows for changes in select operating parameters and the investigation of their impact on the result. The model's detail level affects the analysis cost, time, chance of errors and debugging time.

### 2.2 Introduction to Modeling Repair Using Discrete Event Simulation

This section describes the development of a new discrete event simulation based repair model that can be used to evaluate the repair of tin-lead and lead-free electronic systems.

The Lead-Free Dynamic Simulator (LFDS) developed in this thesis exhibits many of the qualifications of a discrete event simulation and time-based simulation. The model utilized in the simulation is stochastic, dynamic, partially discrete, and partially timebased. It is stochastic because its variables are treated as random. This randomness is achieved in the model by sampling reliability distributions so that a population of nonidentical fielded systems can be assessed. The model is intrinsically dynamic, being dependent on time as the primary state variable. The simulation time of the model is represented by tracking each LRU from introduction to retirement (referred to as end of support). In order to comprehend how the model can be partially discrete and partially time based, the framework of the model must be explored. While the discrete list of LRU failure events is determined prior to the advancement of simulation time, the repair events are dependent on more than the simulation time and the state of its individual LRU. The total repair time is dependent on the quantity of LRUs in repair. This quantity and repair time relationship is therefore only advanced by a discrete set of monotonically increasing time steps where the choice of the duration of the time step interval reflects the desired accuracy of the model.

#### 2.2.1 Mode of Execution

The mode of execution for the Lead-Free Dynamic Simulator is as-fast-aspossible execution. This method is also known as unpaced execution because no
relationship exists between the simulation time and wall clock time. The simulator
operates by determining the earliest LRU introduction date into the field, then advances
by time steps or jumps to discrete events. The simulation operates as a discrete event
simulation by jumping to a failure event when there are no LRUs in the repair process.
This jump is accomplished by increasing its time step size to the difference of the next
failure date minus the current date. The time step can increase when there are no LRUs
in the repair process without compromising the simulation's accuracy because state
values are not changing. When one or more LRUs are in the repair process, the model
still operates as a discrete event simulation and the simulation time advances by a
predetermined time step length because events are occurring at the instant of the time
step.

The disadvantage of a time stepped only simulation is the addition of unnecessary computations when no events are present (which results in slow simulations). Locations when no events are present include when there are no LRUs in repair, and when no LRUs are failing.

The Lead-Free Dynamic Simulator utilizes an event-based method of time advancement at discrete time instants in order to: 1) minimize the total wall clock time the simulation operates, 2) maximize state value accuracies.

### 2.2.2 Process Modeling (Process Flow and Steps)

A process flow is a chronological interaction of events used to describe both informational and physical objects [Fishman 1973]. For the model developed in this thesis, the process flow is represented by a list of the process steps in a repair process. Process steps are single server Markovian Queue chains with priority rankings [Ozekici 1990]. Each process step is defined by six unique properties, the step name, cost, duration, capacity, failure mechanism applicability, and early retirement abilities. These properties, which are inputs to the model, affect how failed LRUs are processed in repair. While each process step is independent with respect to another process step's properties, the position or index in the list of steps is global to the simulation.

Depending on a process step's capacity and assuming that there are no other failed LRUs in the step's queue, a LRU will be immediately repaired. As failed LRUs continue to enter the step's queue, they are placed into repair until it reaches full capacity. The queue represents the sequential list of LRUs in the step waiting to be repaired. Capacity is the maximum number of LRUs that a step can simultaneously repair. The capacity of five LRUs in the "Field Failure Identification" process step, Table 2.1, can be imagined as having five workers on separate workstations all performing the same tests to identify what caused the LRU to fail. When there are a greater number of LRUs than the maximum capacity in the step, LRUs are placed into the waiting pool. After the process step has been completed for the LRUs in repair they move to the next sequential step. LRUs that have been held in the waiting pool are drawn into the process step's

repair based on a FIFO queuing policy. Figure 2.1 represents the waiting pool and capacity as a subset of the process step's queue.



Figure 2.1: Breakdown of a Process Steps Queue into the Repair Section and Waiting Pool

Consider the example shown in Figure 2.2. When there are greater than five LRUs in the step, they go into the waiting room where they wait until they can be processed by workers (LRU numbers 6 through 11). The time in step, and total repair time, continues to grow even when the step is in the waiting room. In the case in Figure 2.4, during the third time step there is only one LRU being processed. The LRUs in the queue may be processed regardless if the queue is at capacity or not. Priorities are used to sort the LRUs if requested. The first time step in Figure 2.2 processes all of the urgent, and high priority failed LRUs and one of the medium priority LRUs. The second time step, Figure 2.3, processes the final medium priority LRU followed by four of the LRUs with low priority. The third time step, Figure 2.4, processes the final low priority LRU. This step completes the process of all the failed LRUs waiting to be processes in the step.

Table 2.1: Process Step "Field Failure Identification"

| Process Step                 | <b>Duration (hrs)</b> | Cost (\$) | Capacity | Branching |
|------------------------------|-----------------------|-----------|----------|-----------|
| Field Failure Identification | 6.0                   | 150.00    | 5        | ALL       |



Figure 2.2: First Time Step of Field Failure Identification Process Step



Figure 2.3: Second Time Step of Field Failure Identification Process Step



Figure 2.4: Third Time Step of Field Failure Identification Process Step

Similar to the structure of the process steps is the list of LRU objects. The LRU list contains an individual object for each LRU. Properties included are the LRU number (unique), introduction date, end of support date (EOS), the next time to failure, and mechanism that will cause the next failure. The "LRU number" is an index assigned to the LRU to identify if from other LRUs within the model.

### 2.2.3 Conversion of Non-Time Based Distributions

Since the model's execution is based on the advancement of time, all model inputs that define events must also be mapped to time. When a non-time based reliability distribution is used, i.e., one that is in cycles (thermal, vibration or other), the model must convert all values into relevant time measures. In order to do this, the failure mechanism must contain the basic reliability distribution parameters, the number of cycles per unit, and the units desired for the conversion. In an example case, a failure mechanism asserts

that an LRU will experience 1000 cycles per operational year. It is expected then, that if the reliability distribution was a Weibull distribution containing a location parameter of 4000 cycles that the LRU will fail sometime after four operational years. This calculation is done in the model, by converting cycle based reliability distributions from operational years, days, hours, and minutes to operational hours.

### 2.2.4 Sampling

In order to determine the location (in time) of events corresponding to failures of LRU instances, the time to failure distributions associated with the applicable failure mechanisms must be sampled. Since the example case described in Chapter 3 uses 2-parameter Weibull time to failure distributions, Equation (2.1) the sampling procedure will be explained below.

$$(x; \beta, \eta) = \frac{\beta}{\eta} \left( \frac{x}{\eta} \right)^{\beta - 1} e^{\left( \frac{x}{\eta} \right) \beta}$$

$$\beta = \text{shape parameter}$$

$$\eta = \text{scale parameter}$$
(2.1)

A Monte Carlo method is used in which deviates are obtained from probability distributions through the following process: 1) a random number between 0 and 1 inclusive is chosen; 2) the value of the cumulative distribution function (cdf) is set equal to the random number and the corresponding value is added to the current time in the model. The addition of the cdf value to the current time creates the next time to failure TTF.

### 2.3 The Modeling Process

Unique to the model described in this thesis is the ability to track information regarding individual LRUs from introduction to end of support to and through a repair facility during failure. A conceptual layout of the model is depicted in Figure 2.5. The model starts at step 1 in Figure 2.5, by determining the earliest introduction date in the population of LRUs. In the preprocessing stage, steps 2 through 4 in Figure 2.5, operational profiles are converted to common units, reliability distributions are sample, and numerical sorting of data occurs to determine the soonest failure event. Steps 6 through 14 in Figure 2.5, describe the operation of the simulation and the tracking of a LRU to and through the repair process. Important to these steps are the advance of time by variable time step sizes, the repair of LRUs in a process is flow determined by specific repair rules, i.e., FIFO, priority, duration, etc, and the resampling of a post-repair reliability distributions. The LRU is retired when they reach their end of support dates. If the LRU has not been retired, the model progresses to step 15 of Figure 2.5. The LRU will continue in the cycle of fielding, failure, repair, and re-fielding until its end of support date is reached.



Figure 2.5: LRU Flow through the model from Fielding to End of Support

Important to the single LRU flow is the interaction that exists when multiple LRUs are in the repair process. When there are multiple LRUs in the repair process, a significant amount of queuing and sorting occurs in steps 9-14 in Figure 2.5.

### 2.3.1 Modeling the Queues

Each process step in the repair process has a repair section limited by the capacity and waiting pool, which are subsets of the queue Figure 2.1 that individual LRU instances enter into in the order in which they arrive at a specific process step (FIFO).

The process step takes LRU instances from the top of the waiting pool and moves them into the repair as its capacity allows.

The arrival process associated with failing LRUs being fed into the repair queues can is not an ordinary Poisson process with rate  $\lambda$ , therefore the time between LRU failures, are independent of each other [Ozekici 1990]. This makes the model different from the repair models described in the literature review; the time to failure is not generated by the constant failure rate. Within this model, LRU failures are dependent on TTFs generated through Monte Carlo sampling of Weibull distributions.

Once an LRU fails, it enters a repair step and stays there for the specified duration of the step. The time in the step can increase based on having to wait if other LRUs are ahead of it, waiting to be repaired. In the case of the model described here, there are no distributions associated with the process step's duration. The time required for repair is associated with the specific mechanism that caused failure and the current number of LRUs in the process step. The time in the step will always be greater than or equal to the process step's duration. When the process step's repair is full and LRUs have to pause in the waiting pool, the total time spent in the step will increase.

#### 2.3.1.1 Service Policy (Queuing Discipline)

In order to define the service policy, three key items must be specified. The first item to be identified is the number of servers present in the process.<sup>4</sup> Within the model, there is only one repair process. This repair process, which is synonymous to the process flow, can be modeled as a Markovian queue (the capacity represents the maximum number of LRUs that can be simultaneously processed) with priorities. The second item

-

<sup>&</sup>lt;sup>4</sup> In this model, a server is defined as a single parallel repair process.

to be identified is the capacity of the queue, and the policy that dictates what happens when there are more LRUs in the step's queue than can be processed concurrently by the step. The third item to be identified is the service discipline, i.e., first in, first out (FIFO), last in, first out (LIFO), servicing in random order (SIRO) and priority rules (PR).

In 1953, Kendall [Kendall 1953] proposed the following notation to classify queues:

$$A \mid B \mid C \mid k$$
 (Service Rule) (2.2)

Where:

A = interarrival distribution

B = service time duration,

C = number of servers

 $k = queue \ capacity$ 

Within the model, LRUs enter the repair process through a Monte Carlo sampled Weibull distribution that is denoted by M. Each repair step has a fixed duration, denoted by the time in hours. As stated before, there is only one server, as all LRUs must flow though the same repair process. The queue capacity, k, is denoted by the maximum number of LRUs that can be repaired in the steps duration. Example step notations for the model described here are given in Table 2.2.

**Table 2.2: Example Process Step Notation** 



### 2.3.1.2 Markovian Queues with Priorities (M/M/1/k) and Priorities

Within the model, LRUs are repaired individually rather than being repaired as a batch.

The first in, first out (FIFO) service discipline is often the most chosen procedure for determining the order in which LRUs are repaired. However, this is not the case in many service systems and customers are classified according to different priorities. VIP, first-class and economy-class priorities are almost always given to airline passengers. Users of computers systems are routinely given different priority levels to access the system and run their programs [Ozekici 1990].

# 2.3.1.3 Preemptive versus Non-preemptive

LRUs in fielded applications often have different levels of mission importance. In order to incorporate this in the model, priorities were introduced in order to expedite LRUs of higher importance through the repair process.

Within the model there are four priority levels: urgent, high, medium, and low, which are described based on mission criticalness in Figure 2.6. Priority levels urgent, high and medium are preemptive, meaning that if they join a queue that contains LRUs of lower priority, they will shuffle the order or preempt them from the repair of LRUs specified by the capacity being processed. When the queue opens up, the LRU preempted from service may continue from the point of the interruption, this rule is called preemptive-resume [Ozekici 1990].

During the repair process, and the beginning of each time step, the model sorts the LRUs into ranking priorities, fills the queue and begins repair. This method of sorting allows for a single, urgent priority LRU, to bypass all queue LRUs within a process step.



Figure 2.6: Priority Levels and Relation to Mission Criticalness

In order to better describe the impact of priority on LRU availability, a small population of 40 LRUs will be observed over a one year period (all of the LRUs are assumed to fail at the same time in this example). For a population this size, the capacities and durations of the repair process steps have been decreased to significantly impact the lead time before repair.

When the model simulates this population of LRUs, there is no change in the average availability between a population that was prioritized and a population that was not because the LRUs were assumed to fail only once. The increase in average repair time when prioritizing will be addressed below. When looking at the distribution of average repair times, the impact of prioritizing LRUs is clearly visible. Figure 2.7, plots the individual index number versus its total time in the repair process (sum of time spent either being repaired, or waiting to be repaired).



Figure 2.7: Impact of Priority on Total Repair Time

For this example the average repair time for each group of urgent, high, medium, and low priority LRUs is 8, 13, 18, and 23 hours respectably. The average repair time for all 40 LRUs in this sample is 15.5 hours. By assigning the mission critical LRUs the urgent priority rating, there is a 48.3% reduction in repair time. From Figure 2.7, it is clear that prioritizing LRUs can alter a population's repair time either decreasing or increasing it based on its level of mission importance when lead times are affected.

Figures 2.6 and 2.7 depict the distributions of repair time for an un-prioritized case and a prioritized case respectably. The inputs used to generate this case were a population of 400 LRUs whose failures are dictated by Weibull distribution generated TTFs. When comparing the mean repair time for the un-prioritized case versus the prioritized case, there is an 8.9% increase when assigning repair priorities to LRUs. The increase in repair time on the average repair time is because LRUs of higher priority are spending less time in repair, and are placed in the field more quickly. These LRUs

experience a greater amount of time in the field, therefore failing more often than their counterparts waiting in the repair facilities to be repaired. The double distribution shape seen in Figure 2.9 is due to the fact that higher priority LRUs have a decreased repair time (the left population), followed by LRUs of lower priority forming the population on the right. The distribution in Figure 2.8 is single and normal as the LRUs follow a first in first out (FIFO) repair rule.



Probablity Distribution

Mean = 3722.64
Standard Deviation = 3208.601

Mean = 3722.64
Standard Deviation = 3208.601

Mean = 3722.64
Standard Deviation = 3208.601

Repair Time

OK Print Help

Figure 2.8: No Priority Sorting

**Figure 2.9: Priority Sorting** 

# 2.3.2 Adding Spares/Repairing Process

In a real life situation when an LRU fails in the field, it is removed from the system and sent for repair. A spare is immediately installed in place of the original LRU to continue system operation. Upon repair, the original LRU is reinstalled and the spare is removed and replaced into storage. The time to failure clock associated with failure

mechanisms in the spare only accumulates the time the original LRU spent in the repair process.

An approximation to the real sparing process is assumed in the model. When a LRU fails and enters the repair process, a spare is assumed to replace it. However, the simulation does not accumulate time against the spare's failure mechanisms unless the spare becomes a permanent replacement for the LRU, i.e., only if the original LRU is retired during repair. The assumption is that the spares do not accumulate appreciable degradation if they are only used while the original LRUs are in repair.

#### 2.3.3 Early Retirement

Oftentimes, an LRU will enter the repair process, pass through one or more steps and be deemed non-repairable. Early retirement is supported in the model by creating specific process steps with the capability to specify a fixed fraction or distribution of LRUs to be retired. When a failed LRU enters one of these specific process steps and is determined to be retired early, the model adds a spare LRU to replace the retired LRU. The failure date of the original LRU becomes the introduction date of the spare LRU. Prior to introduction into the field, the reliability distributions corresponding to all the relevant failure mechanisms are sampled and included in the spare LRUs properties. All other LRU specific properties of the spare, including the end of service date and priority, are the same as the originally failed LRU. The spare LRU acts with the same behavior as the original LRU, and is modeled with the same metrics. If a spare should fail and not be repairable, it will be replaced by another spare that inherits the properties of its parent. Figure 2.10 represents the relationship of LRU specific properties between the parent and child LRU.



Figure 2.10: Example of Original to Spare LRU Relationship

# 2.3.4 Branching

The branched step option provided in the model allows the engineer to explicitly specify the repair path as a function of the failure mechanism (and/or part type) that caused the LRU's failure. In some cases, different failure mechanisms require different repair steps. Figure 2.11 depicts how the process flow of a failed LRU can differ based on which failure mechanism causes the failure: A, B or C.



Figure 2.11: Implementation of Unique and Independent Repair Processes

The mechanism-specific capabilities of each step are stored as part of the process step object's information.

### 2.3.5 How LRUs Get in and Out of the Repair Process Flow

When a LRU fails, it is removed from the field and is placed in the first step of the repair process. This is achieved by increasing the number of LRUs by 1 in either the repair or waiting pool subsection of the queue. If the process step's repair section is under capacity and the LRU is un-prioritized it will be placed in the next sequential opening in the repair section and will be processed during the time step. If the process step's repair section is full (over capacity) and the LRU is un-prioritized it will be placed next in line within the waiting pool. The waiting pool consists of a sequential list of LRUs waiting to fill the process step's repair. If the process step's repair section is under capacity and the LRU is prioritized, the LRU will be placed at the top of its priority type within the repair section. If there are no LRUs of that priority type, it will be placed at the end of the line following the next highest priority rating. If the process step's repair section is full (over capacity) and the LRU is prioritized, it will be placed next in line, following an LRU with equivalent priority rating, within the waiting pool.

# 2.3.6 Time Step Selection and Management

The process of determining the time step value (length) is controlled by two factors, the required accuracy of the simulation and the duration (time) of each repair process step. To obtain the best accuracy and minimize the run-time of the simulation, the size of the time step is set to the greatest common divisor (GCD) of the process step duration during the repair process and the difference of the soonest time to failure (TTF)

and the current date when there are no LRUs in the repair process. This concept is explained in Figure 2.12, which depicts two process steps with unique repair durations.



Figure 2.12: Example Process Steps with different durations

The simulation time operates by beginning at the "start" time, or earliest LRU introduction date and advances by the addition of the time step. In the case shown in Figure 2.12 with two process steps, with durations of 1 and 1.5 hours respectably, the user may choose to use a "1 hour" time step as it is the smallest process step. In this case the clock advances 1 hour, and the LRU passes through process step A. Entering process step B, the clock must advance two full time steps before the LRU can move out of the step. This time step size therefore increased the total repair time to 3.0 hours versus the correct time of 2.5 hours. To avoid this error, the time step must be the lowest common denominator of the process step durations. Taking the above example, the GCD of the two process steps would be ½ hour. Upon entering the repair process, the model advances two time steps before removing the LRU from process step B obtaining the correct total repair time of 2.5 hours.

While accuracy of the model is always important, it may be traded off against computational speed. However, there are ways of increasing the speed of the simulation

by reducing the number of computations without negatively impacting the accuracy. When there are no events occurring in the simulation, i.e., when no LRUs are failing or are in the repair process, there is no need to sort each repair process step queue or the array of LRU TTFs. The simulation determines the gap until the next event, and jumps to the next event. This is the part of the event stepped method that advances time when the discrete events of the model are the LRU failures.

In order to determine the next discrete event, the model must first determine whether there is a LRU in the repair queue. If there is, the step size is set to the GCD of the process step duration by default. If there are no LRUs in the repair process, the simulation calls on a stored array of TTF distributions for each LRU, sorts this array by ascending date and determines the soonest future TTF event. The difference between this TTF date and the current date is the new time step. The size of this step can potentially range from the GCD of the process flow to many years. This time step has the ability to be large because the simulator is jumping to a date when there will be LRUs in the repair queue. When a LRU enters the repair queue, the time step is set to the fixed value, determined by the GCD of the process step durations. This fixed value time represents each individual event in the repair process.

# 2.3.7 The Impact of Low Capacity Process Steps on the Total Repair Time

The electronic repair process in a discrete event simulator is initiated by a single or group of failed LRUs. The failed LRU is placed into the first step of the repair process, remains there for the step's duration, and is then transferred into the next process step upon completion of the step. This release of the LRU from the first process step is

dependent on both the process step's duration and its capacity. The process step's duration is the minimum time that each LRU must spend in that process step. The maximum time spent in the repair process step is dependent on the step's capacity or capability to repair multiple LRUs simultaneously. The capacity of the process step is the maximum number of LRUs that can be handled in that process step concurrently. Therefore, a process step with a high capacity will only occupy the LRU for the step's defined duration. However, a process step that has a capacity lower than the total number of LRUs entering the repair process will backup LRUs and increase the overall LRU repair time. The example repair process with three steps A, B and C used to demonstrate the negative impact of a process step with characteristics of a low capacity and a small duration is represented by Figure 2.13.



Figure 2.13: Example Process Steps

Each process step has a unique duration and capacity. For this case, 50 LRUs have just failed, and the user is running a 1-hour time step in the model. After one time step, all 50 LRUs have completed 50% of repair step A. After the second time step, all 50 LRUs have completed 100% of repair step A, and have moved into repair step B. Due to repair step B's small capacity (2 LRUs), after the third time step, LRUs will begin to

back up in repair step B's queue. The minimum time needed to complete the entire repair process for a single LRU is 6 hours for the above case. Figure 2.14 illustrates the LRU number versus the repair time for each of the 40 LRUs sampled in this example.



Figure 2.14: Total Repair Time (hours) for LRUs 1-40 in the example process shown in Figure 2.13

In Figure 2.12, when a process step has a small capacity it can have significant repercussions on the duration a LRU remains in repair. The time spent in the repair process modeled in Figure 2.11 increases rapidly from the minimum time of 6 hours to nearly 25 hours for LRU #40 due to time spent waiting to enter the repair section.

#### 2.4 Outputs

# 2.4.1 Average Cost per LRU

In this model, the cost being calculated represents only a subset of the total ownership cost of a LRU. The specific subset being described in this model is the cost of

maintaining LRUs in the field, i.e., the cost to repair. Other costs associated with the LRU are not addressed in this model.

The repair cost per LRU is calculated by summing the cost of each repair step that the LRU was processed in. However, the cost of the repair step represented by the value specified in the repair process is the cost of performing the step during the first year. A discount rate, or time value of money is taken into account for repairs that occur past the first year. In Equation 2.3, the cost of the repair step is calculated based on the date (in years) of the repair.

$$Present Value of Repair Step = \frac{Orginal Value of Repair Step}{(1 + Discount Rate)^{Year}}$$
(2.3)

The number of process steps that each LRU enters, is dependent on the mechanism that caused failure. LRUs may fail more than once, therefore be repaired more than once, and possibly follow different repair branches each time it is repaired. The possibility for different repair costs suggests calculating an average repair cost per LRU. In Equation 2.4, the average cost per LRU is calculated by summing the individual repair costs per LRU and dividing by the total number of LRUs.

$$Average Repair Cost per LRU = \frac{\sum Repair Cost per LRU}{Total Number of LRUs}$$
(2.4)

# 2.4.2 Average Repair Time

The repair time represents the time required for an LRU to move through a particular repair process and any extra time spent waiting to enter the repair area. The specific repair process is determined by the mechanism that caused failure and the type of part. Therefore failure caused by vibration in a 2512 resistor is processed differently than a NFF (no fault found) failure in a CTBGA.

The repair time is calculated by summing the individual times the LRU spends in each repair step, either while being repaired or time spent in the waiting pool. In Equation 2.5, the average repair time per LRU is calculated by summing the repair time for each LRU and divided by the total quantity of LRUs.

Average Time In Repair per 
$$LRU = \frac{\sum Time\ In\ Repair\ per\ LRU}{Total\ Number\ of\ LRUs}$$
 (2.5)

# 2.4.3 Availability

Availability is the probability that an item will be able to function (i.e., not failed or undergoing repair) when called upon to do so. Availability is a function of an item's reliability (how quickly it fails) and its maintainability (how quickly it can be repaired and/or how it is spared). Quantitatively, availability is given by,

$$Availability = \frac{Up \ time}{Up \ time + Down \ time} \tag{2.6}$$

The concept of availability marries reliability and maintainability together and only applies to "repairable" systems.

Within this model, availability is defined as the fraction of time the LRU is available for field use and is calculated on the LRU level versus the system level.<sup>5</sup> From this perspective, availability is only a function of total time in the field and the total repair time. In order to calculate the average LRU availability, the individual LRU availability must first be calculated. The individual LRU availability is calculated in Equation 2.7 by subtracting the total time in the field by the total repair time and dividing this by the total time in the field.

\_

<sup>&</sup>lt;sup>5</sup> Availability can be evaluated either for the LRUs or for the "sockets." Sockets are the places in a system where the fielded LRUs are located. In this thesis, only the availability of the LRUs is considered.

$$LRU \ Availability = \frac{\sum Time \ In \ Field - \sum Time \ In \ Repair}{\sum Time \ In \ Field}$$
(2.7)

To calculate the average availability, Equation 2.8 sums each individual LRU's availability and divides by the total number of LRUs in the system.

$$Average LRU \ Availability = \frac{\sum LRU \ Availability}{Total \ Number of \ LRUs}$$
(2.8)

In comparison, for the above repair process, it took only 6 hours to repair the first LRU and 25 hours to repair the  $40^{th}$  LRU.

# 2.5 Model Summary

The model described in this chapter communicates the impact of the tin-lead to lead-free electronics conversion in terms of repair cost and LRU availability. These effects are used to quantify and demonstrate the system- and enterprise- level risks posed by the tin-lead to lead-free conversion issue. The procedure for utilizing the model is shown in Figure 2.13. In Chapter 3, detailed test cases are developed and example results from the test cases are discussed.



Figure 2.15: Usage Procedure for the Model

# Chapter 3: Model Test Case

In order to exercise the model developed in Chapter 2 test cases were developed. The test cases implement a range of different electronic components of various sizes and package types and were assessed for both tin-lead and lead-free solder finishes. The objective of the test cases is twofold: 1) to demonstrate the capability of the model, and 2) to assess the cost and availability impact of the conversion from tin-lead to lead-free and for a range of conditions.

# 3.1 Test Case Development

The model test cases track 8,000 LRU level avionics boards from introduction to retirement. Each of the 8,000 LRUs were tracked entirely independent of each other. The test cases require three basic inputs:

- 1) Logistics Inputs: Introduction and retirement schedules for the LRUs (how many are fielded, when they are fielded and when they are retired from the field)
- 2) Relevant failure mechanisms for the LRUs (including reliability distributions)
- 3) The repair process that will be used for the LRUs (process steps including durations and capacities)

The following subsections describe the development of the input data for the test cases.

#### 3.1.1 LRU Introduction and Retirement Schedules

For the test cases, three deployment (manufacturing/fielding) schedules are utilized. The baseline deployment schedule, which is depicted in Figure 3.1, introduces

LRUs quarterly over a ten-year period with a smooth introduction rate and an equivalent retirement rate during a ten year period.



Figure 3.1: Baseline Deployment Schedule

The medium deployment schedule, which is depicted in Figure 3.2, introduces LRUs quarterly over a ten-year period with an increased introduction rate from the baseline deployment schedule. LRUs are introduced at a rate of 250 per quarter for the first 4 years and then ramp down to approximately 133 LRUs per quarter.



Figure 3.2: Medium Deployment Schedule

The rapid deployment schedule, which is depicted in Figure 3.3, introduces LRUs quarterly over a ten-year period with an even more increased introduction rate compared to the baseline model. LRUs are introduced at a rate of 500 per quarter for the first two years, followed by just 125 LRUs introduced per quarter for the next 8 years.



Figure 3.3: Rapid Deployment Schedule

#### 3.1.2 LRU Operational Profile

GEIA assumes that in most cases, 1000 operational cycles are sufficient for estimating usage over support life and is considered a standard duration for reliability testing in many companies/organizations [GEIA 2008]. In the test cases discussed in this thesis, each LRU assumes a support life of 30 years and therefore will experience 1,000 operational cycles. This equates to an operational profile of 33 cycles per year.

IPC-9701A (Table 3.1) provides additional guidance for duration values and further information about the number of temperature cycles and their interpretation with respect to service life.

Table 3.1: Temperature Cycling Requirements, Mandated and Preferred Test Parameters within Mandated Conditions [GEIA 2008]

| Test Condition                                              | Mandated Condition                                                                                                |
|-------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
| Cycle (TC) Condition:                                       |                                                                                                                   |
| TC1                                                         | 0°C ←→ +100°C (Preferred Reference)                                                                               |
| TC2<br>TC3                                                  | -25°C ←→ +100°C<br>-40°C ←→ +125°C                                                                                |
| TC4                                                         | -40 C ← +125 C<br>-55°C ← +125°C                                                                                  |
| TC5                                                         | -55°C ←→ 100°                                                                                                     |
| Test Duration                                               | Whichever condition occurs FIRST: 50% (Preferred 63.2%) cumulative failure (Preferred Reference Test Duration) or |
| Number of Thermal Cycle (NTC) Requirement:                  |                                                                                                                   |
| NTC-A                                                       | 200 cycles                                                                                                        |
| NTC-B                                                       | 500 cycles                                                                                                        |
| NTC-C                                                       | 1,000 cycles (Preferred for TC2, TC3,and TC4)                                                                     |
| NTC-D<br><i>NTC-E</i>                                       | 3,000 cycles                                                                                                      |
| ****                                                        | 6,000 cycles (Preferred Reference TC1)                                                                            |
| Low Temperature Dwell                                       | 10 minutes                                                                                                        |
| Temperature Tolerance (preferred)                           | +0/-10°C (+0/-5°C) [+0/-18°F (+0/-9°F)]                                                                           |
| High Temperature Dwell<br>Temperature Tolerance (preferred) | 10 minutes<br>+10/-0°C (+5/-0°C) [+18/-0°F(+9/-0°F)]                                                              |
| Temperature Ramp Rate                                       | ≤20°C [36°F]/minute                                                                                               |
| Full Production Sample Size                                 | 33 component samples                                                                                              |
| ·                                                           | (32 test samples plus one for cross-section, add additional 10 samples for rework, if applicable)                 |
| Printed Wiring (Circuit) Board (PWB/PCB) Thickness          | 2.35 mm [0.093 in]                                                                                                |
| Package/Die Condition                                       | Daisy-Chain Die/Package (see Table 4-2)                                                                           |
| Test Monitoring                                             | Continuous Monitoring (see Table 4-4,<br>Preferred Reference-Event Detector)                                      |

# 3.1.3 Developing the Failure Mechanism Distributions

Although the inputs to the model are component specific failure mechanisms, an LRU is capable of failing from multiple different failure mechanisms [Dasgupta et al. 1991]. The failure mechanisms are represented in the model developed in this thesis by time-to-failure (TTF) distributions. The TTF distributions corresponding to specific failure distributions can be determined experimentally or from previously developed reliability models (that were determined experimentally). In this thesis, the applicable

reliability distributions were determined using the calceFAST simulation tool [calceFAST 2005].

In order to develop a test board with failure parameters similar to experimental boards containing similar components, calceFAST was used as part of an iterative process. The Monte Carlo TTF data generated from calceFAST was fit to a Weibull distribution using Weibull++ [Weibull++ 2003]. This section explains the iterative process using calceFAST to generate 2-parameter Weibull data.

The calceFAST (Failure AsseSsment Toolkit) is a software interface for a collection of analysis models that can be used in the assessment of time to failure of structures found in electronic products and systems [calceFAST 2005]. Within the software, the user can specify a single failure mechanism, the package type, and relating parameters. For each package type, a common set of parameters was adjusted in order to obtain a realistic distribution on the Monte Carlo data.

The first step in determining the parameters necessary to produce results similar to an experimental case is to define the package types to be studied. The packages included for this test case were leadless chip carriers (LCC), ball grid arrays (BGA), and column grid arrays (CGA). These types were chosen with the assumption that larger package types have a shorter characteristic life when compared to packages of smaller size.

The second step in generating the test Weibull parameters is to calibrate the calceFAST. An experimental case comprised of a 228 lead BGA package that experienced 0/100°C thermal cycling at 10-minute dwell times was used to calibrate the Weibull distribution values from calceFAST. Two parameters, thermal calibration factor

and interconnect length were adjusted in calceFAST in order to form distributions similar to that of Figure 3.4 [Al-Momanl et al. 2008].



 $\begin{array}{l} \beta 1{=}8.1772,\, \eta 1{=}3975.2248,\, \rho{=}0.9862\\ \beta 2{=}7.4712,\, \eta 2{=}1211.6521,\, \rho{=}0.9566\\ \beta 3{=}6.5250,\, \eta 3{=}3379.5690,\, \rho{=}0\\ \beta 4{=}8.8210,\, \eta 4{=}1082.0580,\, \rho{=}0 \end{array}$ 

Figure 3.4: Comparing Data Generated Using calceFAST to Experimental TTF

Data

First, the thermal fatigue calibration factor was increased from the default value of 1.0 until the mean TTF was close to that of the Figure 3.4. Next, a uniform distribution with 10% variation of the interconnect spans in both the in-plane directions was applied in order to increase the spread of the TTFs.

The general type of failure mechanism chosen for the above set of package types was a first order thermal fatigue model. Within calceFAST, this failure mechanism has specific selections for the package type being studied, e.g., "First Order Thermal Fatigue Model for Column Grid Array or First Order Thermal Fatigue Model for Ball Grid

Array". Specific documentation describing each of the different failure mechanisms from calceFAST can be found in Appendix B.

GEIA specifies that the number of temperature cycles (or duration) should be sufficient to evaluate the expected performance of the samples in the required applications. This equates to running the experiment to failure, or >75% failure of all samples in order to obtain proper statistical metrics [GEIA 2008]. Within calceFAST, each of the test packages are cycled until failure and are analyzed using Monte Carlo. The sample size for the Monte Carlo simulation was based on convergence of the Weibull parameters. Figure 3.5 confirms the fact that after 1000 samples there appears to be very little variation in both  $\eta$  and  $\beta$ .



Figure 3.5: Convergence of Weibull Parameters by Increasing Sample Size

46

<sup>&</sup>lt;sup>6</sup> The maximum sample size allowed in calceFAST limited extending Figure 3.5 to 100,000 samples.

A study composed of three different thermal cycling profiles was developed. The goal of the different profiles was to evaluate a range of thermal cycling parameters where: 1) SAC is more reliable, 2) there should be no difference in reliability between SAC and SnPb, and 3) SnPb is more reliable. Parameters adjusted within the profiles include the dwell time, maximum, minimum and mean temperatures. The maximum and minimum temperatures represent the upper and lower limits of the thermal cycle. The mean therefore, is just the average between the maximum and minimum temperatures. Dwell time is the length of time that a temperature is maintained at the maximum of the temperature cycle.

The first case developed would exhibit a lower cyclic mean and maximum temperature, where SAC is expected to outperform SnPb [Everhart et al 2007, McCluskey et al. 2009]. The opposite case would consist of a high cyclic mean and maximum temperature, where SnPb outperforms SAC [Everhart et al. 2007, McCluskey et al. 2009]. In order to match conditions that favored these trends, three thermal cases were generated. Case 1 in Table 3.2 exhibits a low maximum temperature of 100°C, a mean temperature of 50°C with a long dwell time of 40 minutes. Case 2 in Table 3.2, exhibits a medium maximum temperature of 110°C, a medium mean temperature of 55C and a medium dwell time of 10 minutes. Case 3 in Table 3.2, exhibits a higher maximum temperature of 130°C, a higher mean temperature of 65C and a short dwell time of 0.1 minutes (6 seconds). At these three cases calceFAST predicts that: SAC is more reliable than SnPb in Case 1, SAC and SnPb have nearly identical reliabilities in Case 2, and SnPb is more reliable than SAC in Case 3.

Table 3.2: Thermal Cycling Cases 1-3 Used to Compare Solder Reliability

| Case<br># | Max Temp<br>(°C) | Min Temp<br>(°C) | Avg Temp<br>(°C) | Dwell Time<br>(min) |
|-----------|------------------|------------------|------------------|---------------------|
| 1         | 100.0            | 0.0              | 50.0             | 40.0                |
| 2         | 110.0            | 0.0              | 55.0             | 10.0                |
| 3         | 130.0            | 0.0              | 65.0             | 0.1                 |

# 3.1.3.1 Leadless Chip Carrier (LCC)

The attributes listed in Table 3.3 are used to define the LCC package and attach parameters.

Table 3.3: LCC Attributes Defined in calceFAST

**Package Parameters** 

| Interconnect Span (X)      | 4.33       | mm |
|----------------------------|------------|----|
| Interconnect Span (Y)      | 4.33       | mm |
| Package Material Reference | Ceramic CC |    |

**Attach Properties** 

| Solder Material                    | SnPb / SAC       |    |
|------------------------------------|------------------|----|
| Solder Height                      | 0.1              | mm |
| Board Material Reference           | Epoxy Fiberglass |    |
| Thermal Fatigue Calibration Factor | 1.5              |    |

After running the Monte Carlo simulation, which was described in Section 3.1.3, the cycles to failure data was plotted and fit with a Weibull curve using Weibull++. The Weibull plots for the LCC package at thermal cycles 1-3 are presented in Figures 3.6 to 3.8 followed by a summary of their parameters in Table 3.4.



Figure 3.6: Case 1, Weibull Plot of LCC Package Cycles to Failure for SnPb and SAC Solders



Figure 3.7: Case 2, Weibull Plot of LCC Package Cycles to Failure for SnPb and SAC Solders



Figure 3.8: Case 3, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and SAC Solders

Table 3.4: Weibull Parameters, LCC Package, for Thermal Cases 1-3

| Solder Type       | SnPb    |          |        | SAC     |          |        |
|-------------------|---------|----------|--------|---------|----------|--------|
| Weibull Parameter | β       | η        | ρ      | β       | η        | ρ      |
| Case 1            | 17.3595 | 477.0349 | 0.9585 | 13.8519 | 586.9142 | 0.9549 |
| Case 2            | 15.5269 | 488.8268 | 0.9541 | 14.0313 | 479.4791 | 0.9582 |
| Case 3            | 14.2055 | 852.7378 | 0.9583 | 12.8109 | 433.1598 | 0.9535 |

# 3.1.3.2 Ball Grid Array (BGA)

The attributes listed below in Table 3.5 are used to define the BGA package and attach parameters.

Table 3.5: BGA Attributes Defined in calceFAST

# **Package Parameters**

| Interconnect Span (X)      | 24.18       | mm |
|----------------------------|-------------|----|
| Interconnect Span (Y)      | 24.18       | mm |
| Package Material Reference | Plastic PEM |    |

#### **Attach Properties**

| Solder Material                    | SnPb / SAC       |    |
|------------------------------------|------------------|----|
| Collapsed Ball Height              | 0.562            | mm |
| Board Material Reference           | Epoxy Fiberglass |    |
| Thermal Fatigue Calibration Factor | 1.5              |    |

After running the Monte Carlo simulation, which was described in Section 3.1.3, the cycle to failure data was plotted and fit with a Weibull curve using Weibull++. The Weibull plots for the BGA package at thermal cycles 1-3 are presented in Figures 3.9 to 3.11 followed by a summary of their parameters in Table 3.6.



Figure 3.9: Case 1, Weibull Plot of BGA Package Cycles to Failure for Both SnPb and SAC Solders



Figure 3.10: Case 2, Weibull Plot of BGA Package Cycles to Failure for Both SnPb and SAC Solders



Figure 3.11: Case 3, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and SAC Solders

Table 3.6: Weibull Parameters, BGA Package, for Thermal Cases 1-3

| Solder Type       | SnPb    |           |        | SAC     |          |        |
|-------------------|---------|-----------|--------|---------|----------|--------|
| Weibull Parameter | β       | η         | ρ      | β       | η        | ρ      |
| Case 1            | 16.4404 | 710.1023  | 0.9567 | 13.3239 | 949.4979 | 0.9550 |
| Case 2            | 16.0867 | 733.1301  | 0.9550 | 13.5834 | 777.9479 | 0.9514 |
| Case 3            | 14.3930 | 1349.5196 | 0.9543 | 13.2601 | 710.5656 | 0.9555 |

# 3.1.3.3 Column Grid Array (CGA)

The attributes listed below in Table 3.7 are used to define the CGA package, column, attach, and board parameters.

Table 3.7: BGA Attributes Defined in calceFAST

Package Parameters

| r deliage r drameters      |            |    |  |  |
|----------------------------|------------|----|--|--|
| Interconnect Span (X)      | 58.4       | mm |  |  |
| Interconnect Span (Y)      | 58.4       | mm |  |  |
| Package Thickness          | 2.4        | mm |  |  |
| Package Material Reference | Ceramic CC |    |  |  |
| Package Interconnect Pitch | 2.54       | mm |  |  |

#### **Column Paramters**

| Interconnect Material (Lead) | Alloy 42 |    |
|------------------------------|----------|----|
| Column Height                | 1.7      | mm |
| Column Diameter              | 0.7      | mm |

**Attach Properties** 

| Solder Material                    | SnPb / SAC |      |
|------------------------------------|------------|------|
| Solder Height                      | 0.1        | mm   |
| Solder Joint Bond Area             | 0.8        | mm^2 |
| Thermal Fatigue Calibration Factor | 1.5        |      |

#### **Board Parameters**

| Board Thickness          | 2.36             | mm |
|--------------------------|------------------|----|
| Board Material Reference | Epoxy Fiberglass |    |

After running the Monte Carlo simulation, which was described in Section 3.1.3, the cycle to failure data was plotted and fit with a Weibull curve using Weibull++. The

Weibull plots for the CGA package at thermal cycles 1-3 are presented in Figures 3.12 to 3.14 followed by a summary of their parameters in Table 3.8.



Figure 3.12: Case 1, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and SAC Solders



Figure 3.13: Case 2, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and SAC Solders



Figure 3.14: Case 3, Weibull Plot of LCC Package Cycles to Failure for Both SnPb and SAC Solders

Table 3.8: Weibull Parameters, CGA Package, for Thermal Cases 1-3

| Solder Type       | SnPb    |           |        | SAC     |          |        |
|-------------------|---------|-----------|--------|---------|----------|--------|
| Weibull Parameter | β       | η         | ρ      | β       | η        | ρ      |
| Case 1            | 18.3160 | 724.4178  | 0.9417 | 15.2505 | 808.0187 | 0.9341 |
| Case 2            | 17.6609 | 736.9550  | 0.9298 | 15.2383 | 645.8529 | 0.9337 |
| Case 3            | 15.2061 | 1312.1411 | 0.9319 | 15.1000 | 560.3206 | 0.9370 |

# 3.1.4 Repair Process

The repair process developed in this model, Table 3.9, was formulated based on the NSWC Crane Aviation repair process [Naval Air Systems Command 2006]. The repair process contains a total of 48 independent process steps. Specific to this repair process is a 10% probability of whether or not the LRU is NFF (this value was an estimate provided by NSWC Crane). If a LRU is determined to be NFF, it continues through the repair process until it reaches step 10. From step 10, the LRU skips steps 11 through 39 until it reaches step 40 where it continues processing. When step capacities are reduced to study the affect of reduced repair resources, only steps 6 through 39 are affected. Steps 0 through 6 and steps 40 through 48 are considered administrative steps, such as packaging, transit and paperwork. These steps are not specific to the actual repair of the LRU.

Within the repair process, Table 3.9, there are a total of six columns that specify information regarding each individual step and its relationship to the process. The "Index #" column represents the hierarchal order the process steps are organized into. This is important as LRUs move sequentially from the first to the last step. The "Process Step" column defines the name of the step. The "Duration" column is the minimum time, in calendar hours, required for a step to complete its task. The "Cost" column represents the

individual cost assigned to an LRU that is processed in that step. The "Capacity" column represents the maximum number of LRUs that can be simultaneously processed in the step. The "Branched" column specifies the repair path as a function of the failure mechanism (and/or part type) that caused the LRU's failure. In some cases, different failure mechanisms require different repair steps. For detailed information on how LRUs flow through the model repair process see Section 2.2.2 Process Modeling (Process Flow and Steps).

When determining the time spent in the repair process for each LRU, there is the implicit assumption that the repair process runs 24/7/365. Although this assumption does not accurately reflect a realistic repair process, it reduces the complexity of the model.

**Table 3.9: Baseline NSWC Repair Process** 

| Index # | Process Step              | Duration | Cost    | Capacity | <b>Branched</b> |
|---------|---------------------------|----------|---------|----------|-----------------|
|         | YY Field Failure ID       | _        | 75.00   | 200      | ALL             |
| 1       | Capture of Resources      |          | 75.00   | 200      | ALL             |
|         | Removal                   |          | 75.00   | 200      | ALL             |
| 3       | Package For Transit       | 2.00     | 150.00  | 100      | ALL             |
|         | Transit                   |          | 1200.00 | 100000   | ALL             |
| 5       | Receiving                 | 18.00    | 30.00   | 100000   | ALL             |
|         | Disassembly to Card Level |          | 150.00  | 4        | ALL             |
|         | Locate Test Program       |          | 75.00   | 4        | ALL             |
|         | Test Prep                 |          | 75.00   | 4        | ALL             |
|         | Run Test                  | 0.50     | 37.50   | 4        | ALL             |
| 10      | Diagnose to Component     | 0.50     | 37.50   | 4        | ALL             |
|         | Coating Removal           |          | 100.00  | 4        | 1, 2, 3,        |
|         | Remove Part               | 0.30     | 22.50   | 4        | 1, 2, 3,        |
| 13      | Clean/Prep the Site       | 0.50     | 100.00  | 4        | 1, 2, 3,        |
|         | Find Parts                | 0.50     | 37.50   | 4        | 1, 2, 3,        |
| 15      | Pull Parts From Supply    |          | 15.00   | 4        | 1, 2, 3,        |
|         | Prep Site                 |          | 50.00   | 4        | 1, 2, 3,        |
|         | Component Prep            | 0.20     | 15.00   | 4        | 1, 2, 3,        |
|         | Assemble To Card          |          | 22.50   | 4        | 1, 2, 3,        |
| 19      | Continuity Testing        | 0.20     | 15.00   | 4        | 1, 2, 3,        |
|         | Coating Replacement       |          | 150.00  | 10       | 1, 2, 3,        |
| 21      | Verify Fault Corre.       | 0.50     | 37.50   | 4        | 1, 2, 3,        |
|         | Coating Removal           |          | 500.00  | 4        | 1, 2, 3,        |
| 23      | Remove Part               |          | 50.00   | 4        | 1, 2, 3,        |
| 24      | Clean/Prep the Site       | 0.70     | 150.00  | 4        | 1, 2, 3,        |
| 25      | Pull Parts From Supply    | 0.10     | 200.00  | 4        | 1, 2, 3,        |
| 26      | Prep Site                 | 0.40     | 100.00  | 4        | 1, 2, 3,        |
| 27      | Component Prep            | 1.00     | 200.00  | 4        | 1, 2, 3,        |
| 28      | Assemble To Card          | 1.00     | 75.00   | 4        | 1, 2, 3,        |
| 29      | Continuity Testing        | 1.00     | 75.00   | 4        | 1, 2, 3,        |
| 30      | Verify Fault Corre.       | 1.00     | 75.00   | 4        | 1, 2, 3,        |
| 31      | Coating Removal           | 0.30     | 200.00  | 4        | 1, 2, 3,        |
| 32      | Remove Part               | 0.50     | 37.50   | 4        | 1, 2, 3,        |
| 33      | Clean/Prep the Site       | 0.60     | 120.00  | 4        | 1, 2, 3,        |
|         | Pull Parts From Supply    | 0.10     | 50.00   | 4        | 1, 2, 3,        |
| 35      | Prep Site                 | 0.30     | 75.00   | 4        | 1, 2, 3,        |
| 36      | Component Prep            | 0.50     | 75.00   | 4        | 1, 2, 3,        |
| 37      | Assemble To Card          | 0.50     | 37.50   | 4        | 1, 2, 3,        |
| 38      | Continuity Testing        | 0.40     | 40.00   | 4        | 1, 2, 3,        |
|         | Verify Fault Corre.       | 0.70     | 40.00   | 4        | 1, 2, 3,        |
| 40      | Put Box Together          | 2.00     | 150.00  | 4        | ALL             |
| 41      | Complete Paperwork        | 1.00     | 75.00   | 4        | ALL             |
| 42      | Maint. Officer Sort       | 1.00     | 75.00   | 1        | ALL             |
| 43      | Package For Transit       | 2.00     | 150.00  | 4        | ALL             |
| 44      | Transit                   | 60.00    | 1200.00 | 100000   | ALL             |
| 45      | Receiving                 |          | 30.00   | 100000   | ALL             |
| 46      | Reinstall                 | 1.00     | 75.00   | 200      | ALL             |
| 47      | Verify Fix In System      | 1.00     | 75.00   | 200      | ALL             |

### 3.2 Analysis Results

In order to study the impact of the conversion to lead-free, the model was run independently for: 1) package types attached with SnPb solder and 2) package types attached with SAC 305 solder. A run therefore is defined as the fielding 8,000 LRUs that contain one of each of the following package types: LCC, BGA and CGA tracked from introduction to end of support.

A total of 9 tests were run to compare different thermal cycling properties, fielding rates, reduced repair process capacities and increased time step size.

- A) Thermal Cycling Case 1, SnPb compared to SAC
- B) Thermal Cycling Case 2, SnPb compared to SAC
- C) Thermal Cycling Case 3, SnPb compared to SAC
- D) Effect of Reduced Post Repair Reliabilities By 20%
- E) Effect of Increased Fielding Rates
- F) Effect of Reduced Repair Process Capacity
- G) Effect of Increasing Time Step Size Greater than GCD on Model Accuracy
- H) Effect of LRU Repair Priorities
- I) Effect of Doubling Package Instances on Test Board

#### 3.2.1 Test A Results

Test A, included a comparison between SnPb and SAC experiencing thermal cycling profile defined in Case 1, Table 3.10.

**Table 3.10: Parameters for Thermal Cycling Case 1** 

| Case # | Max Temp (°C) | Min Temp (°C) | Avg Temp (°C) | Dwell Time (min) |
|--------|---------------|---------------|---------------|------------------|
| 1      | 100.0         | 0.0           | 50.0          | 40.0             |

LRUs in Test A experienced the baseline introduction defined in Figure 3.1 and were repaired using the baseline NSWC Crane repair process.

Histograms were generated for distributions of repair cost, availability and repair time. In order place multiple data sets on a histogram, a common set of bins was created. In order to do this, the data set with the minimum value must be determined in order to generate the initial set of bins and bin spacing. To include multiple data sets, more bins are created using the bin spacing found from the data set containing the minimum value. By using the original set of bins with the extended bins, the other data sets can be sorted with the same scale.

The data was then plotted with frequency versus the metric of interest, either repair cost (dollars) Figure 3.15, availability (fraction of uptime over total time) Figure 3.16, or repair time (days) Figure 3.17.



Figure 3.15: Histogram Comparing Repair Cost for SnPb and SAC, Test A

The small population designated by the number 1 in Figure 3.15, are a result of LRUs deemed as NFF. These LRUs are not processed in NSWC Crane repair process

steps 11-39, as seen in Table 3.9. Therefore, their corresponding repair cost is significantly lower than a standard failed LRU. The percent of NFF LRUs for the tests discussed in this chapter is 10%. However, Figure 3.16 shows the effect on the distribution of costs when the percent of NFF LRUs is increased from 0 to 50%.



Figure 3.16: Effect of Increasing NFF Percent on Population Growth

The two large populations of LRUs designated by the numbers 2 and 3 in Figure 3.15 are a result of a varying number of failures per LRU. The LRUs in distribution 2 have failed only once. The LRUs is distribution 3 have failed 2 or more times, which explains why their repair cost is nearly twice as large or more than LRUs in distribution 2.



Figure 3.17: Histogram Comparing Availability for SnPb and SAC, Test A



Figure 3.18: Histogram Comparing Repair Time for SnPb and SAC Test A

Table 3.11 displays the average metrics of Test A for SnPb and SAC solder.

**Table 3.11: Test A Metrics** 

|                                |            | Solder Type |            |  |
|--------------------------------|------------|-------------|------------|--|
|                                | SnPb       | SAC         |            |  |
| Total Number of Failures       | 31467      | 2274        | 8          |  |
| Average Number of Failures/LRU | 3.9334     | 2.843       | 5          |  |
| Total Cost                     | 65,405,625 | 44,181,93   | 9 \$       |  |
| Average Cost                   | 8,176      | 5,52        | 3 \$/LRU   |  |
| Average Availability           | 0.997      | 0.99        | 8          |  |
| Average Repair Time            | 34.7       | 25.         | 5 days/LRU |  |

The following conclusions can be made from Test A in which LRUs experienced the thermal cycling profile of case 1: There was a 27.71% decrease in the number of failures, a 32.45% decrease in cost, a 0.08% increase in availability and a 26.68% decrease in repair time by using SAC solder.

In order to study the effect of the stochastic inputs associated with the reliability of each component for each of the runs above in test A, the standard deviation of each of the above average metrics was calculated over 10 runs. The result is that for SnPb solder, the average number of failures differed by  $\pm 0.003632$  failures per LRU, the average repair cost differed by  $\pm $6.40$ , the average availability does not differ, and the average time in repair differed by  $\pm 0.10$  days. For SAC solder, the average number of failures differed by  $\pm 0.004283$  failures per LRU, the average repair cost differed by  $\pm $8.81$ , the average availability does not differ, and the average time in repair differed by  $\pm 0.17$  days. Similar calculations can be completed for tests B-I by repeating the simulation and taking the standard deviations of the means.

In addition to the previous plots and final metrics, the individual and average LRU metrics plotted over time can be of interest to provide cost tradeoffs. Figure 3.19 plots the highest individual LRU repair cost in the population, the lowest individual LRU repair cost in the population and the average LRU repair cost of the population. As seen

in Figure 3.19, there are LRUs failing as early as 2019 and staying failure free until as late as 2036.



Figure 3.19: Individual Lowest and Highest LRU Repair Cost Compared to the Average LRU Repair Cost for SAC Solder

Figure 3.20 plots the highest individual LRU availability, the lowest individual LRU availability and the average LRU availability. The LRU with the lowest availability is also the LRU that has the highest repair costs. The LRU with the highest availability is the LRU that has not failed until 2036.



Figure 3.20: Individual Lowest and Highest LRU Availability Compared To the Average LRU Availability for SAC Solder

Figure 3.21 plots the highest individual LRU repair time, the lowest individual LRU repair time, and the average LRU repair time. The LRU with the highest repair time is also the LRU that has the highest repair costs. The LRU with the lowest repair time is the LRU that has not failed until 2036.



Figure 3.21: Individual Lowest and Highest LRU Repair Time Compared To the Average LRU Repair Time For SAC Solder

# 3.2.2 Test B Results

Test B, included a comparison between SnPb and SAC experiencing thermal cycle parameters defined in Case 2, Table 3.12.

**Table 3.12: Parameters for Thermal Cycling Case 2** 

| ( | Case # | Max Temp (°C) | Min Temp (°C) | Avg Temp (°C) | Dwell Time (min) |
|---|--------|---------------|---------------|---------------|------------------|
| ſ | 2      | 110.0         | 0.0           | 55.0          | 10.0             |

LRUs in Test B experienced the baseline introduction defined in Figure 3.1 and were repaired using the baseline NSWC Crane repair process.

Figures 3.22 through 3.24 represent the distributions of metrics, i.e., repair cost, availability, and repair time respectively.



Figure 3.22: Histogram Comparing Repair Cost for SnPb and SAC, Test B



Figure 3.23: Histogram Comparing Availability for SnPb and SAC, Test B



Figure 3.24: Histogram Comparing Repair Time for SnPb and SAC, Test B

Table 3.13 displays the average metrics of Test B for SnPb and SAC solder.

**Table 3.13: Test B Metrics** 

|                                | SnPb       | SAC                |          |
|--------------------------------|------------|--------------------|----------|
| Total Number of Failures       | 30385      | 30953              | 3        |
| Average Number of Failures/LRU | 3.7981     | 3.869 <sup>-</sup> |          |
| Total Cost                     | 62,570,552 | 65,310,58°         | \$       |
| Average Cost                   | 7,821      | 8,164              | \$/LRU   |
| Average Availability           | 0.997      | 0.997              | 7        |
| Average Repair Time            | 33.9       | 34.3               | days/LRU |

The following conclusions can be made from Test B in which LRUs experienced the thermal cycling profiles of Case 2: There was a 1.87% increase in the number of failures, a 4.38% increase in cost, no change in availability, and a 1.33% increase in repair time by using SAC solder.

# 3.2.3 Test C Results

Test C, included a comparison between SnPb and SAC experiencing thermal cycle parameters defined in Case 3, Table 3.14.

Table 3.14: Parameters for Thermal Cycling Case 3

| Case # | Max Temp (°C) | Min Temp (°C) | Avg Temp (°C) | Dwell Time (min) |
|--------|---------------|---------------|---------------|------------------|
| 3      | 130.0         | 0.0           | 65.0          | 0.1              |

LRUs in Test C experienced the baseline introduction, Figure 3.1, and were repaired using the baseline NSWC Crane repair process.

Figures 3.25 through 3.27 represent the distributions of metrics, i.e., repair cost, availability, and repair time respectively.



Figure 3.25: Histogram Comparing Repair Cost for SnPb and SAC, Test C



Figure 3.26: Histogram Comparing Availability for SnPb and SAC, Test C



Figure 3.27: Histogram Comparing Repair Time for SnPb and SAC, Test C

Table 3.15 displays the average metrics of Test B for SnPb and SAC solder.

Table 3.15: Test C Metrics

|                                | SnPb       | SAC        |          |
|--------------------------------|------------|------------|----------|
| Total Number of Failures       | 8183       | 31928      |          |
| Average Number of Failures/LRU | 1.0229     | 3.9910     |          |
| Total Cost                     | 13,937,054 | 72,910,310 | \$       |
| Average Cost                   | 1,742      | 9,114      | \$/LRU   |
| Average Availability           | 0.999      | 0.997      |          |
| Average Repair Time            | 9.6        | 35.6       | days/LRU |

The following conclusions can be made from Test B in which LRUs experienced the thermal cycling parameters of Case 3: There was a 290.17% increase in the number of failures, a 423.14% increase in cost, 0.23% decrease in availability and a 272.29% increase in repair time by using SAC solder.

## 3.2.4 Test D Results

Test D, included a comparison between SAC post repair reliabilities modeled "as good as new", and SAC post repair reliabilities modeled as "not good as new" ( 20% reduction) which experiences thermal cycle parameters defined in Case 3, Table 3.14.

LRUs in Test D experienced the baseline introduction defined in Figure 3.1 and were repaired using the baseline NSWC Crane repair process.

Figures 3.28 though 3.30 represent the distributions of metrics repair cost, availability, and repair time respectively.



Figure 3.28: Histogram Comparing Repair Cost for Baseline and 20% Reduced Post-repair Reliability, Test D



Figure 3.29: Histogram Comparing Availability for Baseline and 20% Reduced Post-repair Reliability, Test D



Figure 3.30: Histogram Comparing Repair Time for Baseline and 20% Reduced Post-repair Reliability, Test D

**Table 3.16: Test D Metrics** 

|                                | P              | Post Repair Reliability         |          |  |
|--------------------------------|----------------|---------------------------------|----------|--|
|                                | As Good As New | Not Good As New (20% Reduction) |          |  |
| Total Number of Failures       | 31928          | 32605                           |          |  |
| Average Number of Failures/LRU | 3.9910         | 4.0756                          |          |  |
| Total Cost                     | 72,910,310     | 75,491,204                      | \$       |  |
| Average Cost                   | 9,114          | 9,436                           | \$/LRU   |  |
| Average Availability           | 0.997          | 0.997                           |          |  |
| Average Repair Time            | 35.6           | 36.5                            | days/LRU |  |

The following conclusions can be made from Test D in which LRUs experienced the thermal cycling parameters of Case 3: There was a 2.12% increase in the number of failures, a 3.54% increase in cost, a 0.01% decrease in availability, and a 2.76% increase in repair time by reducing the post repair reliabilities by 20%.

# 3.2.5 Test E Results

Test E, included a comparison between increased fielding rates, Figure 3.2 and Figure 3.3 in Section 3.1.1 LRU Introduction and Retirement Schedules.

LRUs in Test E experienced thermal cycle parameters defined in Case 3, Table 3.14, and were repaired using the baseline NSWC Crane aviation repair process.

Figures 3.31 though 3.33 represent the distributions of metrics repair cost, availability, and repair time respectively.



Figure 3.31: Histogram Comparing Repair Cost for Baseline, Medium and Fast Fielding Rates, Test E



Figure 3.32: Histogram Comparing Availability for Baseline, Medium and Fast Fielding Rates, Test E



Figure 3.33: Histogram Comparing Repair Time for Baseline, Medium and Fast Fielding Rates, Test E

**Table 3.17: Case E Metrics** 

|                                | Baseline   | Medium     | Fast         |          |
|--------------------------------|------------|------------|--------------|----------|
| Total Number of Failures       | 31928      | 31927      | 31935        |          |
| Average Number of Failures/LRU | 3.9910     | 3.9909     | 3.9919       |          |
| Total Cost                     | 72,910,310 | 75,889,276 | \$80,406,305 | \$       |
| Average Cost                   | 9,114      | 9,486      | \$10,051     | \$/LRU   |
| Average Availability           | 0.997      | 0.997      | 0.997        |          |
| Average Repair Time            | 35.6       | 35.4       | 35.7         | days/LRU |

The following conclusions can be made from Test E in which LRUs experience the thermal cycling parameters of Case 3: When comparing the baseline introduction rate to the medium introduction rate, there was no change in the number of failures, a 4.09% increase in cost, no change in availability and a 0.46% decrease in repair time by using SAC solder. When comparing the baseline introduction rate to the fast introduction rate, there was a 0.02% decrease in the number of failures, a 10.28% increase in cost, no change in availability and a 0.31% increase in repair time by using SAC solder.

## 3.2.6 Test F Results

Test F, included a comparison between the NSWC Crane repair process at full and half capacity. Steps 6 through 30 in Table 3.9, non-administrative steps, were affected by reducing the capacity.

LRUs in Test F experienced thermal cycle parameters defined in Case 3 and the fast introduction rate displayed in Figure 3.3.

Figures 3.34 though 3.36 represent the distributions of metrics repair cost, availability, and repair time respectively.



Figure 3.34: Histogram Comparing Repair Cost for Baseline and Reduced Capacity Process Steps, Test F



Figure 3.35: Histogram Comparing Availability for Baseline and Reduced Capacity Process Steps, Test F



Figure 3.36: Histogram Comparing Repair Time for Baseline and Reduced Capacity Process Steps, Test F

**Table 3.18: Case F Metrics** 

|                                | Re            |                             |          |  |
|--------------------------------|---------------|-----------------------------|----------|--|
|                                | Full Capacity | Full Capacity Half Capacity |          |  |
| Total Number of Failures       | 31935         | 31920                       |          |  |
| Average Number of Failures/LRU | 3.9919        | 3.9900                      |          |  |
| Total Cost                     | 80,406,305    | 80,294,067                  | \$       |  |
| Average Cost                   | 10,051        | 10,037                      | \$/LRU   |  |
| Average Availability           | 0.997         | 0.997                       |          |  |
| Average Repair Time            | 35.7          | 36.6                        | days/LRU |  |

The following conclusions can be made from Test F in which LRUs experience the thermal cycling parameters of Case 3: There was a 0.05% increase in the number of failures, a 0.14% decrease in cost, a 0.40% decrease in availability and a 2.57% increase in repair time by decreasing the repair process capacity by half.

## 3.2.7 Test G Results

Test G, included a comparison of increasing time step sizes from the base GCD time step to a 100 hour time step on the accuracy of the model.

LRUs in Test G experienced thermal cycle parameters defined in Case 3, baseline introduction rate displayed in Figure 3.1 and were repaired using the baseline NSWC Crane repair process.

Histograms were omitted for this test due to a large variance in the data. This variance is explained by the fact that increasing time step size larger than the GCD adds extra time to each process step. See Section 2.3.6 Time Step Selection and Management for further explanation on time step taxonomy.

Table 3.19: Case G Metrics

|                                | Time Step Size |            |          |          |          |
|--------------------------------|----------------|------------|----------|----------|----------|
|                                | LCD            | 1 Hour     | 10 Hour  | 100 Hour |          |
| Total Number of Failures       | 31928          | 31899      | 22481    | 10380    |          |
| Average Number of Failures/LRU | 3.9910         | 3.9874     | 2.8101   | 1.2975   |          |
| Total Cost                     | 72,910,310     | 72,743,240 | 50917543 | 17828985 | \$       |
| Average Cost                   | 9,114          | 9,093      | 6365     | 2229     | \$/LRU   |
| Average Availability           | 0.997          | 0.993      | 0.721    | 0.459    |          |
| Average Repair Time            | 35.6           | 79.1       | 3057.1   | 5923.8   | days/LRU |

The following conclusions can be made from Test G in which LRUs experience the thermal cycling parameters of Case 3: When comparing the GCD to 1 hour time step size, there was a 0.09% increase in the number of failures, a 0.23% increase in cost, 0.40% decrease in availability and a 122.38% increase in repair time by using SAC solder. When comparing the GCD to 10 hour time step size, there was a 29.59% decrease in the number of failures, a 30.16% decrease in cost, 27.68% decrease in availability and a 8495.49% increase in repair time by using SAC solder. When comparing the GCD to 100 hour time step size, there was a 67.49% decrease in the number of failures, a 75.55% decrease in cost, 53.95% decrease in availability and a 16555.82% increase in repair time by using SAC solder.

# 3.2.7 Test H Results

Test H, included a comparison of prioritized versus un-prioritized LRUs. For the case of prioritized LRUS, half of the population was marked as "urgent" priority, and the other half "low" priority.

LRUs in Test H experienced thermal cycle parameters defined in Case 3, baseline introduction rate displayed in Figure 3.1 and were repaired using the baseline NSWC Crane repair process.

Figures 3.37 though 3.39 represent the distributions of metrics repair cost, availability, and repair time respectively.



Figure 3.37: Histogram Comparing Repair Cost for Prioritized and Un-prioritized LRUs, Test H



Figure 3.38: Histogram Comparing Availability for Prioritized and Un-prioritized LRUs, Test H



Figure 3.39: Histogram Comparing Repair Time for Prioritized and Un-prioritized LRUs, Test H

Table 3.20: Case H Metrics

|                                |                | Priority Constraints       |          |  |
|--------------------------------|----------------|----------------------------|----------|--|
|                                | Un-Prioritized | Un-Prioritized Prioritized |          |  |
| Total Number of Failures       | 31928          | 31940                      |          |  |
| Average Number of Failures/LRU | 3.9910         | 3.9925                     |          |  |
| Total Cost                     | 72,910,310     | 73,041,918                 | \$       |  |
| Average Cost                   | 9,114          | 9,130                      | \$/LRU   |  |
| Average Availability           | 0.997          | 0.997                      |          |  |
| Average Repair Time            | 35.6           | 35.4                       | days/LRU |  |

The following conclusions can be made from Test H in which LRUs experience the thermal cycling parameters of Case 3: There was a 0.04% increase in the number of failures, a 0.18% increase in cost, no change in availability and a 0.54% decrease in repair time by prioritizing LRUs.

# 3.2.8 Test I Results

Test I, included a comparison of the baseline LRU with single package instances with a LRU with two package instances.

LRUs in Test I experienced thermal cycle parameters defined in Case 3, baseline introduction rate displayed in Figure 3.1 and were repaired using the baseline NSWC Crane repair process.

Figures 3.40 though 3.42 represent the distributions of metrics repair cost, availability, and repair time respectably.



Figure 3.40: Histogram Comparing Repair Cost for Single and Double Package Instance LRUs, Test I



Figure 3.41: Histogram Comparing Availability for Single and Double Package Instance LRUs, Test I



Figure 3.42: Histogram Comparing Repair Time for Single and Double Package Instance LRUs, Test I

**Table 3.21: Case I Metrics** 

|                                |                 | Package Instances |          |  |
|--------------------------------|-----------------|-------------------|----------|--|
|                                | Single Instance | Double Instance   |          |  |
| Total Number of Failures       | 31928           | 56214             |          |  |
| Average Number of Failures/LRU | 3.9910          | 7.0268            |          |  |
| Total Cost                     | 72,910,310      | 132,453,556       | \$       |  |
| Average Cost                   | 9,114           | 16,557            | \$/LRU   |  |
| Average Availability           | 0.997           | 0.994             |          |  |
| Average Repair Time            | 35.6            | 62.9              | days/LRU |  |

The following conclusions can be made from Test I in which LRUs experience the thermal cycling parameters of Case 3: There was a 76.06% increase in the number of failures, a 81.67% decrease in cost, a 0.25% decrease in availability, and a 76.92% decrease in repair time by prioritizing LRUs.

# Chapter 4: Conclusions

# 4.1 Conclusions

Tests conducted in Chapter 3 quantified the impact of varying solder type, post-repair reliability, fielding rate, step capacity, time step size, priority, and number of package instances had on the average number of failures, cost, availability and repair time. Table 4.1, lists the impact of the variable in concern in terms of the percent difference from the baseline. A positive percent represents an increase from the baseline while a negative percent represents a decrease from the baseline.

Table 4.1: Case Study Results, Tests A-I, Percent Differences

|                             | Test A  | Test B | Test C  |
|-----------------------------|---------|--------|---------|
| Comparison Variable         | Solder  | Solder | Solder  |
| Avg. Number of Failures/LRU | -27.71% | 1.87%  | 290.17% |
| Avg. Cost                   | -32.45% | 4.38%  | 423.14% |
| Avg. Availability           | 0.08%   | 0.00%  | 0.23%   |
| Avg. Repair Time            | -26.68% | 1.33%  | 272.29% |

|                             | Test D         | Test E - Fielding Rate |               |
|-----------------------------|----------------|------------------------|---------------|
| Comparison Variable         | Post Rep. Rel. | Baseline-Med.          | Baseline-Fast |
| Avg. Number of Failures/LRU | 2.12%          | 0.00%                  | 0.02%         |
| Avg. Cost                   | 3.54%          | 4.09%                  | 10.28%        |
| Avg. Availability           | -0.01%         | 0.00%                  | 0.00%         |
| Avg. Repair Time            | 2.76%          | -0.46%                 | 0.31%         |

|                             | Test G - Time Step Size |          |           |
|-----------------------------|-------------------------|----------|-----------|
| Comparison Variable         | LCD-1hr                 | LCD-10hr | LCD-100hr |
| Avg. Number of Failures/LRU | -0.09%                  | -29.59%  | -67.49%   |
| Avg. Cost                   | -0.23%                  | -30.16%  | -75.55%   |
| Avg. Availability           | -0.40%                  | -27.68%  | -53.95%   |
| Avg. Repair Time            | 122.38%                 | 8495.49% | 16555.82% |

|                             | Test F        | Test H   | Test I    |
|-----------------------------|---------------|----------|-----------|
| Comparison Variable         | Step Capacity | Priority | Instances |
| Avg. Number of Failures/LRU | -0.05%        | 0.04%    | 76.06%    |
| Avg. Cost                   | -0.14%        | 0.18%    | 81.67%    |
| Avg. Availability           | -0.01%        | 0.00%    | -0.25%    |
| Avg. Repair Time            | 2.57%         | -0.54%   | 76.92%    |

#### 4.2 Contributions

In this thesis, a model has been developed that is capable of quantifying the impact of the tin-lead to lead-free electronics conversion in terms of repair cost and LRU availability. Tradeoffs have been made based on solder composition and thermal cycling profiles. The contributions of this research include the following:

- The first documented trade-off analysis conducted on repair cost and availability impacts for SnPb and SAC assemblies.
- Using the trade-off analysis it was determined that:
  - For applications experiencing long dwell times, low mean and maximum temperature thermal cycles the use of SAC solder decreased the number of LRU failures but had no impact on LRU availability.
  - For applications experiencing short dwell times, high mean and maximum temperature thermal cycles the use of SAC solder compared to tin-lead solder increased the number of LRU failures but had no impact of LRU availability.
- Development of an automated lead-free dynamic simulation model with the following unique capabilities (not in other repair simulation models):
  - o Models LRUs which are "early retired".
  - o Models no fault founds.
  - Specification of a repair process that is specific to a failure mechanism and package type.
  - o Inclusion of both time and cycle based failure mechanism distribution parameters.

- o Use of a non-Poisson method of determining package failures over time.
  - A failure is determined by the sampling of multiple Weibull distributions. This includes multiple failure mechanisms (thermal, corrosion or vibration) and multiple instances of the same distribution, sampled independently of its predecessor, to mimic multiple instances of a package.
  - Post repair reliabilities that can be different than the original reliabilities.
- o Both a time step and event stepped method of advancing time.
- Tracks individual LRUs. This gives the model the ability to track the metrics of cost, availability and repair time versus time.
- o Prioritizing LRUs in repair based on their level of mission criticality.

## 4.3 Future Work

## 4.3.1 Throwaway Applications

The high rate of technology change that characterizes electronic parts, subsystems and software has made the vast majority of electronic products disposable commodities. After all, who would ever consider repairing a flash memory stick, if it fails it is simply replaced. The idea of disposable (or throwaway) electronics is accepted for consumer products, and as a result the supply chain that supports these products is driven by it. However, a disposable electronics policy at the assembly level would represent a considerable departure from common wisdom for the aerospace industry (e.g., avionics and military electronics). Aerospace adopted an assembly-level repair maintenance culture for a variety of reasons that include technical, business, contractual and legal.

However, by doing so they adopted a "culture" (policy) that is orthogonal to the underlying assumptions that their COTS supply chain is based on, thus creating a host of unique (and ultimately very expensive) problems for themselves. It is not out of the question to argue that a significant fraction of the resources expended to manage obsolescence, counterfeit parts risk, lead-free/tin-lead mixing, and configuration control problems would be avoidable in a disposable electronics culture [System Analysis Division, 1984].

The simulator developed in this thesis could be used as the basis for a tradeoff model for electronic systems that allows an assessment of the practicality of treating a module as a throwaway or disposable item.

# 4.3.2 Process Step Durations

Currently the model defines the duration of each process step as a fixed value, i.e., the duration stays constant for all LRUs that enter this step. However, in actual repair processes such as the NSWC Crane Aviation Repair Process used in Chapter 3, process step durations are variable and should be represented as probability distributions. Process step durations are also impacted by another issue that is not addressed in this thesis - in some cases a bottle neck in the repair process occurs when the time spent in the step increases due to the reduced availability of replacement parts or other resources (e.g., when a part becomes obsolete, or when a lifetime buy runs out). In order to model variable step durations the process step durations could be represented as a distribution. With this methodology, the distribution of time step durations would be varied from time step to time step allowing for the variability and uncertainty of part availability. The

variable repair process step is capable of adjusting the duration of the process steps affected by limited replacement parts and resources.

# 4.3.3 Multiple Instances of a Package Type on a Test LRU

While Test I in Chapter 3 studies the impact of doubling package instances on the overall model metrics, it fails to be application specific. Future development of the LRU to mimic a real world application would allow engineers to make more realistic tradeoffs from the model's output metrics. LRUs could be modeled to include increased numbers of packages.

## 4.3.4 Multiple Failures on the Same Date

A special case exists in the simulation when two or more reliability distributions for a LRU share the same sampled TTF date. Currently the simulation processes multiple failure dates that share the same date as a single LRU failure. Future work could be done to model multiple simultaneous failures differently than a single failure, or multiple failures. Certain steps in the repair process are common to the LRU and not specific to the package that failed. Due to that fact, two packages that failed on the same date are not equally expensive as two failures that failed on separate dates. Many of the process steps such as packaging and shipping can be combined to reduce the cost of multiple repairs.

### 4.3.5 Vibration Failure Mechanism

In addition to the thermal failure mechanisms introduced in Chapter 3, vibration failure mechanisms could be introduced specific to the boards being studied. The

accuracy of vibration failure mechanisms depend on the board dimensions and the location of the particular packages.

## 4.3.6 Maintenance Data Integration

The simulation developed in this thesis depends on quantitative reliability information in the form of either: a) reliability distributions (in units corresponding to environmental stress history, i.e., operational hours, thermal cycles, etc.), and/or b) repair experience (data from actual repair processes that describes the mixture of problems "resolved). The future of this simulator will be the integration of real maintenance data transitioning from the past "bottoms-up" approach to the more realistic "top-down" approach.

## 4.3.7 Continuation of Damage During the Repair Process

The current simulation taxonomy models the time spent in the repair process as a continuation of the TTF, i.e. the addition of damage to the LRU during repair. While some failure mechanisms continue to add damage during the repair process, many stop while they are in the repair process. In the future the simulation could accommodate the option to specify whether or not a failure mechanism adds damage during repair.

# Appendix A – Simulation Details

This Appendix provides documentation regarding the operation of the Lead-Free Dynamic Simulation (LFDS), a java based implementation of the repair model described in Chapter 2. Screenshots of each of the control tabs are presented, with corresponding explanations of each.

Multiple steps have been taken in the development of the software. Figure A.1 visualizes the process from developing the repair model to implementation into industry.



Figure A.1: Progression of a Modeling to Implementation

Figure A.2, Tab (1) Welcome, is a welcome screen and the first thing the user sees upon executing the software. It contains a condensed version of the model LRU flow from fielding to end of support.



Figure A.2: Tab (1), Welcome

Figure A.3, Tab (2) Reliability Models, provides the user the capability to define one or more failure mechanisms with numerous probability distributions. Distribution types for use include uniform, triangular, weibull, normal, lognormal, and exponential distributions. Post-repair distributions can be specified under the column of "Post-Repair TTF". Mechanisms can be included or excluded from a run by changing the "Yes" in the "Include?" column to "No". This feature allows the user to create a library or failure mechanisms, and run the simulation for only the mechanism in concern without having to reenter it into the data structure.



Figure A.3: Tab (2) Reliability Models

By clicking on one of the cells with the text "ADDED", the Figure A.3 is displayed, providing a user interface to specify the distribution type and parameters.



**Figure A.3: Distribution Input Window** 

In Figure A.4, Tab (3) LRU Specific Inputs, the user can define LRU entrance dates, quantities, end of service dates (EOS), priority level, and whether or not to include priority in the simulation. Four priority levels currently exist in the model: 1) Urgent, 2) High, 3) Medium and 4) Low.



Figure A.4: Tab (3) LRU Specific Inputs

In Figure A.5, Tab (4) Process Specific Inputs, the user defines the Repair Process Flow by defining the step name, duration of the step in hours, cost of the step, capacity of the step, and whether or not it is a branched step. Branching allows for the routing of LRUs with failure models that have specific repair processes. Adding "XX" before the step name gives the step the ability to early retire (or throwaway) the LRU. Adding "YY" before the step name gives the step the ability to categorize LRUs as NFF. By clicking on the step name, a window similar to Figure A.3 is displayed, giving the user

the ability to specify a fixed percent or distribution of LRUs to be retired early or found as NFF.



Figure A.5: Tab (4) Process Specific Inputs

In Figure A.6, Tab (5) Runtime Outputs, the user has the ability to run, pause or stop and reset the simulation by clicking on the buttons below. Additional functions exist when the run button is clicked and the window in Figure A.7 is displayed. The computation choice window gives the user four different abilities to run the simulation. The first is to run the simulation and plot the LRU quantities. This is explained in the "Quantity Plot" section of the Appendix A. The second choice is run the simulation and to export the average metrics of cost, availability and repair time versus the calendar date to an excel file. The third option is to run the simulation and to display an animation of LRU quantities in the repair process. This is explained in the "Repair Process Step

Animation" section of Appendix A. The fourth and final option is to run the simulation with no additional outputs.



Figure A.6: Tab (5) Runtime Outputs



Figure A.7, Computational Choice Window

Tab (6) Cumulative Metrics Output, which is displayed in Figure A.8, provides the output metrics of the simulation. Metrics include totals for the number of failures, repair cost, average values for number of failures per LRU, availability, and repair time and distributions of individual LRU cost, availability, and repair time. A histogram of the distributions can be generated by clicking on the corresponding "Plot Dist" button to the left the metric. Examples of each of the distributions are given in Figures A.9 to A.11.



Figure A.8: Tab (6) Cumulative Metrics Output



Figure A.9: Distribution of Repair Cost



Figure A.10: Distribution of LRU Availability



Figure A.11: Distribution of Repair Time

Tab (7) Solution Control, displayed in Figure A.12, provides background control of simulation taxonomy, the addition of default inputs, and the ability to load and save run data.



Figure A.12: Tab (7) Solution Control

The specific simulation taxonomy that can be control from Tab (7) Solution Control, is displayed in Figure A.13. Here the user can control time step operation, the

discount rate of money, the base year for net present value calculations, a pause for testing purposes and the option to refresh the text fields at each time step.



**Figure A.13 Solution Control Details** 

## **Quantity Plot**

In order to visually understand trends within the model, a quantity plot, Figure A.14 can be used to depict changing quantities of LRUs over time. The horizontal axis represents the simulation time, the earliest introduction date and the latest end of support (retirement) date. The vertical axis represents the quantity of LRUs. Quantities tracked in model over time include the total number of LRUs manufactured, LRUs in the field, LRUs in the repair facility, LRUs retired, and LRU spares. The line color corresponds to the type of quantity being tracked.



Figure A.14: Quantity Plot

## Repair Process Step Animation

When LRU populations fail and enter the repair process, it is often interesting to track their progress through each repair step. Figure A.15 represents the animation window generated by the model. The text on the left of Figure A.15 is the name of the process step, with the quantity of LRUs waiting to be processed to the right. To the right of the process step name is the quantity of LRUs represented by expanding or contracting colored bars. The colors correspond to the LRU priorities.

The repair process step animation is useful to identify where a bottleneck may occur within a repair process and to visually understand the flow of prioritized processed and FIFO processed LRUs.



**Figure A.15: Process Flow Animation** 

Appendix B – calceFAST Failure Mechanism Reference

Appendix B provides a sample of the documentation provided from calceFAST

on each of the specific failure mechanisms used in the sampling section of Chapter 3.

For greater explanation of the failure models see the following three references:

Osterman, (2002) "Explanation of the 1st Order Thermal Fatigue Model for Solder Interconnects in Leaded (Gullwing and J-Lead) and Leadless Packages" CALCE EPSC.

Osterman, (2002) "Explanation of the 1st Order Thermal Fatigue Model for Solder

Interconnects in Area Array Packages" CALCE EPSC.

Osterman, (2001) "Explanation of the 1st Order Thermal Fatigue Model for 1st Order

Thermal Fatigue Model for Leadless Packages" CALCE EPSC.

First Order Thermal Fatigue Model For Leadless Packages

Failure occurs at the solder joint of an electrical interconnect between the package

and PWB.

**Mechanism:** Fatigue

**Results**: CTF **Description** 

The model is suitable for leadless chip carriers. The user may need to modify the

calibration constant to obtain more accurate results. Calculates median cycles to failure

in solder joint modeled as a simple pillar subjected only to in-plane deformation using

calculated average shear strain.

103

## Glossary

AHP – Aerospace and high performance

BGA – Ball grid array

BOM – Bill of materials

CGA – Column grid array

COTS – Commercial off the shelf

CTBGA – Chip Array Thin Core Ball Grid Array

EEE – Electrical and electronic equipment

FIFO – First in, first out

LCC – Leadless chip carrier

GCD – Greatest common divisor

Lead-free – Solder in which the content of the element lead is <0.1% lead by weight

Legacy system – an existing system that was produced with tin-lead solder

LFDS – Lead-free dynamic simulator

LRU – Line replaceable unit

NFF – No fault found

PBB – Polybrominated biphenyls

PBDE – Polybrominated diphenyl ethers

PCB – Printed circuit board

RoHS – Restriction on Hazardous Substances (Directive 2002/95/EC of the European Parliament and of the Council of 27 January 2003 on the restriction of the use of certain hazardous substances in electrical and electronic equipment.

SAC 305 – Lead-free solder composed of Sn-3.0Ag-0.5Cu

SnPb - See "Tin-lead"

SRA – Shop replaceable assembly

SRU – Shop replaceable unit

Tin-lead – Solder bearing the elements tin and lead, respectively, in the by weight amounts of 63-37 unless otherwise specified.

TTF – Time to failure

WRA – Weapon replaceable assembly

## References

Al-Momanl, E. and Mellunas, M. (2008) Lead-free Thermal Cycle Progress, Unovis.

Birta, L. G., and G. Arbez. (2007) *Modeling and Simulation Exploring Dynamic System Behavior*. New York: Springer.

Boxma, O., and R. Syski. (1988) *Queuing theory and its applications*. Amsterdam: North-Holland, Sole distributors for U.S.A. and Canada, Elsevier Science.

CalceFAST. (2005) Vers. 5.0. College Park, MD: CALCE EPSC.

Casey, P., and M. Pecht. (2002) "Challenges for Adopting Pb-Free Interconnects for "Green" Electronics." *Proc. of IPC/JEDEC International Conference on Lead-Free Electronic Components and Assemblies*, Taiwan, Taipei. 21-32.

Cassidy, C. (2007) Cambridge Dictionary of American English. New York: Cambridge UP, 2007.

Ciocci, R., and M. Pecht. (2006) "Learning from the Migration to Lead-free Solder." *Solder & Surface Mount Technology* 18.3.

Dasgupta, A., and M. Pecht. (1991) "Material Failure Mechanism and Damage Models." *IEEE Transactions on Reliability* 40.5.

Diaz, A., and M. Fu. (1997) "Models for multi-echelon repairable item inventory systems with limited repair capacity." *European Journal of Operational Research*: 480-92.

European Union, (2002/95/EC), "Directive 2002/96/EC of the European Parliament and of the Council of 27 January 2003 on Waste of Electrical and Electronic Equipment," Official Journal of the European Union, pp. L37/24-L37/38

European Union, (2002/96/EC), "Directive 2002/95/EC of the European Parliament and of the Council of 27 January 2003 on the Restriction of the Use of Certain Hazardous Substances in Electrical and Electronic Equipment," Official Journal of the European Union, pp. L37/19-L37/23

Eveloy, V., Y. Fukuda, S. Ganesan, J. Wu, and M. Pecht. (2005) "Key Concerns in the Assembly of Lead-free Electronics." IMAPS Taiwan. Proc. of International Technical Symposium. 167-83.

Everhart, L., McCluskey, P., Hansen, P., Vrignaud, C. Lewandowski, P., and Ramminger, S., "PoF Reliability Assessment of an Engine Control Unit," *Proc. of 2<sup>nd</sup> Int'l SIA Conf. on Automotive Power Electronics*, Paris, France, Sept. 2007.

Fishman, G. (1973) Concepts and methods in discrete event digital simulation. New York: Wiley.

Fishman, G. (2001) Discrete-event simulation modeling, programming, and analysis. New York: Springer.

Fishman, G. (1978) Principles of discrete event simulation. New York: Wiley.

Fishwick, P. A. (1995) Simulation model design and execution building digital worlds. Englewood Cliffs, N.J: Prentice Hall.

Ganesan, S., and Pecht, M. (2004), "Lead-free Electronics," CALCE EPSC Press, College Park.

Ganesan, S., J. Wu, M. Pecht, R. Lee, J. Lo, Y. Fu, Y. Li, and M. Xu. (2005) "Assessment of Long-term Reliability in Lead-free Assemblies." *Proc. of International Conference on Asian Green Electronics*.

Ghosh, Sumit, and Tony Lee. (2000) *Modeling and Asynchronous Distributed Simulation Analyzing Complex Systems*. New York: Wiley-IEEE.

GEIA-STD-0005-3 Draft 50, (2008).

Graves (1985), "A multi echelon inventory model for a repairiable item with one-for-one replenishment", *Management Science* 31, 1247-1256

Guide, D., and R. Srivastava. (1997) "Repairable inventory theory: Models and applications." *European Journal of Operational research* 102: 1-20.

Hillman, C. (2006) "What I Don't Know That I Don't Know: Things to Worry About with the Pb-free Transition." *Hobbs Engineering Newsletter*.

Kendall, D. (1953) Stochastic processes occurring in the theory of queues and their analysis by the method of the embedded Markov chain. An. Math. Statist., 24, 338-354

Kennedy, W., J. Patterson, and L. Fredendall. (2002) "An overview of recent literature on spare parts inventories." *International Journal of Production Economics* 76: 201-15.

Lajos, T. Introduction to the theory of queues. (1982) Westport, Conn: Greenwood.

Ma, J., G. Chen, X. Li, T. Tanaka, and J. Tang. (2006) "Study of New Types Lead-Free Solder Alloys of Sn-Ag-Cu-Al-Ni and Sn-Zn-Bi-In-P." *Proc. of 7th International Conference on Electronics Packaging Technology*.

McCluskey, P., Hansen, P., Lenakakis, C. and Wondrak, W., "Virtual Qualification Using Field Life Temperature Data," *Proc. of ASME InterPACK 2009*, San Francisco, CA, July 2009. IPACK-2009-89184.

Naval Air Systems Command. *Standard Maintenance Practices Miniature/Microminiature (2M) Electronic Assembly Repair*. 2006.

Nie, L., M. Pecht, and R. Ciocci. (2007) "Regulations and market trends in lead-free and halogen-free electronics." *Circuit World* 33.2: 4-9.

Osterman, (2002) "Explanation of the 1st Order Thermal Fatigue Model for Solder Interconnects in Leaded (Gullwing and J-Lead) and Leadless Packages" CALCE EPSC.

Osterman, (2002) "Explanation of the 1st Order Thermal Fatigue Model for Solder Interconnects in Area Array Packages" CALCE EPSC.

Osterman, (2001) "Explanation of the 1st Order Thermal Fatigue Model for 1st Order Thermal Fatigue Model for Leadless Packages" CALCE EPSC.

Ozekici, S. (1990) Queueing theory and applications. New York: Hemisphere Pub. Corp.

Pecht, M., Fukuda, Y., and Subramanian, R., (2005) "The Impact of Lead-free Legislation Exemptions on the Electronics Industry," accepted for publication in the *IEEE Transactions on Electronics Packaging Manufacturing*, Vol. 27, No. 4.

Rappold, J., and B. Van Roo. (2009) "Designing multi-echelon service parts networks with finite repair capacity." *European Journal of Operational Research* 199: 781-92.

Russell, B., D. Fritz, and G. Latta. (2007) "Methodology for Evaluating Data For "Reverse Compatibility" of Solder Joints." *Proc. SMTA International*.

Russell, B., D. Fritz, and J. Tucker. (2008) "Methodology for Evaluating Data For "Reverse Compatibility" of Solder Joints II." *Proc. of SMTA International*.

Schriber, T., and D. Brunner. (1997) "Inside Discrete-Event Simulation Software: How It Works and Why It Matters." *Winter Simulation Conference*.

Sherbrooke, C. (1968), "METRIC: A multi-echelon technique for recoverable item control", *Operations Research* 16, 122-141.

Sleptchenko, A., M. Van der Heijden, and A. Van Harten. (2002) "Effects of finite repair capacity in multi-echelon, multi-indenture service part supply systems." *International Journal of Production Economics* 79: 209-30.

System Analysis Division, Directorate for Plans and Analysis, US Army Missile Command, Redstone Arsenal, Alabama 35898, Discard/Repair Cost Model: Repair versus Throwaway; Printed Circuit Cards and Modules, June, 1984

U.S. Department of Commerce (2009), *U.S. Mission to the European Union*, U.S. Commercial Service, www.buyusa.gov/europeanunion/rohs\_faq.html

Weibull++. (2003) Vers. 6. Reliasoft

Zhu, F., Z. Wang, R. Guan, and H. Zhang. (2005) "Mechanical Properties of a Lead-Free Solder Alloys." *Proc. of International Conference on Asian Green Electronics*.