OUR PRODUCTS: TECHNICAL PAPERS Triant
sales@triant.com
(604) 697-5090

Abstract
Introduction
Universal Process Modeling
Summary
Acknowledgments
References

The following paper was presented at the SEMATECH AEC/APC Workshop VII, November 5-8, 1995 in New Orleans, Louisiana.

Using UPM for Real-Time Multivariate Modeling
of Semiconductor Manufacturing Equipment

Paul J. O'Sullivan
Triant, Nanaimo, BC

Jimmy Martinez, James Durham, Steve Felker,
Motorola Inc., Semiconductor Products Sector, Mesa, AZ

This paper was presented at the SEMATECH AEC/APC Workshop VII, November 5-8, 1995, New Orleans, Louisiana


Abstract  -  Top of Page

Some wafer defects are not detected until the end of the manufacturing process. It is reasonable to assume, however, that the health of a piece of equipment is an indicator of the quality of the product it produces, and that adverse changes in health have a negative impact on product quality. Therefore, by monitoring equipment health, it is possible to immediately detect a misprocessed wafer and take corrective action.

Using a new and unique multivariate modeling technique called Universal Process Modeling (UPM), it is possible to accurately determine the health of a piece of equipment in real-time. Equipment-related problems are revealed when the equipment health falls outside a statistically-derived limit. This limit is calculated from in-process data when the equipment is healthy and operating normally. Insight into the nature of the problem may be gained by examining those process variables which have deviated the most from their expected values. This paper discusses the features and benefits of UPM and describes its application in a wafer etch facility.


Introduction  -  Top of Page

The Challenge in IC Manufacturing


According to the World Semiconductor Trade Statistics (WSTS), the 1995 global semiconductor market is expected to grow 39.7% over 1994. While sales of semiconductors are extremely healthy, they are limited by capacity constraints. The escalating demand for personal computers, communication devices, and automotive and consumer electronics are pushing the need for greater and greater chip capacity. At the end of 1994, the worldwide wafer capacity was 7.8 million wafers per month.

While sales are booming, there are great challenges facing the semiconductor industry. Lost capacity due to equipment downtime, scrapped wafers, and test and quality assurance activities can cost a company tens of millions of dollars per quarter.

In addition, the chip making process is becoming more and more complex as geometries shrink. For example, the 0.25 (m generation of chips is expected to have 4-5 levels of metal, 20-22 mask levels, and 353 process steps. Smaller geometries ultimately mean more dice per wafer, and with 300 mm wafers on the horizon, the investment required to bring all this together is staggering. With the cost of building a new fab currently in the region of $1B, there is an ever growing demand to shorten the time between "breaking ground" on the fab and delivering production wafers. There is little time for production problems. The cost of production problems becomes even higher with larger wafers and more dice per wafer, as greater value is added to each wafer as it is processed.

What is clear is that semiconductor manufacturers, equipment suppliers, and research organizations will have to develop innovative solutions to solve the technical challenges associated with smaller geometries and larger wafers, and to maximize the use of existing equipment and fabs.


The Need for Advanced (Statistical) Process Control

According to a multi-year study from Berkeley, the final report of which is due this fall, a fab must excel in four basic types of practices to realize excellent manufacturing performance and to provide a competitive advantage:

  • A fab must have computer systems that provide strong process control, excellent data collection and excellent data analysis capabilities. A fab must be able to expeditiously pinpoint the causes of yield loss and wafer throughput loss
  • A fab must have an organization that not only carries out the manufacturing process, but is also very good at problem recognition and problem solving.
  • A fab must have the technical talent and vendor support to quickly make modifications to product, process, and equipment.
  • A fab must have effective procedures for managing the introduction of new process flows.

Advanced (statistical) process control (A(S)PC) helps a semiconductor manufacturer realize the four practices outlined by the Berkeley study. It will be required to meet the challenges of smaller geometries and larger wafers planned for the next generation of fabs. In these new fabs, process control requirements will be more stringent, and higher yields will require the control of variability at each processing step. This requires an understanding of all the variables which affect the output of a process, with some form of optimization for tighter control.

However, there is also great potential benefit in applying A(S)PC techniques to the fabs and equipment in use today. These techniques can potentially increase productivity by decreasing the use of pilot test wafers, shortening the cycle time, and improving equipment availability. They can also potentially lower manufacturing cost by reducing the amount of scrap wafers.


The Value of Equipment health monitoring/Modeling

It is reasonable to assume that the health of a piece of equipment is an indicator of the quality of the product it produces. For example, consider a robotic paint machine at a widget manufacturer. For a given type of paint (viscosity, color, adhesion qualities, etc.) and a desired paint quality (thickness, coverage, uniformity, etc.), the nozzle setting, pressure setting, swath speed, and distance from object must be carefully controlled to balance paint quality against paint wastage and widget throughput. When the machine is healthy, i.e., operating normally, and neglecting external influences, paint quality will be as expected. However, a fault in, say, the pressure controller, can cause poor paint quality through lack of coverage and incorrect atomization of the paint. The machine is not healthy, and the quality of the end product suffers. Therefore, if one could monitor the health of the machine, it would be possible to immediately detect a "misprocessed" widget and take corrective action, without the need to measure directly the quality of the painting. In fact, it may be that the quality of the painting cannot be accurately measured until some time later, for example after the widget has been through a paint-baking process.

If the paint viscosity, nozzle setting, pressure setting, swath speed, and distance to object could all be measured, then it would be fairly simple to detect gross errors in the values of these measurements through simple high/low alarm limits. For example, a severe leak in the air line to the nozzle, which causes the air pressure to be much too low, could trip a low-limit alarm. Similarly, a fault in the viscosity sensor, which gives a viscosity reading equivalent to that of tar, could trip a high-alarm limit. The challenge is to detect not gross defects which are immediately obvious to an operator, but much subtler defects which, if left unchecked, can result in a batch of widgets having to be reworked, or even scrapped. Intuitively, it makes sense that if one could take advantage of all the interactions, or correlations, between the process variables, it should be possible to detect subtle defects which ordinarily go unnoticed.

According to the Collins dictionary of mathematics a model is:

"a fragment of a mathematical or formal theory that reflects some aspect of a particular physical, social, technological, or natural phenomenon or process, and enables predictions to be made about its behavior."

If an accurate real-time model of the robotic paint machine could be established, one that would accurately describe the operation of the machine under all normal operating conditions, a comparison could be made between the actual behavior of the machine and the behavior as predicted by the model. A statistically significant difference in behavior could trigger a machine "health" alarm, which can shut down the robot and prevent defective products from being produced.

In the simple paint machine example, a model is needed to predict the paint quality because it cannot be measured directly. The quality of the paint must be predicted with sufficient accuracy to detect subtle changes.

There are two ways to model a robotic paint machine, from first principles or from empirical data. In developing a first principles model, the underlying physics, mechanics, and fluid-dynamics of the machine have to be understood. The model must accurately relate paint viscosity, nozzle setting, pressure setting, swath speed, etc., to paint quality. In fact, we would also have to precisely define how to measure paint quality, which in itself could be quite a challenge. In the end, a first principles model consists of one or more sets of equations which mathematically describe the paint quality in terms of some set of measured parameters.

An empirical model uses data collected from the machine and fits this data to a function which, for the paint example, relates paint quality (dependent variable) to paint viscosity, nozzle setting, pressure setting, etc. (independent variables). The challenge is to find the appropriate fitting function. If the system is simple, a single-order linear polynomial fitting function may be sufficient. If the system is complex, a higher-order, non-linear fitting function may be required. Once a fitting function is chosen, the empirical data can be used to determine fitting coefficients.

Both of these approaches present several problems. First, if the system is complex, it may be very difficult to develop an accurate model. Second, the dependent variable (paint quality) must be quantifiable, which may be difficult to do. Third, both approaches are not generally robust. Any problems with the sensors measuring the paint viscosity, nozzle setting, pressure setting, etc. could result in a wrong prediction of paint quality.

Both these approaches are univariate. That is, the output of the model is a single value for the dependent variable. A multivariate approach, one that predicts the values of both dependent and independent variables, has tremendous advantages over a univariate approach. First, if the assumption that the machine health is related to paint quality is correct, and that machine health is indicated by the relationships between the measured machine parameters, there is no need to directly measure paint quality. Paint quality can be inferred by the values and patterns associated with the machine data. The problem has shifted from measuring paint quality to measuring machine health. Second, a multivariate technique has the potential to validate the independent values and thus be more robust. Third, a multivariate technique provides much more information on a system than a univariate approach.

Universal process modeling (UPM) is an empirical modeling technique that is robust, multivariate, and accurate. UPM employs a unique localized modeling approach. Its fitting function is general enough that it can be applied to simple and complex, linear and non-linear systems. Other multivariate modeling techniques include principal component analysis (PCA) and projection on latent structures (PLS)1-3, and neural networks.4,5


Universal Process Modeling  -  Top of Page

Overview

UPM is a multivariate modeling technique capable of accurately modeling complex non-linear systems in real-time (see Fig. 1). UPM was developed by Triant (formerly TERANET IA Incorporated) during 1990 and 1991, and one of its first applications was modeling the differential core coolant temperature in an experimental fast breeder reactor. Since then, UPM has been successfully applied in situations ranging from validating battery data from electric vehicles to modeling military aircraft flight data.

Figure 1. An Overview of UPM
Any system modeled by UPM must have two very important characteristics. First, the system has to be described numerically. In its simplest sense, this means that numerical data describing how the system "works" must be available. For a machine, this typically consists of appropriate sensor readings. Secondly, the numerical data describing the system must be correlated. That is, the "sensor readings" must be related to each other. It is important to recognize that one does not need to know or understand what the correlations themselves are, simply that they exist.

UPM possesses two very important characteristics which make it well suited to a broad range of applications. UPM is both multivariate and robust. It is multivariate in the sense that it does not differentiate between "input" (independent) variables and "output" (dependent) variables. A predicted value is calculated for each and every variable contained in the data, whether or not the user treats the variable as an "input" to the model or an "output" from the model. UPM is robust in the sense that missing or faulty "input" variables have little effect on the accuracy of the predicted "output" values.

UPM is an inductive, or example-based, technique. A set of data, known as the reference library, describes how the system operates under "normal" conditions. The reference library is usually constructed from system data that has been archived. For example, in the case of an etch machine the simplest form of reference library consists of collecting relevant data from the machine after it has processed a batch of known good wafers. UPM models the etch machine using the reference library as its "knowledge base" on how it normally processes a batch of wafers. If the behavior of the model created by UPM tracks the behavior of the machine, then it is assumed the machine is "healthy" and operating within its normal operating envelope, as described in the reference library. However, if this is not the case, then it is assumed the machine is not "healthy" and some kind of corrective action must be taken.

In UPM, two health-related measurements describe the accuracy of the model output: system health and signal health. System health is a global measure of the accuracy of the model. Signal health is a local measure of the accuracy of each variable. They are both expressed as how many standard deviations the model output is from the "ideal". The "ideal" model is calculated by "profiling" the reference library. This profiling operation is described in more detail in a later section.

Elements and Images

We have adopted a terminology to describe data associated with UPM. An element is a single piece of data akin to a variable, sensor reading, parameter, etc. An image is a collection of associated elements akin to a record, snapshot, etc. We use this "neutral" terminology to avoid confusion when referring to systems which have their own way of describing elements and images.

The reference library, for example, is a collection of two or more images, each image consisting of two or more elements. UPM models a single input image at a time, and generates an output model image. The input image may originate in real-time from say, a piece of equipment, or it may originate from a stored file.

Profiling the Reference Library

The profiling operation uses UPM to model the data in the reference library. Each image in the reference library is modeled using a reference library containing all images except the one being modeled. For each image modeled, the system similarity and the element modeling errorh. It consists of calculating the expected "goodness-of-fit" between the input image and model image, and also the expected modeling error for each element. We refer to the "goodness-of-fit" between the input image and model image as the system similarity. It measures how similar the model image is to the input image. The modeling error for each element is simply the absolute difference between the element input value and the corresponding model output value. The modeling error is often referred to as the residual error, or simply the residual.

Please refer to Figure 1 for additional information.

Features and Benefits

Table 1 shows the main features, and corresponding benefits, of UPM.

Table 1. Features and Benefits of UPM
Feature Benefit
Example-based (inductive) There is no need to describe the system mathematically. All information about how the system should operate is contained in a reference library of past examples.
Multivariate There is no distinction between dependent (output) and independent (input) elements. The algorithm is capable of modeling both types simultaneously, which provides much more useful information than univariate techniques.
Fault-Tolerant The prediction depends on all the elements in the input image. A sensible prediction is made even if one or more elements contain bad or missing data.
Localized Modeling Many complex systems are locally linear. UPM takes advantage of this in modeling complex non-linear systems.
Bounded Predictions The predicted values are always within the bounds of previous experience, i.e. within the range of images in the reference library. This feature is critical in error detection and signal replacement applications.
Accurate Accurate modeling means that small changes in the system can be detected.
No Training UPM requires very little setup. There is no need to normalize or preprocess the data before modeling. New information can be added to the reference library, on-the-fly, without having to retrain the entire system.

Applications

UPM can be applied to a wide variety of problems and tasks. Some of the more useful applications, which relate to the semiconductor industry, are briefly described.

Signal Validation

Sensor validation is the process of distinguishing a "faulty" sensor from a "faulty" system. For example, if a sensor gives a reading which appears to be 20 percent too high, the challenge is to decide whether the reading is correct, or whether, say, the sensor has drifted out of calibration.

To distinguish between a system fault and a sensor fault, the system health and signal healths are examined. If a single sensor has a low signal health (high standard deviation value), but the system health is normal, it usually means a problem with the sensor itself. This can be confirmed by examining the other signal healths and confirming they are normal. If, on the other hand, the system health is low (high standard deviation value), it usually means a problem with the system.

There is an underlying assumption that a "faulty" system affects many sensor readings simultaneously. It is unusual for a problem in the system to only affect a single sensor reading.

Fault Detection

In many systems, fault detection is implemented as a simple high/low alarm on one or more signals. This simple type of alarm can often detect gross defects, but leaves more subtle defects unnoticed. This is especially true of dynamic systems, where signals vary over a wide range in normal operation. Another problem with simple alarms is that the correlation between signals is ignored. For example, if signals A and B are inversely related, the fact that both signals, under normal operating conditions, should never simultaneously approach their high alarm limits, goes unnoticed by simple alarms.UPM is multivariate and it builds a dynamic model of the system. It can detect subtle changes in the operation of the system which may be indicative of a developing fault. The system health indicates the severity of the "fault", and the individual signal healths help localize it.

Machine Fingerprinting

Often, it is desirable to know if a piece of equipment operates the same way after an "event" as it did before. For example, after an etch machine has been "cleaned", it is important to establish that the machine operates correctly.

The reference library establishes the "baseline" or "normal" operation of the equipment. If a machine models poorly after, say, a "clean", then it means that the machine is not operating the same way as it did before. As in fault detection, the system health indicates the severity of the "fault", and the individual signal healths help localize it.

Signal Replacement

If a signal is defective, it can be replaced with the UPM value for the defective signal. UPM can identify the defective signal by examining the signal health values, and replace its value using the corresponding signal, or element, from the model image.

A variation on this idea, is the concept of a virtual sensor. For example, the reference library may contain wafer temperature data, and other data, from running an instrumented test wafer through, say, an etch process. When production wafers are run through the same process, UPM generates a wafer temperature value, even though the temperature is not directly measured on the production wafer.

Correlation Discovery

UPM is a multivariate modeling technique. It can use correlations in the data that may be unknown or unobvious. UPM can be used to describe correlations in the data, and thus provide an insight into the system, and an understanding of it.

Data Display

A number of different charts are in UPM to display the model results. Some charts are unique, such as the bull's-eye and thermometer plots, and others are common types of SPC charts.


Figure 2. Bull's-Eye Plot

Bull's-Eye Plot

The bull's-eye plot (see Fig. 2) is a unique way of displaying a lot of information, in a format which is easy for an operator or engineer to interpret. The bull's-eye plot displays the signal healths in a polar plot format. Each signal "owns" a spoke on the polar plot. Each spoke is labeled with the signal name. A circular "bullet" moves radially along the spoke. The distance of the "bullet" from the center of the plot indicates how many standard deviations the signal value is from the expected, or modeled, value. If the "bullet" is blue, the signal value from the equipment is higher than expected. If it is black, it is lower than expected.

The plot also displays the numerical value of the system health. The background color of the plot depends on whether the system health has crossed its user-adjustable limits. Two limits exist, a warning limit and a fault limit. Typically, the warning limit is set to 2.5 standard deviations and the fault limit set to 3.0 standard deviations. If the system is operating normally, the background color of the bull's-eye plot is green. If the system health crosses the warning limit, the background turns yellow. If it crosses the fault limit, it turns red.

Clicking on the name of a signal brings up its corresponding trend plot.

Thermometer Plot

The thermometer plot is a vertical bar chart which, using a single bar, shows the minimum, maximum, and current system health values. The normal, warning, and fault zones for the system health are color-coded regions of the bar.

System Health Trend Plot

The system health trend plot displays the history of the system health over time. It is useful in identifying the trend pattern of the system health, as well as in identifying cyclical variations.

Model, Residual, and Cusum Trend Plots

Each signal in the system can be displayed in the form of a model, residual, or cusum trend plot. The model trend plot shows the input signal value together with its modeled value. It also shows the control-limits around the modeled value. When the input signal value crosses either of the control limits, it means that the signal has deviated from its normal, or expected, value.

The residual trend plot shows the residual, or modeling, error, which is simply the difference between the input signal value and its modeled value. It also shows the user-set upper and lower control limits as horizontal straight lines. As with the model plot, when the residual value crosses either of the control limits, it means the signal has deviated from its normal, or expected, value.

The cusum trend plot shows the cumulative sum of the residual errors. If modeling was perfect, one would expect modeling errors to be normally distributed around the mean modeling error computed from profiling the reference library. Any systematic error gradually builds up the cusum value, without necessarily triggering a signal health problem. A cusum plot is therefore useful in identifying gradual, but consistent, drifts in the system, which may be indicative of a problem developing.

Application to a Wafer Etch Facility

Motorola, with its continuous improvement methodology, quickly realized the potential benefit an advanced modeling technology, such as UPM, could provide. It decided to work with the engineers at Triant to implement a real-time fault detection system for etch tools in one of their production fabs. The site chosen was Motorola's MOS 6 facility, located in Mesa, Arizona. MOS 6 is a sub-micron ASIC/BICMOS fab which started operations in 1988.

MOS 6 has developed a "back-to-basics" approach to tackle scrap reduction. The "back-to-basics" approach focuses on the fundamentals of documentation and training. A complete, in-depth review of specifications, recipes, data collection operations, charts and specification limits is designed to decrease the incidence of scrap.

At the same time, the etch engineers at MOS 6 are nearing completion of the installation of an automatic data collection system for most of their etch tools. Lot information, process data, and logistics information is collected on every wafer that runs through the Lam etchers. The data is analyzed as it is collected to detect any conditions that adversely affect wafer processing.
The basic concepts of the UPM system were demonstrated to the MOS 6 engineers in June of this year. The next step is to install UPM on several Lam 4400 Poly etchers so that its effectiveness at monitoring the health of etch tools can be demonstrated. This is scheduled for October 1995.

The rest of this section briefly describes the system architecture of the UPM software and the various software modules which make up an implementation.

System Architecture

Although the UPM algorithm is important, it is only one component of providing an overall solution for a semiconductor fab. A number of other components are required for successful deployment. Some of these components are software-related, some are hardware-related.

The architecture had to support real-time modeling of semiconductor manufacturing equipment. It had to be scalable to very small and very large fabs. It had to be portable to a wide range of hardware and operating systems. It also had to be modular, so the customer could choose only those components which make technical and economic sense.

The basic system architecture is shown in Figure 3. It shows the typical topology of the hardware and software components for a medium size fab.

Software Components

Currently, six software components comprise the architecture: data collector, model manager, model display, system administrator, data archiver, and model miner.

Data Collector

This component is responsible for collecting data from the equipment. It collects tool data every second and makes it available to the model manager component. There are a variety of methods to collect real-time data from the tool. The method we chose was to use the SECS port.
One of the challenges in using the SECS port is that other systems may also need the SECS port for recipe download, preventive maintenance, etc. Motorola uses software from Brookside Software for data collection, preventive maintenance, and end-point monitoring. Therefore, the data collector module had to work in conjunction with the Brookside software. Triant and Brookside engineers worked together to provide the data collector with real-time access to the machine data. The data collector can also send "stop processing" alarm messages, via the Brookside software, to the tool.

Model Manager

This component is responsible for modeling the real-time tool data using the UPM algorithm. Real-time data from the data collector component are read and modeled, with the results stored in a model results file (MRF). The model manager also creates a model archive file (MAF) which contains all the model results for a single lot. It also supports the sending of alarm messages to the data collector component to instruct the tool to stop processing.

The model manager can service multiple data collector components. Typically, 4-10 tools can be modeled simultaneously using a single model manager. The system architecture also supports multiple model managers. The model manager can run on the same computer as the data collector component, although it is more usual for it to run on a separate computer connected to a network. In this case, data between the data collector and model manager components are exchanged over the network.

Model Display

This component is responsible for displaying the model results. It reads the model results created by one or more model managers and displays the results using the bull's-eye, thermometer, and trend plot formats. The user can choose any tool for viewing, and can simultaneously display the results from several tools.



Figure 3. System Architecture

The model display can run on the same computer as the model manager component, although it is more usual for it to run on a separate computer connected to a network. In this case, data between the model manager and model display components are exchanged over the network.

System Administrator

This component is responsible for setting and configuring the various components which make up the installation. The system administrator is a protected component requiring authorized access.

Data Archiver

This component, which is optional, is responsible for archiving the model archive files created by one or more model manager components. Each model archive file contains all the modeling information for a lot. For best data mining results, an ORACLE database can be used for archiving.

This component usually runs on a dedicated database server.

Model Miner

This component, which is also optional, mines the real-time tool data in the ORACLE database. It is used to provide a variety of information on the performance and operation of each tool. Typically, it assists in analyzing poor yield and equipment-related problems. It is also used to examine and report operational changes in the tool to assist in tool maintenance.
This component typically runs on a workstation or personal computer with access to the database server.


Summary  -  Top of Page

The need for A(S)PC is evident as semiconductor manufacturers face the challenges of tomorrow in moving to smaller geometries and larger wafers, and those of today in maximizing the use of existing equipment and fabs.

UPM is an empirical modeling technique that is robust, multivariate, and accurate. It uses a reference library of past data as its "knowledge base" on how a piece of equipment operates under "normal" conditions. UPM calculates a statistical measure of the overall "health" of a piece of equipment, as well as the "health" of the equipment signals monitored. UPM can be used for signal validation, fault detection, machine fingerprinting, signal replacement, and correlation discovery.

UPM's unique bull's-eye plot, thermometer plot, and other trend plots, give operations personnel the tools needed to make informed decisions about the process.

As part of an overall scrap reduction program, Motorola's MOS 6 facility, located in Mesa, Arizona, is working with engineers from Triant to implement a UPM-based real-time fault detection system for etch tools. The system is scheduled to be fully operational in late 1995.


Acknowledgments  -  Top of Page

The authors would like to acknowledge John Gragg, Carl Aspin, and Mike Clayton, all at Motorola, for their enthusiasm and support of the project at MOS 6.


References  -  Top of Page

  1. Nomikos, P., MacGregor J. F., "Monitoring Batch Processes Using Multiway Principal Component Analysis", AIChE Journal, August 1994
  2. Kresta, J. V., MacGregor, J. F., Marlin, T. E., "Multivariate Statistical Monitoring of Process Operating Performance", The Canadian Journal of Chemical Engineering, February 1991
  3. MacGregor, J. F., Jaeckle, C., Kiparissides, C., Koutoudi, M., "Process Monitoring and Diagnosis by Multiblock PLS Methods", AIChE Journal, May 1994
  4. O'Sullivan, P. J., "Application of a New Technique for Modeling System Behavior", ISA Symposium Proceedings, Edmonton, May 1991
  5. Dayal, B. S., MacGregor, J. F., Taylor, P. A., Kildaw, R., Marcikic, S., "Application of Feedforward Neural Networks and Partial Least Squares Regression for Modelling Kappa Number in a Continuous Kamyr Digester", Pulp and Paper Canada, 95:1 (1994)



© 1996-2007 Triant
    All Rights Reserved.
Triant Home