Using Fourier Transforms to Characterize Implantation Damage

As an undergraduate at the University of Missouri – St. Louis I had a joint project with the NASA Missouri Space Grant Consortium and my current company, SunEdison Semiconductors.  The aim was to better understand helium and hydrogen ion implantation into silicon before a thermal treatment.  This was an active area of research to help optimize the SmartCut technology for transferring thin films to create Silicon on Insulator (SOI) materials.

Data

Our data set would be gathered via the preparation of cross-sections of implanted silicon for analysis via Transmission Electron Microscopy.  Interestingly, there were highly regular dark bands parallel to the surface and at depths expected for the implantation species and damage.  The bands remained dark regardless of imaging conditions–bright field or darkfield.  This suggested the contrast was the disruption of electron channeling due to displaced atoms in the crystal.  In other words, the darker it was, the more displaced silicon atoms, the more damage.  See the image and intensity profile plot below.  The take-away was:  TEM is a reasonable way to at least see the damage.

Goal and Method

Aside from measuring standard things under different conditions, we also wanted to be able to say something quantitative about the structure in the damage layer and that’s what we’ll focus on here.  For sufficiently thin samples we could see a sort of mottled contrast.  In the image below you can see the thinner part of the sample highlighted as Region 1 shows significantly more structure than the thicker Region 2.

How can we be quantitative about this structure?  One way is to look at the makeup of the spatial frequencies needed to construct the two image regions.  This map can be obtained by taking the magnitude of the Fourier Transform of the image, called the Power Spectrum (PS).  This can be done so that the center of the PS is an intensity called the DC Peak which is the zero frequency and gives a measure of average image intensity.  Further away from this center point corresponds to increasing frequency.  So low-frequency (read:  long range) contrast correlations are near the center and quickly changing intensities at short-range require high frequencies.  If you take the average intensity of a band at a given radius in the PS, this tells you something about the amount of those spatial frequencies required to construct the image.

Power Spectrum/Fourier Transform Intuition

To build intuition, lets think more about this relationship.  Imagine you have an image of a zebra-stripe vertical pattern where there’s 10,000 lines counted from the left to the right of the image.  If you were to make a profile plot of intensity similar to Fig. 1 above, you would see a very fast-changing signal that would look like a high frequency square wave.  If the number of lines were reduced to 10, you would see a similar plot but with a much lower frequency.  Similarly, the PS of the first case would show a bright peak far away from the center (indicating high frequency) and the second image would have a peak much nearer.  The angle of this point gives us information on the orientation of the corresponding spacings in the image.  More specifically, drawing a line in the PS from the center to the bright dot of interest is a line that is perpendicular to the spacing in the image.  So vertical lines would have points to the left and right of the center at angles 0 and 180 degrees.

How do we apply this to our problem?  We simply take a PS in the damaged region and another PS nearby as a baseline and compare the spatial frequencies.  If some particular band of frequencies is strong (indicated by a radial intensity average in a band of the PS) in the damaged area, that tells us something about the size scale of the texture.

Results

To show we were actually getting spatial frequencies as a result of the damaged texture, we compared PS from the regions marked above since the average intensity profiles are similar but in the case of region 2 the texture information is washed out due to the sample thickness.  (Note:  ion milling was used to thin the sample and the presence of the oxide near region 2 indicates it is less thinned than region 1, where the oxide has been milled away).  We normalized the PS intensities to match at the DC peak.

Observe that in the thinner region there’s a significant bump in intensity in the 40 – 80 pixel range.  This says something about our characteristic damage size.  2g measures in pixels can be converted to spatial frequencies if you know the camera constant, magnificantion…etc of your image.  In this case the 40-80 pixel 2g range corresponds to 90-180 Angstrom spacings.

Digital Darkfield

To complete the analysis of this damage layer we wanted to confirm that the regions of the image that were using these frequencies were indeed the regions we were interpreting as implantation damage.  This was achieved by applying a filter to the fourier transform to only select the frequency range of interest, and then inverse-transforming to see our filtered image that only allowed spatial frequencies we let pass.

Specifically, we made a doughnut shape where inside the doughnut we multiplied the FT of the image by 1 and outside (and in the middle) of the doughnut we multiplied by zero.  The width of the doughnut was defined by our 2g measures in the image above and was the 40-80 pixel distance from the center of the PS.

After this multiplication, only the frequencies within the doughnut remained and we then took the inverse transform.  There are so-called “ringing” effects due to this abrupt cut off in frequency, but the image below shows that the damaged region in the thin part of the sample lights up much more brightly than other regions.  It also shows that this texture is washed out as the thickness increases, as our intuition might suggest.

It’s important to note that this result is completely independent of the damaged area being dark compared to the rest of the sample.  This shows that the frequencies required to create the “damaged region” in the original image lie in a particular range and therefore give us information on its characteristic size scale.  The fact that this region lights up when we filter out all other spatial frequencies confirms this.

Acknowledgments

Phil Fraundorf at UMSL for teaching me how to do all of this stuff as well as providing the ideas/tools.

Jeff Libbert, Lu Fei, and MEMC/SunEdison Semiconductors for providing guidance, funding, and materials.

The NASA Space Grant Consortium and UMSL for providing funding and the opportunity to do undergraduate research!

Machine Learning Practice on Breast Cancer Data

I found an open data set on the Machine Learning Repository concerning Breast Cancer and thought I would use it as a way to practice machine learning approaches.  Briefly, the training data indicate whether the instance was Benign or Malignant and the values of a variety of features:  i.e. Clump Thickness, Uniformity of Cell Size, Uniformity of Cell Shape…etc.

Please go to my GitHub Breast Cancer Repository for the data set and code.

Approach Outline

The data set had 9 features, two classifications and about 700 data points.  I split it so that the training of my model happened on the first 350 data points and the remainder were used for checking the accuracy.

Our prediction equation is a sigmoid and is interpreted as the probability that the tumor is malignant based on the input features.  The input to the sigmoid is a sum of each feature multiplied by a parameter value, i.e. $\vec{\theta}\cdot \vec{X}$ where $\vec{\theta}$ is the parameter vector $[\theta_0, \theta_1, \theta_2, ..., \theta_9]$ and $\vec{X}$ is the feature vector of a given data point i:  $[1, x_{1i}, x_{2i}, x_{3i}, ..., x_{9i}]$.  Explicitly, the input for a set of features $\vec{X}$ is:  $\theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_9x_9$ and the output is the probability of being malignant with the chosen model parameters ($\vec{\theta}$).  My program says if the output is greater than or equal to 0.5 (meaning the probability is greater than 50%), then the prediction is that the tumor is malignant.

The cost function is such that if the sigmoid predicts a 1 (meaning the probability the tumor is malignant is 100%) and the real classification is that it’s benign, then there is an infinite cost.  The same is true if it predicts a 0 (meaning the probability of being malignant is 0%) and the real classification is malignant.  The cost equation ends up being:  $J = -\frac{1}{m}\sum_{i=1}^m (y*log(h) + (1-y)*log(1-h))$ where m is the number of data points, h is the output of the prediction sigmoid and y is the classification of 0 or 1  (benign/malignant).  The idea is that if you’re 100% confident and end up being wrong, you have a very large cost.

The job is then to find the vector theta that minimizes this cost function.  The gradient descent method is one choice, but for the first iteration I tried a minimization function built into Octave ‘fminunc’ with 400 iterations.

Initial Results

The ‘fminunc’ returned a vector of parameters theta that minimized the cost function specified above.  I then used the sigmoid with these values for $\vec{\theta}$ to make predictions on the remaining ~350 data points.  So for each data point there are a set of features given by $\vec{X}$ that were dotted with my parameter vector and fed as the input to the sigmoid function.  If this number was greater than or equal to 0.5, it predicted the tumor was malignant.

The prediction accuracy ends up being 98%!

Next I used a Gradient Descent (GD) minimization algorithm instead of the built-in ‘fminunc’ feature to compare results.  The results when using GD were very sensitive to the learning rate parameter $\alpha$ .  If I used $\alpha = 0.1$ things remained pretty stable and gave reasonable accuracy, but took more iterations to achieve.  The accuracy ended up exceeding that of fminunc very slightly with the best value being 98.6% compared to fminun’s 98.2%.  Essentially it classified one more of the data points correctly.

For $\alpha = 0.5$ the transient behavior resulted became unstable as log functions began returning infinities.  The algorithm ignored these results and continued to minimize and the output would end up being similar to cases where this wasn’t an issue, but the descent was not a smooth march towards the minimum.  See the figures below for a comparison between the methods and an example of an unstable cost output as a function of the number of iterations.

What the Model Tells Us

Once we have parameters that give us high accuracy in prediction, what can we learn from it?  If you look at the sigmoid function it’s clear that when the input argument is positive, the answer tends towards 1 and when it’s negative it tends towards 0.  The more positive/negative the input is, the closer to 1/0 the output is.  So instead of writing out the sigmoid function, let’s just look at the input and see the conditions that make it positive or negative to garner some intuition.

The input is just $\vec{\theta}\cdot \vec{X} = \theta_0 + \theta_1x_1 + \theta_2x_2 + ... + \theta_9x_9$.  The machine learning algorithm outputs the following for the parameter vector:  $\vec{\theta} = [ -8.2, 0.52, -0.10, 0.42, 0.27, 0.11, 0.41, 0.12, 0.09, 0.31 ]$.  Recall that $\vec{X}$ is our vector of features (with $x_0 = 1$ so that we can have a $\theta_0 = const$ term).  The description of each feature is given below:

#  Attribute:  Domain

0. Constant:  1 for all
1. Clump Thickness:  1 – 10
2. Uniformity of Cell Size:  1 – 10
3. Uniformity of Cell Shape:  1 – 10
4. Marginal Adhesion:  1 – 10
5. Single Epithelial Cell Size:  1 – 10
6. Bare Nuclei:  1 – 10
7. Bland Chromatin:  1 – 10
8. Normal Nucleoli:  1 – 10
9. Mitoses:   1 – 10
Now, the first parameter, which merely multiplies the constant 1, is equal to -8.2.  This means that in the absence of any other feature information, it’s most likely that the tumor is benign, not malignant.  Good news!  The next highest parameter value is 0.52, which multiplies the Clump Thickness.  This feature seems to be the most strongly correlated with being malignant.  The third parameter value is less than zero, indicating the higher the Uniformity of Cell Shape, the more negative the argument becomes and therefore the more likely to be benign.

Next Steps

Next we’ll see how the prediction accuracy changes when using more a complicated feature set (e.g. including combinations of features like $x_1x_2x_4^3$).

Acknowledgments

Data came from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.

Citations

1. O. L. Mangasarian and W. H. Wolberg: “Cancer diagnosis via linear
programming”, SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.

2. William H. Wolberg and O.L. Mangasarian: “Multisurface method of
pattern separation for medical diagnosis applied to breast cytology”,
Proceedings of the National Academy of Sciences, U.S.A., Volume 87,
December 1990, pp 9193-9196.

3. O. L. Mangasarian, R. Setiono, and W.H. Wolberg: “Pattern recognition
via linear programming: Theory and application to medical diagnosis”,
in: “Large-scale numerical optimization”, Thomas F. Coleman and Yuying
Li, editors, SIAM Publications, Philadelphia 1990, pp 22-30.

4. K. P. Bennett & O. L. Mangasarian: “Robust linear programming
discrimination of two linearly inseparable sets”, Optimization Methods
and Software 1, 1992, 23-34 (Gordon & Breach Science Publishers).

Synthesis and Characterization of Nanoporous Carbon

As part of a materials research group we were creating highly-ordered nanoporous carbon.  Not only is it a beautiful material to put in the Transmission Electron Microscope (TEM), it can have a very interesting effect on materials when they’re confined in the pores.

The Motivation

Our research group was interested in hydrogen storage materials.  Lithium Borohydride (LiBH4) was a material of promise in the literature that could be heated to release its hydrogen and cooled down in a hydrogen bath to absorb it.  Unfortunately, the temperatures and pressures required for this cycling make it an impractical choice for something like a fuel storage system in a hydrogen-powered car.  However, it was found by confining this material in the nano-pores of a carbon framework, the temperature of hydrogen release from LiBH4 could be greatly reduced.  Characterizing this behavior was our aim.  But first, we needed a way to create consistent nano-pores in a carbon framework!

Creating the Nanoporous Carbon

We ultimately wanted well controlled, hollow tubes that were very long and had a diameter of a few nanometers.  The technique/recipe was based on a paper by Meng, et. al. but can be described as “evaporation induced self-assembly”.  Essentially polymers were chosen that would undergo the self-assembly of different structures as a consequence of concentration gradients.  We chose a range that would result in a hexagonal structure of tubes.  You then bake the structure to “cross-like” and lock everything in place.  You then bake at a higher temperature (600C) to remove the hexagonally ordered polymer, leaving tubes and the matrix. Finally, bake to around 900C to remove all but the carbon atoms.  Voila!  Nanoporous carbon.

Characterization via TEM

This was an extremely tedious task, but one that came with great reward.  The goal was to find an image that was on-edge to show the hollow tubes propagating through the material and also a view looking down the tubes to show them arranged hexagonally.  I had a single-stage tilt and was imaging under bright field conditions.  Finding the lines was a much easier task, but it was still tricky to get a nice image because the only way to get nice contrast was to have several tubes stacked on top of one another.  Kind of like looking at a hand that’s tilted sideways so that the fingers all line up, I needed all the tubes to line up so there was a portion of the sample that would be much less dense than that which surrounded it and would therefore allow more electron penetration and hence a bright spot on the image.  See the left image below for the result.

Finding the “down the pipe” view was much, much harder.  Imagine you have tubes that are 2 inches wide and 20 foot long–if you’re looking down them even slightly off angle you won’t be able to see through the other side.  I needed to find something comparable that happened to land that way on the sample stage by chance, but was also thin enough that the structure could be imaged by transmitting electrons.  Further, to find new chunks to look at you must zoom out, re-focus, find something, zoom in, and set focus.  Far from trivial, especially when you’re turning all those knobs in the dark!  After many hours, I found the “goldilocks” piece and it was rather stunning.  Maybe I’m biased.

Results

After confirming we had the proper material several experiments were conducted to understand the kinetics of the reactions.  Most notably, the temperature for hydrogen release of LiBH4 was reduced from 460C to 220C and the emission of Diborane gas (toxic) as a reaction byproduct was eliminated.  This work resulted in two papers:  Journal of Physical Chemistry C and Chemistry of Materials.

Ellipse Detection

I enjoy learning new algorithms and am interested in computer vision.  It turned out that my company had a use for recognizing small circles in images and defect maps so I thought I’d use it as an opportunity to learn something.  You can find all the code and example data files you need at my GitHub repository on ellipse detection.

Background:

The standard way of finding the signature defect pattern was by human observation.  It’s obviously advantageous to automate this task both for rigorous statistics and for the sake of freeing up peoples’ time.  Further, more robustly understanding the timing and frequency of this defect might allow us to learn something about the root cause.

Method:

I first wanted to prototype a solution in the easiest case, which is when dealing with a defect map with a list of $(x,y)$ coordinates.  I found a paper by Xie, et. al. that reduces the task of finding an ellipse to that of fitting a single parameter by making a few assumptions about the points and using a voting scheme on a one-dimensional accumulator array.  This is probably overkill since my defects are largely circular, but I wanted to understand how they reduced the complexity of the problem and get practice incorporating algorithms from the literature.

The trick to reduce the problem from finding 5 parameters $\{x_0, y_0, \alpha, a, b\}$ (center point, orientation,  length of major/minor axes) down to one is to simply assume the two chosen points are the two endpoints of the major axis.  If this is the case then $\{x_0, y_0, a, \alpha \}$ can all be calculated and only b needs to be voted upon.

So you pick two points and calculate the 4 geometric terms.  Next, iterate over all other points and calculate a minor axis for each and then increment the vote for that choice of minor axis.  Once you’ve iterated over all other points for that pair, find the minor axis that was voted for the most.  If the number of votes is above your threshold, you’ve found an ellipse!  This act of voting in a parameter space is called a Hough transform.

But as they say, the devil’s in the details.  It’s up to you to put in rules about minimum distances to consider and the appropriate value for the vote threshold.

Results:

I found I had far too many points than I needed to get a reasonable fit, so I did some filtering.  The below image shows the original points (blue), the remaining points after filtering (red), and the final calculated ellipse (solid blue line).  Better accuracy can be obtained by changing the resolution of the bins used for grouping the axes, but I was more interested in stability than an accurate fit.

Method criticisms:

Although I was able to get it to work, I had to deviate somewhat from the paper.  For instance, I was not able to get good results by merely deleting points as soon as any type of ellipse was found.  The reason is that a constant value for the minimum vote would sometimes be too high and miss ellipses or would be too low and allow many nearby ellipses fitting the same points and I would sometimes exclude a possible good ellipse by removing points after a bad ellipse was “found”.  I ended up keeping all points in the defect list even if they led to a “found ellipse” because it’s possible that another ellipse using those same points may be a better fit.  I found it better to leave the points in, find all the suggested ellipses, and then choose the best ones by ruling out nearby alternatives with lower vote numbers.

Conclusion:

The prototype is successful in finding ellipses with the small file set I tested, but there is a lot of room for improvement in tuning the various parameters and rules to strike a balance between speed and accuracy.

1 2