INTRODUCTION
As part of its responsibilities, the BC Ministry of Environment monitors water quality in the
province’s streams, rivers, and lakes. Often, it is necessary to compile statistics involving concentrations of contaminants or other compounds.
Quite often the instruments used cannot measure concentrations below certain values. These observations are called non-detects or less thans. However, non-detects pose a difficulty when it is
necessary to compute statistical measurements such as the mean, the median, and the standard
deviation for a data set. The way non-detects are handled can affect the quality of any statistics
generated.
Non-detects, or censored data are found in many fields such as medicine, engineering, biology, and
environmetrics. In such fields, it is often the case that the measurements of interest are below
some threshold. Dealing with non-detects is a significant issue and statistical tools using survival
or reliability methods have been developed.
Basically, there are three approaches for treating data containing censored values:
1. substitution, which gives poor results and therefore, is not recommended in the literature;
2. maximum likelihood estimation, which requires an assumption of some distributional form;
and 3. and nonparametric methods which assess the shape of the data based on observed percentiles rather than a strict distributional form.
This document provides guidance on how to record censor data, and on when and how to use
certain analysis methods when the percentage of censored observations is less than 50%. The
methods presented in this document are:1. substitution; 2. Kaplan-Meier, as part of nonparametric
methods; 3. lognormal model based on maximum likelihood estimation; 4. and robust regression on
order statistics, which is a semiparametric method.
Statistical software suitable for survival or reliability analysis is available for dealing with censored
data. This software has been widely used in medical and engineering environments. In this document, methods are illustrated with both R and JMP software packages, when possible. JMP often requires some intermediate steps to obtain summary statistics with most of the methods described in this document. R, with the NADA package is usually straightforward. The package NADA was developed specifically for computing statistics with non-detects in environmental data based on Helsel (2005b).
The data used to illustrate the methods described for computing summary statistics for non-detects
are either simulated or based on information acquired from the B.C. Ministry of Environment.
This document is strongly based on the book Nondetects And Data Analysis written by Dennis R.
Helsel in 2005 (Helsel, 2005b).