2k Table
2k Histogram
2k Raw
2k C Code
60k Table
60k Histogram
60k Raw
60k C Code

A Table of 60,000 Random Data Points
of an Ideal Bell Curve

by Oscar Falconi

This large table ( 60k Table ) contains 10 columns, each column containing 6000 numbers for a total of 60,000 numbers. Each number, represents the deviation, sigma, from the mean, zero, of a data point within a standard Gaussian curve (also referred to as an error curve, normal curve, or bell curve). Thus, we have 60,000 random numbers, sigmas from -3.57 to +3.57. The beauty of this table is that the length of every bar (also known as a "bin") in a histogram ( 60k Histogram ) of all 60,000 data points within all 715 bins will average less than 1/3 of a data point from the smoothe Gaussian curve - which never happens with machine-generated random Gaussian numbers.

In the histogram of our "Perfect Table", for instance, there are exactly 240 data points in the bin representing sigma zero (between -0.005 and +0.005). Similarly, there are 120 data points when the ordinate of the error curve has halved (at sigma 1.17741). There's only one data point at sigma 3.57. The total number of data points in all 715 bins is exactly 60,000, with an equal number of positive and negative sigmas. If you wish to include the complete range of deviations of the bell curve from -3.57 to +3.57, the standard deviation (SD) of all 60,000 sigmas is exactly one, and the mean is exactly zero. The standard deviation and mean of any Perfect Table remain 1 and zero no matter how many data points the table may contain. The SD is merely the RMS (Root Mean Square) of all 60,000 data points.

Another perfect table of only 2000 data points is included that can occasionally be useful or instructive. In the box to the right, the data of both sets are tabulated in several forms for your own use and downloading convenience.

Use of the Perfect Table - With 2 Examples

In making most measurements, exact answers are not possible. An exact answer can be more closely approached, and accuracy increased, as more readings are taken. These readings generally differ in a random manner and thus must distribute themselves into the familiar Gaussian error curve, discussed in the previous paragraph.

Whatever the causes of measurement errors, there is no reason to believe these causes will lead to errors that are not Gaussian. Each cause introduces random errors and has its own error curve and (SD) standard deviation, say S1. The other causes of random errors also have their very own standard deviations, S2, S3, etc. To find S, the standard deviation of the error curve from all causes, one simply convolves all the individual error curves (which turns out to be another error curve, a characteristic unique to this curve and the reason for its importance) where S equals the square root of the sum of S1 squared, S2 squared, S3 squared, etc. Thus if we have 4 independent causes of errors, each with its own error curve, with standard deviations, of 0.2, 0.4, 0.5, and 0.6, then S = 0.9. It's gratifying to know it'll always be an error curve we'll be working with when dealing with small measuring errors.

The First Example

Now suppose we've spent a great amount of time, money, and effort determining 3 important measurements, say 10, 11, and 15. How should we treat these results? Average them? Choose the middle reading? Average the outliers? These 3 expensive measurements were born of an error curve whose mean and spread are unknown, but which we are trying to determine as best we can. Since the mean is between the outliers 75% of the time, perhaps we should average the outliers, 10 and 15, resulting in 12.5. (There's a 25% chance all three data points will be in one half the error curve.) Most persons, though, trying to make full use of all three expensive measurements, would intuitively average all three numbers, arriving at 12. Many others, skeptical of the number 15, are suspicious that it resulted from a specious result, a glitch, an accident, a misreading, a typo, or a cerebral event, and that it shouldn't be accepted as a proper statistical result. These folks, therefore, would be quite happy to eliminate both outliers, and choose 11, the middle reading.

Enter, now, our "Perfect Table". Knowing that the exact mean of our 60,000 data points is exactly zero allows us to test which of the three answers is best. Of course we can't test "specious" results, but we'll discuss them later. One question arises: If we don't know the mean or spread of the error curve of our expensive measurements, how can we perform meaningful tests? The answer is that the random numbers chosen from our table will be chosen from a set of numbers faithfully representing the total area under all error curves, exactly as the 3 expensive numbers, exampled above, were chosen from their very own error curve. The chance of having measured a number, probably an outlier, such as 15, near the tail of their error curve, is the same as choosing a similar number, near a tail, from our Perfect Table. Learning how to handle 3 random data points from our Perfect Table will teach us how to best combine the 3 expensive measurements from their own error curve. Thus the mean and spread of the three measurements' own error curve are unimportant since we'll know exactly how to cope with 3 data points.

We have taken 300,000 sets of 3 consecutive numbers from our table (20,000 sets from each of 15 different shuffles) and have found that averaging the 3 readings is about 14% more accurate than choosing the middle reading (The RMS of the 300,000 average-of-3 readings was 14% less than the RMS of the 300,000 middle numbers). Having said that, however, shouldn't we ask if a mere 14% greater accuracy is worth the risk of having an outlier introducing a gross error in the final result? Suppose the 3 measurements were made by 3 different groups, in 3 different countries, say England & France (IQs ~99), and Nigeria (IQ ~67), what say you now? In addition, simply averaging 3 data points is beyond the capability of the many folks who cannot achieve 100.00% perfection in grammar school arithmetic. Another inaccuracy was avoided by using the Perfect Table and thus eliminating the random errors introduced by digitally generating an error curve with nearly all the bins underfilled or overfilled. The Perfect Table has been checked and rechecked for any imperfections and cannot introduce man-made errors.

What about averaging the outliers? I subconsciously rejected this number which usually (75%) emanates from data on both sides of, and often well-removed from, the correct answer. But, for completeness, and to satisfy my curiosity as to the expected accuracy of averaging the outliers, the Perfect Table was once more invoked. Again, using those 300,000 sets-of-three data points, it was found that, surprise of surprises, averaging all 3 was only about 4% more accurate than simply averaging the 2 outliers. This result was completely unexpected.

After more thought, it was clear that accuracy increased as more data was included. Averaging all 3 measurements in a set-of-3 always was the most accurate. Averaging the 2 outliers was less accurate. But relying on just one measurement, even though it was the middle one, intuitively the most accurate, proved, in the end, least accurate. Here, accuracy was decreased by excluding the information contained in the outliers.

So, should we choose the middle number, or average the 2 outliers, or choose the average of all 3 measurements? What we SHALL do, shouldn't necessarily be determined by what our numbers say we SHOULD do. Final decisions should always pass the test of human judgement, such as critiquing the design of the study, how the data was gathered and processed, how the results were interpreted, and so forth. In addition, one should always feel comfortable with any results that were rejected because they were perceived as clearly not part of the error curve. They should be shown to be the result of an arithmetic mistake, or the result of incorrectly copying numbers into a database, or to clearly be the result of a gross human error, and NOT be the result of the rare, but occasional, high-sigma reading that should not be rejected - since it's a valid result that actually does carry useful information.

As I had previously concluded, the very real fear of having included gross human errors, likely resulting in suspicious outliers, caused me to accept a 14% accuracy-decrease, which then allowed me to choose just the single middle measurement - thus:

* eliminating the two outliers as potential sources of error,
* eliminating the possibility of miscalculating an average, and
* eliminating the need to decide if an outlier should be rejected or not.

Similar considerations also caused me to eliminate averaging the two outliers.

In conclusion, I often take the middle measurement. It's already there, and suspicious outliers can be completely ignored.

The Second Example

Blood pressure measurements, which seem to have a mind of their own, can vary over large ranges for many reasons, even if taken on the same patient with the same instrument within several minutes. A single measurement, therefore, is unreliable, and could easily be a specious outlier. If 3 readings are taken, I suggest the middle of the three diastolic numbers, and the middle of the three systolic numbers (that may arise from different readings) form the final result. This result appears to be the most efficient and accurate method to use in the real world where life and death decisions must not be influenced by a specious measurement.

The Method

How have we used the Perfect Table to compare the different methods of combining 3 readings? The Perfect Table provides 60,000 random data points such that they form a Gaussian error "curve" with the correct number of data points in each of 715 bins of a histogram (found at: 60k Histogram.) The standard deviation is exactly 1, and the mean is exactly zero. Each of the 60,000 data points can be considered a reading, any 3 of which can constitute the only 3 readings we are given in order to determine the best answer. Of course, we already know the best answer: it's zero, the mean of the Perfect Table - but it takes all 60,000 data points to get close to this answer, and we're only allowed three data points. What we have done, to determine the best method, is to take all 20,000 consecutive sets-of-3 in our table of 60,000, and compare the results of: (1) averaging all three readings; (2) averaging the two outliers; and (3) choosing only the middle reading. By finding the SD of each method, we'll know how to treat those three precious readings that are difficult or impossible to repeat. As defined above, the smaller the SD, the more accurate the method. After extracting all we can from our Perfect Table, we are allowed to shuffle the same 60,000 numbers and obtain a "New Perfect Table". From our original table we could have chosen different sets, such as horizontal sets-of-3 instead of vertical sets-of-3, but some unknown subtle relationships may have affected randomness, which shuffling easily avoids. So shuffling the same numbers will allow us to extend and improve the results of the original table since we'll have completely different sets-of-3 to use to obtain slightly different RMS's for the three measuring methods under study, thus giving us more RMS's to combine with prior RMS's, and so improving the accuracy. Shuffling our original table is indeed a valid procedure, improving our final results - and possibly even our conclusions.

SHUFFLING VERSUS RE-SAMPLING: "Re-sampling" is an invalid procedure purported to improve accuracy by repeating (or adding) some pre-used data, and removing other perfectly good data. The technique is illogical and serves no purpose. Any results would be artificial, merely mathematical distortions, tantamount to falsifying data. Gentlemen of the Polish persuasion refer to such questionable procedures as "pissing in the soup". Shuffling, however, uses exactly the same numbers, no more, no less. No new data is added, nor is original data removed - yet shuffling allows us to multiply our Perfect Table a vast number of times, allowing us to improve our results and determine with greater accuracy and confidence how the 3 measuring methods compare. Consider how the same 52 playing cards can deal a lifetime of different hands.

The Results

Our results, however, are still limited by the imperfect match of a Gaussian curve with only 60,000 data points and only 715 bins. If we square each of the 60,000 data points, and add them, divide by 60,000, and take the square root of the answer, we'll arrive at the Root-Mean-Square (RMS) of all 60,000 numbers. The RMS, also known as the SD, should theoretically equal 1, but because of the imperfect match of integral bin contents with the smooth error curve, we actually get SD = 0.9989 - which remains the same even if we reshuffle the 60,000 numbers. But by judiciously adjusting just 52 of the 715 bins by just one data point, we have achieved an RMS of 1 to within a part in a Billion (actually: 0.99999999333...). This RMS reading is simply the SD of taking only one reading to arrive at the correct answer, zero. Taking all 60,000 numbers would bring down the SD from 1, to 1 divided by the square root of 60,000, or 0.00408. Since we only have one set of 3 data points, averaging them would decrease the SD from 1, to 1/(root 3), or 0.57735. If we used all 20,000 sets-of-3, the final SD is 0.57735 divided by root 20,000, also 0.00408, as we found just above. Thus we have that averaging all 20,000 sets-of-3 (60,000 total data points) to determine accuracy is the same as processing all 60,000 as single data points.

We have found that averaging any one set of 3 readings is 14% more accurate (SD is 14% less) than using only the middle number of the three, and that averaging the set-of-3 is 4% more accurate than averaging the 2 outliers.

These percentages were found by computing the averages of all three methods from 300,000 sets-of-3 from 15 shuffles, each shuffle creating 20,000 brand new sets-of-3 from the same 60,000 data points. The final SD (equals the RMS) of each of the measuring methods are:

SD of Average of all 3 Readings: 0.576700 (In Theory, the Most Accurate)
SD of Average of the 2 Outliers: 0.600183 ( 3.91% Less Accurate than 0.576700)
SD of 300,000 Middle Readings: 0.670496 (13.99% Less Accurate than 0.576700)

In conclusion, if you're absolutely certain all three readings of your set are valid, including both outliers, and are happily accepted as 3 good results, then, by all means, the readings should be averaged, and the result recorded. However, if there is any hesitation about the validity of either outlier, especially after reviewing the details of how the result was obtained, then accepting the middle number should seriously be considered.


2k Table
2k Histogram
2k Raw
2k C Code
60k Table
60k Histogram
60k Raw
60k C Code

Numbers and Text by: Oscar Falconi
Graph and Charts by: Mall-Net web services.
Computer Number-Crunching by Pete Williamson
© 2006 Oscar Falconi, Saratoga, California
Ignore © if " www.nutri.com/random/ " is credited.