Another View of Accuracy Testing by John E. Leslie III Copyright 1993 by John E. Leslie III All Rights Reserved Premise In his recent article entitled "Accuracy Testing" (Precision Shooting, August 1993, p.58), Peter Craig pointed out the problems associated with using three or five shot groups to try to determine the best load for a particular firearm. Mr. Craig demonstrated, using a computer simulation, that the laws of probability associated with shooting can cause a randomly chosen shot group fired with less consistent ammunition, to be smaller than another random shot group fired with ammunition, which was fact more consistent. This apparition is much more likely to occur in shot groups containing fewer shots than in shot groups containing a large number of shots. Mr. Craig proved that, as more shots are fired, the laws of probability catch up with the looser grouping ammunition and show it up for what it really is. I have been doing research to try to identify the best statistical measure of shot group dispersion or "tightness" and decided to recreate Mr. Craig's computer simulation and add the statistics which I have been researching. My Simulation My simulation, like Mr. Craig's, examined the success rate of the group size statistic at identifying the shot group fired by the tighter grouping load out of four possible choices. Each ammunition "load" was 20% less consistent than the previous load. This selection process was repeated 65,000 times to get an accurate representation of the statistics' success rate for groups containing each number of shots. In addition to group size, I tested four other statistics: the figure of merit, the diagonal, the mean radius, and the radial standard deviation. A graph showing all of these statistics' success rates for correctly determining the tightest grouping load is included as Figure 1. These percentages are not absolute numbers, as we will see later in this article. Greater or lesser differences between the ammunition loads will change the success rates. Group Size Group size, also known as Extreme Spread, is the most widely used measure of shot group dispersion. It is defined as the maximum distance between any two shots within the group. In my opinion, there are several problems with using group size, most notably, the measure's domination by the group's outliers. Outliers are shots which have a low probability of occurrence (otherwise they would not stand out so). Since group size measures the distance between extreme shots, what it really measures is the spread between the least likely to be repeated shots in the group. Also, by only using data from two shots within the group, it ignores the data represented by the other (more likely to be repeated) shots. While group size outperformed all of the other measures for the three shot groups, it was the worst statistic for all groups of four or more shots. Figure of Merit The figure of merit (FOM) is the average of the maximum horizontal group spread and the maximum vertical group spread. This measure uses data from at least two shots but more likely four shots. Since you are using more data points (shots), the effect of an outlier gets diluted: it now has a 25% influence rather than 50% as with group size. In the simulation, the FOM proved to be superior in groups of four or more shots to group size in correctly choosing the tighter grouping load. I believe this is due to the use of twice as many data points. Diagonal The diagonal statistic uses inputs similar to the FOM. It is calculated by taking the square root of the sum of the maximum horizontal spread squared and the maximum vertical spread squared. The success ratios for the diagonal were almost identical to those of the FOM; in fact, these two measures are shown on the same line on the graph in Figure 2. I believe these results reflect the similarity of their inputs. All of the advantages mentioned above for the FOM also apply to the diagonal. Mean Radius The mean radius, as the name implies, is simply the average distance from the group center of all of the shots of the group. This measure uses data from every shot, not just two or four shots. Here once again, additional information helped improve the usefulness of the statistic: the mean radius was a more reliable predictor of the smallest load than either the group size or FOM/diagonal statistics. Radial Standard Deviation The radial standard deviation (RSD) is similar to the standard deviations we are all familiar with except that it is two dimensional. It is calculated by taking the square root of the sum of the horizontal variance and the vertical variance. Like the mean radius, this statistic uses all of the available data points of the shot group. The RSD was the most accurate measure I examined for determining which group was from the tighter grouping load. Different Sized Loads Having established that the RSD was superior at selecting the best load in the previous simulation, I wanted to determine the effect of varying magnitudes of differences among the loads. My first simulation used four loads which were progressively 20% larger than the previous load. I decided to run the simulation twice more - once using half of that difference between loads (the 10% difference loads) and once using twice the original difference between the loads (the 40% difference loads). Both simulations showed the same relative accuracy rankings between the statistics as my first simulation, but the amount of the improvement of the RSD (and the other statistics) over group size varied greatly. My comparison of the RSD's accuracy relative to the group size's accuracy is shown in Figure 2. This study showed that the statistics' accuracy was quite sensitive to the amount of variation between the loads. While the RSD was always more accurate than group size, its advantage shrunk when faced with identifying the more obviously differing loads (40% differences). However, the RSD was dramatically better than group size at distinguishing among the more difficult to differentiate loads (10% differences). When the going got tough, the RSD clearly demonstrated its superiority. Different Numbers of Loads A final dimension of the RSD versus group size question that I examined was whether the statistics' ranking would be affected by distinguishing between two loads rather than the four loads used in the other simulations. The results of the two-load simulation were identical, in both relative ranking and magnitude, to the results of the four-load simulation. Conclusion This exercise has proven to me that the group size statistic, which we all put so much faith in, is marginally adequate for the task. The radial standard deviation can distinguish between loads with fewer shots fired and a higher degree of confidence. The consequences of this finding are important for all shooters, not just reloaders. Position shooters cannot only match their ammunition to their firearm more reliably using the RSD but they can also use this statistic to judge changes in their position construction. If the RSD of their groups declined after the change, they would know that they should keep the modification. Shooters can also use this measure to evaluate alterations to their equipment: How much of an improvement did I get from fire lapping my barrel? Did my new stock really improve my accuracy? Is it worthwhile for me to separate my rimfire ammunition by rim thickness? Does it matter, from an accuracy point of view, how thoroughly I clean my firearm? The major drawback to using the RSD is the hassle of calculating it. First you must determine the cartesian (x & y axes) coordinates of all the shots in the group. Then you must average all of the x values and all of the y values separately to find the coordinates of the group center. Next you would calculate the variances of the group in the x and y directions. Finally, you would add the two variances and find the square root of the total. After you have done this a few times you realize why group size is still so popular: it is so much easier to calculate! Fortunately, the personal computer revolution comes to the rescue. There are several pc programs, including one which I wrote for IBM compatibles named ScorStat, which can help you do some or all of the necessary calculations. I would expect to see additional programs become available as this type of statistical analysis becomes more popular.