Wilson, D. Randall's Publications (detailed list) - NNML Laboratory

This page contains the titles and abstracts of papers written by author Wilson, D. Randall, a member of the BYU Neural Networks and Machine Learning (NNML) Research Group. Postscript files are available for most papers. A more concise list is available.

To view the entire list in one page, click here.

The General Inefficiency of Batch Training for Gradient Descent Learning

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more “correct” than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training often orders of magnitude slower especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy.
Reference: Neural Networks, volume 16 (10), pages 1429–1451, Elsevier Science Ltd. Oxford, UK, UK, 2003.
BibTeX
Download the file: ps, pdf

The Need for Small Learning Rates on Large Problems

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: In gradient descent learning algorithms such as error backpropagation, the learning rate parameter can have a significant effect on generalization accuracy. In particular, decreasing the learning rate below that which yields the fastest convergence can significantly improve generalization accuracy, especially on large, complex problems. The learning rate also directly affects training speed, but not necessarily in the way that many people expect. Many neural network practitioners currently attempt to use the largest learning rate that still allows for convergence, in order to improve training speed. However, a learning rate that is too large can be as slow as a learning rate that is too small, and a learning rate that is too large or too small can require orders of magnitude more training time than one that is in an appropriate range. This paper illustrates how the learning rate affects training speed and generalization accuracy, and thus gives guidelines on how to efficiently select a learning rate that maximizes generalization accuracy.
Reference: In Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN’01, pages 115–119, 2001.
BibTeX
Download the file: pdf

The Inefficiency of Batch Training for Large Training Sets,

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Multilayer perceptrons are often trained using error backpropagation (BP). BP training can be done in either a batch or continuous manner. Claims have frequently been made that batch training is faster and/or more “correct” than continuous training because it uses a better approximation of the true gradient for its weight updates. These claims are often supported by empirical evidence on very small data sets. These claims are untrue, however, for large training sets. This paper explains why batch training is much slower than continuous training for large training sets. Various levels of semi-batch training used on a 20,000-instance speech recognition task show a roughly linear increase in training time required with an increase in batch size.
Reference: In Proceedings of the International Joint Conference on Neural Networks (IJCNN2000), volume II, pages 113–117, July 2000.
BibTeX
Download the file: pdf, ps

Reduction Techniques for Exemplar-Based Learning Algorithms

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce the number of exemplars retained in exemplar-based learning models. Second, it proposes six new reduction algorithms called DROP1-5 and DEL that can be used to prune instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 datasets. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest generalization accuracy in these experiments, especially in the presence of noise.
Reference: Machine Learning, volume 3, pages 257–286, March 2000.
BibTeX
Download the file: ps, pdf

An Integrated Instance-Based Learning Algorithm

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: The basic nearest-neighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This paper proposes methods for overcoming each of these weaknesses and combines these methods into a comprehensive learning system called the Integrated Decremental Instance-Based Learning Algorithm (IDIBL) that seeks to reduce storage, improve execution speed, and increase generalization accuracy, when compared to the basic nearest neighbor algorithm and other learning models. IDIBL tunes its own parameters using a new measure of fitness that combines confidence and cross-validation (CVC) accuracy in order to avoid discretization problems with more traditional leave-one-out cross-validation (LCV). In our experiments IDIBL achieves higher generalization accuracy than other less comprehensive instance-based learning algorithms, while requiring less than one fourth the storage of the nearest neighbor algorithm and improving execution speed by a corresponding factor. In experiments on 21 datasets, IDIBL also achieves higher generalization accuracy than those reported for 16 major machine learning and neural network models.
Reference: Computational Intelligence, volume 1, pages 1–28, 2000.
BibTeX
Download the file: ps, pdf

Combining Cross-Validation and Confidence to Measure Fitness

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Neural network and machine learning algorithms often have parameters that must be tuned for good performance on a particular task. Leave-one-out cross-validation (LCV) accuracy is often used to measure the fitness of a set of parameter values. However, small changes in parameters often have no effect on LCV accuracy. Many learning algorithms can measure the confidence of a classification decision, but often confidence alone is an inappropriate measure of fitness. This paper proposes a combined measure of Cross-Validation and Confidence (CVC) for obtaining a continuous measure of fitness for sets of parameters in learning algorithms. This paper also proposes the Refined Instance-Based (RIB) learning algorithm which illustrates the use of CVC in automated parameter tuning. Using CVC provides significant improvement in generalization accuracy on a collection of 31 classification tasks when compared to using LCV.
Reference: In Proceedings of the International Joint Conference on Neural Networks (IJCNN’99), paper 163, 1999.
BibTeX
Download the file: ps, pdf

The Robustness of Relaxation Rates in Constraint Satisfaction Networks

Authors: D. Randall Wilson and Dan Ventura and Brian Moncur and Tony R. Martinez
Abstract: Constraint satisfaction networks contain nodes that receive weighted evidence from external sources and/or other nodes. A relaxation process allows the activation of nodes to affect neighboring nodes, which in turn can affect their neighbors, allowing information to travel through a network. When doing discrete updates (as in a software implementation of a relaxation network), a goal net or goal activation can be computed in response to the net input into a node, and a relaxation rate can then be used to determine how fast the node moves from its current value to its goal value. An open question was whether or not the relaxation rate is a sensitive parameter. This paper shows that the relaxation rate has almost no effect on how information flows through the network as long as it is small enough to avoid large discrete steps and/or oscillation.
Reference: In Proceedings of the International Joint Conference on Neural Networks (IJCNN’99), paper 162, 1999.
BibTeX
Download the file: pdf

Advances in Instance-Based Learning Algorithms

Authors: D. Randall Wilson
Abstract: The nearest neighbor algorithm and its derivatives, which are often referred to collectively as instance-based learning algorithms, have been successful on a variety of real-world applications. However, in its basic form, the nearest neighbor algorithm suffers from inadequate distance functions, large storage requirements, slow execution speed, a sensitivity to noise and irrelevant attributes, and an inability to adjust its decision surfaces after storing the data. This dissertation presents a collection of papers that seek to overcome each of these disadvantages. The most successful enhancements are combined into a comprehensive system called the Integrated Decremental Instance-Based Learning algorithm, which in experiments on 44 applications achieves higher generalization accuracy than other instance-based learning algorithms. It also yields higher generalization accuracy than that reported for 16 major machine learning and neural network models.
Reference: PhD thesis, Brigham Young University, Computer Science Department, August 1997.
BibTeX
Download the file: ps, pdf

Improved Center Point Selection for Probabilistic Neural Networks

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Probabilistic Neural Networks (PNN) typically learn more quickly than many neural network models and have had success on a variety of applications. However, in their basic form, they tend to have a large number of hidden nodes. One common solution to this problem is to keep only a randomly-selected subset of the original training data in building the network. This paper presents an algorithm called the Reduced Probabilistic Neural Network (RPNN) that seeks to choose a better-than-random subset of the available instances to use as center points of nodes in the network. The algorithm tends to retain non-noisy border points while removing nodes with instances in regions of the input space that are highly homogeneous. In experiments on 22 datasets, the RPNN had better average generalization accuracy than two other PNN models, while requiring an average of less than one-third the number of nodes.
Reference: In Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA’97), pages 514–517, 1997.
BibTeX
Download the file: ps, pdf

Instance Pruning Techniques

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and slow execution speed, as well as a sensitivity to noise. This paper provides a discussion of issues related to reducing the number of instances retained in memory while maintaining (and sometimes improving) generalization accuracy, and mentions algorithms other researchers have used to address this problem. It presents three intuitive noise-tolerant algorithms that can be used to prune instances from the training set. In experiments on 29 applications, the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.
Reference: In Fisher, D., editor, Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97), pages 403–411, San Francisco, CA, 1997. Morgan Kaufmann Publishers.
BibTeX
Download the file: ps, pdf

Bias and the Probability of Generalization

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: In order to be useful, a learning algorithm must be able to generalize well when faced with inputs not previously presented to the system. A bias is necessary for any generalization, and as shown by several researchers in recent years, no bias can lead to strictly better generalization than any other when summed over all possible functions or applications. This paper provides examples to illustrate this fact, but also explains how a bias or learning algorithm can be better than another in practice when the probability of the occurrence of functions is taken into account. It shows how domain knowledge and an understanding of the conditions under which each learning algorithm performs well can be used to increase the probability of accurate generalization, and identifies several of the conditions that should be considered when attempting to select an appropriate bias for a particular problem.
Reference: In Proceedings of the 1997 International Conference on Intelligent Information Systems (IIS’97), pages 108–114, 1997.
BibTeX
Download the file: ps, pdf

Improved Heterogeneous Distance Functions

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Nearest neighbor and instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), Interpolated Value Difference Metric (IVDM) and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher average classification accuracy than previous distance functions on applications with both nominal and continuous attributes.
Reference: Journal of Artificial Intelligence Research (JAIR), volume 1, pages 1–34, 1997.
BibTeX
Download the file: ps, pdf

Real-Valued Schemata Search Using Statistical Confidence

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Many neural network models must be trained by finding a set of real-valued weights that yield high accuracy on a training set. Other learning models require weights on input attributes that yield high leave-one-out classification accuracy in order to avoid problems associated with irrelevant attributes and high dimensionality. In addition, there are a variety of general problems for which a set of real values must be found which maximize some evaluation function. This paper presents an algorithm for doing a schemata search over a real-valued weight space to find a set of weights (or other real values) for a given evaluation function. The algorithm, called the Real-Valued Schemata Search (RVSS) uses the BRACE statistical technique [Moore & Lee, 1993] to determine when to narrow the search space. This paper details the RVSS approach and gives initial empirical results.
Reference: In Proceedings of the 1997 Sian Ka’an International Workshop on Neural Networks and Neurocontrol, 1997.
BibTeX
Download the file: ps, pdf

Instance-Based Learning with Genetically Derived Attribute Weights

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: This paper presents an inductive learning system called the Genetic Instance-Based Learning (GIBL) system. This system combines instance-based learning approaches with evolutionary computation in order to achieve high accuracy in the presence of irrelevant or redundant attributes. Evolutionary computation is used to find a set of attribute weights that yields a high estimate of classification accuracy. Results of experiments on 16 data sets are shown, and are compared with a non-weighted version of the instance-based learning system. The results indicate that the generalization accuracy of GIBL is somewhat higher than that of the non-weighted system on regular data, and is significantly higher on data with irrelevant or redundant attributes.
Reference: In Proceedings of the International Conference on Artificial Intelligence, Expert Systems, and Neural Networks (AIE’96), pages 11–14, August 1996.
BibTeX
Download the file: ps, pdf

Value Difference Metrics for Continuously Valued Attributes

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Nearest neighbor and instance-based learning techniques typically handle continuous and linear input values well, but often do not handle symbolic input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between symbolic attribute values, but it largely ignores continuous attributes, using discretization to map continuous values into symbolic values. This paper presents two heterogeneous distance metrics, called the Interpolated VDM (IVDM) and Windowed VDM (WVDM), that extend the Value Difference Metric to handle continuous attributes more appropriately. In experiments on 21 data sets the new distance metrics achieves higher classification accuracy in most cases involving continuous attributes.
Reference: In Proceedings of the International Conference on Artificial Intelligence, Expert Systems, and Neural Networks (AIE’96), pages 74–78, August 1996.
BibTeX
Download the file: ps, pdf

Heterogeneous Radial Basis Function Networks

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: Radial Basis Function (RBF) networks typically use a distance function designed for numeric attributes, such as Euclidean or city-block distance. This paper presents a heterogeneous distance function which is appropriate for applications with symbolic attributes, numeric attributes, or both. Empirical results on 30 data sets indicate that the heterogeneous distance metric yields significantly improved generalization accuracy over Euclidean distance in most cases involving symbolic attributes.
Reference: In Proceedings of the International Conference on Neural Networks (ICNN’96), volume 2, pages 1263–1267, June 1996.
BibTeX
Download the file: pdf, ps

Prototype Styles of Generalization

Authors: D. Randall Wilson
Abstract: A learning system can generalize from training set data in many ways. No single style of generalization is likely to solve all problems better than any other style, and different styles work better on some applications than others. Several generalization styles are proposed, including distance metrics, first-order features, voting schemes, and confidence levels. These generalization styles are efficient in terms of time and space and lend themselves to massively parallel architectures. Empirical results of using these generalization styles on several real-world applications are presented. These results indicate that the prototype styles of generalization presented can provide accurate generalization for many applications, and that having several styles of generalization available often permits one to be selected that works well for a particular application.
Reference: Master’s thesis, Brigham Young University, August 1994.
BibTeX
Download the file: pdf, ps

The Potential of Prototype Styles of Generalization

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: There are many ways for a learning system to generalize from training set data. This paper presents several generalization styles using prototypes in an attempt to provide accurate generalization on training set data for a wide variety of applications. These general- ization styles are efficient in terms of time and space, and lend themselves well to massively parallel architectures. Empirical results of generalizing on several real-world applications are given, and these results indicate that the prototype styles of generalization presented have potential to provide accurate generalization for many applications.
Reference: In The Sixth Australian Joint Conference on Artificial Intelligence (AI ’93), pages 356–361, November 1993.
BibTeX
Download the file: ps, pdf

The Importance of Using Multiple Styles of Generalization

Authors: D. Randall Wilson and Tony R. Martinez
Abstract: There are many ways for a learning system to generalize from training set data. There is likely no one style of generalization which will solve all problems better than any other style, for different styles will work better on some applications than others. This paper presents several styles of generalization and uses them to suggest that a collection of such styles can provide more accurate generalization than any one style by itself. Empirical results of generalizing on several real-world applications are given, and comparisons are made on the generalization accuracy of each style of generalization. The empirical results support the hypothesis that using multiple generalization styles can improve generalization accuracy.
Reference: In Proceedings of the First New Zealand International Conference on Artificial Neural Networks and Expert Systems (ANNES), pages 54–57, November 1993.
BibTeX
Download the file: pdf, ps