Publications

(Newer publications are shown first. The numbering may change each time a new publication is added.)

[1] Smith, Michael R. and Martinez, Tony R. and Giraud-Carrier, Christophe, An Instance Level Analysis of Data Complexity, Machine Learning, 95, 2, 225--256, 2014. [Abstract] [Bibtex]
Most data complexity studies have focused on characterizing the complexity of the entire data set and do not provide information about individual instances. Knowing which instances are misclassified and understanding why they are misclassified and how they contribute to data set complexity can improve the learning process and could guide the future development of learning algorithms and data analysis methods.
The goal of this paper is to better understand the data used in machine learning problems by identifying and analyzing the instances that are frequently misclassified by learning algorithms that have shown utility to date and are commonly used in practice. We identify instances that are hard to classify correctly (instance hardness) by classifying over 190,000 instances from 64 data sets with 9 learning algorithms. We then use a set of hardness measures to understand why some instances are harder to classify correctly than others. We find that class overlap is a principal contributor to instance hardness. We seek to integrate this information into the training process to alleviate the effects of class overlap and present ways that instance hardness can be used to improve learning.
@article{smith.ml2013,
title = {An Instance Level Analysis of Data Complexity},
author = {Smith, Michael R. and Martinez, Tony R. and Giraud-Carrier, Christophe},
journal = {Machine Learning},
pages = {225--256},
year = {2014},
number = {2},
volume = {95},
}
[2] Gashler, Michael S. and Smith, Michael R. and Morris, Richard and Martinez, Tony, Missing Value Imputation With Unsupervised Backpropagation, Computational Intelligence, Accepted, 2014. [Abstract] [Bibtex]
Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing values still exhibit deteriorated accuracy. One approach to handling missing values is to fill in (impute) the missing values. In this paper, we present a technique for unsupervised learning called Unsupervised Backpropagation (UBP), which trains a multi-layer perceptron to fit to the manifold sampled by a set of observed point-vectors. We evaluate UBP with the task of imputing missing values in datasets, and show that UBP is able to predict missing values with significantly lower sum-squared error than other collaborative filtering and imputation techniques. We also demonstrate with 24 datasets and 9 supervised learning algorithms that classification accuracy is usually higher when randomly-withheld values are imputed using UBP, rather than with other methods.

@article{Gashler2014,
    author = {Gashler, Michael S. and Smith, Michael R. and Morris, Richard and Martinez, Tony},
    title = {Missing Value Imputation With Unsupervised Backpropagation},
    journal = {Computational Intelligence},
    pages = {Accepted},
    year = {2014},
}
[3] Derrall Heath and Dan Ventura, Improving Multi-label Classification by Avoiding Implicit Negativity with Incomplete Data, Computational Intelligence, 30, 3, 535--561, 2014. [Abstract] [Bibtex]
Many real world problems require multi-label classification, in which each training instance is associated with a set of labels. There are many existing learning algorithms for multi-label classification; however, these algorithms assume implicit negativity, where missing labels in the training data are automatically assumed to be negative. Additionally, many of the existing algorithms do not handle incremental learning in which new labels could be encountered later in the learning process. A novel multi-label adaptation of the backpropagation algorithm is proposed that does not assume implicit negativity. In addition, this algorithm can, using a naive Bayesian approach, infer missing labels in the training data. This algorithm can also be trained incrementally as it dynamically considers new labels. This solution is compared with existing multi-label algorithms using data sets from multiple domains and the performance is measured with standard multi-label evaluation metrics. It is shown that our algorithm improves classification performance for all metrics by an overall average of 7.4% when at least 40% of the labels are missing from the training data, and improves by 18.4% when at least 90% of the labels are missing.
@article{heath2014ci,
    author = {Derrall Heath and Dan Ventura},
    title = {Improving Multi-label Classification by Avoiding Implicit Negativity with Incomplete Data},
    journal = {Computational Intelligence},
volume = {30},
number = {3},
    pages = {535--561},
    year = {2014},
}
[4] Smith, Michael R. and Martinez, Tony, A Comparative Evaluation of Curriculum Learning with Filtering and Boosting in Supervised Classification Problems, Computational Intelligence, accepted, 2014, This is the pre-peer reviewed version of the article.. [Abstract] [Bibtex]
Not all instances in a data set are equally beneficial for inferring a model of the data. Some instances (such as outliers) are detrimental to inferring a model of the data. Several machine learning techniques treat instances in a data set differently during training such as curriculum learning, filtering, and boosting. However, an automated method for determining how beneficial an instance is for inferring a model of the data does not exist. In this paper, we present an automated method that orders the instances in a data set by complexity based on the their likelihood of being misclassified (instance hardness). The underlying assumption of this method is that instances with a high likelihood of being misclassified represent more complex concepts in a data set. Ordering the instances in a data set allows a learning algorithm to focus on the most beneficial instances and ignore the detrimental ones. We compare ordering the instances in a data set in curriculum learning, filtering and boosting. We find that ordering the instances significantly increases classification accuracy and that filtering has the largest impact on classification accuracy. On a set of 52 data sets, ordering the instances increases the average accuracy from 81% to 84%.
@article{Smith2014_CL,
author = {Smith, Michael R. and Martinez, Tony},
title = {A Comparative Evaluation of Curriculum Learning with Filtering and Boosting in Supervised Classification Problems},
journal = {Computational Intelligence},
year = {2014},
pages = {accepted},
note = {This is the pre-peer reviewed version of the article.}
}
[5] Rob Smith and Andrew D. Mathis and Dan Ventura and John T. Prince, Proteomics, lipidomics, metabalomics: a mass spectrometry tutorial from a computer scientist's point of view, BMC Bioinformatics, 15, Suppl 7, S9, 2014. [Abstract] [Bibtex]
Background: For decades, mass spectrometry data has been analyzed to investigate a wide array of research interests, including disease diagnostics, biological and chemical theory, genomics, and drug development. Progress towards solving any of these disparate problems depends upon overcoming the common challenge of interpreting the large data sets generated. Despite interim successes, many data interpretation problems in mass spectrometry are still challenging. Further, though these challenges are inherently interdisciplinary in nature, the significant domain-specific knowledge gap between disciplines makes interdisciplinary contributions difficult.

Results: This paper provides an introduction to the burgeoning field of computational mass spectrometry. We illustrate key concepts, vocabulary, and open problems in MS-omics, as well as provide invaluable resources such as open data sets and key search terms and references.

Conclusions: This paper will facilitate contributions from mathematicians, computer scientists, and statisticians to MS-omics that will fundamentally improve results over existing approaches and inform novel algorithmic solutions to open problems.
@article{smith2014biot,
author = {Rob Smith and Andrew D. Mathis and Dan Ventura and John T. Prince},
title = {Proteomics, lipidomics, metabalomics: a mass spectrometry tutorial from a computer scientist's point of view},
journal = {BMC Bioinformatics},
volume = {15},
number = {Suppl 7},
pages = {S9},
year = {2014},
}
[6] Derrall Heath and David Norton and Eric Ringger and Dan Ventura, Semantic Models as a Combination of Free Association Norms and Corpus-based Correlations, Proceedings of the IEEE International Conference on Semantic Computing, 48--55, 2013. [Abstract] [Bibtex]
We present computational models capable of under- standing and conveying concepts based on word asso- ciations. We discover word associations automatically using corpus-based semantic models with Wikipedia as the corpus. The best model effectively combines corpus-based models with preexisting databases of free association norms gathered from human volun- teers. We use this model to play human-directed and computer-directed word guessing games (games with a purpose similar to Catch Phrase or Taboo) and show that this model can measurably convey and understand some aspect of word meaning. The results highlight the fact that human-derived word associations and corpus- derived word associations can play complementary roles in semantic models.
@inproceedings{heath2013icsc,
    author = {Derrall Heath and David Norton and Eric Ringger and Dan Ventura},
    title = {Semantic Models as a Combination of Free Association Norms and Corpus-based Correlations},
    booktitle = {Proceedings of the IEEE International Conference on Semantic Computing},
    pages = {48--55},
    year = {2013},
}
[7] Adam Drake and Dan Ventura, An Empirical Comparison of Spectral Learning Methods for Classification, Proceedings of the International Conference on Machine Learning and Applications, 9--14, 2013. [Abstract] [Bibtex]
In this paper, we explore the problem of how to learn spectral (e.g., Fourier) models for classification problems. Specifically, we consider two sub-problems of spectral learning: (1) how to select the basis functions that will be included in the model and (2) how to assign coefficients to the selected basis functions. Interestingly, empirical results suggest that the most commonly used approach does not perform as well in practice as other approaches, while a method for assigning coefficients based on finding an optimal linear combination of low-order basis functions usually outperforms other approaches.
@inproceedings{drake2013icmla,
    author = {Adam Drake and Dan Ventura},
    title = {An Empirical Comparison of Spectral Learning Methods for Classification},
    booktitle = {Proceedings of the International Conference on Machine Learning and Applications},
    pages = {9--14},
    year = {2013},
}
[8] Rob Smith and Tamil S. Anthonymuthu and Dan Ventura and John T. Prince, Statistical Agglomeration: Peak Summarization for Direct Infusion Lipidomics, Bioinformatics, 29, 19, 2445--2451, 2013. [Abstract] [Bibtex]
Motivation:
Quantification of lipids is a primary goal in lipidomics. In direct infusion/injection (or shotgun) lipidomics, accurate downstream identification and quantitation requires accurate summarization of repetitive peak measurements. Imprecise peak summarization multiplies downstream error by propagating into species identification and intensity estimation. To our knowledge, this is the first analysis of direct infusion peak summarization in the literature.

Results:
We present two novel peak summarization algorithms for direct infusion samples and compare them with an off-machine ad-hoc summarization algorithm as well as with the propriety Xcalibur algorithm. Our statistical agglomeration algorithm reduces peakwise error by 38% (m/z) and 44% (intensity) compared to the ad-hoc method over 3 data sets. Pointwise error is reduced by 23% (m/z). Compared to Xcalibur, our statistical agglomeration algorithm produces 68% less m/z error and 51% less intensity error on average on two comparable data sets.

Availability:
The source code for Statistical Agglomeration and the data sets used are freely available for non-commercial purposes at
https://github.com/optimusmoose/statistical_agglomeration. Modified Bin Aggolmeration is freely available in MSpire, an open source mass spectrometry package at https://github.com/princelab/mspire/.
@article{smith2013bio,
    author = {Rob Smith and Tamil S. Anthonymuthu and Dan Ventura and John T. Prince},
    title = {Statistical Agglomeration: Peak Summarization for Direct Infusion Lipidomics},
    booktitle = {Bioinformatics},
volume = {29},
number = {19},
    pages = {2445--2451},
    year = {2013},
}
[9] Rob Smith and Ryan Williamson and Dan Ventura and John T. Prince, Rubabel: Wrapping OpenBabel with Ruby, Journal of Chemoinformatics, 5, 1, 35--44, 2013. [Abstract] [Bibtex]
Background: The number and diversity of wrappers for chemoinformatic toolkits suggests the diverse needs of the chemoinformatic community. While existing chemoinformatics libraries provide a broad range of utilities, many chemoinformaticians find compiled language libraries intimidating, time-consuming, arcane, and verbose. Although high-level language wrappers have been implemented, more can be done to leverage the intuitiveness of object-orientation, the paradigms of high-level languages, and the extensibility of languages such as Ruby. We introduce Rubabel, an intuitive, object-oriented suite of functionality that substantially increases the accessibily of the tools in the Open Babel chemoinformatics library.

Results: Rubabel requires fewer lines of code than any other actively developed wrapper, providing better object organization and navigation, and more intuitive object behavior than extant solutions. Moreover, Rubabel provides a convenient interface to the many extensions currently available in Ruby, greatly streamlining otherwise onerous tasks such as creating web applications that serve up Rubabel functionality.

Conclusions: Rubabel is powerful, intuitive, concise, freely available, cross-platform, and easy to install. We expect it to be a platform of choice for new users, Ruby users, and some users of current solutions.
@article{smith2013jc,
    author = {Rob Smith and Ryan Williamson and Dan Ventura and John T. Prince},
    title = {Rubabel: Wrapping OpenBabel with Ruby},
    journal = {Journal of Chemoinformatics},
    pages = {35--44},
volume = {5},
number = {1},
    year = {2013},
}
[10] Derrall Heath and David Norton and Dan Ventura, Autonomously Communicating Conceptual Knowledge Through Visual Art, Proceedings of 4th the International Conference on Computational Creativity, 97--104, 2013. [Abstract] [Bibtex]
In visual art, the communication of meaning or intent is an important part of eliciting an aesthetic experience in the viewer. We present a computer system, called DARCI, that is designed to automatically create original images that con- vey meaning. Building on previous work, we present three new components of DARCI that enhances its ability to com- municate concepts through the images it creates. The first component is a model of semantic memory based on word associations that helps to provide meaning to concepts. The second component composes universal icons into a single im- age and then renders the image to match an associated adjec- tive. The third component is a similarity metric that keeps the icons recognizable, but still allows for artistic elements to be discovered during the adjective rendering phase. We use an online survey to show that the system is successful at creating images that communicate concepts to human viewers.
@inproceedings{heath2013iccc,
    author = {Derrall Heath and David Norton and Dan Ventura},
    title = {Autonomously Communicating Conceptual Knowledge Through Visual Art},
    booktitle = {Proceedings of 4th the International Conference on Computational Creativity},
    pages = {97--104},
    year = {2013},
}
[11] Rob Smith and Dan Ventura and John T. Prince, Novel Algorithms and the Benefits of Comparative Validation, Bioinformatics, 29, 12, 1583--1585, 2013. [Abstract] [Bibtex]
n/a
@article{smith2013bio2,
    author = {Rob Smith and Dan Ventura and John T. Prince},
    title = {Novel Algorithms and the Benefits of Comparative Validation},
    journal = {Bioinformatics},
volume = {29},
number = {12},
    pages = {1583--1585},
    year = {2013},
}
[12] Monteith, Kristine and Brown, Bruce, and Ventura, Dan and Martinez, Tony, Automatic Generation of Music for Inducing Physiological Response, Proceedings of the 35th Annual Meeting of the Cognitive Science Society, 3098-3103, 2013. [Abstract] [Bibtex]
Music is known to have a profound impact on human cogni- tive and emotional response, which in turn are strongly cor- related with physiological mechanisms. This paper presents a system that is designed to create original musical compositions that elicit particular physiological responses. The experiments described below demonstrate that the music generated by this system is as effective as human-composed music in effecting changes in skin resistance, skin temperature, breathing rate, and heart rate. The system is particularly adept at composing pieces that elicit target responses in individuals who demon- strated predictable responses to training selections.
@inproceedings{monteith2013cogsci,
    author = {Monteith, Kristine and Brown, Bruce, and Ventura, Dan and Martinez, Tony},
    title = {Automatic Generation of Music for Inducing Physiological Response},
    booktitle = {Proceedings of the 35th Annual Meeting of the Cognitive Science Society},
    pages = {3098-3103},
    year = {2013},
}
[13] Fowers, Spencer G. and Lee, D.J. and Ventura, Dan and Archibald, James K., Nature-Inspired BASIS Feature Descriptor and Its Hardware Implementation, IEEE Transactions on Circuits and Systems for Video Technology, 23, 5, 756--768, 2013. [Abstract] [Bibtex]
This paper presents the development of a color feature descriptor well-suited
for low resource applications such as UAV embedded systems, small
microprocessors, and field programmable gate array (FPGA) fabric, and its
hardware implementation. The BAsis Sparse-coding Inspired Similarity (BASIS)
descriptor utilizes sparse coding to create dictionary images that model the
regions in the human visual cortex. Due to the reduced amount of computation
required for computing BASIS descriptors, reduced descriptor size, and the
ability to create the descriptors without the use of floating point, this approach is an
excellent candidate for FPGA hardware implementation. The BASIS descriptor was
tested on a dataset of real aerial images with the task of calculating a
frame-to-frame homography. Experimental results show that the software and
bit-level accurate BASIS descriptor outperforms SIFT and SURF. The bit-level
accurate hardware version of the BASIS descriptor also results in no loss of
accuracy when compared with the software version. BASIS descriptors are more space efficient
than other descriptors, and can be computed entirely in FPGA fabric, allowing
the descriptor to operate at real-time frame rates on a low power, embedded
platform such as an FPGA.
@article{fowers2012ieeetcsvt,
    author = {Fowers, Spencer G. and Lee, D.J. and Ventura, Dan and Archibald, James K.},
    title = {Nature-Inspired BASIS Feature Descriptor and Its Hardware Implementation},
    booktitle = {IEEE Transactions on Circuits and Systems for Video Technology},
volume = {23},
number = {5},
    pages = {756--768},
    year = {2013},
}
[14] Smith, Robert and Dennis, Aaron and Ventura, Dan, Automatic Composition from Non-musical Inspiration Sources, Proceedings of the 3rd International Conference on Computational Creativity, 160--164, 2012. [Abstract] [Bibtex]
In this paper, we describe a system which creates novel musical compositions inspired by non-musical audio signals. The system processes input audio signals using onset detection and pitch estimation algorithms. Additional musical voices are added to the resulting melody by models of note relationships that are built using machine learning trained with different pieces of music. The system creates interesting compositions, suggesting merit for the idea of computational "inspiration".
@inproceedings{smith2012iccc,
    author = {Smith, Robert and Dennis, Aaron and Ventura, Dan},
    title = {Automatic Composition from Non-musical Inspiration Sources},
    booktitle = {Proceedings of the 3rd International Conference on Computational Creativity},
    pages = {160--164},
    year = {2012},
}
[15] Van Dam, Robert and Langkilde-Geary, Irene and Ventura, Dan, Adapting ADtrees for Improved Performance on Large Datasets with High Arity Features, Knowledge and Information Systems, , , to appear, 2012. [Abstract] [Bibtex]
The ADtree, a data structure useful for caching sufficient statistics, has been successfully adapted to grow lazily when memory is limited and to update sequentially with an incrementally updated dataset. However, even these modified forms of the ADtree still exhibit inefficiencies in terms of both space usage and query time, particularly on datasets with very high dimensionality and with high arity features. We propose four modifications to the ADtree, each of which can be used to improve size and query time under specific types of datasets and features. These modifications also provide an increased ability to precisely control how an ADtree is built and to tune its size given external memory or speed requirements.
@article{vandam2012kais,
    author = {Van Dam, Robert and Langkilde-Geary, Irene and Ventura, Dan},
    title = {Adapting ADtrees for Improved Performance on Large Datasets with High Arity Features},
    booktitle = {Knowledge and Information Systems},
volume = {},
number = {},
    pages = {to appear},
    year = {2012},
}
[16] Monteith, Kristine and Martinez, Tony, Aggregate Certainty Estimators, Computational Intelligence, 28, 2012. [Bibtex]
@article{ACE_CI,
author = {Monteith, Kristine and Martinez, Tony},
title = {Aggregate Certainty Estimators},
journal = {Computational Intelligence},
volume = {28},
year = {2012},
}
[17] Aaron Dennis and Dan Ventura, Learning the Architecture of Sum-Product Networks Using Clustering on Variables, Neural Information Processing Systems, 2042--2050, 2012. [Abstract] [Bibtex]
The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difficult tasks such as image completion. Designing an SPN network architecture that is suitable for the task at hand is an open question. We propose an algorithm for learning the SPN architecture from data. The idea is to cluster variables (as opposed to data instances) in order to identify variable subsets that strongly interact with one another. Nodes in the SPN network are then allocated towards explaining these interactions. Experimental evidence shows that learning the SPN architecture significantly improves its per- formance compared to using a previously-proposed static architecture.
@inproceedings{dennis2012nips,
    author = {Aaron Dennis and Dan Ventura},
    title = {Learning the Architecture of Sum-Product Networks Using Clustering on Variables},
    booktitle = {Neural Information Processing Systems},
    pages = {2042--2050},
    year = {2012},
}
[18] Gashler, Mike and Martinez, Tony, Robust manifold learning with CycleCut, Connection Science, 24, 1, 57-69, 2012, 10.1080/09540091.2012.664122, http://www.tandfonline.com/doi/abs/10.1080/09540091.2012.664122. [Abstract] [Bibtex]
Many manifold learning algorithms utilize graphs of local neighborhoods to estimate manifold topology. When neighborhood connections short-circuit between geodesically distant regions of the manifold, poor results are obtained due to the compromises that the manifold learner must make to satisfy the erroneous criteria. Also, existing manifold learning algorithms have difficulty unfolding manifolds with toroidal intrinsic variables without introducing significant distortions to local neighborhoods. An algorithm called CycleCut is presented which prepares data for manifold learning by removing short-circuit connections, and by severing toroidal connections in a manifold.

@article{gashler_cyclecut,
author = {Gashler, Mike and Martinez, Tony},
title = {Robust manifold learning with CycleCut},
journal = {Connection Science},
volume = {24},
number = {1},
pages = {57-69},
year = {2012},
doi = {10.1080/09540091.2012.664122},
URL = {http://www.tandfonline.com/doi/abs/10.1080/09540091.2012.664122},
}
[19] Morris, Richard G and Burton, Scott H. and Bodily, Paul M. and Ventura, Dan, Soup Over Bean of Pure Joy: Culinary Ruminations of an Artificial Chef, Proceedings of the 3rd International Conference on Computational Creativity, 119--125, 2012. [Abstract] [Bibtex]
We introduce a system for generating novel recipes and use that context to examine some current theoretical ideas for computational creativity. Specifically, we have found that the notion of a single inspiring set can be generalized into two separate sets used for generation and evaluation, respectively, with the result being greater novelty as well as system flexibility (and the potential for natural meta-level creativity), that explicitly measuring artefact typicality is not always necessary, and that explicitly defining an artefact hierarchically results in greater novelty.
@inproceedings{morris2012iccc,
    author = {Morris, Richard G and Burton, Scott H. and Bodily, Paul M. and Ventura, Dan},
    title = {Soup Over Bean of Pure Joy: Culinary Ruminations of an Artificial Chef},
    booktitle = {Proceedings of the 3rd International Conference on Computational Creativity},
    pages = {119--125},
    year = {2012},
}
[20] Neo, Toh Koon Charlie and Ventura, Dan, A Direct Boosting Algorithm for the k-nearest Neighbor Classifier via Local Warping of the Distance Metric, Pattern Recognition Letters, 33, 1, 92--102, 2012. [Abstract] [Bibtex]
Though the k-nearest neighbor (k-NN) pattern classifier is an effective learning algorithm, it can result in large model sizes. To compensate, a number of variant algorithms have been developed that condense the model size of the k-NN classifier at the expense of accuracy. To increase the accuracy of these condensed models, we present a direct boosting algorithm for the k-NN classifier that creates an ensemble of models with locally modified distance weighting. An empirical study conducted on 10 standard databases from the UCI repository shows that this new Boosted k-NN algorithm has increased generalization accuracy in the majority of the datasets and never performs worse than standard k-NN.
@article{neo2012patrec,
    author = {Neo, Toh Koon Charlie and Ventura, Dan},
    title = {A Direct Boosting Algorithm for the k-nearest Neighbor Classifier via Local Warping of the Distance Metric},
    journal = {Pattern Recognition Letters},
volume = {33},
number = {1},
    pages = {92--102},
    year = {2012},
}
[21] Ventura, Dan, Rational Irrationality, Game Theory for Security, Sustainability and Health, AAAI 2012 Spring Symposium Series, 2012. [Abstract] [Bibtex]
We present a game-theoretic account of irrational agent behavior and define conditions under which irrational behavior may be considered \emph{quasi-rational}. To do so, we use a 2-player, zero-sum strategic game, parameterize the reward structure and study how the value of the game changes with this parameter. We argue that for any "underdog" agent, there is a point at which the asymmetry of the game will provoke the agent to act irrationally. This implies that the non-"underdog" player must therefore also act irrationally even though he has no incentive (in the reward structure) for doing so, which implies, in turn, a meta-level game.
@inproceedings{ventura2012aaaiss,
    author = {Ventura, Dan},
    title = {Rational Irrationality},
    booktitle = {Game Theory for Security, Sustainability and Health, AAAI 2012 Spring Symposium Series},
    year = {2012},
}
[22] Norton, David and Heath, Derrall and Ventura, Dan, Finding Creativity in an Artificial Artist, Journal of Creative Behavior, , , to appear, 2012. [Abstract] [Bibtex]
Creativity is an important component of human intelligence, and imbuing artificially intelligent systems with creativity is an interesting challenge. In particular, it is difficult to quantify (or even qualify) creativity. Recently, it has been suggested that conditions for attributing creativity to a system include: appreciation, imagination, and skill. We demonstrate and describe an original computer system (called DARCI) that is designed to produce images through creative means. We present methods for evaluating DARCI and other artificially creative systems with respect to appreciation, imagination, and skill, and use these methods to show that DARCI is arguably a creative system.
@article{norton2012jcb,
    author = {Norton, David and Heath, Derrall and Ventura, Dan},
    title = {Finding Creativity in an Artificial Artist},
    booktitle = {Journal of Creative Behavior},
volume = {},
number = {},
    pages = {to appear},
    year = {2012},
}
[23] Heath, Derrall and Norton, David and Ventura, Dan, Conveying Semantics through Visual Metaphor, ACM Transactions on Intelligent Systems and Technology, , , to appear, 2012. [Abstract] [Bibtex]
In the field of visual art, metaphor is a way to communicate meaning to the viewer. We present a computational system for communicating visual metaphor that can identify adjectives for describing an image based on a low-level visual feature representation of the image. We show that the system can use this visual-linguistic association to render source images that convey the meaning of adjectives in a way consistent with human understanding. Our conclusions are based on a detailed analysis of how the system's artifacts cluster, how these clusters correspond to the semantic relationships of adjectives as documented in WordNet, and how these clusters correspond to human opinion.
@article{heath2012acmtist,
    author = {Heath, Derrall and Norton, David and Ventura, Dan},
    title = {Conveying Semantics through Visual Metaphor},
    booktitle = {ACM Transactions on Intelligent Systems and Technology},
    volume ={},
number = {},
pages = {to appear},
    year = {2012},
}
[24] Murray, Skyler and Ventura, Dan, Algorithmically Flexible Style Composition Through Multi-Objective Fitness Functions, Proceedings of the 1st International Workshop on Musical Metacreation, 55--62, 2012. [Abstract] [Bibtex]
Creating a musical fitness function is largely subjective and can be critically affected by the designer's biases. Previous attempts to create such functions for use in genetic algorithms lack scope or are prejudiced to a certain genre of music. They also are limited to producing music strictly in the style determined by the programmer. We show in this paper that \emph{musical feature extractors}, which avoid the challenges of qualitative judgment, enable creation of a multi-objective function for direct music production. The main result is that the multi-objective fitness function enables creation of music with varying identifiable styles. To demonstrate this, we use three different multi-objective fitness functions to create three distinct sets of musical melodies. We then evaluate the distinctness of these sets using three different approaches: a set of traditional computational clustering metrics; a survey of non-musicians; and analysis by three trained musicians.
@inproceedings{murray2012mume,
    author = {Murray, Skyler and Ventura, Dan},
    title = {Algorithmically Flexible Style Composition Through Multi-Objective Fitness Functions},
    booktitle = {Proceedings of the 1st International Workshop on Musical Metacreation},
    pages = {55--62},
    year = {2012},
}
[25] White, Spencer and Martinez, Tony and Rudolph, George, Automatic Algorithm Development Using New Reinforcement Programming Techniques, Computational Intelligence, 28, 2, 176-208, 2012. [Abstract] [Bibtex]
Reinforcement Programming (RP) is a new approach to automatically generating algorithms that uses re- inforcement learning techniques. This paper introduces the RP approach and demonstrates its use to generate a generalized, in-place, iterative sort algorithm. The RP approach improves on earlier results that use genetic pro- gramming (GP). The resulting algorithm is a novel algorithm that is more efficient than comparable sorting routines. RP learns the sort in fewer iterations than GP and with fewer resources. Experiments establish interesting empirical bounds on learning the sort algorithm: A list of size 4 is sufficient to learn the generalized sort algorithm. The training set only requires one element and learning took less than 200,000 iterations. Additionally RP was used to generate three binary addition algorithms: a full adder, a binary incrementer, and a binary adder.
@article{RP-CI,
author = {White, Spencer and Martinez, Tony and Rudolph, George},
title = {Automatic Algorithm Development Using New Reinforcement Programming Techniques},
journal = {Computational Intelligence},
volume = {28},
number = {2},
pages = {176-208},
year = {2012},
}
[26] Monteith, Kristine and Martinez, Tony and Ventura, Dan, Automatic Generation of Melodic Accompaniments for Lyrics, Proceedings of the International Conference on Computational Creativity, 87--94, 2012. [Abstract] [Bibtex]
Music and speech are two realms predominately species-specific to humans, and many human creative endeavors involve these two modalities. The pairing of music and spoken text can heighten the emotional and cognitive impact of both - the complete song be- ing much more compelling than either the lyrics or the accompaniment alone. This work describes a system that is able to automatically generate and evaluate mu- sical accompaniments for a given set of lyrics. It de- rives the rhythm for the melodic accompaniment from the cadence of the text. Pitches are generated through the use of n-gram models constructed from melodies of songs with a similar style. This system is able to gen- erate pleasing melodies that fit well with the text of the lyrics, often doing so at a level similar to that of human ability.
@inproceedings{monteith2012iccc,
    author = {Monteith, Kristine and Martinez, Tony and Ventura, Dan},
    title = {Automatic Generation of Melodic Accompaniments for Lyrics},
    booktitle = {Proceedings of the International Conference on Computational Creativity},
    pages = {87--94},
    year = {2012},
}
[27] Fowers, Spencer G and Lee, D.J. and Ventura, Dan and Wilde, Doran K., A Novel, Efficient, Tree-Based Descriptor and Matching Algorithm, Proceedings of the 21st International Conference on Pattern Recognition, to appear, 2012. [Abstract] [Bibtex]
This paper presents the development of a new feature descriptor derived from the BASIS descriptor that provides improvements in descriptor size, computation speed, matching speed, and accuracy. The TreeBASIS descriptor algorithm utilizes a binary vocabulary tree that is computed off-line using basis dictionary images (BDIs) derived from sparse coding and a test set of feature region images (FRIs), and the resulting tree is stored in memory for on-line searching. During the on-line algorithm, a feature region image (FRI) is binary quantized and the resulting quantized vector is passed into the BASIS tree, where a Hamming distance is computed between the FRI and the effectively descriptive BDI (EDBDI) at the current node to determine the branch taken. The path the FRI takes is saved as the descriptor, and matching is performed by following the paths of two features and iteratively reducing the distance as the path is traversed. Experimental results show that the TreeBASIS descriptor outperforms BASIS, SIFT, and SURF on frame-to-frame aerial feature point matching.
@inproceedings{fowers2012icpr,
    author = {Fowers, Spencer G and Lee, D.J. and Ventura, Dan and Wilde, Doran K.},
    title = {A Novel, Efficient, Tree-Based Descriptor and Matching Algorithm},
    booktitle = {Proceedings of the 21st International Conference on Pattern Recognition},
    pages = {to appear},
    year = {2012},
}
[28] Gashler, Michael S. and Ventura, Dan and Martinez, Tony, Manifold Learning by Graduated Optimization, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, 41, 6, 1458--1470, 2011. [Abstract] [Bibtex]
We present an algorithm for manifold learning called Manifold Sculpting, which utilizes graduated optimization to seek an accurate manifold embedding. Empirical analysis across a wide range of manifold problems indicates that Manifold Sculpting yields more accurate results than a number of existing algorithms, including Isomap, LLE, HLLE, and L-MVU, and is significantly more efficient than HLLE and L-MVU. Manifold Sculpting also has the ability to benefit from prior knowledge about expected results.
@article{gashler2011smc,
    title        ={Manifold Learning by Graduated Optimization},
    author        ={Gashler, Michael S. and Ventura, Dan and Martinez, Tony},
    journal        ={{IEEE} Transactions on Systems, Man, and Cybernetics Part B: Cybernetics},
    year        ={2011},
    volume        ={41},
    number        ={6},
    pages        ={1458--1470},
    }
[29] Gashler, Michael S. and Martinez, Tony, Tangent Space Guided Intelligent Neighbor Finding, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'11, 2617--2624, 2011, IEEE Press, San Jose, California, U.S.A.. [Abstract] [Bibtex]
We present an intelligent neighbor-finding algorithm called SAFFRON that chooses neighboring points while avoiding making connections between points on geodesically distant regions of a manifold. SAFFRON identifies the suitability of points to be neighbors by using a relaxation technique that alternately estimates the tangent space at each point, and measures how well the estimated tangent spaces align with each other. This technique enables SAFFRON to form high-quality local neighborhoods, even on manifolds that pass very close to themselves. SAFFRON is even able to find neighborhoods that correctly follow the manifold topology of certain self-intersecting manifolds.
@incollection{gashler2011ijcnn1,
title ={Tangent Space Guided Intelligent Neighbor Finding},
author ={Gashler, Michael S. and Martinez, Tony},
booktitle ={Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'11},
year ={2011},
pages ={2617--2624},
publisher ={IEEE Press},
location ={San Jose, California, U.S.A.},

}
[30] Gashler, Michael S., Waffles: A Machine Learning Toolkit, Journal of Machine Learning Research, MLOSS 12, 2383--2387, 2011, July, 1532-4435, JMLR.org and Microtome Publishing, http://www.jmlr.org/papers/volume12/gashler11a/gashler11a.pdf. [Abstract] [Bibtex]
We present a breadth-oriented collection of cross-platform command-line tools for researchers in machine learning called Waffles. The Waffles tools are designed to offer a broad spectrum of functionality in a manner that is friendly for scripted automation. All functionality is also available in a C++ class library. Waffles is available under the GNU Lesser General Public License.
@article{gashler2011jmlr,
    title        ={Waffles: A Machine Learning Toolkit},
    author        ={Gashler, Michael S.},
    journal        ={Journal of Machine Learning Research},
    year        ={2011},
    volume        ={MLOSS 12},
    pages        ={2383--2387},
    month        ={July},
    issn        ={1532-4435},
    publisher    ={JMLR.org and Microtome Publishing},
    url        ={http://www.jmlr.org/papers/volume12/gashler11a/gashler11a.pdf}

    }
[31] Smith, Michael R. and Martinez, Tony, Improving Classification Accuracy by Identifying and Removing Instances that Should Be Misclassified, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2011), 2690--2697, 2011. [Abstract] [Bibtex]
Appropriately handling noise and outliers is an important issue in data mining. In this paper we examine how noise and outliers are handled by learning algorithms. We introduce a filtering method called PRISM that identifies and removes instances that should be misclassified. We refer to the set of removed instances as ISMs (instances that should be misclassified). We examine PRISM and compare it against 3 existing outlier detection methods and 1 noise reduction technique on 48 data sets using 9 learning algorithms. Using PRISM, the classification accuracy increases from 78.5% to 79.8% on a set of 53 data sets and is statistically significant. In addition, the accuracy on the non-outlier instances increases from 82.8% to 84.7%. PRISM achieves a higher classification accuracy than the outlier detection methods and compares favorably with the noise reduction method.
@inproceedings{ smith.ijcnn2011,
    author    = {Smith, Michael R. and Martinez, Tony},
    title        = {Improving Classification Accuracy by Identifying and Removing Instances that Should Be Misclassified},
    booktitle    = {Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2011)},
    pages    = {2690--2697},
    year        = {2011},


    }
[32] Peterson, Adam and Martinez, Tony and Rudolph, George, On the Structure of Algorithm Spaces, Proceedings of the International Joint Conference on Neural Networks IJCNN'2011 , 658--665, 2011. [Abstract] [Bibtex]
Many learning algorithms have been developed to solve various problems. Machine learning practitioners must use their knowledge of the merits of the algorithms they know to decide which to use for each task. This process often raises questions such as: (1) If performance is poor after trying certain algorithms, which should be tried next? (2) Are some learning algorithms the same in terms of actual task classification? (3) Which algorithms are most different from each other? (4) How different? (5) Which algorithms should be tried for a particular problem? This research uses the COD (Classifier Output Difference) distance metric for measuring how similar or different learning algorithms are. The COD quantifies the difference in output behavior between pairs of learning algorithms. We construct a distance matrix from the individual COD values, and use the matrix to show the spectrum of differences among families of learning algorithms. Results show that individual algorithms tend to cluster along family and functional lines. Our focus, however, is on the structure of relationships among algorithm families in the space of algorithms, rather than on individual algorithms. A number of visualizations illustrate these results. The uniform numerical representation of COD data lends itself to human visualization techniques.
@inproceedings{AlgSpacesIJCNN,
    author = {Peterson, Adam and Martinez, Tony and Rudolph, George},
    title = {On the Structure of Algorithm Spaces},
    booktitle = {Proceedings of the International Joint Conference on Neural Networks {IJCNN}'2011 },
    pages = {658--665},
    year = {2011},
}
[33] Norton, David and Heath, Darrell and Ventura, Dan, Autonomously Creating Quality Images, Proceedings of the Second International Conference on Computational Creativity, 10--15, 2011. [Abstract] [Bibtex]
Creativity is an important part of human intelligence, and it is difficult to quantify (or even qualify) creativity in an intelligent system. Recently it has been suggested that quality, novelty, and typicality are essential properties of a creative system. We describe and demonstrate a computational system (called DARCI) that is designed to eventually produce images in a creative manner. In this paper, we focus on quality and show, through experimentation and statistical analysis, that DARCI is beginning to be able to produce images with quality comparable to those produced by humans.
@inproceedings{norton2011iccc,
    author = {Norton, David and Heath, Darrell and Ventura, Dan},
    title = {Autonomously Creating Quality Images},
    booktitle = {Proceedings of the Second International Conference on Computational Creativity},
    pages = {10--15},
    year = {2011},
}
[34] Monteith, Kristine and Francisco, Virginia and Martinez, Tony and Gervas, Pablo and Ventura, Dan, Automatic Generation of Emotionally-Targeted Soundtracks, Proceedings of the Second International Conference on Computational Creativity, 60--62, 2011. [Abstract] [Bibtex]
Music can be used both to direct and enhance the impact a story can have on its listeners. This work makes use of two creative systems to provide emotionally-targeted musical accompaniment for stories. One system assigns emotional labels to text, and the other generates original musical compositions with targeted emotional content. We use these two programs to generate music to accompany audio readings of fairy tales. Results show that music with targeted emotional content makes the stories significantly more enjoyable to listen to and increases listener perception of emotion in the text.

@inproceedings{monteith_2011_iccc,
    author = {Monteith, Kristine and Francisco, Virginia and Martinez, Tony and Gervas, Pablo and Ventura, Dan},
    title = {Automatic Generation of Emotionally-Targeted Soundtracks},
    booktitle = {Proceedings of the Second International Conference on Computational Creativity},
    pages = {60--62},
    year = {2011},
}
[35] Ventura, Dan, No Free Lunch in the Search for Creativity, Proceedings of the Second International Conference on Computational Creativity, 108--110, 2011. [Abstract] [Bibtex]
We consider computational creativity as a search process and give a No Free Lunch result for computational creativity in this context. That is, we show that there is no a priori "best" creative strategy. We discuss some implications of this result and suggest some additional questions to be explored.
@inproceedings{ventura2011iccc,
    author = {Ventura, Dan},
    title = {No Free Lunch in the Search for Creativity},
    booktitle = {Proceedings of the Second International Conference on Computational Creativity},
    pages = {108--110},
    year = {2011},
}
[36] Norton, David and Heath, Darrell and Ventura, Dan, A Collaboration with DARCI, ACM C&C Workshop on Semi-Automated Creativity, 2011. [Abstract] [Bibtex]
We briefly outline several different ways in which DARCI, a budding digital artist, can interact with people in an artistic sense. These include acting as the medium for sociological experiments, acting as artistic juror, acting as artistic collaboorator, and modifying environment to affect mood.
@inproceedings{norton2010ccworkshop,
    author = {Norton, David and Heath, Darrell and Ventura, Dan},
    title = {A Collaboration with DARCI},
    booktitle = {ACM C&C Workshop on Semi-Automated Creativity},
    year = {2011},
}
[37] Norton, David and Heath, Darrell and Ventura, Dan, An Artistic Dialogue with the Artificial, Proceedings of the Eighth ACM Conference on Creativity and Cognition, 31--40, 2011. [Bibtex]
@inproceedings{norton2010cc,
    author = {Norton, David and Heath, Darrell and Ventura, Dan},
    title = {An Artistic Dialogue with the Artificial},
    booktitle = {Proceedings of the Eighth ACM Conference on Creativity and Cognition},
    pages = {31--40},
    year = {2011},
}
[38] Monteith, Kristine and Carroll, James and Seppi, Kevin and Martinez, Tony, Turning Bayesian Model Averaging into Bayesian Model Combination, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'11, 2657--2663, 2011. [Abstract] [Bibtex]
Bayesian methods are theoretically optimal in many situations. Bayesian model averaging is generally con- sidered the standard model for creating ensembles of learners using Bayesian methods, but this technique is often out- performed by more ad hoc methods in empirical studies. The reason for this failure has important theoretical implications for our understanding of why ensembles work. It has been proposed that Bayesian model averaging struggles in practice because it accounts for uncertainty about which model is correct but still operates under the assumption that only one of them is. In order to more effectively access the benefits inherent in ensembles, Bayesian strategies should therefore be directed more towards model combination rather than the model selection implicit in Bayesian model averaging. This work provides empirical verification for this hypothesis using several different Bayesian model combination approaches tested on a wide variety of classification problems. We show that even the most simplistic of Bayesian model combination strategies outperforms the traditional ad hoc techniques of bagging and boosting, as well as outperforming BMA over a wide variety of cases. This suggests that the power of ensembles does not come from their ability to account for model uncertainty, but instead comes from the changes in representational and preferential bias inherent in the process of combining several different models.
@incollection{Kristine.ijcnn2011,
title ={Turning Bayesian Model Averaging into Bayesian Model Combination},
author ={Monteith, Kristine and Carroll, James and Seppi, Kevin and Martinez, Tony},
booktitle ={Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'11},
year ={2011},
pages ={2657--2663},

}
[39] Gashler, Michael S. and Martinez, Tony, Temporal Nonlinear Dimensionality Reduction, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'11, 1959--1966, 2011, IEEE Press, San Jose, California, U.S.A.. [Abstract] [Bibtex]
Existing Nonlinear dimensionality reduction (NLDR) algorithms make the assumption that distances between observations are uniformly scaled. Unfortunately, with many interesting systems, this assumption does not hold. We present a new technique called Temporal NLDR (TNLDR), which is specifically designed for analyzing the high-dimensional observations obtained from random-walks with dynamical systems that have external controls. It uses the additional information implicit in ordered sequences of observations to compensate for non-uniform scaling in observation space. We demonstrate that TNLDR computes more accurate estimates of intrinsic state than regular NLDR, and we show that accurate estimates of state can be used to train accurate models of dynamical systems.
@incollection{gashler2011ijcnn2,
title ={Temporal Nonlinear Dimensionality Reduction},
author ={Gashler, Michael S. and Martinez, Tony},
booktitle ={Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'11},
year ={2011},
pages ={1959--1966},
publisher ={IEEE Press},
location ={San Jose, California, U.S.A.},

}
[40] Kenneth Sundberg and Mark Clement and Quinn Snell and Dan Ventura and Michael Whiting and Keith Crandall, Phylogenetic search through partial tree mixing, BMC Bioinformatics, 13, Suppl 13, S8, 2011. [Abstract] [Bibtex]
Background: Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques.

Results: When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/ psoda

Conclusions: The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution.
@article{sundberg2011biot,
    author = {Kenneth Sundberg and Mark Clement and Quinn Snell and Dan Ventura and Michael Whiting and Keith Crandall},
    title = {Phylogenetic search through partial tree mixing},
    journal = {BMC Bioinformatics},
volume = {13},
number = {Suppl 13},
    pages = {S8},
    year = {2011},
}
[41] Dickerson, Kyle and Ventura, Dan, A SOM-based Multimodal System for Musical Query-by-Content, Proceedings of the International Joint Conference on Neural Networks, 705--710, 2011. [Abstract] [Bibtex]
The ever-increasing density of computer storage devices has allowed the average user to store enormous quantities of multimedia content, and a large amount of this content is usually music. We present a query-by-content system which searches the actual audio content of the music and supports querying in several styles using a Self-Organizing Map as its basis. Empirical results demonstrate the viability of this approach for musical query-by-content.
@inproceedings{dickerson2011ijcnn,
    author = {Dickerson, Kyle and Ventura, Dan},
    title = {A SOM-based Multimodal System for Musical Query-by-Content},
    booktitle = {Proceedings of the International Joint Conference on Neural Networks},
    pages = {705--710},
    year = {2011},
}
[42] Whiting, Jeffrey S. and Dinerstein, Jonathan and Egbert, Paris K. and Ventura, Dan, Cognitive and Behavioral Model Ensembles for Autonomous Virtual Characters, Computational Intelligence, 26, 2, 142--159, 2010. [Abstract] [Bibtex]
Cognitive and behavioral models have become popular methods for creating autonomous self-animating characters. Creating these models presents the following challenges: (1) Creating a cognitive or behavioral model is a time intensive and complex process that must be done by an expert programmer, (2) The models are created to solve a specific problem in a given environment and because of their specific nature cannot be easily reused. Combining existing models together would allow an animator, without the need for a programmer, to create new characters in less time and to leverage each model's strengths, resulting in an increase in the character's performance and in the creation of new behaviors and animations. This paper provides a framework that can aggregate existing behavioral and cognitive models into an ensemble. An animator has only to rate how appropriately a character performs in a set of scenarios and the system then uses machine learning to determine how the character should act given the current situation. Empirical results from multiple case studies validate the approach.

@article{whiting2010coin,
    author = {Whiting, Jeffrey S. and Dinerstein, Jonathan and Egbert, Paris K. and Ventura, Dan},
    title = {Cognitive and Behavioral Model Ensembles for Autonomous Virtual Characters},
    journal = {Computational Intelligence},
volume = {26},
number = {2},
    pages = {142--159},
    year = {2010},
}
[43] White, Spencer and Martinez, Tony R. and Rudolph, George, Generating Three Binary Addition Algorithms using Reinforcement Programming, Proceedings of ACMSE 2010, 2010. [Bibtex]
@inproceedings{ Spencer.ACMSE10,
    author    = {White, Spencer and Martinez, Tony R. and Rudolph, George},
    title        = {Generating Three Binary Addition Algorithms using Reinforcement Programming},
    booktitle    = {Proceedings of ACMSE 2010},
    year        = {2010},
    }
[44] Norton, David and Heath Darrell and Ventura, Dan, Establishing Appreciation in a Creative System, Proceedings of the First International Conference Computational Creativity, 26--35, 2010. [Abstract] [Bibtex]
Colton discusses three conditions for attributing creativity to a system: appreciation, imagination, and skill. We describe an original computer system (called DARCI) that is designed to eventually produce images through creative means. We show that DARCI has already started gaining appreciation, and has even demonstrated imagination, while skill will come later in her development.
@inproceedings{norton2010iccc,
    author = {Norton, David and Heath Darrell and Ventura, Dan},
    title = {Establishing Appreciation in a Creative System},
    booktitle = {Proceedings of the First International Conference Computational Creativity},
    pages = {26--35},
    year = {2010},
}
[45] Monteith, Kristine and Martinez, Tony R. and Ventura, Dan, Computational Modeling of Emotional Content in Music, Proceedings of International Conference on Cognitive Science, 2356--2361, 2010. [Bibtex]
@inproceedings{ Kristine.Cogsci10,
    author    = {Monteith, Kristine and Martinez, Tony R. and Ventura, Dan},
    title        = {Computational Modeling of Emotional Content in Music},
    booktitle    = {Proceedings of International Conference on Cognitive Science},
    pages    = {2356--2361},
    year        = {2010},
    }
[46] Smith, Michael R. and Clement, Mark and Martinez, Tony and Snell, Quinn, Time Series Gene Expression Prediction using Neural Networks with Hidden Layers, Proceedings of the 7th Annual Biotechnology and Bioinformatics Symposium (BIOT 2010), 67--69, 2010, October. [Abstract] [Bibtex]
A central issue to systems biology is modeling how genes interact with each other. The non-linear relationships between genes and feedback loops in the network makes modeling gene regulatory networks (GRNs) a difficult problem. In this paper, we examine modeling GRNs using neural networks (NNs) with hidden layers to predict gene expression levels. Some assume that single layer NNs are sufficient to model GRNs. However, NNs without hidden layers are only capable of modeling first order correlations. We find that the addition of a hidden layer is better able to model gene expression data than a NN without a hidden layer and that the addition of a hidden layer results in a lower error than previous models on a benchmark data set. Recurrent NNs have also been used to handle the feedback in GRNs and the temporal nature of the data, but they lack a formal training method. To incorporate time without recurrence, we propose a novel approach to combining time series data to generate additional training points. The additional training points also address the problem that generally only a limited number of noisy data points are available to build the model. The additional data points have a smoothing effect on the predictions providing the overall trend of each gene. The error values for models trained using the additional training points are competitive to those from recurrent NNs.
@inproceedings{ smith_2010biot,
    author    = {Smith, Michael R. and Clement, Mark and Martinez, Tony and Snell, Quinn},
    title        = {Time Series Gene Expression Prediction using Neural Networks with Hidden Layers},
    booktitle    = {Proceedings of the 7th Annual Biotechnology and Bioinformatics Symposium (BIOT 2010)},
    pages    = {67--69},
    year        = {2010},
    month        = {October}

    }
[47] Peterson, A. and Martinez, T. R., Using learning algorithm behavior to chart task space: The DICES distance, Intelligent Data Analysis, 14, 3, 355--367, 2010. [Abstract] [Bibtex]

@article{DICES.IDA,
    author = {Peterson, A. and Martinez, T. R.},
    title = {Using learning algorithm behavior to chart task space: The DICES distance},
    journal = {Intelligent Data Analysis},
    pages = {355--367},
volume = {14},
number = {3},
    year = {2010},
}
[48] White, Spencer and Martinez, Tony R. and Rudolph, George, Generating a Novel Sort Algorithm using Reinforcement Programming, Proceedings of Conference on Evolutionary Compution (CEC), 2633--2640, 2010. [Abstract] [Bibtex]
Reinforcement Programming (RP) is a new ap-
proach to automatically generating algorithms, that uses rein-
forcement learning techniques. This paper describes the RP ap-
proach and gives results of experiments using RP to generate a
generalized, in-place, iterative sort algorithm. The RP approach
improves on earlier results that that use genetic programming
(GP). The resulting algorithm is a novel algorithm that is more
efﬁcient than comparable sorting routines. RP learns the sort
in fewer iterations than GP and with fewer resources. Results
establish interesting empirical bounds on learning the sort
algorithm: A list of size 4 is sufﬁcient to learn the generalized
sort algorithm. The training set only requires one element and
learning took less than 200,000 iterations. RP has also been
used to generate three binary addition algorithms: a full adder,
a binary incrementer, and a binary adder.
@inproceedings{ Spencer.CEC10,
    author    = {White, Spencer and Martinez, Tony R. and Rudolph, George},
    title        = {Generating a Novel Sort Algorithm using Reinforcement Programming},
    booktitle    = {Proceedings of Conference on Evolutionary Compution (CEC)},
    pages    = {2633--2640},
    year        = {2010},
    }
[49] Monteith, Kristine and Martinez, Tony R. and Ventura, Dan, Automatic Generation of Music for Inducing Emotive Response, Proceedings of the First International Conference on Computational Creativity, 140--149, 2010. [Abstract] [Bibtex]
We present a system that generates original music designed
to match a target emotion. It creates n-gram models, Hidden Markov
Models, and other statistical distributions based on musical selections
from a corpus representing a given emotion and uses these models to
probabilistically generate new musical selections with similar emotional
content. This system produces unique and often remarkably musical
selections that tend to match a target emotion, performing this task at a
level that approaches human competency for the same task.

@inproceedings{ Kristine.ICCC10,
    author    = {Monteith, Kristine and Martinez, Tony R. and Ventura, Dan},
    title        = {Automatic Generation of Music for Inducing Emotive Response},
    booktitle    = {Proceedings of the First International Conference on Computational Creativity},
    pages    = {140--149},
    year        = {2010},
    }
[50] Dennis, Aaron and Michaels, Andrew D. and Arand, Patti and Ventura, Dan, Noninvasive Diagnosis of Pulmonary Hypertension Using Heart Sound Analysis, Computers in Biology and Medicine, 40, 9, 758--764, 2010. [Abstract] [Bibtex]
Right-heart catheterization is the most accurate method for measuring pulmonary artery pressure (PAP). It is an expensive, invasive procedure, exposes patients to the risk of infection, and is not suited for long-term monitoring situations. Medical researchers have shown that PAP influences the characteristics of heart sounds. This suggests that heart sound analysis is a potential method for the noninvasive diagnosis of pulmonary hypertension. We describe the development of a prototype system, called PHD (pulmonary hypertension diagnoser), that implements this method. PHD uses patient data with machine learning algorithms to build models of how pulmonary hypertension affects heart sounds. Data from 20 patients was used to build the models and data from another 31 patients was used as a validation set. PHD diagnosed pulmonary hypertension in the validation set with 77% accuracy and 0.78 area under the receiver-operating-characteristic curve.
@article{dennis2010cbm,
    author = {Dennis, Aaron and Michaels, Andrew D. and Arand, Patti and Ventura, Dan},
    title = {Noninvasive Diagnosis of Pulmonary Hypertension Using Heart Sound Analysis},
    journal = {Computers in Biology and Medicine},
volume = {40},
number = {9},
    pages = {758--764},
    year = {2010},
}
[51] Norton, David and Ventura, Dan, Improving Liquid State Machines Through Iterative Reﬁnement of the Reservoir, Neurocomputing, 73, 16-18, 2893--2904, 2010. [Abstract] [Bibtex]
Liquid State Machines (LSMs) exploit the power of recurrent spiking neural networks (SNNs) without training the SNN. Instead, LSMs randomly generate this network and then use it as a filter for a generic machine learner. Previous research has shown that LSMs can yield competitive results; however, the process can require numerous time consuming epochs before finding a viable filter. We have developed a method for iteratively refining these randomly generated networks so that the LSM will yield a more effective filter in fewer epochs than the traditional method. We define a new metric for evaluating the quality of a filter before calculating the accuracy of the LSM. The LSM then uses this metric to drive a novel algorithm founded on principals integral to both Hebbian and reinforcement learning. We compare this new method with traditional LSMs across two artificial pattern recognition problems and two simplified problems derived from the TIMIT dataset. Depending on the problem, our method demonstrates improvements in accuracy of from 15 to almost 600%.
@inproceedings{norton2010neuro,
    author = {Norton, David and Ventura, Dan},
    title = {Improving Liquid State Machines Through Iterative Reﬁnement of the Reservoir},
    journal = {Neurocomputing},
volume = {73},
number = {16-18},
    pages = {2893--2904},
    year = {2010},
}
[52] Monteith, Kristine and Martinez, Tony R., Using Multiple Measures to Predict Confidence in Instance Classification, Proceedings of IJCNN 2010, 4192--4199, 2010. [Abstract] [Bibtex]
Selecting an effective method for combining the votes of classifiers in an ensemble can have a significant impact on the ensemble's overall classification accuracy. Some methods cannot even achieve as high a classification accuracy as the most accurate individual classifying component. To address this issue, we present the strategy of Aggregate Confidence Ensembles, which uses multiple measures to estimate a classifier's confidence in its predictions on an instance-by-instance basis. Using these confidence estimators to weight the votes in an ensemble results in an overall average increase in classification accuracy compared to the most accurate classifier in the ensemble. These aggregate measures result in higher classification accuracy than using a collection of single confidence estimates. Aggregate Confidence Ensembles outperform three baseline ensemble creation strategies, as well as the methods of Modified Stacking and Arbitration, both in terms of average classification accuracy and algorithm-by-algorithm comparisons in accuracy over 36 data sets.
@inproceedings{ Kristine.IJCNN10,
    author    = {Monteith, Kristine and Martinez, Tony R.},
    title        = {Using Multiple Measures to Predict Confidence in Instance Classification},
    booktitle    = {Proceedings of IJCNN 2010},
    pages    = {4192--4199},
    year        = {2010},
    }
[53] Toronto, Neil and Morse, Bryan and Seppi, Kevin and Ventura, Dan, Super-Resolution via Recapture and Bayesian Effect Modeling, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2388--2395, 2009, June. [Abstract] [Bibtex]
This paper presents Bayesian edge inference (BEI), a single-frame super-resolution method explicitly grounded in Bayesian inference that addresses issues common to existing methods. Though the best give excellent results at modest magnification factors, they suffer from gradient stepping and boundary coherence problems by factors of 4x. Central to BEI is a causal framework that allows image capture and recapture to be modeled differently, a principled way of undoing downsampling blur, and a technique for incorporating Markov random field potentials arbitrarily into Bayesian networks. Besides addressing gradient and boundary issues, BEI is shown to be competitive with existing methods on published correctness measures. The model and framework are shown to generalize to other reconstruction tasks by demonstrating BEI's effectiveness at CCD demosaicing and inpainting with only trivial changes.
@inproceedings{toronto.cvpr09,
    title        = {Super-Resolution via Recapture and Bayesian Effect Modeling},
    author        = {Toronto, Neil and Morse, Bryan and Seppi, Kevin and Ventura, Dan},
    booktitle    = {Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition},
    pages        = {2388--2395},
    month        = {June},
    year        = {2009},

    }
[54] Peterson, Adam H. and Martinez, Tony R., Reducing Decision Tree Ensemble Size using Parallel Decision DAGs, International Journal of Artificial Intelligence Tools, 18, 4, 613--620, 2009. [Bibtex]
@article{ peterson.IJAIT2009,
    author    = {Peterson, Adam H. and Martinez, Tony R.},
    title        = {Reducing Decision Tree Ensemble Size using Parallel Decision DAGs},
    journal    = {International Journal of Artificial Intelligence Tools},
    volume    = {18},
    number    = {4},
    pages    = {613--620},
    year        = {2009},
    }
[55] Drake, Adam and Ventura, Dan, Search Techniques for Fourier-Based Learning, Proceedings of the International Joint Conference on Artificial Intelligence, 1040--1045, 2009, (First appeared in Proceedings of the AAAI Workshop on Search in Artificial Intelligence and Robotics, 2008), July. [Abstract] [Bibtex]
Fourier-based learning algorithms rely on being able to efficiently find the large coefficients of a function's spectral representation. In this paper, we introduce and analyze techniques for finding large coefficients. We show how a previously introduced search technique can be generalized from the Boolean case to the real-valued case, and we apply it in branch-and-bound and beam search algorithms that have significant advantages over the best-first algorithm in which the technique was originally introduced.
@inproceedings{drake2009a,
    author        ={Drake, Adam and Ventura, Dan},
    title        ={Search Techniques for {F}ourier-Based Learning},
    booktitle    ={Proceedings of the International Joint Conference on Artificial Intelligence},
    note        ={(First appeared in Proceedings of the {AAAI} Workshop on Search in Artificial Intelligence and Robotics, 2008)},
    year        ={2009},
    month        ={July},
    pages        ={1040--1045}

    }
[56] Menke, Joshua and Martinez, Tony R., Improving Supervised Learning by Adapting the Problem to the Learner, International Journal of Neural Systems, 19, 1, 1--9, 2009. [Bibtex]
@inproceedings{menke_2009_adapting,
    author        = {Menke, Joshua and Martinez, Tony R.},
    year        = {2009},
    title        = {Improving Supervised Learning by Adapting the Problem to the Learner},
    booktitle    = {International Journal of Neural Systems},
    volume = {19},
    number = {1},
    pages        = {1--9},
    }
[57] Menke, Joshua and Martinez, Tony R., Artificial Neural Network Reduction through Oracle Learning, Intelligent Data Analysis, 13, 1, 135--149, 2009. [Bibtex]
@article{ Menke.OracleJournal,
    author    = {Menke, Joshua and Martinez, Tony R.},
    title        = {Artificial Neural Network Reduction through Oracle Learning},
    journal    = {Intelligent Data Analysis},
    volume    = {13},
    number    = {1},
    pages    = {135--149},
    year        = {2009},
    }
[58] Norton, David and Ventura, Dan, Improving the Separability of a Reservoir Facilitates Learning Transfer, Proceedings of the International Joint Conference on Neural Networks, 2288--2293, 2009. [Abstract] [Bibtex]
We use a type of reservoir computing called the liquid state machine (LSM) to explore learning transfer. The Liquid State Machine (LSM) is a neural network model that uses a reservoir of recurrent spiking neurons as a filter for a readout function. We develop a method of training the reservoir, or liquid, that is not driven by residual error. Instead, the liquid is evaluated based on its ability to separate different classes of input into different spatial patterns of neural activity. Using this method, we train liquids on two qualitatively different types of artificial problems. Resulting liquids are shown to substantially improve performance on either problem regardless of which problem was used to train the liquid, thus demonstrating a significant level of learning transfer.
@article{norton.ijcnn09,
title={Improving the Separability of a Reservoir Facilitates Learning Transfer},
author={Norton, David and Ventura, Dan},
year={2009},
pages={2288--2293},
journal={Proceedings of the International Joint Conference on Neural Networks},

}
[59] Ventura, Dan, A Sub-symbolic Model of the Cognitive Processes of Re-representation and Insight, Proceedings of ACM Creativity and Cognition, 409-410, 2009, October. [Abstract] [Bibtex]
We present a sub-symbolic computational model for effecting knowledge \emph{re-representation} and \emph{insight}. Given a set of data, manifold learning is used to automatically organize the data into one or more representational transformations, which are then learned with a set of neural networks. The result is a set of neural filters that can be applied to new data as re-representation operators.
@inproceedings{ventura.cc09,
    title        = {A Sub-symbolic Model of the Cognitive Processes of Re-representation and Insight},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of ACM Creativity and Cognition},
    pages        = {409-410},
    month        = {October},
    year        = {2009},

    }
[60] Dickerson, Kyle and Ventura, Dan, Using Self-Organizing Maps to Implicitly Model Preference for a Musical Query-by-Content System, Proceedings of the International Joint Conference on Neural Networks, 705--710, 2009, June. [Abstract] [Bibtex]
The ever-increasing density of computer storage devices has allowed the average user to store enormous quantities of multimedia content, and a large amount of this content is usually music. Current search techniques for musical content rely on meta-data tags which describe artist, album, year, genre, etc. Query-by-content systems allow users to search based upon the acoustical content of the songs. Recent systems have mainly depended upon textual representations of the queries and targets in order to apply common string-matching algorithms. However, these methods lose much of the information content of the song and limit the ways in which a user may search. We have created a music recommendation system that uses Self-Organizing Maps to find similarities between songs while preserving more of the original acoustical content. We build on the design of the recommendation system to create a musical query-by-content system. We discuss the weaknesses of the naive solution and then implement a quasi-supervised design and discuss some preliminary results.
@inproceedings{dickerson.ijcnn09,
    title        = {Using Self-Organizing Maps to Implicitly Model Preference for a Musical Query-by-Content System},
    author        = {Dickerson, Kyle and Ventura, Dan},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    pages        = {705--710},
    month        = {June},
    year        = {2009},

    }
[61] Raykhel, Ilya and Ventura, Dan, Real-time Automatic Price Prediction for eBay Online Trading, Proceedings of the Innovative Applications of Artificial Intelligence Conference, 135--140, 2009, July. [Abstract] [Bibtex]
We develop a system for attribute-based prediction of final (online) auction pricing, focusing on the eBay laptop category. The system implements a feature-weighted k-NN algorithm, using evolutionary computation to determine feature weights, with prior trades used as training data. The resulting average prediction error is 16%. Mostly automatic trading using the system greatly reduces the time a reseller needs to spend on trading activities, since the bulk of market research is now done automatically with the help of the learned model. The result is a 562% increase in trading efficiency (measured as profit/hour).
@inproceedings{raykhel.iaai09,
    title        = {Real-time Automatic Price Prediction for eBay Online Trading},
    author        = {Raykhel, Ilya and Ventura, Dan},
    booktitle    = {Proceedings of the Innovative Applications of Artificial Intelligence Conference},
    pages        = {135--140},
    month        = {July},
    year        = {2009},

    }
[62] Drake, Adam and Ringger, Eric and Ventura, Dan, Sentiment Regression: Using Real-Valued Scores to Summarize Overall Document Sentiment, Proceedings of the IEEE International Conference on Semantic Computing, 152--157, 2008, August. [Abstract] [Bibtex]
In this paper, we consider a sentiment regression problem: summarizing the overall sentiment of a review with a real-valued score. Empirical results on a set of labeled reviews show that real-valued sentiment modeling is feasible, as several algorithms improve upon baseline performance. We also analyze performance as the granularity of the classification problem moves from two-class (positive vs. negative) towards infinite-class (real-valued).
@inproceedings{drv.icsc2008,
    author        = {Drake, Adam and Ringger, Eric and Ventura, Dan},
    title        = {Sentiment Regression: Using Real-Valued Scores to Summarize Overall Document Sentiment},
    booktitle    = {Proceedings of the {IEEE} International Conference on Semantic Computing},
    year        = {2008},
    month        = {August},
    pages        = {152--157},

    }
[63] Gashler, Michael S. and Ventura, Dan and Martinez, Tony, Iterative Non-linear Dimensionality Reduction with Manifold Sculpting, Advances in Neural Information Processing Systems 20, 513--520, 2008, Platt, J.C. and Koller, D. and Singer, Y. and Roweis, S., MIT Press, Cambridge, MA. [Abstract] [Bibtex]
Many algorithms have been recently developed for reducing dimensionality by projecting data onto an intrinsic non-linear manifold. Unfortunately, existing algorithms often lose significant precision in this transformation. Manifold Sculpting is a new algorithm that iteratively reduces dimensionality by simulating surface tension in local neighborhoods. We present several experiments that show Manifold Sculpting yields more accurate results than existing algorithms with both generated and natural data-sets. Manifold Sculpting is also able to benefit from both prior dimensionality reduction efforts.
@incollection{gashler2007nips,
    title        = {Iterative Non-linear Dimensionality Reduction with Manifold Sculpting},
    author        = {Gashler, Michael S. and Ventura, Dan and Martinez, Tony},
    booktitle    = {Advances in Neural Information Processing Systems 20},
    editor        = {Platt, J.C. and Koller, D. and Singer, Y. and Roweis, S.},
    publisher    = {MIT Press},
    address        = {Cambridge, MA},
    pages        = {513--520},
    year        = {2008},

    }
[64] Dinerstein, Jonathan and Ventura, Dan and Goodrich, Michael and Egbert, Parris, Data-Driven Programming and Behavior for Autonomous Virtual Characters, Proceedings of the Association for the Advancement of Artificial Intelligence, 1450--1451, 2008, July. [Abstract] [Bibtex]
We present a high-level overview of a system for programming autonomous virtual characters by demonstration. The result is a deliberative model of agent behavior that is stylized and effective, as demonstrated in five different cases studies.
@inproceedings{dinerstein.aaai08,
    title        = {Data-Driven Programming and Behavior for Autonomous Virtual Characters},
    author        = {Dinerstein, Jonathan and Ventura, Dan and Goodrich, Michael and Egbert, Parris},
    booktitle    = {Proceedings of the Association for the Advancement of Artificial Intelligence},
    pages        = {1450--1451},
    month        = {July},
    year        = {2008},

    }
[65] Ventura, Dan, Sub-symbolic Re-representation to Facilitate Learning Transfer, Creative Intelligent Systems, AAAI 2008 Spring Symposium Technical Report SS-08-03, 128--134, 2008, March. [Abstract] [Bibtex]
We consider the issue of knowledge (re-)representation in the context of learning transfer and present a sub-
symbolic approach for e?ecting such transfer. Given a set of data, manifold learning is used to automatically
organize the data into one or more representational transformations, which are then learned with a set of
neural networks. The result is a set of neural filters that can be applied to new data as re-representation
operators. Encouraging preliminary empirical results elucidate the approach and demonstrate its feasibility,
suggesting possible implications for the broader field of creativity.

@inproceedings{ventura.aaaiss08,
    title        = {Sub-symbolic Re-representation to Facilitate Learning Transfer},
    author        = {Ventura, Dan},
    booktitle    = {Creative Intelligent Systems, {AAAI} 2008 Spring Symposium Technical Report {SS}-08-03},
    pages        = {128--134},
    month        = {March},
    year        = {2008},
    }
[66] Zeng, Xinchuan and Martinez, Tony R., Using Decision Trees and Soft Labeling to Filter Mislabeled Data, Journal of Intelligent Systems, 17, 4, 331--354, 2008. [Bibtex]
@article{ Zeng.JIS,
    author    = {Zeng, Xinchuan and Martinez, Tony R.},
    title        = {Using Decision Trees and Soft Labeling to Filter Mislabeled Data},
    journal    = {Journal of Intelligent Systems},
    volume    = {17},
    number    = {4},
    pages    = {331--354},
    year        = {2008},
    }
[67] Chan, Heather and Ventura, Dan, Automatic Composition of Themed Mood Pieces, Proceedings of the International Joint Workshop on Computational Creativity, 109--115, 2008, September. [Abstract] [Bibtex]
Musical harmonization of a given melody is a nontrivial problem; slight variations in instrumentation, voicing, texture, and bass rhythm can lead to significant differences in the mood of the resulting piece. This study explores the possibility of automatic musical composition by using machine learning and statistical natural language processing to tailor a piece to a particular mood using an existing melody.
@inproceedings{chan.ijwcc08,
    title        = {Automatic Composition of Themed Mood Pieces},
    author        = {Chan, Heather and Ventura, Dan},
    booktitle    = {Proceedings of the International Joint Workshop on Computational Creativity},
    pages        = {109--115},
    month        = {September},
    year        = {2008},

    }
[68] Menke, Joshua and Martinez, Tony R., A Bradley-Terry Artificial Neural Network Model for Individual Ratings in Group Competitions, Journal of Neural Computing and Applications, 17, 2, 175--186, 2008. [Bibtex]
@article{ Menke.BTJournal,
    author    = {Menke, Joshua and Martinez, Tony R.},
    title        = {A {B}radley-{T}erry Artificial Neural Network Model for Individual Ratings in Group Competitions},
    journal    = {Journal of Neural Computing and Applications},
    volume    = {17},
    number    = {2},
    pages    = {175--186},
    year        = {2008},
    }
[69] Ventura, Dan, A Reductio Ad Absurdum Experiment in Sufficiency for Evaluating (Computational) Creative Systems, Proceedings of the International Joint Workshop on Computational Creativity, 11--19, 2008, September. [Abstract] [Bibtex]
We consider a combination of two recent proposals for characterizing computational creativity and explore the sufficiency of the resultant framework. We do this in the form of a \textit{gedanken} experiment designed to expose the nature of the framework, what it has to say about computational creativity, how it might be improved and what questions this raises.
@inproceedings{ventura.ijwcc08,
    title        = {A Reductio Ad Absurdum Experiment in Sufficiency for Evaluating (Computational) Creative Systems},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the International Joint Workshop on Computational Creativity},
    pages        = {11--19},
    month        = {September},
    year        = {2008},

    }
[70] Van Dam, Rob and Geary, Irene and Ventura, Dan, Adapting ADtrees for High Arity Features, Proceedings of the Association for the Advancement of Artificial Intelligence, 708--713, 2008, July. [Abstract] [Bibtex]
{AD}trees, a data structure useful for caching sufficient statistics, have been successfully adapted to grow lazily when memory is limited and to update sequentially with an incrementally updated dataset. For low arity symbolic features, {AD}trees trade a slight increase in query time for a reduction in overall tree size. Unfortunately, for high arity features, the same technique can often result in a very large increase in query time and a nearly negligible tree size reduction. In the dynamic (lazy) version of the tree, both query time and tree size can increase for some applications. Here we present two modifications to the {AD}tree which can be used separately or in combination to achieve the originally intended space-time tradeoff in the {AD}tree when applied to datasets containing very high arity features.
@inproceedings{vandam.aaai08,
    title        = {Adapting {AD}trees for High Arity Features},
    author        = {Van Dam, Rob and Geary, Irene and Ventura, Dan},
    booktitle    = {Proceedings of the Association for the Advancement of Artificial Intelligence},
    pages        = {708--713},
    month        = {July},
    year        = {2008},

    }
[71] Gashler, Michael S. and Giraud-Carrier, Christophe and Martinez, Tony, Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, Seventh International Conference on Machine Learning and Applications, 2008. ICMLA '08., 900--905, 2008, Dec., 10.1109/ICMLA.2008.154. [Abstract] [Bibtex]
Using decision trees that split on randomly selected attributes is one way to increase the diversity within an ensemble of decision trees. Another approach increases diversity by combining multiple tree algorithms. The random forest approach has become popular because it is simple and yields good results with common datasets. We present a technique that combines heterogeneous tree algorithms and contrast it with homogeneous forest algorithms. Our results indicate that random forests do poorly when faced with irrelevant attributes, while our heterogeneous technique handles them robustly. Further, we show that large ensembles of random trees are more susceptible to diminishing returns than our technique. We are able to obtain better results across a large number of common datasets with a significantly smaller ensemble.
@incollection{gashler2008icmla,
    title        ={Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous},
    author        ={Gashler, Michael S. and Giraud-Carrier, Christophe and Martinez, Tony},
    booktitle    ={Seventh International Conference on Machine Learning and Applications, 2008. ICMLA '08.},
    year        ={2008},
    month        ={Dec.},
    pages        ={900--905},
    doi        ={10.1109/ICMLA.2008.154},

    }
[72] Lundell, Jared and Ventura, Dan, A Data-dependent Distance Measure for Transductive Instance-based Learning, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2825--2830, 2007, October. [Abstract] [Bibtex]
We consider learning in a transductive setting using instance-based learning ({k-NN}) and present a method for constructing a data-dependent distance ``metric'' using both labeled training data as well as available unlabeled data (that is to be classified by the model). This new data-driven measure of distance is empirically studied in the context of various instance-based models and is shown to reduce error (compared to traditional models) under certain learning conditions. Generalizations and improvements are suggested.
@inproceedings{lundell.smc07,
    title        = {A Data-dependent Distance Measure for Transductive Instance-based Learning},
    author        = {Lundell, Jared and Ventura, Dan},
    booktitle    = {Proceedings of the {IEEE} International Conference on Systems, Man and Cybernetics},
    pages        = {2825--2830},
    month        = {October},
    year        = {2007},

    }
[73] Merrell, Jake and Ventura, Dan and Morse, Bryan, Clustering Music via the Temporal Similarity of Timbre, IJCAI Workshop on Artificial Intelligence and Music, 153--164, 2007, January. [Abstract] [Bibtex]
We consider the problem of measuring the similarity of streaming music content and present a method for modeling, on the fly, the temporal progression of a song's timbre. Using a minimum distance classification scheme, we give an approach to classifying streaming music sources and present performance results for auto- associative song identification and for content-based clustering of streaming music. We discuss possible extensions to the approach and possible uses for such a system.
@inproceedings{merrell.ijcai07,
    title        = {Clustering Music via the Temporal Similarity of Timbre},
    author        = {Merrell, Jake and Ventura, Dan and Morse, Bryan},
    booktitle    = {{IJCAI} Workshop on Artificial Intelligence and Music},
    pages        = {153--164},
    month        = {January},
    year        = {2007},

    }
[74] Van Dam, Rob and Ventura, Dan, ADtrees for Sequential Data and N-gram Counting, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 492--497, 2007, October. [Abstract] [Bibtex]
We consider the problem of efficiently storing n-gram counts for large n over very large corpora. In such cases, the efficient storage of sufficient statistics can have a dramatic impact on system performance. One popular model for storing such data derived from tabular data sets with many attributes is the {AD}tree. Here, we adapt the {AD}tree to benefit from the sequential structure of corpora-type data. We demonstrate the usefulness of our approach on a portion of the well-known Wall Street Journal corpus from the Penn Treebank and show that our approach is exponentially more efficient than the na{\"\i}ve approach to storing n-grams and is also significantly more efficient than a traditional prefix tree.
@inproceedings{vandam.smc07,
    title        = {{AD}trees for Sequential Data and N-gram Counting},
    author        = {Van Dam, Rob and Ventura, Dan},
    booktitle    = {Proceedings of the {IEEE} International Conference on Systems, Man and Cybernetics},
    pages        = {492--497},
    month        = {October},
    year        = {2007},

    }
[75] Giraud-Carrier, Christophe and Martinez, Tony R., Learning by Discrimination: A Constructive Incremental Approach, Journal of Computers, 2, 7, 49--58, 2007, 1796-203X, September. [Abstract] [Bibtex]
This paper presents i-{AA1}*, a constructive, incremental learning algorithm for a special class of weightless, self-organizing networks. In i-{AA1}*, learning consists of adapting the nodes' functions and the network's overall topology as each new training pattern is presented. Provided the training data is consistent, computational complexity is low and prior factual knowledge may be used to ``prime'' the network and improve its predictive accuracy and/or efficiency. Empirical generalization results on both toy problems and more realistic tasks demonstrate promise.
@article{cgc.jcp2007,
    journal        = {Journal of Computers},
    issn        = {1796-203X},
    volume        = {2},
    number        = {7},
    month        = {September},
    year        = {2007},
    title        = {Learning by Discrimination: A Constructive Incremental Approach},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    pages        = {49--58},

    }
[76] Dinerstein, Jonathan and Egbert, Parris and Ventura, Dan, Learning Policies for Embodied Virtual Agents Through Demonstration, Proceedings of the International Joint Conference on Artificial Intelligence, 1257--1262, 2007, Hyderabad, India, January. [Abstract] [Bibtex]
Although many powerful {AI} and machine learning techniques exist, it remains difficult to quickly create {AI} for embodied virtual agents that produces visually lifelike behavior. This is important for applications (e.g., games, simulators, interactive displays) where an agent must behave in a manner that appears human-like. We present a novel technique for learning reactive policies that mimic demonstrated human behavior. The user demonstrates the desired behavior by dictating the agent's actions during an interactive animation. Later, when the agent is to behave autonomously, the recorded data is generalized to form a continuous state-to-action mapping. Combined with an appropriate animation algorithm (e.g., motion capture), the learned policies realize stylized and natural-looking agent behavior. We empirically demonstrate the efficacy of our technique for quickly producing policies which result in lifelike virtual agent behavior.
@inproceedings{dinerstein.ijcai07,
    author        = {Dinerstein, Jonathan and Egbert, Parris and Ventura, Dan},
    title        = {Learning Policies for Embodied Virtual Agents Through Demonstration},
    booktitle    = {Proceedings of the International Joint Conference on Artificial Intelligence},
    address        = {Hyderabad, India},
    month        = {January},
    year        = {2007},
    pages        = {1257--1262},

    }
[77] Fulda, Nancy and Ventura, Dan, Predicting and Preventing Coordination Problems in Cooperative Learning Systems, Proceedings of the International Joint Conference on Artificial Intelligence, 780--785, 2007, Hyderabad, India, January. [Abstract] [Bibtex]
We present a conceptual framework for creating Q-learning-based algorithms that converge to optimal equilibria in cooperative multiagent settings. This framework includes a set of conditions that are sufficient to guarantee optimal system performance. We demonstrate the efficacy of the framework by using it to analyze several well-known multi-agent learning algorithms and conclude by employing it as a design tool to construct a simple, novel multiagent learning algorithm.
@inproceedings{fulda.ijcai07,
    author        = {Fulda, Nancy and Ventura, Dan},
    title        = {Predicting and Preventing Coordination Problems in Cooperative Learning Systems},
    booktitle    = {Proceedings of the International Joint Conference on Artificial Intelligence},
    address        = {Hyderabad, India},
    month        = {January},
    year        = {2007},
    pages        = {780--785},

    }
[78] Gashler, Michael S., Manifold Sculpting, 2007, Brigham Young University, August. [Abstract] [Bibtex]
Manifold learning algorithms have been shown to be useful for many applications of numerical analysis. Unfortunately, existing algorithms often produce noisy results, do not scale well, and are unable to benefit from prior knowledge about the expected results. We propose a new algorithm that iteratively discovers manifolds by preserving the local structure among neighboring data points while scaling down the values in unwanted dimensions. This algorithm produces less noisy results than existing algorithms, and it scales better when the number of data points is much larger than the number of dimensions. Additionally, this algorithm is able to benefit from existing knowledge by operating in a semi-supervised manner.
@mastersthesis{gashler.thesis2007,
title = {Manifold Sculpting},
author = {Gashler, Michael S.},
school = {Brigham Young University},
month = {August},
year = {2007},

}
[79] Dinerstein, Sabra and Dinerstein, Jon and Ventura, Dan, Robust Multi-Modal Biometric Fusion via SVM Ensemble, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 1530--1535, 2007, October. [Abstract] [Bibtex]
Existing learning-based multi-modal biometric fusion techniques typically employ a single static Support Vector Machine ({SVM}). This type of fusion improves the accuracy of biometric classification, but it also has serious limitations because it is based on the assumptions that the set of biometric classifiers to be fused is local, static, and complete. We present a novel multi-{SVM} approach to multi-modal biometric fusion that addresses the limitations of existing fusion techniques and show empirically that our approach retains good classification accuracy even when some of the biometric modalities are unavailable.
@inproceedings{dinerstein.smc07,
    title        = {Robust Multi-Modal Biometric Fusion via {SVM} Ensemble},
    author        = {Dinerstein, Sabra and Dinerstein, Jon and Ventura, Dan},
    booktitle    = {Proceedings of the {IEEE} International Conference on Systems, Man and Cybernetics},
    pages        = {1530--1535},
    month        = {October},
    year        = {2007},

    }
[80] Toronto, Neil and Morse, Bryan and Ventura, Dan and Seppi, Kevin, The Hough Transform's Implicit Bayesian Foundation, Proceedings of the IEEE International Conference on Image Processing, 377--380, 2007, September. [Abstract] [Bibtex]
This paper shows that the basic Hough transform is implicitly a Bayesian process---that it computes an unnormalized posterior distribution over the parameters of a single shape given feature points. The proof motivates a purely Bayesian approach to the problem of finding parameterized shapes in digital images. A proof-of-concept implementation that finds multiple shapes of four parameters is presented. Extensions to the basic model that are made more obvious by the presented reformulation are discussed.
@inproceedings{toronto.icip07,
    title        = {The Hough Transform's Implicit Bayesian Foundation},
    author        = {Toronto, Neil and Morse, Bryan and Ventura, Dan and Seppi, Kevin},
    booktitle    = {Proceedings of the IEEE International Conference on Image Processing},
    pages        = {377--380},
    month        = {September},
    year        = {2007},

    }
[81] Rimer, Michael E. and Martinez, Tony R., Classification-based Objective Functions, Machine Learning, 2006, March. [Bibtex]
@article{rimer.ml2006,
    author        = {Rimer, Michael E. and Martinez, Tony R.},
    title        = {Classification-based Objective Functions},
    journal        = {Machine Learning},
    month        = {March},
    year        = {2006},
    }
[82] Fulda, Nancy and Ventura, Dan, Learning a Rendezvous Task with Dynamic Joint Action Perception, Proceedings of the International Joint Conference on Neural Networks, 627--632, 2006, Vancouver, BC, July. [Abstract] [Bibtex]
Groups of reinforcement learning agents interacting in a common environment often fail to learn optimal behaviors. Poor performance is particularly common in environments where agents must coordinate with each other to receive rewards and where failed coordination attempts are penalized. This paper studies the effectiveness of the Dynamic Joint Action Perception (DJAP) algorithm on a grid-world rendezvous task with this characteristic. The effects of learning rate, exploration strategy, and training time on algorithm effectiveness are discussed. An analysis of the types of tasks for which DJAP learning is appropriate is also presented.
@inproceedings{fulda.ijcnn06,
    author        = {Fulda, Nancy and Ventura, Dan},
    title        = {Learning a Rendezvous Task with Dynamic Joint Action Perception},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    address        = {Vancouver, BC},
    month        = {July},
    year        = {2006},
    pages        = {627--632},

    }
[83] Rimer, Michael E. and Martinez, Tony R., CB3: An Adaptive Error Function for Backpropagation Training, Neural Processing Letters, 24, 1, 81--92, 2006. [Bibtex]
@article{rimer.npl2006,
    author        = {Rimer, Michael E. and Martinez, Tony R.},
    title        = {{CB3}: An Adaptive Error Function for Backpropagation Training},
    journal        = {Neural Processing Letters},
    volume        = {24},
    number        = {1},
    pages        = {81--92},
    year        = {2006},
    }
[84] Toronto, Neil and Ventura, Dan, Learning Quantum Operators From Quantum State Pairs, IEEE World Congress on Computational Intelligence, 2607--2612, 2006, July. [Abstract] [Bibtex]
Developing quantum algorithms has proven to be very difficult. In this paper, the concept of using classical machine learning techniques to derive quantum operators from examples is presented. A gradient descent algorithm for learning unitary operators from quantum state pairs is developed as a starting point to aid in developing quantum algorithms. The algorithm is used to learn the quantum Fourier transform, an underconstrained two-bit function, and Grover's iterate.
@inproceedings{toronto2006,
    author        = {Toronto, Neil and Ventura, Dan},
    title        = {Learning Quantum Operators From Quantum State Pairs},
    booktitle    = {IEEE World Congress on Computational Intelligence},
    year        = {2006},
    month        = {July},
    pages        = {2607--2612},

    }
[85] Giraud-Carrier, Christophe and Martinez, Tony R., A Constructive Incremental Learning Algorithm for Binary Classification Tasks, Proceedings of SMCals/06, 213--218, 2006. [Bibtex]
@inproceedings{cgc_smc2006,
    title        = {A Constructive Incremental Learning Algorithm for Binary Classification Tasks},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    booktitle    = {Proceedings of SMCals/06},
    pages        = {213--218},
    year        = {2006},
    }
[86] Goodman, Eric and Ventura, Dan, Spatiotemporal Pattern Recognition in Liquid State Machines, Proceedings of the International Joint Conference on Neural Networks, 7979--7584, 2006, Vancouver, BC, July. [Abstract] [Bibtex]
The applicability of complex networks of spiking neurons as a general purpose machine learning technique remains open. Building on previous work using macroscopic exploration of the parameter space of an (artificial) neural microcircuit, we investigate the possibility of using a liquid state machine to solve two real-world problems: stockpile surveillance signal alignment and spoken phoneme recognition.
@inproceedings{goodman.ijcnn06,
    author        = {Goodman, Eric and Ventura, Dan},
    title        = {Spatiotemporal Pattern Recognition in Liquid State Machines},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    address        = {Vancouver, BC},
    month        = {July},
    year        = {2006},
    pages        = {7979--7584},

    }
[87] Norton, David and Ventura, Dan, Preparing More Effective Liquid State Machines Using Hebbian Learning, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'06, 8359--8364, 2006. [Abstract] [Bibtex]
In Liquid State Machines, separation is a critical attribute of the liquid--which is traditionally not trained. The effects of using Hebbian learning in the liquid to improve separation are investigated in this paper. When presented with random input, Hebbian learning does not dramatically change separation. However, Hebbian learning does improve separation when presented with real-world speech data.
@inproceedings{norton.ijcnn06,
    author        = {Norton, David and Ventura, Dan},
    title        = {Preparing More Effective Liquid State Machines Using {H}ebbian Learning},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'06},
    pages        = {8359--8364},
    year        = {2006},

    }
[88] Kamali, Kaivan and Ventura, Dan and Garga, Amulya and Kumara, Soundar, Geometric Task Decomposition in a Multi-agent Environment, Applied Artificial Intelligence, 20, 5, 437--456, 2006. [Abstract] [Bibtex]
Task decomposition in a multi-agent environment is often performed online. This paper proposes a method for sub-task allocation that can be performed before the agents are deployed, reducing the need for communication among agents during their mission. The proposed method uses a Voronoi diagram to partition the task-space among team members and includes two phases: static and dynamic. Static decomposition (performed in simulation before the start of the mission) repeatedly partitions the task-space by generating random diagrams and measuring the efficacy of the corresponding sub-task allocation. If necessary, dynamic decomposition (performed in simulation after the start of a mission) modifies the result of a static decomposition (i.e. in case of resource limitations for some agents). Empirical results are reported for the problem of surveillance of an arbitrary region by a team of agents.
@article{Kamali.aai06,
    author        = {Kamali, Kaivan and Ventura, Dan and Garga, Amulya and Kumara, Soundar},
    title        = {Geometric Task Decomposition in a Multi-agent Environment},
    journal        = {Applied Artificial Intelligence},
    year        = {2006},
    volume        = {20},
    number        = {5},
    pages        = {437--456},

    }
[89] Menke, Joshua and Martinez, Tony R., Domain Expert Approximation Through Oracle Learning, Proceedings of the 13th European Symposium on Artificial Neural Networks (ESANN 2005), 205--210, 2005. [Bibtex]
@inproceedings{menke_2005_domain,
    author        = {Menke, Joshua and Martinez, Tony R.},
    year        = {2005},
    title        = {Domain Expert Approximation Through Oracle Learning},
    booktitle    = {Proceedings of the 13th European Symposium on Artificial Neural Networks ({ESANN} 2005)},
    pages        = {205--210},
    }
[90] Drake, Adam and Ventura, Dan, Comparing High-Order Boolean Features, Proceedings of the Joint Conference on Information Sciences, 428--431, 2005, July. [Abstract] [Bibtex]
Many learning algorithms attempt, either explicitly or implicitly, to discover useful high-order features. When considering all possible functions that could be encountered, no particular type of high-order feature should be more useful than any other. However, this paper presents arguments and empirical results that suggest that for the learning problems typically encountered in practice, some high-order features may be more useful than others.
@inproceedings{drake.ventura.jcis2005,
    author        = {Drake, Adam and Ventura, Dan},
    title        = {Comparing High-Order Boolean Features},
    booktitle    = {Proceedings of the Joint Conference on Information Sciences},
    year        = {2005},
    pages        = {428--431},
    month        = {July},

    }
[91] Henderson, Eric and Martinez, Tony R., Constructing Low-Order Discriminant Neural Networks Using Statistical Feature Selection, Journal of Intelligent Systems, 14, 2005. [Bibtex]
@article{henderson.jis2005,
    title        = {Constructing Low-Order Discriminant Neural Networks Using Statistical Feature Selection},
    author        = {Henderson, Eric and Martinez, Tony R.},
    journal        = {Journal of Intelligent Systems},
    volume        = {14},
    year        = {2005},
    }
[92] Peterson, Adam H., COD: Measuring the Similarity of Classifiers, 2005, Brigham Young University, January. [Abstract] [Bibtex]
In the practice of machine learning, one must select a learning algorithm to employ for a problem. Common questions are: Are some algorithms basically the same, or are they fundamentally different? How different? Does an algorithm that solves a particular problem well exist? What algorithms should be tried for a particular problem? Could performance be improved by combining more than one solution? This research presents the {COD} (\emph{Classifier Output Difference}) distance metric for finding similarity between classifiers and classifier families. This metric is a tool which begins to address such questions, in both theoretical and practical aspects. It goes beyond simple accuracy comparisons and provides insights about fundamental differences between learning algorithms and the effectiveness of algorithm variations, fills a niche in meta-learning, may be used to improve the effectiveness of the construction of ensemble classifiers, and can give guidance in research towards hybrid systems. This paper describes how {COD} works and provides examples showing its utility in providing research insights. Results from this research show that there are clearly measurable differences in the behaviors of hypotheses produced by different learning algorithms, as well as clearly measurable differences between learning paradigms.
@mastersthesis{peterson.thesis2004,
    title        = {{COD}: Measuring the Similarity of Classifiers},
    author        = {Peterson, Adam H.},
    school    = {Brigham Young University},
    month        = {January},
    year        = {2005},

    }
[93] Goodman, Eric and Ventura, Dan, Effectively Using Recurrently Connected Spiking Neural Networks, Proceedings of the International Joint Conference on Neural Networks, 1542--1547, 2005, July. [Abstract] [Bibtex]
Recurrently connected spiking neural networks are difficult to use and understand because of the complex nonlinear dynamics of the system. Through empirical studies of spiking networks, we deduce several principles which are critical to success. Network parameters such as synaptic time delays and time constants and the connection probabilities can be adjusted to have a significant impact on accuracy. We show how to adjust these parameters to fit the type of problem.
@inproceedings{goodman.ijcnn05,
    author        = {Goodman, Eric and Ventura, Dan},
    title        = {Effectively Using Recurrently Connected Spiking Neural Networks},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    month        = {July},
    year        = {2005},
    pages        = {1542--1547},

    }
[94] Drake, Adam and Ventura, Dan, A Practical Generalization of Fourier-Based Learning, ICML '05: Proceedings of the 22nd International Conference on Machine Learning, 185--192, 2005, ACM Press, New York, NY, USA. [Abstract] [Bibtex]
This paper presents a search algorithm for finding functions that are highly correlated with an arbitrary set of data. The functions found by the search can be used to approximate the unknown function that generated the data. A special case of this approach is a method for learning Fourier representations. Empirical results demonstrate that on typical real-world problems the most highly correlated functions can be found very quickly, while combinations of these functions provide good approximations of the unknown function.
@inproceedings{drake.ventura.icml2005,
    author        = {Drake, Adam and Ventura, Dan},
    title        = {A Practical Generalization of Fourier-Based Learning},
    booktitle    = {ICML '05: Proceedings of the 22nd International Conference on Machine Learning},
    year        = {2005},
    pages        = {185--192},
    publisher    = {ACM Press},
    address        = {New York, NY, USA},

    }
[95] Toronto, Neil and Ventura, Dan and Morse, Bryan S., Edge Inference for Image Interpolation, International Joint Conference on Neural Networks, 1782--1787, 2005. [Abstract] [Bibtex]
Image interpolation algorithms try to fit a function to a matrix of samples in a "natural-looking" way. This paper presents edge inference, an algorthm that does this by mixing neural network regression with standard image interpolation techniques. Results on gray level images are presented. Extension into RGB color space and additional applications of the algorithm are discussed.
@inproceedings{ntoronto-ijcnn05,
    author        = {Toronto, Neil and Ventura, Dan and Morse, Bryan S.},
    title        = {Edge Inference for Image Interpolation},
    booktitle    = {International Joint Conference on Neural Networks},
    year        = {2005},
    pages        = {1782--1787},

}
[96] Peterson, Adam H. and Martinez, Tony R., Estimating The Potential For Combining Learning Models, Proceedings of the ICML Workshop on Meta-Learning, 68--75, 2005. [Bibtex]
@inproceedings{ peterson.icmlws05,
    author        = {Peterson, Adam H. and Martinez, Tony R.},
    title        = {Estimating The Potential For Combining Learning Models},
    booktitle    = {Proceedings of the ICML Workshop on Meta-Learning},
    year        = {2005},
    pages        = {68--75},
    }
[97] Dinerstein, John and Ventura, Dan and Egbert, Parris, Fast and Robust Incremental Action Prediction for Interactive Agents, Computational Intelligence, 21, 1, 90--110, 2005. [Abstract] [Bibtex]
The ability for a given agent to adapt on-line to better interact with another agent is a difficult and important problem. This problem becomes even more difficult when the agent to interact with is a human, since humans learn quickly and behave nondeterministically. In this paper we present a novel method whereby an agent can incrementally learn to predict the actions of another agent (even a human), and thereby can learn to better interact with that agent. We take a case-based approach, where the behavior of the other agent is learned in the form of state-action pairs. We generalize these cases either through continuous k-nearest neighbor, or a modified bounded minimax search. Through our case studies, our technique is empirically shown to require little storage, learn very quickly, and be fast and robust in practice. It can accurately predict actions several steps into the future. Our case studies include interactive virtual environments involving mixtures of synthetic agents and humans, with cooperative and/or competitive relationships.
@article{dinerstein.ci05,
    author        = {Dinerstein, John and Ventura, Dan and Egbert, Parris},
    title        = {Fast and Robust Incremental Action Prediction for Interactive Agents},
    journal        = {Computational Intelligence},
    volume        = {21},
    number        = {1},
    year        = {2005},
    pages        = {90--110},

    }
[98] Goodman, Eric and Ventura, Dan, Time Invariance and Liquid State Machines, Proceedings of the Joint Conference on Information Sciences, 420--423, 2005, July. [Abstract] [Bibtex]
Time invariant recognition of spatiotemporal patterns is a common task of signal processing. The liquid state machine (LSM) is a paradigm which robustly handles this type of classification. Using an artificial dataset with target pattern lengths ranging from 0.1 to 1.0 seconds, we train an LSM to find the start of the pattern with a mean absolute error of 0.18 seconds. Also, LSMs can be trained to identify spoken digits, 1-9, with an accuracy of 97.6%, even with scaling by factors ranging from 0.5 to 1.5.
@inproceedings{goodman.jcis05,
    author        = {Goodman, Eric and Ventura, Dan},
    title        = {Time Invariance and Liquid State Machines},
    booktitle    = {Proceedings of the Joint Conference on Information Sciences},
    month        = {July},
    year        = {2005},
    pages        = {420--423},

    }
[99] Rimer, Michael E. and Martinez, Tony R., Softprop: Softmax Neural Network Backpropagation Learning, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'04, 979--984, 2004. [Bibtex]
@inproceedings{rimer.ijcnn2004.softprop,
    author        = {Rimer, Michael E. and Martinez, Tony R.},
    title        = {Softprop: Softmax Neural Network Backpropagation Learning},
    booktitle    = {Proceedings of the IEEE International Joint Conference on Neural Networks {IJCNN}'04},
    pages        = {979--984},
    year        = {2004},
    }
[100] Whiting, Stephen and Ventura, Dan, Learning Multiple Correct Classifications from Incomplete Data using Weakened Implicit Negatives, Proceedings of the International Joint Conference on Neural Networks, 2953--2958, 2004, July. [Abstract] [Bibtex]
Classification problems with output class overlap create problems for standard neural network approaches. We present a modification of a simple feedforward neural network that is capable of learning problems with output overlap, including problems exhibiting hierarchical class structures in the output. Our method of applying weakened implicit negatives to address overlap and ambiguity allows the algorithm to learn a large portion of the hierarchical structure from very incomplete data. Our results show an improvement of approximately 58% over a standard backpropagation network on the hierarchical problem.
@inproceedings{whiting.ijcnn04,
    author        = {Whiting, Stephen and Ventura, Dan},
    title        = {Learning Multiple Correct Classifications from Incomplete Data using Weakened Implicit Negatives},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    month        = {July},
    year        = {2004},
    pages        = {2953--2958},

    }
[101] Morring, Brent D. and Martinez, Tony R., Weighted Instance Typicality Search (WITS): A Nearest Neighbor Data Reduction Algorithm, Intelligent Data Analysis, 8, 1, 61--78, 2004. [Abstract] [Bibtex]
Two disadvantages of the standard nearest neighbor algorithm are 1) it must store all the instances of the training set, thus creating a large memory footprint and 2) it must search all the instances of the training set to predict the classification of a new query point, thus it is slow at run time. Much work has been done to remedy these shortcomings. This paper presents a new algorithm {WITS} (Weighted-Instance Typicality Search) and a modified version, Clustered-{WITS} (C-{WITS}), designed to address these issues. Data reduction algorithms address both issues by storing and using only a portion of the available instances. {WITS} is an incremental data reduction algorithm with O(n2) complexity, where n is the training set size. {WITS} uses the concept of Typicality in conjunction with Instance-Weighting to produce minimal nearest neighbor solutions. {WITS} and C-{WITS} are compared to three other state of the art data reduction algorithms on ten real-world datasets. {WITS} achieved the highest average accuracy, showed fewer catastrophic failures, and stored an average of 71\% fewer instances than {DROP}-5, the next most competitive algorithm in terms of accuracy and catastrophic failures. The C-{WITS} algorithm provides a user-defined parameter that gives the user control over the training-time vs. accuracy balance. This modification makes C-{WITS} more suitable for large problems, the very problems data reductions algorithms are designed for. On two large problems (10,992 and 20,000 instances), C-{WITS} stores only a small fraction of the instances (0.88\% and 1.95\% of the training data) while maintaining generalization accuracies comparable to the best accuracies reported for these problems.
@article{MorringIDA,
    title        = {Weighted Instance Typicality Search ({WITS}): A Nearest Neighbor Data Reduction Algorithm},
    author        = {Morring, Brent D. and Martinez, Tony R.},
    journal        = {Intelligent Data Analysis},
    volume        = {8},
    number        = {1},
    year        = {2004},
    pages        = {61--78},

    }
[102] Fulda, Nancy and Ventura, Dan, Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests, Proceedings of the International Joint Conference on Neural Networks, 1121--1126, 2004, July. [Abstract] [Bibtex]
We present an equilibrium selection algorithm for reinforcement learning agents that incrementally adjusts the probabilityof executing each action based on the desirability of the outcome obtained in the last time step. The algorithm assumes that at least one coordination equilibrium exists and requires that the agents have a heuristic for determining whether or not the equilibrium was obtained. In deterministic environments with one or more strict coordination equilibria, the algorithm will learn to play an optimal equilibrium as long as the heuristic is accurate. Empirical data demonstrate that the algorithm is also effective in stochastic environments and is able to learn good joint policies when the heuristic's parameters are estimated during learning, rather than known in advance.
@inproceedings{fulda.ijcnn04,
    author        = {Fulda, Nancy and Ventura, Dan},
    title        = {Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    month        = {July},
    year        = {2004},
    pages        = {1121--1126},

    }
[103] Zeng, Xinchuan and Martinez, Tony R., Feature Weighting Using Neural Networks, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'04, 1327--1330, 2004. [Abstract] [Bibtex]
In this work we propose a feature weighting method for classification tasks by extracting relevant information from a trained neural network. This method weights an attribute based on strengths (weights) of related links in the neural network, in which an important feature is typically connected to strong links and has more impact on the outputs. This method is applied to feature weighting for the nearest neighbor classifier and is tested on 15 real-world classification tasks. The results show that it can improve the nearest neighbor classifier on 14 of the 15 tested tasks, and also outperforms the neural network on 9 tasks.
@inproceedings{zeng_martinez_ijcnn04,
    title        = {Feature Weighting Using Neural Networks},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'04},
    pages         = {1327--1330},
    year        = {2004},

    }
[104] Menke, Joshua and Martinez, Tony R., Using Permutations Instead of Student's t Distribution for p-values in Paired-Difference Algorithm Comparisons., Proceedings of the 2004 IEEE Joint Conference on Neural Networks IJCNN'04, 2004. [Bibtex]
@inproceedings{menke_2004_permutations,
    author        = {Menke, Joshua and Martinez, Tony R.},
    year        = {2004},
    title        = {Using Permutations Instead of Student's t Distribution for p-values in Paired-Difference Algorithm Comparisons.},
    booktitle    = {Proceedings of the 2004 IEEE Joint Conference on Neural Networks {IJCNN}'04},
    }
[105] Richards, Mark and Ventura, Dan, Choosing a Starting Configuration for Particle Swarm Optimization, Proceedings of the Joint Conference on Neural Networks, 2309--2312, 2004, July. [Abstract] [Bibtex]
The performance of Particle Swarm Optimization can be improved by strategically selecting the starting positions of the particles. This work suggests the use of generators from centroidal Voronoi tessellations as the starting points for the swarm. The performance of swarms initialized with this method is compared with the standard PSO algorithm on several standard test functions. Results suggest that CVT initialization improves PSO performance in high-dimensional spaces.
@article{richards.ijcnn04,
    author        = {Richards, Mark and Ventura, Dan},
    title        = {Choosing a Starting Configuration for Particle Swarm Optimization},
    journal        = {Proceedings of the Joint Conference on Neural Networks},
    month        = {July},
    year        = {2004},
    pages        = {2309--2312},

    }
[106] Richards, Mark and Ventura, Dan, Dynamic Sociometry in Particle Swarm Optimization, Proceedings of the Joint Conference on Information Sciences, 1557--1560, 2003, September. [Abstract] [Bibtex]
The performance of Particle Swarm Optimization is greatly affected by the size and sociometry of the swarm. This research proposes a dynamic sociometry, which is shown to be more effective on some problems than the standard star and ring sociometries. The performance of various combinations of swarm size and sociometry on six different test functions is qualitatively analyzed.
@article{richards.jcis03,
    author        = {Richards, Mark and Ventura, Dan},
    title        = {Dynamic Sociometry in Particle Swarm Optimization},
    journal        = {Proceedings of the Joint Conference on Information Sciences},
    month        = {September},
    year        = {2003},
    pages        = {1557--1560},

    }
[107] Fulda, Nancy and Ventura, Dan, Concurrently Learning Neural Nets: Encouraging Optimal Behavior in Reinforcement Learning Systems., IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement, and Related Applications (SCIMA), 2003, May. [Bibtex]
@incollection{fulda_2003a,
    title        = {Concurrently Learning Neural Nets: Encouraging Optimal Behavior in Reinforcement Learning Systems.},
    author        = {Fulda, Nancy and Ventura, Dan},
    booktitle    = {{IEEE} International Workshop on Soft Computing Techniques in Instrumentation, Measurement, and Related Applications ({SCIMA})},
    month        = {May},
    year        = {2003},
    }
[108] Zeng, Xinchuan and Martinez, Tony R., A Noise Filtering Method Using Neural Networks, Proceedings of the International Workshop of Soft Computing Techniques in Instrumentation, Measurement and Related Applications, 2003. [Abstract] [Bibtex]
During the data collecting and labeling process it is possible for noise to be introduced into a data set. As a result, the quality of the data set degrades and experiments and inferences derived from the data set become less reliable. In this paper we present an algorithm, called {ANR} (automatic noise reduction), as a filtering mechanism to identify and remove noisy data items whose classes have been mislabeled. The underlying mechanism behind {ANR} is based on a framework of multi-layer artificial neural networks. {ANR} assigns each data item a soft class label in the form of a class probability vector, which is initialized to the original class label and can be modified during training. When the noise level is reasonably small (<30%), the non-noisy data is dominant in determining the network architecture and its output, and thus a mechanism for correcting mislabeled data can be provided by aligning class probability vector with the network output. With a learning procedure for class probability vector based on its difference from the network output, the probability of a mislabeled class gradually becomes smaller while that of the correct class becomes larger, which eventually causes a correction of mislabeled data after sufficient training. After training, those data items whose classes have been relabeled are then treated as noisy data and removed from the data set. We evaluate the performance of the {ANR} based on 12 data sets drawn from the {UCI} data repository. The results show that {ANR} is capable of identifying a significant portion of noisy data. An average increase in accuracy of 24.5% can be achieved at a noise level of 25% by using {ANR} as a training data filter for a nearest neighbor classifier, as compared to the one without using {ANR}.
@inproceedings{zeng.scima2003,
    title        = {A Noise Filtering Method Using Neural Networks},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Workshop of Soft Computing Techniques in Instrumentation, Measurement and Related Applications},
    year        = {2003},

    }
[109] Wilson, D. Randall and Martinez, Tony R., The General Inefficiency of Batch Training for Gradient Descent Learning, Neural Networks, 16, 10, 1429--1451, 2003, Elsevier Science Ltd., Oxford, UK, UK, 0893-6080. [Abstract] [Bibtex]
Gradient descent training of neural networks can be done in either a batch or on-line manner. A widely held myth in the neural network community is that batch training is as fast or faster and/or more ``correct'' than on-line training because it supposedly uses a better approximation of the true gradient for its weight updates. This paper explains why batch training is almost always slower than on-line training often orders of magnitude slower especially on large training sets. The main reason is due to the ability of on-line training to follow curves in the error surface throughout each epoch, which allows it to safely use a larger learning rate and thus converge with less iterations through the training data. Empirical results on a large (20,000-instance) speech recognition task and on 26 other learning tasks demonstrate that convergence can be reached significantly faster using on-line training than batch training, with no apparent difference in accuracy.
@article{Wilson.nn03.batch,
    title        = {The General Inefficiency of Batch Training for Gradient Descent Learning},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    journal        = {Neural Networks},
    volume        = {16},
    number        = {10},
    year        = {2003},
    pages        = {1429--1451},
    publisher    = {Elsevier Science Ltd.},
    address        = {Oxford, UK, UK},
    issn        = {0893-6080},

    }
[110] Menke, Joshua and Martinez, Tony R., Simplifying OCR Neural Networks Through Oracle Learning, Proceedings of the 2003 IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement, and Related Applications, 2003, Provo, UT, U.S.A., IEEE Press. [Bibtex]
@inproceedings{menke_2003a,
    title        = {Simplifying {OCR} Neural Networks Through Oracle Learning},
    author        = {Menke, Joshua and Martinez, Tony R.},
    booktitle    = {Proceedings of the 2003 {IEEE} International Workshop on Soft Computing Techniques in Instrumentation, Measurement, and Related Applications},
    year        = {2003},
    location    = {Provo, UT, U.S.A.},
    publisher    = {IEEE Press},
    }
[111] Fulda, Nancy and Ventura, Dan, Dynamic Joint Action Perception for Q-Learning Agents., To Appear in Proceedings of the 2003 International Conference on Machine Learning and Applications, Los Angeles, CA, 2003. [Bibtex]
@inproceedings{fulda_2003b,
    title        = {Dynamic Joint Action Perception for Q-Learning Agents.},
    author        = {Fulda, Nancy and Ventura, Dan},
    booktitle    = {To Appear in Proceedings of the 2003 International Conference on Machine Learning and Applications, Los Angeles, {CA}},
    year        = {2003},
    }
[112] Fulda, Nancy and Ventura, Dan, Target Sets: A Tool for Understanding and Predicting the Behavior of Interacting Q-learners, Proceedings of the Joint Conference on Information Sciences, 1549--1552, 2003, September. [Abstract] [Bibtex]
Reinforcement learning agents that interact in a common environment frequently affect each others' perceived transition and reward distributions. This can result in convergence of the agents to a sub-optimal equilibrium or even to a solution that is not an equilibrium at all. Several modifications to the Q-learning algorithm have been proposed which enable agents to converge to optimal equilibria under specified conditions. This paper presents the concept of target sets as an aid to understanding why these modifications have been successful and as a tool to assist in the development of new modifications which are applicable in a wider range of situations.
@inproceedings{fulda.jcis03,
    author        = {Fulda, Nancy and Ventura, Dan},
    title        = {Target Sets: A Tool for Understanding and Predicting the Behavior of Interacting Q-learners},
    booktitle    = {Proceedings of the Joint Conference on Information Sciences},
    month        = {September},
    year        = {2003},
    pages        = {1549--1552},

    }
[113] Ricks, Bob and Ventura, Dan, Training a Quantum Neural Network, Neural Information Processing Systems, 1019--1026, 2003, December. [Abstract] [Bibtex]
Quantum learning holds great promise for the field of machine intelligence. The most studied quantum learning algorithm is the quantum neural network. Many such models have been proposed, yet none has become a standard. In addition, these models usually leave out many details, often excluding how they intend to train their networks. This paper discusses one approach to the problem and what advantages it would have over classical networks.
@inproceedings{ricks.nips03,
    author        = {Ricks, Bob and Ventura, Dan},
    title        = {Training a Quantum Neural Network},
    booktitle    = {Neural Information Processing Systems},
    month        = {December},
    year        = {2003},
    pages        = {1019--1026},

    }
[114] Menke, Joshua and Peterson, Adam H. and Rimer, Michael E. and Martinez, Tony R., Neural Network Simplification Through Oracle Learning, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'02 , 2482--2497, 2002, Honolulu, Hawaii, U.S.A., IEEE Press. [Abstract] [Bibtex]
Often the best artificial neural network to solve a real world problem is relatively complex. However, with the growing popularity of smaller computing devices (handheld computers, cellular telephones, automobile interfaces, etc.), there is a need for simpler models with comparable accuracy. The following research presents evidence that using a larger model as an oracle to train a smaller model on unlabeled data results in 1) a simpler acceptable model and 2) improved results over standard training methods on a similarly sized smaller model. On automated spoken digit recognition, oracle learning resulted in an artificial neural network of half the size that 1) maintained comparable accuracy to the larger neural network, and 2) obtained up to a 25% decrease in error over standard training methods.
@inproceedings{menke_2002b,
    title        = {Neural Network Simplification Through Oracle Learning},
    author        = {Menke, Joshua and Peterson, Adam H. and Rimer, Michael E. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'02 },
    pages        = {2482--2497},
    year        = {2002},
    location    = {Honolulu, Hawaii, U.S.A.},
    publisher    = {IEEE Press},

    }
[115] Rimer, Michael E. and Martinez, Tony R. and Wilson, D. Randall, Improving Speech Recognition Learning through Lazy Training, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'02, 2568--2573, 2002. [Abstract] [Bibtex]
Backpropagation, like most high-order learning algorithms, is prone to overfitting. We present a novel approach, called lazy training, for reducing the overfit in multiple-output networks. Lazy training has been shown to reduce the error of optimized neural networks more than half on a large {OCR} data set and on several problems from the {UCI} machine learning database. Here, lazy training is shown to be effective in a multi-layered adaptive learning system, reducing the error of an optimized backpropagation network in a speech recognition system by 55.0% on the {TIDIGITS} corpus.
@inproceedings{rimer.ijcnn02,
    title        = {Improving Speech Recognition Learning through Lazy Training},
    author        = {Rimer, Michael E. and Martinez, Tony R. and Wilson, D. Randall},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'02},
    pages        = {2568--2573},
    year        = {2002},

    }
[116] Rimer, Michael E., Lazy Training: Interactive Classification Learning, 2002, Brigham Young University, April. [Bibtex]
@mastersthesis{rimer.thesis2002,
    author        = {Rimer, Michael E.},
    title        = {Lazy Training: Interactive Classification Learning},
    school        = {Brigham Young University},
    month        = {April},
    year        = {2002},
    }
[117] Henderson, Eric and Martinez, Tony R., Pair Attribute Learning: Network Construction Using Pair Features, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'02, 2556--2561, 2002. [Abstract] [Bibtex]
We present the Pair Attribute Learning ({PAL}) algorithm for the selection of relevant inputs and network topology. Correlations on training instance pairs are used to drive network construction of a single-hidden layer {MLP}. Results on nine learning problems demonstrate 70% less complexity, on average, without a significant loss of accuracy.
@inproceedings{EricIJCNN,
    title        = {Pair Attribute Learning: Network Construction Using Pair Features},
    author        = {Henderson, Eric and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'02},
    pages        = {2556--2561},
    year        = {2002},

    }
[118] Menke, Joshua, Neural Network Simplification Through Oracle Learning, 2002, Brigham Young University, November. [Bibtex]
@mastersthesis{menke_2002a,
    title        = {Neural Network Simplification Through Oracle Learning},
    author        = {Menke, Joshua},
    school    = {Brigham Young University},
    month        = {November},
    year        = {2002},
    }
[119] Ventura, Dan, Pattern Classification Using a Quantum System, Proceedings of the Joint Conference on Information Sciences, 537--640, 2002, March. [Abstract] [Bibtex]
We consider and compare three approaches to quantum pattern classification, presenting empirical results from simulations.
@inproceedings{ventura.jcis02,
    title        = {Pattern Classification Using a Quantum System},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the Joint Conference on Information Sciences},
    pages        = {537--640},
    month        = {March},
    year        = {2002},

    }
[120] Istook, Butch and Martinez, Tony R., Improved Backpropagation Learning in Neural Networks with Windowed Momentum, International Journal of Neural Systems, 3&4, 303--318, 2002. [Abstract] [Bibtex]
Backpropagation, which is frequently used in Neural Network training, often takes a great deal of time to converge on an acceptable solution. Momentum is a standard technique that is used to speed up convergence and maintain generalization performance. In this paper we present the Windowed momentum algorithm, which increases speedup over standard momentum. Windowed momentum is designed to use a fixed width history of recent weight updates for each connection in a neural network. By using this additional information, Windowed momentum gives significant speed-up over a set of applications with same or improved accuracy. Windowed Momentum achieved an average speed-up of 32% in convergence time on 15 data sets, including a large {OCR} data set with over 500,000 samples. In addition to this speedup, we present the consequences of sample presentation order. We show that Windowed momentum is able to overcome these effects that can occur with poor presentation order and still maintain its speed-up advantages.
@article{IstookIJNS,
    title        = {Improved Backpropagation Learning in Neural Networks with Windowed Momentum},
    author        = {Istook, Butch and Martinez, Tony R.},
    journal    = {International Journal of Neural Systems},
    pages        = {303--318},
    volume        = {3&4},
    year        = {2002},

    }
[121] Ventura, Dan, Probabilistic Connections in Relaxation Networks, Proceedings of the International Joint Conference on Neural Networks, 934--938, 2002, May. [Abstract] [Bibtex]
This paper reports results from studying the behavior of Hopfield-type networks with probabilistic connections. As the probabilities decrease, network performance degrades. In order to compensate, two network modifications --- input persistence and a new activation function --- are suggested, and empirical results indicate that the modifications significantly improve network performance.
@inproceedings{ventura.ijcnn02,
    title        = {Probabilistic Connections in Relaxation Networks},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    pages        = {934--938},
    month        = {May},
    year        = {2002},

    }
[122] Zeng, Xinchuan and Martinez, Tony R., Optimization by Varied Beam Search in Hopfield Networks, Proceedings of the IEEE International Joint Conference on Neural Networks, 913--918, 2002. [Abstract] [Bibtex]
This paper shows that the performance of the \emph{Hopfield network} for solving optimization problems can be improved by a varied beam search algorithm. The algorithm varies the beam search size and beam intensity during the network relaxation process. It consists of two stages: increasing the beam search parameters in the first stage and then decreasing them in the second stage. The purpose of using such a scheme is to provide the network with a better chance to find more and better solutions. A large number of simulation results based on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem} demonstrated that it is capable of increasing the percentage of valid tours by 28.3% and reducing the error rate by 40.8%, compared to the original Hopfield network.
@inproceedings{zeng.ijcnn2002,
    title        = {Optimization by Varied Beam Search in Hopfield Networks},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks},
    pages        = {913--918},
    year        = {2002},

    }
[123] Peterson, Todd S. and Owens, Nancy and Carroll, James L., Towards Automatic Shaping in Robot Navigation., Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2001. [Bibtex]
@inproceedings{fulda_2001a,
    title        = {Towards Automatic Shaping in Robot Navigation.},
    author        = {Peterson, Todd S. and Owens, Nancy and Carroll, James L.},
    booktitle    = {Proceedings of the {IEEE} International Conference on Robotics and Automation ({ICRA})},
    year        = {2001},
    }
[124] Carroll, James L. and Peterson, Todd S. and Owens, Nancy, Memory-guided Exploration in Reinforcement Learning., Proceedings of the INNS-IEEE International Joint Conference on Neural Networks (IJCNN), 2001. [Bibtex]
@inproceedings{fulda_2001b,
    title        = {Memory-guided Exploration in Reinforcement Learning.},
    author        = {Carroll, James L. and Peterson, Todd S. and Owens, Nancy},
    booktitle    = {Proceedings of the {INNS}-{IEEE} International Joint Conference on Neural Networks ({IJCNN})},
    year        = {2001},
    }
[125] Clift, Fred and Martinez, Tony R., Improved Hopfield Nets by Training with Noisy Data, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'01, 1138--1143, 2001. [Bibtex]
@inproceedings{Cliftijcnn2001,
    title        = {Improved Hopfield Nets by Training with Noisy Data},
    author        = {Clift, Fred and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'01},
    pages        = {1138--1143},
    year        = {2001},
    }
[126] Wilson, D. Randall and Martinez, Tony R., The Need for Small Learning Rates on Large Problems, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'01, 115--119, 2001. [Abstract] [Bibtex]
In gradient descent learning algorithms such as error backpropagation, the learning rate parameter can have a significant effect on generalization accuracy. In particular, decreasing the learning rate below that which yields the fastest convergence can significantly improve generalization accuracy, especially on large, complex problems. The learning rate also directly affects training speed, but not necessarily in the way that many people expect. Many neural network practitioners currently attempt to use the largest learning rate that still allows for convergence, in order to improve training speed. However, a learning rate that is too large can be as slow as a learning rate that is too small, and a learning rate that is too large or too small can require orders of magnitude more training time than one that is in an appropriate range. This paper illustrates how the learning rate affects training speed and generalization accuracy, and thus gives guidelines on how to efficiently select a learning rate that maximizes generalization accuracy.
@inproceedings{wilson.ijcnn2001,
    title        = {The Need for Small Learning Rates on Large Problems},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'01},
    pages        = {115--119},
    year        = {2001},

    }
[127] Ventura, Dan, A Quantum Analog to Basis Function Networks, Proceedings of the International Conference on Computing Anticipatory Systems, 286--295, 2001, August. [Abstract] [Bibtex]
A Fourier-based quantum computational learning algorithm with similarities to classical basis function networks is developed. Instead of a Gaussian basis, the quantum algorithm uses a discrete Fourier basis with the output being a linear combination of the basis. A set of examples is considered as a quantum system that undergoes unitary transformations to produce learning. The main result of the work is a quantum computational learning algorithm that is unique among quantum algorithms as it does not assume \emph{a priori} knowledge of a function $f$.
@inproceedings{ventura.casys01,
    title        = {A Quantum Analog to Basis Function Networks},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the International Conference on Computing Anticipatory Systems},
    pages        = {286--295},
    month        = {August},
    year        = {2001},

    }
[128] Andersen, Tim L. and Martinez, Tony R., DMP3: A Dynamic Multi-Layer Perceptron Construction Algorithm, International Journal of Neural Systems, 2, 145--166, 2001. [Bibtex]
@article{AndersenIJNS,
    title        = {{DMP3}: A Dynamic Multi-Layer Perceptron Construction Algorithm},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    journal    = {International Journal of Neural Systems},
    pages        = {145--166},
    volume        = {2},
    year        = {2001},
    }
[129] Zeng, Xinchuan and Martinez, Tony R., Improving the Hopfield Network through Beam Search, Proceedings of the IEEE International Joint Conference on Neural Networks, 1162--1167, 2001. [Abstract] [Bibtex]
In this paper we propose a beam search mechanism to improve the performance of the \emph{Hopfield network} for solving optimization problems. The beam search readjusts the top {\bf M (M > 1) } activated neurons to more similar activation levels in the early phase of relaxation, so that the network has the opportunity to explore more alternative, potentially better solutions. We evaluated this approach using a large number of simulations (20,000 for each parameter setting), based on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem}. The results show that the beam search has the capability of significantly improving the network performance over the original Hopfield network, increasing the percentage of valid tours by 17.0% and reducing error rate by 24.3%.
@inproceedings{zeng.ijcnn2001,
    title        = {Improving the Hopfield Network through Beam Search},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks},
    pages        = {1162--1167},
    year        = {2001},

    }
[130] Rimer, Michael E. and Andersen, Tim L. and Martinez, Tony R., Improving Backpropagation Ensembles through Lazy Training, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'01, 2007--2012, 2001. [Abstract] [Bibtex]
Backpropagation, similar to most high-order learning algorithms, is prone to overfitting. We address this issue by introducing interactive training ({IT}), a logical extension to backpropagation training that employs interaction among multiple networks. This method is based on the theery that centralized control is more effective for learning in deep problem spaces in a multigent paradigm. {IT} methods allow networks to work together to form more complex systems while not restraining their individual ability to specialize. Lazy training, an implementation of {IT} that minimizes misclassification error, is presented. Lazy training discourages overfitting and is conducive to higher accuracy in multiclass problems than standard backpropagation. Experiments on a large, real world {OCR} data set have shown interactive training to significantly increase generalization accuracy, from 97.86% to 99.11%. These results are supported by theoretical and conceptual extensions from algorithmic to interactive training models.
@inproceedings{rimer.ijcnn01a,
    title        = {Improving Backpropagation Ensembles through Lazy Training},
    author        = {Rimer, Michael E. and Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'01},
    pages        = {2007--2012},
    year        = {2001},

    }
[131] Andersen, Tim L. and Rimer, Michael E. and Martinez, Tony R., Optimal Artificial Neural Network Architecture Selection for Voting, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'01, 790--795, 2001. [Abstract] [Bibtex]
This paper studies the performance of standard architecture selection strategies, such as cost/performance and {CV} based strategies, for voting methods such as bagging. It is shown that standard architecture selection strategies are not optimal for voting methods and tend to underestimate the complexity of the optimal network architecture, since they only examine the performance of the network on an individual basis and do not consider the correlation between responses from multiple networks.
@inproceedings{andersen.ijcnn01.oas,
    title        = {Optimal Artificial Neural Network Architecture Selection for Voting},
    author        = {Andersen, Tim L. and Rimer, Michael E. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'01},
    pages        = {790--795},
    year        = {2001},

    }
[132] Andersen, Tim L. and Martinez, Tony R., Optimal Artificial Neural Network Architecture Selection for Bagging, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'01, 790--795, 2001. [Abstract] [Bibtex]
This paper studies the performance of standard architecture selection strategies, such as cost/performance and {CV} based strategies, for voting methods such as bagging. It is shown that standard architecture selection strategies are not optimal for voting methods and tend to underestimate the complexity of the optimal network architecture, since they only examine the performance of the network on an individual basis and do not consider the correlation between responses from multiple networks.
@inproceedings{andersen_2001b,
    title        = {Optimal Artificial Neural Network Architecture Selection for Bagging},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'01},
    pages        = {790--795},
    year        = {2001},

    }
[133] Zeng, Xinchuan and Martinez, Tony R., Graded Rescaling in Hopfield Networks, Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, 63--67, 2001. [Abstract] [Bibtex]
In this work we propose a method with the capability of improving the performance of the \emph{Hopfield network} for solving optimization problems by using a \emph{graded rescaling} scheme on the distance matrix of the energy function. This method controls the magnitude of rescaling by adjusting a parameter (scaling factor) in order to explore the optimal range for performance. We have evaluated different scaling factors through 20,000 simulations, based on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem}. The results show that the graded rescaling can improve the performance significantly for a wide range of scaling factors. It increases the percentage of valid tours by 72.2%, reduces the error rate of tour length by 10.2%, and increases the chance of finding optimal tours by 39.0%, as compared to the original Hopfield network without rescaling.
@inproceedings{zeng.icannga2001,
    title        = {Graded Rescaling in Hopfield Networks},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms},
    pages        = {63--67},
    year        = {2001},

    }
[134] Rimer, Michael E. and Andersen, Tim L. and Martinez, Tony R., Speed Training: Improving Learning Speed for Large Data Sets, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'01, 2662--2666, 2001. [Abstract] [Bibtex]
Artificial neural networks provide an effective empirical predictive model for pattern classification. However, using complex neural networks to learn very large training sets is often problematic, imposing prohibitive time constraints on the training process. We present four practical methods for dramatically decreasing training time through dynamic stochastic sample presentation, a technique we call speed training. These methods are shown to be robust to retaining generalization accuracy over a diverse collection of real world data sets. In particular, the {SET} technique achieves a training speedup of 4278% on a large {OCR} database with no detectable loss in generalization.
@inproceedings{rimer.ijcnn01b,
    title        = {Speed Training: Improving Learning Speed for Large Data Sets},
    author        = {Rimer, Michael E. and Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'01},
    pages        = {2662--2666},
    year        = {2001},

    }
[135] Owens, Nancy and Peterson, Todd S., Using a Reinforcement Learning Controller to Overcome Simulator/Environment Discrepancies., Proceedings of the IEEE Conference on Systems, Man, and Cybernetics, 2001. [Bibtex]
@inproceedings{fulda_2001c,
    title        = {Using a Reinforcement Learning Controller to Overcome Simulator/Environment Discrepancies.},
    author        = {Owens, Nancy and Peterson, Todd S.},
    booktitle    = {Proceedings of the {IEEE} Conference on Systems, Man, and Cybernetics},
    year        = {2001},
    }
[136] Ventura, Dan, On the Utility of Entanglement in Quantum Neural Computing, Proceedings of the International Joint Conference on Neural Networks, 1565--1570, 2001, July. [Abstract] [Bibtex]
Efforts in combining quantum and neural computation are briefly discussed and the concept of entanglement as it applies to this subject is addressed. Entanglement is perhaps the least understood aspect of quantum systems used for computation, yet it is apparently most responsible for their computational power. This paper argues for the importance of understanding and utilizing entanglement in quantum neural computation.
@inproceedings{ventura.ijcnn01,
    title        = {On the Utility of Entanglement in Quantum Neural Computing},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    pages        = {1565--1570},
    month        = {July},
    year        = {2001},

    }
[137] Zeng, Xinchuan and Martinez, Tony R., An Algorithm for Correcting Mislabeled Data, Intelligent Data Analysis, 5, 491--502, 2001. [Abstract] [Bibtex]
Reliable evaluation for the performance of classifiers depends on the quality of the data sets on which they are tested. During the collecting and recording of a data set, however, some noise may be introduced into the data, especially in various real-world environments, which can degrade the quality of the data set. In this paper, we present a novel approach, called {\bf {ADE}} (automatic data enhancement), to correct mislabeled data in a data set. In addition to using multi-layer neural networks trained by backpropagation as the basic framework, {\bf {ADE}} assigns each training pattern a class probability vector as its class label, in which each component represents the probability of the corresponding class. During training, {\bf {ADE}} constantly updates the probability vector based on its difference from the output of the network. With this updating rule, the probability of a mislabeled class gradually becomes smaller while that of the correct class becomes larger, which eventually causes the correction of mislabeled data after a number of training epochs. We have tested {\bf {ADE}} on a number of data sets drawn from the {\bf {UCI}} data repository for nearest neighbor classifiers. The results show that for most data sets, when there exists mislabeled data, a classifier constructed using a training set corrected by {\bf {ADE}} can achieve significantly higher accuracy than that without using {\bf {ADE}}.
@article{zeng.ida2001,
    title        = {An Algorithm for Correcting Mislabeled Data},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    journal    = {Intelligent Data Analysis},
    pages        = {491--502},
    volume        = {5},
    year        = {2001},

    }
[138] Zeng, Xinchuan and Martinez, Tony R., Using a Neural Network to Approximate an Ensemble of Classifiers, Neural Processing Letters, 12, 225--237, 2000. [Abstract] [Bibtex]
Several methods (e.g., Bagging, Boosting) of constructing and combining an ensemble of classifiers have recently been shown capable of improving accuracy of a class of commonly used classifiers (e.g., decision trees, neural networks). The accuracy gain achieved, however, is at the expense of a higher requirement for storage and computation. This storage and computation overhead can decrease the utility of these methods when applied to real-world situations. In this paper, we propose a learning approach which allows a single neural network to approximate a given ensemble of classifiers. Experiments on a large number of real-world data sets show that this approach can substantially save storage and computation while still maintaining accuracy similar to that of the entire ensemble.
@article{zeng.npl2000,
    title        = {Using a Neural Network to Approximate an Ensemble of Classifiers},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    journal    = {Neural Processing Letters},
    pages        = {225--237},
    volume        = {12},
    year        = {2000},

    }
[139] Wilson, D. Randall and Martinez, Tony R., An Integrated Instance-Based Learning Algorithm, Computational Intelligence, 1, 1--28, 2000. [Abstract] [Bibtex]
The basic nearest-neighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This paper proposes methods for overcoming each of these weaknesses and combines these methods into a comprehensive learning system called the Integrated Decremental Instance-Based Learning Algorithm ({IDIBL}) that seeks to reduce storage, improve execution speed, and increase generalization accuracy, when compared to the basic nearest neighbor algorithm and other learning models. {IDIBL} tunes its own parameters using a new measure of fitness that combines confidence and cross-validation ({CVC}) accuracy in order to avoid discretization problems with more traditional leave-one-out cross-validation ({LCV}). In our experiments {IDIBL} achieves higher generalization accuracy than other less comprehensive instance-based learning algorithms, while requiring less than one fourth the storage of the nearest neighbor algorithm and improving execution speed by a corresponding factor. In experiments on 21 datasets, {IDIBL} also achieves higher generalization accuracy than those reported for 16 major machine learning and neural network models.
@article{wilson.ci2000.idibl,
    title        = {An Integrated Instance-Based Learning Algorithm},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    journal    = {Computational Intelligence},
    pages        = {1--28},
    volume        = {1},
    year        = {2000},

    }
[140] Wilson, D. Randall and Martinez, Tony R., The Inefficiency of Batch Training for Large Training Sets,, Proceedings of the International Joint Conference on Neural Networks (IJCNN2000), II, 113--117, 2000, July. [Abstract] [Bibtex]
Multilayer perceptrons are often trained using error backpropagation ({BP}). {BP} training can be done in either a batch or continuous manner. Claims have frequently been made that batch training is faster and/or more ``correct'' than continuous training because it uses a better approximation of the true gradient for its weight updates. These claims are often supported by empirical evidence on very small data sets. These claims are untrue, however, for large training sets. This paper explains why batch training is much slower than continuous training for large training sets. Various levels of semi-batch training used on a 20,000-instance speech recognition task show a roughly linear increase in training time required with an increase in batch size.
@inproceedings{wilson.ijcnn2000.batch,
    title        = {The Inefficiency of Batch Training for Large Training Sets,},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks ({IJCNN2000})},
    pages        = {113--117},
    volume        = {{II}},
    month        = {July},
    year        = {2000},

    }
[141] Zeng, Xinchuan and Martinez, Tony R., Rescaling the Energy Function in the Hopfield Network, Proceedings of the IEEE International Joint Conference on Neural Networks, 6, 498--502, 2000. [Abstract] [Bibtex]
In this paper we propose an approach that rescales the distance matrix of the energy function in the \emph{Hopfield network} for solving optimization problems. We rescale the distance matrix by normalizing each row in the matrix and then adjusting the parameter for the distance term. This scheme has the capability of reducing the effects of clustering in data distributions, which is one of main reasons for the formation of invalid solutions. We evaluate this approach through a large number (20,000) simulations based on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem}. The result shows that, compared to those using the original Hopfield network, rescaling is capable of increasing the percentage of valid tours by 17.6%, reducing the error rate of tour length by 11.9%, and increasing the chance of finding optimal tours by 14.3%.
@inproceedings{zeng.ijcnn2000,
    title        = {Rescaling the Energy Function in the Hopfield Network},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks},
    pages        = {498--502},
    volume        = {6},
    year        = {2000},

    }
[142] Zeng, Xinchuan and Martinez, Tony R., Distribution-Balanced Stratified Cross-Validation for Accuracy Estimation, Journal of Experimental and Theoretical Artificial Intelligence, 12, 1--12, 2000. [Abstract] [Bibtex]
Cross-validation has often been applied in machine learning research for estimating the accuracies of classifiers. In this work, we propose an extension to this method, called \emph{distribution-balanced stratified cross-validation} ({\bf DBSCV}), which improves the estimation quality by providing balanced intraclass distributions when partitioning a data set into multiple folds. We have tested {\bf DBSCV} on nine real-world and three artificial domains using the {\bf C4.5} decision trees classifier. The results show that {\bf DBSCV} performs better (has smaller biases) than the regular stratified cross-validation in most cases, especially when the number of folds is small. The analysis and experiments based on three artificial data sets also reveal that {\bf DBSCV} is particularly effective when multiple intraclass clusters exist in a data set.
@article{zeng.jetai2000,
    title        = {Distribution-Balanced Stratified Cross-Validation for Accuracy Estimation},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    journal    = {Journal of Experimental and Theoretical Artificial Intelligence},
    pages        = {1--12},
    volume        = {12},
    year        = {2000},

    }
[143] Howell, John and Yeazell, John and Ventura, Dan, Optically Simulating a Quantum Associative Memory, Physical Review A, 62, 2000, Article 42303. [Abstract] [Bibtex]
This paper discusses the realization of a quantum associative memory using linear integrated optics. An associative memory produces a full pattern of bits when presented with only a partial pattern. Quantum computers have the potential to store large numbers of patterns and hence have the ability to far surpass any classical neural network realization of an associative memory. In this work two 3-qubit associative memories will be discussed using linear integrated optics. In addition, corrupted, invented and degenerate memories are discussed.
@article{howell.pra00,
    title        = {Optically Simulating a Quantum Associative Memory},
    author        = {Howell, John and Yeazell, John and Ventura, Dan},
    journal    = {Physical Review A},
    volume        = {62},
    year        = {2000},
    note        = {Article 42303},

    }
[144] Ezhov, Alexandr and Nifanova, A. and Ventura, Dan, Distributed Queries for Quantum Associative Memory, Information Sciences , 3-4, 271--293, 2000. [Abstract] [Bibtex]
This paper discusses a model of quantum associative memory which generalizes the completing associative memory proposed by Ventura and Martinez. Similar to this model, our system is based on Grover's well known algorithm for searching an unsorted quantum database. However, the model presented in this paper suggests the use of a distributed query of general form. It is demonstrated that spurious memories form an unavoidable part of the quantum associative memory model; however, the very presence of these spurious states provides the possibility of organizing a controlled process of data retrieval using a specially formed initial state of the quantum database and also of the transformation performed upon it. Concrete examples illustrating the properties of the proposed model are also presented.
@article{ezhov.is00,
    title        = {Distributed Queries for Quantum Associative Memory},
    author        = {Ezhov, Alexandr and Nifanova, A. and Ventura, Dan},
    journal    = {Information Sciences },
    pages        = {271--293},
    volume        = {3-4},
    year        = {2000},

    }
[145] Wilson, D. Randall and Martinez, Tony R., Reduction Techniques for Exemplar-Based Learning Algorithms, Machine Learning, 3, 257--286, 2000, March. [Abstract] [Bibtex]
Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce the number of exemplars retained in exemplar-based learning models. Second, it proposes six new reduction algorithms called {DROP1}-5 and {DEL} that can be used to prune instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 datasets. Of those algorithms that provide substantial storage reduction, the {DROP} algorithms have the highest generalization accuracy in these experiments, especially in the presence of noise.
@article{wilson.ml2000.drop,
    title        = {Reduction Techniques for Exemplar-Based Learning Algorithms},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    journal    = {Machine Learning},
    pages        = {257--286},
    volume        = {3},
    month        = {March},
    year        = {2000},

    }
[146] Ventura, Dan and Martinez, Tony R., Quantum Associative Memory, Information Sciences , 1-4, 273--296, 2000. [Abstract] [Bibtex]
This paper combines quantum computation with classical neural network theory to produce a quantum computational learning algorithm. Quantum computation uses microscopic quantum level effects to perform computational tasks and has produced results that in some cases are exponentially faster than their classical counterparts. The unique characteristics of quantum theory may also be used to create a quantum associative memory with a capacity exponential in the number of neurons. This paper combines two quantum computational algorithms to produce such a quantum associative memory. The result is an exponential increase in the capacity of the memory when compared to traditional associative memories such as the Hopfield network. The paper covers necessary high-level quantum mechanical and quantum computational ideas and introduces a quantum associative memory. Theoretical analysis proves the utility of the memory, and it is noted that a small version should be physically realizable in the near future.
@article{ventura.is00,
    title        = {Quantum Associative Memory},
    author        = {Ventura, Dan and Martinez, Tony R.},
    journal    = {Information Sciences },
    pages        = {273--296},
    volume        = {1-4},
    year        = {2000},

    }
[147] Ventura, Dan, Learning Quantum Operators, Proceedings of the Joint Conference on Information Sciences, 750--752, 2000, March. [Abstract] [Bibtex]
Consider the system $F|x>=|y>$ where $F$ is unknown. We examine the possibility of learning the operator $F$ inductively, drawing analogies with ideas from classical computational learning.
@inproceedings{ventura.jcis00,
    title        = {Learning Quantum Operators},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the Joint Conference on Information Sciences},
    pages        = {750--752},
    month        = {March},
    year        = {2000},

    }
[148] Ezhov, Alexandr and Ventura, Dan, Quantum Neural Networks, Future Directions for Intelligent Systems and Information Science 2000, 2000, Kasabov, N., Physica-Verlag. [Abstract] [Bibtex]
This chapter outlines the research, development and perspectives of quantum neural networks -- a burgeoning new field which integrates classical neurocomputing with quantum computation. It is argued that the study of quantum neural networks may give us both new undestanding of brain function as well as unprecedented possibilities in creating new systems for information processing, including solving classically intractable problems, associative memory with exponential capacity and possibly overcoming the limitations posed by the Church-Turing thesis.
@article{ezhov.fdisis00,
    title        = {Quantum Neural Networks},
    author        = {Ezhov, Alexandr and Ventura, Dan},
    editor        = {Kasabov, N.},
    journal    = {Future Directions for Intelligent Systems and Information Science 2000},
    year        = {2000},
    address    = {Physica-Verlag},

    }
[149] Jensen, Lee S. and Martinez, Tony R., Improving Text Classification using Conceptual and Contextual Features, 101--102, 2000, KDD 2000, Text Mining Workshop, Boston.. [Abstract] [Bibtex]
The exponential growth of text available on the Internet has created a critical need for accurate, fast, and general-purpose text classification algorithms. This paper examines the improvement of broadly based text classification by using features that are easily extracted from training documents. These features represent both the conceptual and contextual properties of a target class, and include synonyms, hypernyms, term frequency, and bigrams of nouns, synonyms and hypernyms. Multiple permutations of the features are applied to three different classification models (Coordinate matching, {TF}*{IDF}, and naive Bayes) over three diverse data sets (Reuters, {USENET}, and folk songs). The findings are also compared to previously published results for the rule-based learner Ripper and results obtained by using another naive Bayes classifier, Rainbow. Suggestions are made about how to automatically determine which features to use, based upon the data set in question. The results demonstrate that the introduction of both conceptual and contextual features decreases the error by as much as 33%.
@misc{jensen,
    title        = {Improving Text Classification using Conceptual and Contextual Features},
    author        = {Jensen, Lee S. and Martinez, Tony R.},
    howpublished    = {{KDD} 2000, Text Mining Workshop, Boston.},
    pages        = {101--102},
    year        = {2000},

    }
[150] Ventura, Dan, Implementing Competitive Learning in a Quantum System, Proceedings of the International Joint Conference on Neural Networks (IJCNN'99), paper 513, 1999. [Abstract] [Bibtex]
Ideas from quantum computation are applied to the field of neural networks to produce competitive learning in a quantum system. The resulting quantum competitive learner has a prototype storage capacity that is exponentially greater than that of its classical counterpart. Further, empirical results from simulation of the quantum competitive learning system on real-world data sets demonstrate the quantum system's potential for excellent performance.
@inproceedings{ventura.ijcnn99a,
    title        = {Implementing Competitive Learning in a Quantum System},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks ({IJCNN}'99), paper 513},
    year        = {1999},

    }
[151] Ventura, Dan and Martinez, Tony R., A Quantum Associative Memory Based on Grover's Algorithm, Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, 22--27, 1999, April. [Abstract] [Bibtex]
Quantum computation uses microscopic quantum level effects to perform computational tasks and has produced results that in some cases are exponentially faster than their classical counterparts. The unique characteristics of quantum theory may also be used to create a quantum associative memory with a capacity exponential in the number of neurons. This paper combines two quantum computational algorithms to produce a quantum associative memory. The result is an exponential increase in the capacity of the memory when compared to traditional associative memories such as the Hopfield network. This paper covers necessary high- level quantum mechanical ideas and introduces a quantum associative memory, a small verson of which should be physically realizable in the near future.
@inproceedings{ventura.icannga99,
    title        = {A Quantum Associative Memory Based on Grover's Algorithm},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms},
    pages        = {22--27},
    month        = {April},
    year        = {1999},

    }
[152] Wilson, D. Randall and Ventura, Dan and Moncur, Brian and Martinez, Tony R., The Robustness of Relaxation Rates in Constraint Satisfaction Networks, Proceedings of the International Joint Conference on Neural Networks (IJCNN'99), paper 162, 1999. [Abstract] [Bibtex]
Constraint satisfaction networks contain nodes that receive weighted evidence from external sources and/or other nodes. A relaxation process allows the activation of nodes to affect neighboring nodes, which in turn can affect their neighbors, allowing information to travel through a network. When doing discrete updates (as in a software implementation of a relaxation network), a goal net or goal activation can be computed in response to the net input into a node, and a relaxation rate can then be used to determine how fast the node moves from its current value to its goal value. An open question was whether or not the relaxation rate is a sensitive parameter. This paper shows that the relaxation rate has almost no effect on how information flows through the network as long as it is small enough to avoid large discrete steps and/or oscillation.
@inproceedings{wilson.ijcnn99.relax,
    title        = {The Robustness of Relaxation Rates in Constraint Satisfaction Networks},
    author        = {Wilson, D. Randall and Ventura, Dan and Moncur, Brian and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks ({IJCNN}'99), paper 162},
    year        = {1999},

    }
[153] Andersen, Tim L. and Martinez, Tony R., Cross Validation and MLP Architecture Selection, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'99, CD paper #192, 1999. [Bibtex]
@inproceedings{andersen.ijcnn99.cv,
    title        = {Cross Validation and {MLP} Architecture Selection},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'99, {CD} paper #192},
    year        = {1999},
    }
[154] Zeng, Xinchuan and Martinez, Tony R., A New Relaxation Procedure in the Hopfield Network for Solving Optimization Problems, Neural Processing Letters, 10, 1--12, 1999. [Abstract] [Bibtex]
When solving an optimization problem with a Hopfield network, a solution is obtained after the network is relaxed to an equilibrium state. The relaxation process is an important step in achieving a solution. In this paper, a new procedure for the relaxation process is proposed. In the new procedure, the amplified signal received by a neuron from other neurons is treated as the target value for its activation (output) value. The activation of a neuron is updated directly based on the difference between its current activation and the received target value, without using the updating of the input value as an intermediate step. A \emph{relaxation rate} is applied to control the updating scale for a smooth relaxation process. The new procedure is evaluated and compared with the original procedure in the Hopfield network through simulations based on 200 randomly generated instances of the 10-city \emph{traveling salesman problem}. The new procedure reduces the error rate by 34.6% and increases the percentage of valid tours by 194.6% as compared with the original procedure.
@article{zeng.npl99,
    title        = {A New Relaxation Procedure in the Hopfield Network for Solving Optimization Problems},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    journal    = {Neural Processing Letters},
    pages        = {1--12},
    volume        = {10},
    year        = {1999},

    }
[155] Ventura, Dan and Wilson, D. Randall and Moncur, Brian and Martinez, Tony R., A Neural Model of Centered Tri-gram Speech Recognition, Proceedings of the International Joint Conference on Neural Networks (IJCNN'99), paper 2188, 1999. [Abstract] [Bibtex]
A relaxation network model that includes higher order weight connections is introduced. To demonstrate its utility, the model is applied to the speech recognition domain. Traditional speech recognition systems typically consider only that context preceding the word to be recognized. However, intuition suggests that considering following context as well as preceding context should improve recognition accuracy. The work described here tests this hypothesis by applying the higher order relaxation network to consider both precedes and follows context in a speech recognition task. The results demonstrate both the general utility of the higher order relaxation network as well as its improvement over traditional methods on a speech recognition task.
@inproceedings{ventura.ijcnn99b,
    title        = {A Neural Model of Centered Tri-gram Speech Recognition},
    author        = {Ventura, Dan and Wilson, D. Randall and Moncur, Brian and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks ({IJCNN}'99), paper 2188},
    year        = {1999},

    }
[156] Zeng, Xinchuan and Martinez, Tony R., Improving the Performance of the Hopfield Network By Using A Relaxation Rate, Proceedings of the International Conference on Neural Networks and Genetic Algorithms, 73--77, 1999. [Abstract] [Bibtex]
In the \emph{Hopfield network}, a solution of an optimization problem is obtained after the network is relaxed to an equilibrium state. This paper shows that the performance of the Hopfield network can be improved by using a \emph{relaxation rate} to control the relaxation process. Analysis suggests that the relaxation process has an important impact on the quality of a solution. A \emph{relaxation rate} is then introduced to control the relaxation process in order to achieve solutions with better quality. Two types of relaxation rate (\emph{constant} and \emph{dynamic}) are proposed and evaluated through simulations based on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem}. The result shows that using a relaxation rate can decrease the error rate by 9.87% and increase the percentage of valid tours by 14.0% as compared to those without using a relaxation rate. Using a dynamic relaxation rate can further decrease the error rate by 4.2% and increase the percentage of valid tours by 0.4% as compared to those using a constant relaxation rate.
@inproceedings{zeng.icannga99b,
    title        = {Improving the Performance of the Hopfield Network By Using A Relaxation Rate},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Neural Networks and Genetic Algorithms},
    pages        = {73--77},
    year        = {1999},

    }
[157] Andersen, Tim L. and Martinez, Tony R., The Little Neuron that Could, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'99, CD paper #191, 1999. [Bibtex]
@inproceedings{andersen.ijcnn1999.wag,
    title        = {The Little Neuron that Could},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'99, {CD} paper #191},
    year        = {1999},
    }
[158] Wilson, D. Randall and Martinez, Tony R., Combining Cross-Validation and Confidence to Measure Fitness, Proceedings of the International Joint Conference on Neural Networks (IJCNN'99), paper 163, 1999. [Abstract] [Bibtex]
Neural network and machine learning algorithms often have parameters that must be tuned for good performance on a particular task. Leave-one-out cross-validation ({LCV}) accuracy is often used to measure the fitness of a set of parameter values. However, small changes in parameters often have no effect on {LCV} accuracy. Many learning algorithms can measure the confidence of a classification decision, but often confidence alone is an inappropriate measure of fitness. This paper proposes a combined measure of Cross-Validation and Confidence ({CVC}) for obtaining a continuous measure of fitness for sets of parameters in learning algorithms. This paper also proposes the Refined Instance-Based ({RIB}) learning algorithm which illustrates the use of {CVC} in automated parameter tuning. Using {CVC} provides significant improvement in generalization accuracy on a collection of 31 classification tasks when compared to using {LCV}.
@inproceedings{wilson.ijcnn99.cvc,
    title        = {Combining Cross-Validation and Confidence to Measure Fitness},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks ({IJCNN}'99), paper 163},
    year        = {1999},

    }
[159] Zeng, Xinchuan and Martinez, Tony R., Extending the Power and Capacity of Constraint Satisfaction Networks, Proceedings of the International Joint Conference on Neural Networks, 432--437, 1999. [Abstract] [Bibtex]
This work focuses on improving the \emph{Hopfield network} for solving optimization problems. Although much work has been done in this area, the performance of the Hopfield network is still not satisfactory in terms of valid convergence and quality of solutions. We address this issue in this work by combing a new activation function {\bf EBA} and a new relaxation procedure {\bf CR} in order to improve the performance of the Hopfield network. Each of {\bf EBA} and {\bf CR} has been individually demonstrated capable of substantially improving the performance. The combined approach has been evaluated through 20,000 simulations based on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem}. The result shows that combining the two methods is able to further improve the performance. Compared to {\bf CR} without combining with {\bf EBA}, the combined approach increases the percentage of valid tours by 21.0% and decreases the error rate by 46.4%. As compared to the original Hopfield method (using neither {\bf EBA} nor {\bf CR}), the combined approach increases the percentage of valid tours by 245.7% and decreases the error rate by 64.1%.
@inproceedings{zeng.ijcnn99,
    title        = {Extending the Power and Capacity of Constraint Satisfaction Networks},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    pages        = {432--437},
    year        = {1999},

    }
[160] Ventura, Dan, Quantum Computational Intelligence: Answers and Questions, IEEE Intelligent Systems , 4, 14--16, 1999. [Abstract] [Bibtex]
This is a brief article discussing the interesting possibilities and potential difficulties with combining classical computational intelligence with quantum computation. See \url{http://www.computer.org/intelligent/ex1999/pdf/x4009.pdf} for a copy of the article.
@article{ventura.ieeeis99,
    title        = {Quantum Computational Intelligence: Answers and Questions},
    author        = {Ventura, Dan},
    journal    = {{IEEE} Intelligent Systems },
    pages        = {14--16},
    volume        = {4},
    year        = {1999},

    }
[161] Ventura, Dan and Martinez, Tony R., Initializing the Amplitude Distribution of a Quantum State, Foundations of Physics Letters , 6, 547--559, 1999, December. [Abstract] [Bibtex]
To date, quantum computational algorithms have operated on a superposition of all basis states of a quantum system. Typically, this is because it is assumed that some function f is known and implementable as a unitary evolution. However, what if only some points of the function f are known? It then becomes important to be able to encode only the knowledge that we have about f. This paper presents an algorithm that requires a polynomial number of elementary operations for initializing a quantum system to represent only the m known points of a function f.
@article{ventura.fopl99,
    title        = {Initializing the Amplitude Distribution of a Quantum State},
    author        = {Ventura, Dan and Martinez, Tony R.},
    journal    = {Foundations of Physics Letters },
    pages        = {547--559},
    volume        = {6},
    month        = {December},
    year        = {1999},

    }
[162] Zeng, Xinchuan and Martinez, Tony R., A New Activation Function in the Hopfield Network for Solving Optimization Problems, Proceedings of the International Conference on Neural Networks and Genetic Algorithms, 67--72, 1999. [Abstract] [Bibtex]
This paper shows that the performance of the \emph{Hopfield network} for solving optimization problems can be improved by using a new activation (output) function. The effects of the activation function on the performance of the Hopfield network are analyzed. It is shown that the \emph{sigmoid} activation function in the Hopfield network is sensitive to noise of neurons. The reason is that the \emph{sigmoid} function is most sensitive in the range where noise is most predominant. A new activation function that is more robust against noise is proposed. The new activation function has the capability of amplifying the signals between neurons while suppressing noise. The performance of the new activation function is evaluated through simulation. Compared with the \emph{sigmoid} function, the new activation function reduces the error rate of tour length by 30.6% and increases the percentage of valid tours by 38.6% during simulation on 200 randomly generated city distributions of the 10-city \emph{traveling salesman problem}.
@inproceedings{zeng.icannga99a,
    title        = {A New Activation Function in the Hopfield Network for Solving Optimization Problems},
    author        = {Zeng, Xinchuan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Neural Networks and Genetic Algorithms},
    pages        = {67--72},
    year        = {1999},

    }
[163] Andersen, Tim L. and Martinez, Tony R., Constructing Higher Order Perceptrons with Genetic Algorithms, Proceedings of the IEEE International Joint Conference on Neural Networks IJCNN'98, 1920--1925, 1998. [Bibtex]
@inproceedings{andersen.ijcnn1998,
    title        = {Constructing Higher Order Perceptrons with Genetic Algorithms},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} International Joint Conference on Neural Networks {IJCNN}'98},
    pages        = {1920--1925},
    year        = {1998},
    }
[164] Ventura, Dan, Artificial Associative Memory using Quantum Processes, Proceedings of the Joint Conference on Information Sciences, 2, 218--221, 1998, October. [Abstract] [Bibtex]
This paper discusses an approach to constructing an artificial quantum associative memory ({QuAM}). The {QuAM} makes use of two quantum computational algorithms, one for pattern storage and the other for pattern recall. The result is an exponential increase in the capacity of the memory when compared to traditional associative memories such as the Hopfield network. Further, the paper argues for considering pattern recall as a non-unitary process and demonstrates the utility of non-unitary operators for improving the pattern recall performance of the {QuAM}.
@inproceedings{ventura.jcis98,
    title        = {Artificial Associative Memory using Quantum Processes},
    author        = {Ventura, Dan},
    booktitle    = {Proceedings of the Joint Conference on Information Sciences},
    pages        = {218--221},
    volume        = {2},
    month        = {October},
    year        = {1998},

    }
[165] Zeng, Xinchuan, Improving the Performance of the Hopfield Network for Solving Optimization Problems, 1998, Brigham Young University, Computer Science Department, December. [Bibtex]
@mastersthesis{zeng.thesis98,
    title        = {Improving the Performance of the Hopfield Network for Solving Optimization Problems},
    author        = {Zeng, Xinchuan},
    school    = {Brigham Young University},
    address    = {Computer Science Department},
    month        = {December},
    year        = {1998},
    }
[166] Ventura, Dan and Martinez, Tony R., Quantum Associative Memory with Exponential Capacity, Proceedings of the International Joint Conference on Neural Networks, 509--13, 1998, May. [Abstract] [Bibtex]
Quantum computation uses microscopic quantum level effects to perform computational tasks and has produced results that in some cases are exponentially faster than their classical counterparts by taking advantage of quantum parallelism. The unique characteristics of quantum theory may also be used to create a quantum associative memory with a capacity exponential in the number of neurons. This paper covers necessary high-level quantum mechanical ideas and introduces a simple quantum associative memory. Further, it provides discussion, empirical results and directions for future work.
@inproceedings{ventura.ijcnn98a,
    title        = {Quantum Associative Memory with Exponential Capacity},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    pages        = {509--13},
    month        = {May},
    year        = {1998},

    }
[167] Ventura, Dan, Quantum and Evolutionary Approaches to Computational Learning, 1998, Brigham Young University, Computer Science Department, August. [Abstract] [Bibtex]
This dissertation presents two methods for attacking the problem of high dimensional spaces inherent in most computational learning problems. The first approach is a hybrid system for combining the thorough search capabilities of evolutionary computation with the speed and generalization of neural computation. This neural/evolutionary hybrid is utilized in three different settings: to address the problem of data acquisition for training a supervised learning system; as a learning optimization system; and as a system for developing neurocontrol. The second approach is the idea of quantum computational learning that overcomes the ``curse of dimensionality'' by taking advantage of the massive state space of quantum systems to process information in a way that is classically impossible. The quantum computational learning approach results in the development of a neuron with quantum mechanical properties, a quantum associative memory and a quantum computational learning system for inductive learning.
@phdthesis{ventura.dissertation,
    title        = {Quantum and Evolutionary Approaches to Computational Learning},
    author        = {Ventura, Dan},
    school    = {Brigham Young University},
    address    = {Computer Science Department},
    month        = {August},
    year        = {1998},

    }
[168] Ventura, Dan and Martinez, Tony R., Optimal Control Using a Neural/Evolutionary Hybrid System, Proceedings of the International Joint Conference on Neural Networks, 1036--41, 1998, May. [Abstract] [Bibtex]
One of the biggest hurdles to developing neurocontrollers is the difficulty in establishing good training data for the neural network. We propose a hybrid approach to the development of neurocontrollers that employs both evolutionary computation ({EC}) and neural networks ({NN}). {EC} is used to discover appropriate control actions for specific plant states. The survivors of the evolutionary process are used to construct a training set for the {NN}. The {NN} learns the training set, is able to generalize to new plant states, and is then used for neurocontrol. Thus the {EC}/{NN} approach combines the broad, parallel search of {EC} with the rapid execution and generalization of {NN} to produce a viable solution to the control problem. This paper presents the {EC}/{NN} hybrid and demonstrates its utility in developing a neurocontroller that demonstrates stability, generalization, and optimality.
@inproceedings{ventura.ijcnn98b,
    title        = {Optimal Control Using a Neural/Evolutionary Hybrid System},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Joint Conference on Neural Networks},
    pages        = {1036--41},
    month        = {May},
    year        = {1998},

    }
[169] Wilson, D. Randall and Martinez, Tony R., Instance Pruning Techniques, Machine Learning: Proceedings of the Fourteenth International Conference (ICML'97), 403--411, 1997, Fisher, D., San Francisco, CA, Morgan Kaufmann Publishers. [Abstract] [Bibtex]
The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and slow execution speed, as well as a sensitivity to noise. This paper provides a discussion of issues related to reducing the number of instances retained in memory while maintaining (and sometimes improving) generalization accuracy, and mentions algorithms other researchers have used to address this problem. It presents three intuitive noise-tolerant algorithms that can be used to prune instances from the training set. In experiments on 29 applications, the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.
@inproceedings{wilson.icml97.prune,
    title        = {Instance Pruning Techniques},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    editor        = {Fisher, D.},
    booktitle    = {Machine Learning: Proceedings of the Fourteenth International Conference ({ICML}'97)},
    pages        = {403--411},
    year        = {1997},
    address    = {San Francisco, {CA}},
    publisher    = {Morgan Kaufmann Publishers},

    }
[170] Andersen, Tim L. and Martinez, Tony R., Genetic Algorithms and Higher Order Perceptron Networks, Proceedings of the International Workshop on Neural Networks and Neurocontrol, 217--223, 1997. [Bibtex]
@inproceedings{andersen.sian97,
    title        = {Genetic Algorithms and Higher Order Perceptron Networks},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Workshop on Neural Networks and Neurocontrol},
    pages        = {217--223},
    year        = {1997},
    }
[171] Andersen, Tim L. and Martinez, Tony R., Wagging: A learning approach which allows single layer perceptrons to outperform more complex learning algorithms, Submitted to IEEE Transactions on Neural Networks, 1997. [Bibtex]
@inproceedings{andersen.ieee99.wag,
    title        = {Wagging: A learning approach which allows single layer perceptrons to outperform more complex learning algorithms},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Submitted to {IEEE} Transactions on Neural Networks},
    year        = {1997},
    }
[172] Wilson, D. Randall and Martinez, Tony R., Real-Valued Schemata Search Using Statistical Confidence, Proceedings of the 1997 Sian Ka'an International Workshop on Neural Networks and Neurocontrol, 1997. [Abstract] [Bibtex]
Many neural network models must be trained by finding a set of real-valued weights that yield high accuracy on a training set. Other learning models require weights on input attributes that yield high leave-one-out classification accuracy in order to avoid problems associated with irrelevant attributes and high dimensionality. In addition, there are a variety of general problems for which a set of real values must be found which maximize some evaluation function. This paper presents an algorithm for doing a schemata search over a real-valued weight space to find a set of weights (or other real values) for a given evaluation function. The algorithm, called the Real-Valued Schemata Search ({RVSS}) uses the {BRACE} statistical technique [Moore & Lee, 1993] to determine when to narrow the search space. This paper details the {RVSS} approach and gives initial empirical results.
@inproceedings{wilson.sian97.schema,
    title        = {Real-Valued Schemata Search Using Statistical Confidence},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the 1997 Sian Ka'an International Workshop on Neural Networks and Neurocontrol},
    year        = {1997},

    }
[173] Ventura, Dan and Martinez, Tony R., An Artificial Neuron with Quantum Mechanical Properties, Proceedings of the International Conference on Neural Networks and Genetic Algorithms, 482--485, 1997. [Abstract] [Bibtex]
Quantum computation uses microscopic quantum level effects to perform computational tasks and has produced results that in some cases are exponentially faster than their classical counterparts. Choosing the best weights for a neural network is a time consuming problem that makes the harnessing of this quantum parallelism appealing. This paper briefly covers high-level quantum theory and introduces a model for a quantum neuron.
@inproceedings{ventura.icannga97,
    title        = {An Artificial Neuron with Quantum Mechanical Properties},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Neural Networks and Genetic Algorithms},
    pages        = {482--485},
    year        = {1997},

    }
[174] Rudolph, George L. and Martinez, Tony R., A Transformation Strategy for Implementing Distributed Multilayer Feedfoward Networks: Backpropagation Transformation, Future Generation Computer Systems, 6, 547--564, 1997. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and often suffer from a number of short-comings as a result. Variations of {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for implementing distributed feed forward networks that use dynamic topologies (dynamic {ANNs}) efficiently in parallel hardware. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. In particular, this paper presents a {LIT} that supports both the standard (static) multilayer backpropagation network, and backpropagation with dynamic extensions. The complexity of both learning and execution algorithms is O(q(N+logM)) for a single pattern, where q is the number of weight layers in the original network, Nis the number of nodes in the widest node layer in the original network, and M is the number of nodes in the transformed network (which is linear in the number hidden nodes in the original network). This paper extends previous work with 2-weight-layer backpropagation networks.
@article{rudolph_97,
    title        = {A Transformation Strategy for Implementing Distributed Multilayer Feedfoward Networks: Backpropagation Transformation},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    journal    = {Future Generation Computer Systems},
    pages        = {547--564},
    volume        = {6},
    year        = {1997},

    }
[175] Wilson, D. Randall and Martinez, Tony R., Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), 1, 1--34, 1997. [Abstract] [Bibtex]
Nearest neighbor and instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric ({VDM}) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric ({HVDM}), Interpolated Value Difference Metric ({IVDM}) and the Windowed Value Difference Metric ({WVDM}). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher average classification accuracy than previous distance functions on applications with both nominal and continuous attributes.
@article{wilson.jair97.hvdm,
    title        = {Improved Heterogeneous Distance Functions},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    journal    = {Journal of Artificial Intelligence Research ({JAIR})},
    pages        = {1--34},
    volume        = {1},
    year        = {1997},

    }
[176] Ventura, Dan and Martinez, Tony R., Using Evolutionary Computation to Facilitate Development of Neurocontrol, Proceedings of the International Workshop on Neural Networks and Neurocontrol, 1997, August. [Abstract] [Bibtex]
The field of neurocontrol, in which neural networks are used for control of complex systems, has many potential applications. One of the biggest hurdles to developing neurocontrollers is the difficulty in establishing good training data for the neural network. We propose a hybrid approach to the development of neurocontrollers that employs both evolutionary computation ({EC}) and neural networks ({NN}). The survivors of this evolutionary process are used to construct a training set for the {NN}. The {NN} learns the training set, is able to generalize to new system states, and is then used for neurocontrol. Thus the {EC}/{NN} approach combines the broad, parallel search of {EC} with the rapid execution and generalization of {NN} to produce a viable solution to the control problem. This paper presents the {EC}/{NN} hybrid and demonstrates its utility in developing a neurocontroller for the pole balancing problem.
@inproceedings{ventura.sian97,
    title        = {Using Evolutionary Computation to Facilitate Development of Neurocontrol},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Workshop on Neural Networks and Neurocontrol},
    month        = {August},
    year        = {1997},

    }
[177] Wilson, D. Randall, Advances in Instance-Based Learning Algorithms, 1997, Brigham Young University, Computer Science Department, August. [Abstract] [Bibtex]
The nearest neighbor algorithm and its derivatives, which are often referred to collectively as instance-based learning algorithms, have been successful on a variety of real-world applications. However, in its basic form, the nearest neighbor algorithm suffers from inadequate distance functions, large storage requirements, slow execution speed, a sensitivity to noise and irrelevant attributes, and an inability to adjust its decision surfaces after storing the data. This dissertation presents a collection of papers that seek to overcome each of these disadvantages. The most successful enhancements are combined into a comprehensive system called the Integrated Decremental Instance-Based Learning algorithm, which in experiments on 44 applications achieves higher generalization accuracy than other instance-based learning algorithms. It also yields higher generalization accuracy than that reported for 16 major machine learning and neural network models.
@phdthesis{wilson.phd97,
    title        = {Advances in Instance-Based Learning Algorithms},
    author        = {Wilson, D. Randall},
    school    = {Brigham Young University},
    address    = {Computer Science Department},
    month        = {August},
    year        = {1997},

    }
[178] Wilson, D. Randall and Martinez, Tony R., Bias and the Probability of Generalization, Proceedings of the 1997 International Conference on Intelligent Information Systems (IIS'97), 108--114, 1997. [Abstract] [Bibtex]
In order to be useful, a learning algorithm must be able to generalize well when faced with inputs not previously presented to the system. A bias is necessary for any generalization, and as shown by several researchers in recent years, no bias can lead to strictly better generalization than any other when summed over all possible functions or applications. This paper provides examples to illustrate this fact, but also explains how a bias or learning algorithm can be better than another in practice when the probability of the occurrence of functions is taken into account. It shows how domain knowledge and an understanding of the conditions under which each learning algorithm performs well can be used to increase the probability of accurate generalization, and identifies several of the conditions that should be considered when attempting to select an appropriate bias for a particular problem.
@inproceedings{wilson.iis97.bias,
    title        = {Bias and the Probability of Generalization},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the 1997 International Conference on Intelligent Information Systems ({IIS}'97)},
    pages        = {108--114},
    year        = {1997},

    }
[179] Wilson, D. Randall and Martinez, Tony R., Improved Center Point Selection for Probabilistic Neural Networks, Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA'97), 514--517, 1997. [Abstract] [Bibtex]
Probabilistic Neural Networks ({PNN}) typically learn more quickly than many neural network models and have had success on a variety of applications. However, in their basic form, they tend to have a large number of hidden nodes. One common solution to this problem is to keep only a randomly-selected subset of the original training data in building the network. This paper presents an algorithm called the Reduced Probabilistic Neural Network ({RPNN}) that seeks to choose a better-than-random subset of the available instances to use as center points of nodes in the network. The algorithm tends to retain non-noisy border points while removing nodes with instances in regions of the input space that are highly homogeneous. In experiments on 22 datasets, the {RPNN} had better average generalization accuracy than two other {PNN} models, while requiring an average of less than one-third the number of nodes.
@inproceedings{wilson.icannga97.rpnn,
    title        = {Improved Center Point Selection for Probabilistic Neural Networks},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms ({ICANNGA}'97)},
    pages        = {514--517},
    year        = {1997},

    }
[180] Wilson, D. Randall and Martinez, Tony R., Instance-Based Learning with Genetically Derived Attribute Weights, Proceedings of the International Conference on Artificial Intelligence, Expert Systems, and Neural Networks (AIE'96), 11--14, 1996, August. [Abstract] [Bibtex]
This paper presents an inductive learning system called the Genetic Instance-Based Learning ({GIBL}) system. This system combines instance-based learning approaches with evolutionary computation in order to achieve high accuracy in the presence of irrelevant or redundant attributes. Evolutionary computation is used to find a set of attribute weights that yields a high estimate of classification accuracy. Results of experiments on 16 data sets are shown, and are compared with a non-weighted version of the instance-based learning system. The results indicate that the generalization accuracy of {GIBL} is somewhat higher than that of the non-weighted system on regular data, and is significantly higher on data with irrelevant or redundant attributes.
@inproceedings{wilson.aie96.gibl,
    title        = {Instance-Based Learning with Genetically Derived Attribute Weights},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Intelligence, Expert Systems, and Neural Networks ({AIE}'96)},
    pages        = {11--14},
    month        = {August},
    year        = {1996},

    }
[181] Andersen, Tim L. and Martinez, Tony R., The Effect of Decision Surface Fitness on Dynamic Multi-layer Perceptron Networks, Proceedings of the World Congress on Neural Networks , 177--181, 1996. [Abstract] [Bibtex]
The {DMP1} (Dynamic Multi-layer Perceptron 1) network training method is based upon a divide and conquer approach which builds networks in the form of binary trees, dynamically allocating nodes and layers as needed. This paper introduces the {DMP1} method, and compares the preformance of {DMP1} when using the standard delta rule training method for training individual nodes against the performance of {DMP1} when using a genetic algorithm for training. While the basic model does not require the use of a genetic algorithm for training individual nodes, the results show that the convergence properties of {DMP1} are enhanced by the use of a genetic algorithm with an appropriate fitness function.
@inproceedings{andersen.wcnn96.dmp_ga,
    title        = {The Effect of Decision Surface Fitness on Dynamic Multi-layer Perceptron Networks},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the World Congress on Neural Networks },
    pages        = {177--181},
    year        = {1996},

    }
[182] Ventura, Dan and Martinez, Tony R., A General Evolutionary/Neural Hybrid Approach to Learning Optimization Problems, Proceedings of the World Congress on Neural Networks, 1091--5, 1996. [Abstract] [Bibtex]
A method combining the parallel search capabilities of Evolutionary Computation ({EC}) with the generalization of Neural Networks ({NN}) for solving learning optimization problems is presented. Assuming a fitness function for potential solutions can be found, {EC} can be used to explore the solution space, and the survivors of the evolution can be used as a training set for the {NN} which then generalizes over the entire space. Because the training set is generated by {EC} using a fitness function, this hybrid approach allows explicit control of training set quality.
@inproceedings{ventura.wcnn96,
    title        = {A General Evolutionary/Neural Hybrid Approach to Learning Optimization Problems},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the World Congress on Neural Networks},
    pages        = {1091--5},
    year        = {1996},

    }
[183] Rudolph, George L. and Martinez, Tony R., LIA: A Location-Independent Transformation for ASOCS Adaptive Algorithm 2, International Journal of Neural Systems, 1996. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and often suffer from a number of short-comings as a result. {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. Adaptive Self Organizing Concurrent Systems ({ASOCS}) are a class of learning models with inherently dynamic topologies. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for implementing learning models that use dynamic topologies efficiently in parallel hardware. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. In particular, this paper presents the Location-Independent {ASOCS} ({LIA}) model as a {LIT} for {ASOCS} Adaptive Algorithm 2. The description of {LIA} gives formal definitions for {LIA} algorithms. Because {LIA} implements basic {ASOCS} mechanisms, these definitions provide a formal description of basic {ASOCS} mechanisms in general, in addition to {LIA}.
@article{rudolph.lia96,
    title        = {{LIA}: A Location-Independent Transformation for {ASOCS} Adaptive Algorithm 2},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    journal    = {International Journal of Neural Systems},
    year        = {1996},

    }
[184] Wilson, D. Randall and Martinez, Tony R., Heterogeneous Radial Basis Function Networks, Proceedings of the International Conference on Neural Networks (ICNN'96), 2, 1263--1267, 1996, June. [Abstract] [Bibtex]
Radial Basis Function ({RBF}) networks typically use a distance function designed for numeric attributes, such as Euclidean or city-block distance. This paper presents a heterogeneous distance function which is appropriate for applications with symbolic attributes, numeric attributes, or both. Empirical results on 30 data sets indicate that the heterogeneous distance metric yields significantly improved generalization accuracy over Euclidean distance in most cases involving symbolic attributes.
@inproceedings{wilson.icnn96.hrbf,
    title        = {Heterogeneous Radial Basis Function Networks},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Neural Networks ({ICNN}'96)},
    pages        = {1263--1267},
    volume        = {2},
    month        = {June},
    year        = {1996},

    }
[185] Wilson, D. Randall and Martinez, Tony R., Value Difference Metrics for Continuously Valued Attributes, Proceedings of the International Conference on Artificial Intelligence, Expert Systems, and Neural Networks (AIE'96), 74--78, 1996, August. [Abstract] [Bibtex]
Nearest neighbor and instance-based learning techniques typically handle continuous and linear input values well, but often do not handle symbolic input attributes appropriately. The Value Difference Metric ({VDM}) was designed to find reasonable distance values between symbolic attribute values, but it largely ignores continuous attributes, using discretization to map continuous values into symbolic values. This paper presents two heterogeneous distance metrics, called the Interpolated {VDM} ({IVDM}) and Windowed {VDM} ({WVDM}), that extend the Value Difference Metric to handle continuous attributes more appropriately. In experiments on 21 data sets the new distance metrics achieves higher classification accuracy in most cases involving continuous attributes.
@inproceedings{wilson.aie96.ivdm,
    title        = {Value Difference Metrics for Continuously Valued Attributes},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Intelligence, Expert Systems, and Neural Networks ({AIE}'96)},
    pages        = {74--78},
    month        = {August},
    year        = {1996},

    }
[186] Andersen, Tim L. and Martinez, Tony R., Using Multiple Node Types to Improve the Performance of DMP (Dynamic Multilayer Perceptron), Proceedings of the IASTED International Conference on Artificial Intelligence, Expert Systems and Neural Networks, 249--252, 1996. [Abstract] [Bibtex]
This paper discusses a method for training multi-layer perceptron networks called {DMP2} (Dynamic Multi-layer Perceptron 2). The method is based upon a divide and conquer approach which builds networks in the form of binary trees, dynamically allocating nodes and layers as needed. The focus of this paper is on the effects of using multiple node types within the {DMP} framework. Simulation results show that {DMP2} performs favorably in comparison with other learning algorithms, and that using multiple node types can be beneficial to network performance.
@inproceedings{andersen.iasted96.dmp2,
    title        = {Using Multiple Node Types to Improve the Performance of {DMP} (Dynamic Multilayer Perceptron)},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IASTED} International Conference on Artificial Intelligence, Expert Systems and Neural Networks},
    pages        = {249--252},
    year        = {1996},

    }
[187] Ventura, Dan and Martinez, Tony R., Concerning a General Framework for the Development of Intelligent Systems, Proceedings of the IASTED International Conference on Artificial Intelligence, Expert Systems and Neural Networks, 44--47, 1996. [Abstract] [Bibtex]
There exists on-going debate between Connectionism and Symbolism as to the nature of and approaches to cognition. Many viewpoints exist and various issues seen as important have been raised. This paper suggests that a combination of these methodologies will lead to a better overall model. The paper reviews and assimilates the opinions and viewpoints of these diverse fields and provides a cohesive list of issues thought to be critical to the modeling of intelligence. Further, this list results in a framework for the development of a general, unified theory of cognition.
@inproceedings{ventura.iasted96,
    title        = {Concerning a General Framework for the Development of Intelligent Systems},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IASTED} International Conference on Artificial Intelligence, Expert Systems and Neural Networks},
    pages        = {44--47},
    year        = {1996},

    }
[188] Ventura, Dan and Martinez, Tony R., Robust Optimization Using Training Set Evolution, Proceedings of the International Conference on Neural Networks, 524--8, 1996. [Abstract] [Bibtex]
Training Set Evolution is an eclectic optimization technique that combines evolutionary computation ({EC}) with neural networks ({NN}). The synthesis of {EC} with {NN} provides both initial unsupervised random exploration of the solution space as well as supervised generalization on those initial solutions. An assimilation of a large amount of data obtained over many simulations provides encouraging empirical evidence for the robustness of Evolutionary Training Sets as an optimization technique for feedback and control problems.
@inproceedings{ventura.icnn96,
    title        = {Robust Optimization Using Training Set Evolution},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Neural Networks},
    pages        = {524--8},
    year        = {1996},

    }
[189] Giraud-Carrier, Christophe and Martinez, Tony R., Analysis of the Convergence and Generalization of AA1, Journal of Parallel and Distributed Computing, 26, 125--131, 1995. [Abstract] [Bibtex]
{AA1} is an incremental learning algorithm for Adaptive Self-Organizing Concurrent Systems ({ASOCS}). {ASOCS} are self-organizing, dynamically growing networks of computing nodes. {AA1} learns by discrimination and implements knowledge in a distributed fashion over all the nodes. This paper reviews {AA1} from the perspective of convergence and generalization. A formal proof that {AA1} converges on any arbitrary Boolean instance set is given. A discussion of generalization and other aspects of {AA1}, including the problem of handling inconsistency, follows. Results of simulations with real-world data are presented. They show that {AA1} gives promising generalization.
@article{cgc_95c,
    title        = {Analysis of the Convergence and Generalization of {AA1}},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    journal    = {Journal of Parallel and Distributed Computing},
    pages        = {125--131},
    volume        = {26},
    year        = {1995},

    }
[190] Andersen, Tim L. and Martinez, Tony R., NP-Completeness of Minimum Rule Sets, Proceedings of the 10th International Symposium on Computer and Information Sciences, 411--418, 1995. [Abstract] [Bibtex]
Rule induction systems seek to generate rule sets which are optimal in the complexity of the rule set. This paper develops a formal proof of the {NP}-Completeness of the problem of generating the simplest rule set ({MIN} {RS}) which accurately predicts examples in the training set for a particular type of generalization algorithm and complexity measure. The proof is then informally extended to cover a broader spectrum of complexity measures and learning algorithms.
@inproceedings{andersen_95c,
    title        = {{NP}-Completeness of Minimum Rule Sets},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the 10th International Symposium on Computer and Information Sciences},
    pages        = {411--418},
    year        = {1995},

    }
[191] Giraud-Carrier, Christophe and Martinez, Tony R., AA1*: A Dynamic Incremental Network that Learns by Discrimination, Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA'95), 45--48, 1995. [Abstract] [Bibtex]
An incremental learning algorithm for a special class of self-organising, dynamic networks is presented. Learning is effected by adapting both the function performed by the nodes and the overall network topology, so that the network grows (or shrinks) over time to fit the problem. Convergence is guaranteed on any arbitrary Boolean dataset and empirical generalisation results demonstrate promise.
@inproceedings{cgc_95b,
    title        = {{AA1}*: A Dynamic Incremental Network that Learns by Discrimination},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms ({ICANNGA}'95)},
    pages        = {45--48},
    year        = {1995},

    }
[192] Giraud-Carrier, Christophe and Martinez, Tony R., An Integrated Framework for Learning and Reasoning, Journal of Artificial Intelligence Research, 3, 147--185, 1995. [Abstract] [Bibtex]
Learning and reasoning are both aspects of what is considered to be intelligence. Their studies within {AI} have been separated historically, learning being the topic of machine learning and neural networks, and reasoning falling under classical (or sym b olic) {AI}. However, learning and reasoning are in many ways interdependent. This paper discusses the nature of some of these interdependencies, and proposes a general framework called {FLARE}, that combines inductive learning using prior knowledge together with reasoning. Several examples are presented that serve as a benchmark to test the framework, including classical induction, several commonsense protocols, and the use of reasoning to discover prior knowledge that can be used as a learning bias for inductive learning.
@article{cgc_95a,
    title        = {An Integrated Framework for Learning and Reasoning},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    journal    = {Journal of Artificial Intelligence Research},
    pages        = {147--185},
    volume        = {3},
    year        = {1995},

    }
[193] Zarndt, Frederick, A Comprehensive Case Study: An Examination of Connectionist and Machine Learning Algorithms, 1995, Brigham Young University, June. [Abstract] [Bibtex]
This case study examines the performance of 16 well-known and widely-used inductive learning algorithms on a comprehensive collection (102) of learning problems. It is distinguished from other, similar studies by the number of learning models used, the number of problems examined, and the rigor with which it has been conducted. The results of future case studies, which are done with the method and tools presented here, with learning problems used in this study, and with the same rigor, should be readily reproducible and directly comparable to the results of this study. An extensive set of tools is offered.
@mastersthesis{Zarndt.thesis95,
    title        = {A Comprehensive Case Study: An Examination of Connectionist and Machine Learning Algorithms},
    author        = {Zarndt, Frederick},
    school    = {Brigham Young University},
    month        = {June},
    year        = {1995},

    }
[194] Andersen, Tim L. and Martinez, Tony R., Learning and Generalization with Bounded Order Rule Sets, Proceedings of the 10th International Symposium on Computer and Information Sciences, 419--426, 1995. [Abstract] [Bibtex]
All current rule-based methods found in the literature use some form of heuristic(s) in order to limit the size of the rule search space examined by the learning algorithm. This paper is an attempt to determine (mainly from an empirical standpoint) how generalization performance is affected when certain areas of the rule search space are ignored, as compared to when the entire search space is considered. This is done by exhaustively generating all rules for several small real-world problems and then determining how accuracy decreases as the size of the search space is iteratively reduced. The results show that higher-order rules are not required to approximate many real world learning problems. In dealing with the above question, several methods for inducing rules and using them for classification of novel examples are tested.
@inproceedings{andersen_95b,
    title        = {Learning and Generalization with Bounded Order Rule Sets},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the 10th International Symposium on Computer and Information Sciences},
    pages        = {419--426},
    year        = {1995},

    }
[195] Ventura, Dan and Martinez, Tony R., Using Multiple Statistical Prototypes to Classify Continuously Valued Data, Proceedings of the International Symposium on Neuroinformatics and Neurocomputers, 238--245, 1995. [Abstract] [Bibtex]
Multiple Statistical Prototypes ({MSP}) is a modification of a standard minimum distance classification scheme that generates multiple prototypes per class using a modified greedy heuristic. Empirical comparison of {MSP} with other well-known learning algorithms shows {MSP} to be a robust algorithm that uses a very simple premise to produce good generalization and achieve parsimonious hypothesis representation.
@inproceedings{ventura.isninc95,
    title        = {Using Multiple Statistical Prototypes to Classify Continuously Valued Data},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Symposium on Neuroinformatics and Neurocomputers},
    pages        = {238--245},
    year        = {1995},

    }
[196] Ventura, Dan and Martinez, Tony R., An Empirical Comparison of Discretization Models, Proceedings of the 10th International Symposium on Computer and Information Sciences, 443--450, 1995. [Abstract] [Bibtex]
Many machine learning and neurally inspired algorithms are limited, at least in their pure form, to working with nominal data. However, for many real-world problems, some provision must be made to support processing of continuously valued data. This paper presents empirical results obtained by using six different discretization methods as preprocessors to three different supervised learners on several real-world problems. No discretization technique clearly outperforms the others. Also, discretization as a preprocessing step is in many cases found to be inferior to direct handling of continuously valued data. These results suggest that machine learning algorithms should be designed to directly handle continuously valued data rather than relying on preprocessing or ad hoc techniques.
@inproceedings{ventura.iscis95,
    title        = {An Empirical Comparison of Discretization Models},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the 10th International Symposium on Computer and Information Sciences},
    pages        = {443--450},
    year        = {1995},

    }
[197] Rudolph, George L., Location-Independent Neural Network Models, 1995, Brigham Young University, Computer Science Department, August. [Abstract] [Bibtex]
Neural networks that use a static topology, i.e. a topology that remains fixed throughout learning, suffer from a number of short-comings. Current research is demonstrating the use of dynamic topologies in overcoming some of these problems. The Location-Independent Transformation ({LIT}) is a general strategy for implementing in hardware neural networks with static and dynamic topologies. {LIT} maps an arbitrary neural network onto a uniform tree topology which uses global broadcast and gather operations for communication. This dissertation formally defined {LITs} with associated execution and learning algorithms for static and dynamic versions of competitive learning, counterpropagation, 2-layer backpropagation, and multilayer feedforward networks. An {LIT} for {ASOCS} {AA2} model (which is only dynamic) is also described. This is a representative set of neural network models, not an exhaustive list, providing potential guidelines for implementing other models of interest.
@phdthesis{rudolph_diss,
    title        = {Location-Independent Neural Network Models},
    author        = {Rudolph, George L.},
    school    = {Brigham Young University},
    address    = {Computer Science Department},
    month        = {August},
    year        = {1995},

    }
[198] Ventura, Dan and Andersen, Tim L. and Martinez, Tony R., Using Evolutionary Computation to Generate Training Set Data for Neural Networks, Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, 468--471, 1995. [Abstract] [Bibtex]
Most neural networks require a set of training examples in order to attempt to approximate a problem function. For many real-world problems, however, such a set of examples is unavailable. Such a problem involving feedback optimization of a computer network routing system has motivated a general method of generating artificial training sets using evolutionary computation. This paper describes the method and demonstrates its utility by presenting promising results from applying it to an artificial problem similar to a real-world network routing optimization problem.
@inproceedings{ventura.icannga95,
    title        = {Using Evolutionary Computation to Generate Training Set Data for Neural Networks},
    author        = {Ventura, Dan and Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms},
    pages        = {468--471},
    year        = {1995},

    }
[199] Hoeffgen, Klaus-Uwe and Simon, Hans-Ulrich and Van Horn, Kevin S., Robust Trainability of Single Neurons, Journal of Computer and System Sciences, 50, 1, 114--125, 1995. [Abstract] [Bibtex]
It is well known that (McCulloch-Pitts) neurons are efficiently trainable to learn an unknown halfspace from examples, using linear-programming methods. We want to analyze how the learning performance degrades when the representational power of the neuron is overstrained, i.e., if more complex concepts than just halfspaces are allowed. We show that the problem of learning a probably almost optimal weight vector for a neuron is so difficult that the minimum error cannot even be approximated to within a constant factor in polynomial time (unless {RP} = {NP}); we obtain the same hardness result for several variants of this problem. We considerably strengthen these negative results for neurons with binary weights 0 or 1. We also show that neither heuristical learning nor learning by sigmoidal neurons with a constant reject rate is efficiently possible (unless {RP} = {NP}).
@article{vanhorn_4,
    title        = {Robust Trainability of Single Neurons},
    author        = {Hoeffgen, Klaus-Uwe and Simon, Hans-Ulrich and Van Horn, Kevin S.},
    journal        = {Journal of Computer and System Sciences},
    volume        = {50},
    number        = {1},
    pages        = {114--125},
    year        = {1995},

    }
[200] Andersen, Tim L., Learning and Generalization with Bounded Order Rule Sets, 1995, Brigham Young University, April. [Abstract] [Bibtex]
This thesis deals with the problem of inducing useful rules, or extracting critical, higher-order conjunctions of attributes, from a set of preclassified examples, where little or nothing is known about the underlying functional form of the distribution from which the examples were taken. The approach taken in this thesis differs from that normally used in that it does not limit the size of the rule search space. Rather, every possible conjunction of input attributes is considered by the learning algorithm as a potential rule component. In so doing, this thesis is attempting to determine (mainly from an empirical standpoint) how generalization performance is affected when certain areas of the search space are ignored, as compared to when the entire search space is considered. In dealing with the above question, this thesis studies several methods for inducing rules and using them for classification of novel examples. This thesis also uses results obtained with the C4.5 rule-induction method for comparison purposes, and to support the main points of the thesis. The results show that higher-order rules are not required to approximate many real world learning problems. In addition, the difficulty of generating optimal rule sets is discussed, where the measure of optimality is the complexity or size of the rule set and/or the degree of predictive accuracy, and the problem of {NP}-completeness is discussed in relation to these two optimality measures.
@mastersthesis{andersen_95a,
    title        = {Learning and Generalization with Bounded Order Rule Sets},
    author        = {Andersen, Tim L.},
    school    = {Brigham Young University},
    month        = {April},
    year        = {1995},

    }
[201] Rudolph, George L. and Martinez, Tony R., A Transformation for Implementing Efficient Dynamic Backpropagation Neural Networks, Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, 41--44, 1995. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and often suffer from a number of short-comings as a result. Variations of {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for implementing distributed feed-forward networks that use dynamic topologies (dynamic {ANNs}) efficiently in parallel hardware. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. In particular, this paper presents an {LIT} for standard Backpropagation with two layers of weights, and shows how dynamic extensions to Backpropagation can be supported.
@inproceedings{rudolph_95c,
    title        = {A Transformation for Implementing Efficient Dynamic Backpropagation Neural Networks},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms},
    pages        = {41--44},
    year        = {1995},

    }
[202] Rudolph, George L. and Martinez, Tony R., An Efficient Transformation for Implementing Two-layer Feedforward Neural Networks, Journal of Artificial Neural Networks, 3, 263--282, 1995. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and often suffer from a number of short-comings as a result. Variations of {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for implementing distributed feedforward networks that use dynamic topologies (dynamic {ANNs}) efficiently in parallel hardware. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. In particular, this paper presents an {LIT} for dynamic Backpropagation networks with a single hidden layer. The complexity of both learning and execution algorithms is O(n+p+logm) for a single pattern, where n is the number of inputs, p is the number of outputs, and m is the number of hidden nodes in the original network.
@article{rudolph_95a,
    title        = {An Efficient Transformation for Implementing Two-layer Feedforward Neural Networks},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    journal    = {Journal of Artificial Neural Networks},
    pages        = {263--282},
    volume        = {3},
    year        = {1995},

    }
[203] Andersen, Tim L. and Martinez, Tony R., A Provably Convergent Dynamic Training Method for Multilayer Perceptron Networks, Proceedings of the 2nd International Symposium on Neuroinformatics and Neurocomputers, 77--84, 1995. [Abstract] [Bibtex]
This paper presents a new method for training multi-layer perceptron networks called {DMP1} (Dynamic Multi-layer Perceptron 1). The method is based upon a divide and conquer approach which builds networks in the form of binary trees, dynamically allocating nodes and layers as needed. The individual nodes of the network are trained using a gentetic algorithm. The method is capable of handling real-valued inputs and a proof is given concerning its convergence properties of the basic model. Simulation results show that {DMP1} performs favorably in comparison with other learning algorithms.
@inproceedings{andersen_95d,
    title        = {A Provably Convergent Dynamic Training Method for Multilayer Perceptron Networks},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the 2nd International Symposium on Neuroinformatics and Neurocomputers},
    pages        = {77--84},
    year        = {1995},

    }
[204] Ventura, Dan, On Discretization as a Preprocessing Step For Supervised Learning Models, 1995, Brigham Young University, Computer Science Department, April. [Abstract] [Bibtex]
Many machine learning and neurally inspired algorithms are limited, at least in their pure form, to working with nominal data. However, for many real-world problems, some provision must be made to support processing of continuously valued data. {BRACE}, a paradigm for the discretization of continuously valued attributes is introduced, and two algorithmic instantiations of this paradigm, {VALLEY} and {SLICE} are presented. These methods are compared empirically with other discretization techniques on several real-world problems and no algorithm clearly outperforms the others. Also, discretization as a preprocessing step is in many cases found to be inferior to direct handling of continuously valued data. These results suggest that machine learning algorithms should be designed to directly handle continuously valued data rather than relying on preprocessing or ad hoc techniques. To this end statistical prototypes ({SP}/{MSP}) are developed and an empirical comparison with well-known learning algorithms is presented. Encouraging results demonstrate that statistical prototypes have the potential to handle continuously valued data well. However, at this point, they are not suited for handling nominally valued data, which is arguably at least as important as continuously valued data in learning real-world applications. Several areas of ongoing research that aim to provide this ability are presented.
@mastersthesis{ventura.thesis,
    title        = {On Discretization as a Preprocessing Step For Supervised Learning Models},
    author        = {Ventura, Dan},
    school    = {Brigham Young University},
    address    = {Computer Science Department},
    month        = {April},
    year        = {1995},

    }
[205] Giraud-Carrier, Christophe and Martinez, Tony R., An Efficient Metric for Heterogeneous Inductive Learning Applications in the Attribute-Value Language, Intelligent Systems), 1, 341--350, 1995, Yfantis, Evangelos A., Kluwer Academic Publishers. [Abstract] [Bibtex]
Many inductive learning problems can be expressed in the classical attribute-value language. In order to learn and to generalize, learning systems often rely on some measure of similarity between their current knowledge base and new information. The attribute-value language defines a heterogeneous multi-dimensional input space, where some attributes are nominal and others linear. Defining similarity, or proximity, of two points in such input spaces is non trivial. We discuss two representative homogeneous metrics and show examples of why they are limited to their own domains. We then address the issues raised by the design of a heterogeneous metric for inductive learning systems. In particular, we discuss the need for normalization and the impact of don't-care values. We propose a heterogeneous metric and evaluate it empirically on a simplified version of {ILA}.
@article{cgc_94c,
    title        = {An Efficient Metric for Heterogeneous Inductive Learning Applications in the Attribute-Value Language},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    editor        = {Yfantis, Evangelos A.},
    journal    = {Intelligent Systems)},
    pages        = {341--350},
    volume        = {1},
    year        = {1995},
    publisher    = {Kluwer Academic Publishers},

    }
[206] Rudolph, George L. and Martinez, Tony R., A Transformation for Implementing Localist Neural Networks, Neural Parallel and Scientific Computations, 2, 173--188, 1995. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and typically suffer from a number of short-comings as a result. Variations of {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for parallel implementation of feedforward networks that use dynamic topologies. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. This paper deals specifically with {LITs} for localist {ANNs}--localist in the sense that ultimately one node is responsible for each output. In particular, this paper presents {LITs} for two {ANNs}: a) the single-layer competitive learning network, and b) the counterpropagation network, which combines elements of supervised learning with competitive learning. The complexity of both learning and execution algorithms for both {ANNs} is linear in the number of inputs and logarithmic in the number of nodes in the original network.
@article{rudolph_95b,
    title        = {A Transformation for Implementing Localist Neural Networks},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    journal    = {Neural Parallel and Scientific Computations},
    pages        = {173--188},
    volume        = {2},
    year        = {1995},

    }
[207] Barker, J. Cory and Martinez, Tony R., Efficient Construction of Networks for Learned Representations with General to Specific Relationships, Intelligent Systems, 1, 617--625, 1995, Yfantis, Evangelos A., Kluwer Academic Publishers. [Abstract] [Bibtex]
Machine learning systems often represent concepts or rules as sets of attribute-value pairs. Many learning algorithms generalize or specialize these concept representations by removing or adding pairs. Thus concepts are created that have general to specific relationships. This paper presents algorithms to connect concepts into a network based on their general to specific relationships. Since any concept can access related concepts quickly, the resulting structure allows increased efficiency in learning and reasoning. The time complexity of one set of learning models improves from O(n log n) to O(log n) (where n is the number of nodes) when using the general to specific structure.
@article{barker_95a,
    title        = {Efficient Construction of Networks for Learned Representations with General to Specific Relationships},
    author        = {Barker, J. Cory and Martinez, Tony R.},
    editor        = {Yfantis, Evangelos A.},
    journal    = {Intelligent Systems},
    pages        = {617--625},
    volume        = {1},
    year        = {1995},
    publisher    = {Kluwer Academic Publishers},

    }
[208] Rudolph, George L. and Martinez, Tony R., A Transformation for Implementing Neural Networks with Localist Properties, Intelligent Systems, 1, 637--645, 1995, Yfantis, Evangelos A., Kluwer Academic Publishers. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and typically suffer from a number of short-comings as a result. Variations of {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for implementing feedforward networks that use dynamic topologies. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. In particular, this paper presents {LITs} for the single-layer competitve learning network, and the counterpropagation network, which combines elements of supervised learning with competitive learning. These two networks are localist in the sense that ultimately one node is responsible for each output. {LITs} for other models are presented in other papers.
@article{rudolph_95d,
    title        = {A Transformation for Implementing Neural Networks with Localist Properties},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    editor        = {Yfantis, Evangelos A.},
    journal    = {Intelligent Systems},
    pages        = {637--645},
    volume        = {1},
    year        = {1995},
    publisher    = {Kluwer Academic Publishers},

    }
[209] Van Horn, Kevin S. and Martinez, Tony R., The Minimum Feature Set Problem, Neural Networks , 3, 491--494, 1994. [Abstract] [Bibtex]
One approach to improving the generalization power of a neural net is to try to minimize the number of non-zero weights used. We examine two issues relevant to this approach, for single-layer nets. First we bound the {VC} dimension of the set of linear-threshold functions that have non-zero weights for at most s of n inputs. Second, we show that the problem of minimizing the number of non-zero input weights used (without misclassifying training examples) is both {NP}-hard and difficult to approximate.
@article{vanhorn_3,
    title        = {The Minimum Feature Set Problem},
    author        = {Van Horn, Kevin S. and Martinez, Tony R.},
    journal    = {Neural Networks },
    pages        = {491--494},
    volume        = {3},
    year        = {1994},

    }
[210] Giraud-Carrier, Christophe and Martinez, Tony R., Seven Desirable Properties for Artificial Learning Systems, Proceedings of FLAIRS'94 Florida Artificial Intelligence Research Symposium, 16--20, 1994. [Abstract] [Bibtex]
Much effort has been devoted to understanding learning and reasoning in artificial intelligence, giving rise to a wide collection of models. For the most part, these models focus on some observed characteristic of human learning, such as induction or analogy, in an effort to emulate (and possibly exceed) human abilities. We propose seven desirable properties for artificial learning systems: incrementality, non-monotonicity, inconsistency and conflicting defaults handling, abstraction, self- organization, generalization, and computational tractability. We examine each of these properties in turn and show how their (combined) use can improve learning and reasoning, as well as potentially widen the range of applications of artificial learning systems. An overview of the algorithm {PDL2}, that begins to integrate the above properties, is given as a proof of concept.
@inproceedings{cgc_94a,
    title        = {Seven Desirable Properties for Artificial Learning Systems},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    booktitle    = {Proceedings of {FLAIRS}'94 Florida Artificial Intelligence Research Symposium},
    pages        = {16--20},
    year        = {1994},

    }
[211] Rudolph, George L. and Martinez, Tony R., Location-Independent Transformations: A General Strategy for Implementing Neural Networks, International Journal on Artificial Intelligence Tools, 3, 417--427, 1994. [Abstract] [Bibtex]
Most Artificial Neural Networks ({ANNs}) have a fixed topology during learning, and typically suffer from a number of short-comings as a result. Variations of {ANNs} that use dynamic topologies have shown ability to overcome many of these problems. This paper introduces Location-Independent Transformations ({LITs}) as a general strategy for implementing neural networks that use static and dynamic topologies. A {LIT} creates a set of location-independent nodes, where each node computes its part of the network output independent of other nodes, using local information. This type of transformation allows efficient support for adding and deleting nodes dynamically during learning. Two simple networks, the single-layer competitive learning network, and the counterpropagation network, which combines elements of supervised learning with competitive learning, are used in this paper to illustrate the {LIT} strategy. These two networks are localist in the sense that ultimately one node is responsible for each output. {LITs} for other models are presented in other papers.
@article{rudolph_94a,
    title        = {Location-Independent Transformations: A General Strategy for Implementing Neural Networks},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    journal    = {International Journal on Artificial Intelligence Tools},
    pages        = {417--427},
    volume        = {3},
    year        = {1994},

    }
[212] Giraud-Carrier, Christophe, On Integrating Inductive Learning with Prior Knowledge and Reasoning, 1994, Brigham Young University, December. [Abstract] [Bibtex]
Learning and reasoning are both aspects of what is considered to be intelligence. Their studies within {AI} have been separated historically, learning being the topic of neural networks and machine learning, and reasoning falling under classical (or symbolic) {AI}. However, learning and reasoning share many interdependencies, and the integration of the two may lead to more powerful models. This dissertation examines some of these interdependencies, and describes several models, culminating in a system called {FLARE} (Framework for Learning And {RE}asoning). The proposed models integrate inductive learning with prior knowledge and reasoning. Learning is incremantal, prior knowledge is given by a teacher or deductively obtained by instantiating commonsense knowledge, and reasoning is non-monotonic. Simulation results on several datasets and classical commonsense protocols demonstrate promise.
@phdthesis{cgc_diss,
    title        = {On Integrating Inductive Learning with Prior Knowledge and Reasoning},
    author        = {Giraud-Carrier, Christophe},
    school    = {Brigham Young University},
    month        = {December},
    year        = {1994},

    }
[213] Van Horn, Kevin S. and Martinez, Tony R., Extending Occam's Razor, Proceedings of the Third Golden West International Conference on Intelligent Systems, 1994, June, Las Vegas, Nevada. [Abstract] [Bibtex]
Occam's Razor states that, all other things being equal, the simpler of two possible hypotheses is to be preferred. A quantified version of Occam's Razor has been proven for the {PAC} model of learning, giving sample-complexity bounds for learning using what Blumer \emph{et al.} call an Occam algorithm. We prove an analog of this result for Haussler's more general learning model, which encompasses learning in stochastic situations, learning real-valued functions, etc.
@inproceedings{vanhorn_5,
    title        = {Extending Occam's Razor},
    author        = {Van Horn, Kevin S. and Martinez, Tony R.},
    booktitle    = {Proceedings of the Third Golden West International Conference on Intelligent Systems},
    month        = {June},
    year        = {1994},
    address    = {Las Vegas, Nevada},

    }
[214] Barker, J. Cory and Martinez, Tony R., Generalization by Controlled Expansion of Examples, Proceedings of The Seventh International Symposium on Artificial Intelligence, 142--149, 1994. [Abstract] [Bibtex]
{SG} (Specific to General) is a learning system that derives general rules from specific examples. {SG} learns incrementally with good speed and generalization. The {SG} network is built of many simple nodes that adapt to the problem being learned. Learning is done without requiring user adjustment of sensitive parameters and noise is tolerated with graceful degradation in performance. Nodes learn important features in the input space and then monitor the ability of the features to predict output values. Learning is O(n log n) for each example, where n is the number of nodes in the network, and the number of inputs and output values are treated as constants. An enhanced network topology reduces time complexity to O(log n). Empirical results show that the model gives good generalization and that learning converges in a small number of training passes.
@inproceedings{barker_94b,
    title        = {Generalization by Controlled Expansion of Examples},
    author        = {Barker, J. Cory and Martinez, Tony R.},
    booktitle    = {Proceedings of The Seventh International Symposium on Artificial Intelligence},
    pages        = {142--149},
    year        = {1994},

    }
[215] Barker, J. Cory and Martinez, Tony R., Proof of Correctness for ASOCS AA3 Networks, IEEE Transactions on Systems, Man, and Cybernetics, 3, 503--510, 1994. [Abstract] [Bibtex]
This paper analyzes Adaptive Algorithm 3 ({AA3}) of Adaptive Self-Organizing Concurrent Systems ({ASOCS}) and proves that {AA3} correctly fulfills the rules presented. Several different models for {ASOCS} have been developed. {AA3} uses a distributed mechanism for implementing rules so correctness is not obvious. An {ASOCS} is an adaptive network composed of many simple computing elements operating in parallel. An {ASOCS} operates in one of two modes: learning and processing. In learning mode, rules are presented to the {ASOCS} and incorporated in a self-organizing fashion. In processing mode, the {ASOCS} acts as a parallel hardware circuit that performs the function defined by the learned rules.
@article{barker_94a,
    title        = {Proof of Correctness for {ASOCS} {AA3} Networks},
    author        = {Barker, J. Cory and Martinez, Tony R.},
    journal    = {{IEEE} Transactions on Systems, Man, and Cybernetics},
    pages        = {503--510},
    volume        = {3},
    year        = {1994},

    }
[216] Stout, Matthew and Rudolph, George L. and Martinez, Tony R. and Salmon, Linton, A VLSI Implementation of a Parallel Self-Organizing Learning Model, Proceedings of the 12th International Conference on Pattern Recognition, 3, 373--376, 1994. [Abstract] [Bibtex]
This paper presents a {VLSI} implementation of the Priority Adaptive Self-Organizing Concurrent System ({PASOCS}) learning model that is built using a multi-chip module ({MCM}) substrate. Many current hardware implementations of neural network learning models are direct implementations of classical neural network structures---a large number of simple computing nodes connected by a dense number of weighted links. {PASOCS} is one of a class of {ASOCS} (Adaptive Self-Organizing Concurrent System) connectionist models whose overall goal is the same as classical neural networks models, but whose functional mechanisms differ significantly. This model has potential application in areas such as pattern recognition, robotics, logical inference, and dynamic control.
@inproceedings{stout_icpr,
    title        = {A {VLSI} Implementation of a Parallel Self-Organizing Learning Model},
    author        = {Stout, Matthew and Rudolph, George L. and Martinez, Tony R. and Salmon, Linton},
    booktitle    = {Proceedings of the 12th International Conference on Pattern Recognition},
    pages        = {373--376},
    volume        = {3},
    year        = {1994},

    }
[217] Wilson, D. Randall, Prototype Styles of Generalization, 1994, Brigham Young University, August. [Abstract] [Bibtex]
A learning system can generalize from training set data in many ways. No single style of generalization is likely to solve all problems better than any other style, and different styles work better on some applications than others. Several generalization styles are proposed, including distance metrics, first-order features, voting schemes, and confidence levels. These generalization styles are efficient in terms of time and space and lend themselves to massively parallel architectures. Empirical results of using these generalization styles on several real-world applications are presented. These results indicate that the prototype styles of generalization presented can provide accurate generalization for many applications, and that having several styles of generalization available often permits one to be selected that works well for a particular application.
@mastersthesis{wilson.thesis94,
    title        = {Prototype Styles of Generalization},
    author        = {Wilson, D. Randall},
    school    = {Brigham Young University},
    month        = {August},
    year        = {1994},

    }
[218] Ventura, Dan and Martinez, Tony R., BRACE: A Paradigm for the Discretization of Continuously Valued Data, Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, 117--121, 1994. [Abstract] [Bibtex]
Discretization of continuously valued data is a useful and necessary tool because many learning paradigms assume nominal data. A list of objectives for efficient and effective discretization is presented. A paradigm called {BRACE} (Boundary Ranking And Classification Evaluation) that attempts to meet the objectives is presented along with an algorithm that follows the paradigm. The paradigm meets many of the objectives, with potential for extension to meet the remainder. Empirical results have been promising. For these reasons {BRACE} has potential as an effective and efficient method for discretization of continuously valued data. A further advantage of {BRACE} is that it is general enough to be extended to other types of clustering/unsupervised learning.
@inproceedings{ventura.flairs94,
    title        = {{BRACE}: A Paradigm for the Discretization of Continuously Valued Data},
    author        = {Ventura, Dan and Martinez, Tony R.},
    booktitle    = {Proceedings of the Seventh Florida Artificial Intelligence Research Symposium},
    pages        = {117--121},
    year        = {1994},

    }
[219] Giraud-Carrier, Christophe and Martinez, Tony R., An Incremental Learning Model for Commonsense Reasoning, Proceedings of the Seventh International Symposium on Artificial Intelligence (ISAI'94), 134--141, 1994. [Abstract] [Bibtex]
A self-organizing incremental learning model that attempts to combine inductive learning with prior knowledge and default reasoning is described. The inductive learning scheme accounts for useful generalizations and dynamic priority allocation, and effectively supplements prior knowledge. New rules may be created and existing rules modified, thus allowing the system to evolve over time. By combining the extensional and intensional approaches to learning rules, the model remains self-adaptive, while not having to unnecessarily suffer from poor (or atypical) learning environments. By combining rule-based and similarity-based reasoning, the model effectively deals with many aspects of brittleness.
@inproceedings{cgc_94b,
    title        = {An Incremental Learning Model for Commonsense Reasoning},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    booktitle    = {Proceedings of the Seventh International Symposium on Artificial Intelligence ({ISAI}'94)},
    pages        = {134--141},
    year        = {1994},

    }
[220] Bertelsen, Rick and Martinez, Tony R., Extending ID3 through Discretization of Continuous Inputs, Proceedings of FLAIRS'94 Florida Artificial Intelligence Research Symposium, 122--125, 1994. [Abstract] [Bibtex]
This paper presents a mechanism to extend {ID3} by classifying real valued inputs. Real valued inputs are classified through a neural network model termed the Competitive Classifier ({CC}). The {CC} forwards discrete classification results to the {ID3} system, and accepts feedback from the {ID3} system. Through the use of feedback, the {ID3} system guides the {CC} into improving classifications.
@inproceedings{bertelsen_94,
    title        = {Extending {ID3} through Discretization of Continuous Inputs},
    author        = {Bertelsen, Rick and Martinez, Tony R.},
    booktitle    = {Proceedings of {FLAIRS}'94 Florida Artificial Intelligence Research Symposium},
    pages        = {122--125},
    year        = {1994},

    }
[221] Barker, J. Cory, Eclectic Machine Learning, 1994, Brigham Young University, February. [Abstract] [Bibtex]
This dissertation presents a family of inductive learning systems that derive general rules from specific examples. These systems combine the benefits of neural networks, {ASOCS}, and symbolic learning algorithms. The systems presented here learn incrementally with good speed and generalization. They are based on a parallel architectural model that adapts to the problem being learned. Learning is done without requiring user adjustment of sensitive parameters, and noise is tolerated with graceful degradation in performance. The systems described in this work are based on features. Features are subsets of the input space. One group of learning algorithms begins with general features and specializes those features to match the problem that is being learned. Another group creates specific features and then generalizes those features. The final group combines the approaches used in the first two groups to gain the benefits of both. The algorithms are O(m log m), where m is the number of nodes in the network, and the number of inputs and output values are treated as constants. An enhanced network topology reduces time complexity to O(log m). Empirical results show that the algorithms give good generalization and that learning converges in a small number of training passes.
@phdthesis{barker_diss,
    title        = {Eclectic Machine Learning},
    author        = {Barker, J. Cory},
    school    = {Brigham Young University},
    month        = {February},
    year        = {1994},

    }
[222] Bertelsen, Rick, Automatic Feature Extraction in Machine Learning, 1994, Brigham Young University. [Abstract] [Bibtex]
This thesis presents a machine learning model capable of extracting discrete classes out of continuous valued input features. This is done using a neurally inspired novel competitive classifier ({CC}) which feeds the discrete classifications forward to a supervised machine learning model. The supervised learning model uses the discrete classifications and perhaps other information available to solve a problem. The supervised learner then generates feedback to guide the {CC} into potentially more useful classifications of the continuous valued input features. Two supervised learning models are combined with the {CC} creating {ASOCS}-{AFE} and {ID3}-{AFE}. Both models are simulated and the results are analyzed. Based on these results, several areas of future research are proposed.
@mastersthesis{bertelsen_th,
    title        = {Automatic Feature Extraction in Machine Learning},
    author        = {Bertelsen, Rick},
    school    = {Brigham Young University},
    year        = {1994},

    }
[223] Stout, Matthew and Salmon, Linton and Rudolph, George L. and Martinez, Tony R., A Multi-Chip Module Implementation of a Neural Network, Proceedings of the IEEE Multi-Chip Module Conference MCMC-94, 20--25, 1994. [Abstract] [Bibtex]
The requirement for dense interconnect in artificial neural network systems has led researchers to seek high-density interconnect technologies. This paper reports an implementation using multi-chip modules ({MCMs}) as the interconnect medium. The specific system described is a self-organizing, parallel, and dynamic learning model which requires a dense interconnect technology for effective implementation; this requirement is fulfilled by exploiting {MCM} technology. The ideas presented in this paper regarding an {MCM} implementation of artificial neural networks are versatile and can be adapted to apply to other neural network and connectionist models.
@inproceedings{stout_mcm,
    title        = {A Multi-Chip Module Implementation of a Neural Network},
    author        = {Stout, Matthew and Salmon, Linton and Rudolph, George L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IEEE} Multi-Chip Module Conference {MCMC}-94},
    pages        = {20--25},
    year        = {1994},

    }
[224] Van Horn, Kevin S., Learning as Optimization, 1994, Brigham Young University, August. [Bibtex]
@phdthesis{vanhorn_6,
    title        = {Learning as Optimization},
    author        = {Van Horn, Kevin S.},
    school    = {Brigham Young University},
    month        = {August},
    year        = {1994},
    }
[225] Martinez, Tony R. and Hughes, Brent W. and Campbell, Douglas M., Priority ASOCS, Journal of Artificial Neural Networks , 3, 403--429, 1994. [Abstract] [Bibtex]
This paper presents an {ASOCS} (Adaptive Self-Organizing Concurrent System) model for massively parallel processing of incrementally defined rule systems in such areas as adaptive logic, robotics, logical inference, and dynamic control. An {ASOCS} is an adaptive network composed of many simple computing elements operating asynchronously and in parallel. An {ASOCS} can operate in either a data processing mode or a learning mode. During data processing mode, an {ASOCS} acts as a parallel hardware circuit. During learning mode, an {ASOCS} incorporates a rule expressed as a Boolean conjunction in a distributed fashion in time logarithmic in the number of rules. This paper proposes a learning algorithm and architecture for Priority {ASOCS}. This new {ASOCS} model uses rules with priorities. The new model has significant learning time and space complexity improvements over previous models.
@article{martinez_94a,
    title        = {Priority {ASOCS}},
    author        = {Martinez, Tony R. and Hughes, Brent W. and Campbell, Douglas M.},
    journal        = {Journal of Artificial Neural Networks },
    pages        = {403--429},
    volume        = {3},
    year        = {1994},

    }
[226] Andersen, Tim L. and Martinez, Tony R., Learning and Generalization with Bounded Order Critical Feature Sets, Proceedings of the AI'93 Australian Joint Conference on Artificial Intelligence, 450, 1993. [Abstract] [Bibtex]
It is the case that many real world learning problems exhibit a great deal of regularity. It is likely that the solutions to learning problems which exhibit such regularity can be approximated utilizing only simple (low-order) features gathered from analysis of pre-classified examples. However, little specific work has been done to demonstrate the utility of low-order features. This paper presents methods for gathering low order features from an existing data set of preclassified examples, and using these features for classification of novel patterns from the problem domain. It then conducts experiments using the methods presented on several real world classification problems and reports the results. The results show that pattern classification methods involving low-order feature sets have promise and warrant further research.
@inproceedings{andersen_93a,
    title        = {Learning and Generalization with Bounded Order Critical Feature Sets},
    author        = {Andersen, Tim L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {AI}'93 Australian Joint Conference on Artificial Intelligence},
    pages        = {450},
    year        = {1993},

    }
[227] Giraud-Carrier, Christophe, A Precept-Driven Learning Algorithm, 1993, Brigham Young University, April. [Abstract] [Bibtex]
Machine learning is an attempt at devising mechanisms that machines can use to learn, rather than being explicitly programmed for, real-world applications. The goal of learning systems is to generalize. Generalization is based on the set of critical features the system has available. Training set learners typically extract critical features from a random set of examples drawn from experimentation. This approach can beneficially be extended by endowing the system with some a priori knowledge, in the form of precepts. Advantages of the augmented system include speed-up, improved generalization and greater parsimony. This thesis presents a precept-driven learning algorithm. The main characteristics of the algorithm include: 1) neurally inspired architecture, 2) bounded learning and execution times, and 3) ability to handle both correct and incorrect precepts. Results of simulations on real-world data demonstrate promise.
@mastersthesis{cgc_th,
    title        = {A Precept-Driven Learning Algorithm},
    author        = {Giraud-Carrier, Christophe},
    school    = {Brigham Young University},
    month        = {April},
    year        = {1993},

    }
[228] Giraud-Carrier, Christophe and Martinez, Tony R., Using Precepts to Augment Training Set Learning, Proceedings of the First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems (ANNES'93), 46--51, 1993, November. [Abstract] [Bibtex]
The goal of learning systems is to generalize. Generalization is commonly based on the set of criticalfeatures the system has available. Training set learners typically extract critical features from a random set of examples. While this approach is attractive, it suffers from the exponential growth of the number of features to be searched. We propose to extend it by endowing the system with some a priori knowledge, in the form of precepts. Advantages of the augmented system are speed-up, improved generalization, and greater parsimony. This paper presents a precept-driven learning algorithm. Its main features include: 1) distributed implementation, 2) bounded learning and execution times, and 3) ability to handle both correct and incorrect precepts. Results of simulations on real-world data demonstrate promise.
@inproceedings{cgc_93a,
    title        = {Using Precepts to Augment Training Set Learning},
    author        = {Giraud-Carrier, Christophe and Martinez, Tony R.},
    booktitle    = {Proceedings of the First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems ({ANNES}'93)},
    pages        = {46--51},
    month        = {November},
    year        = {1993},

    }
[229] Barker, J. Cory and Martinez, Tony R., GS: A Network that Learns Important Features, Proceedings of The World Congress on Neural Networks, 3, 376--380, 1993, July. [Abstract] [Bibtex]
{GS} is a network for supervised inductive learning from examples that uses ideas from neural networks and symbolic inductive learning to gain benefits of both methods. The network is built of many simple nodes that learn important features in the input space and then monitor the ability of the features to predict output values. The network avoids the exponential nature of the number of features by using information gained by general features to guide the creation of more specific features. Empirical evaluation of the model on real world data has shown that the network provides good generalization performance. Convergence is accomplished within a small number of training passes. The network provides these benefits while automatically allocating and deleting nodes and without requiring user adjustment of any parameters. The network learns incrementally and operates in a parallel fashion.
@inproceedings{barker_93c,
    title        = {{GS}: A Network that Learns Important Features},
    author        = {Barker, J. Cory and Martinez, Tony R.},
    booktitle    = {Proceedings of The World Congress on Neural Networks},
    pages        = {376--380},
    volume        = {3},
    month        = {July},
    year        = {1993},

    }
[230] Barker, J. Cory and Martinez, Tony R., Learning and Generalization Controlled by Contradiction, Proceedings of The International Conference on Artificial Neural Networks, 1993. [Abstract] [Bibtex]
One page overview of the {SG} (Specific to General) learning model.
@inproceedings{barker_93b,
    title        = {Learning and Generalization Controlled by Contradiction},
    author        = {Barker, J. Cory and Martinez, Tony R.},
    booktitle    = {Proceedings of The International Conference on Artificial Neural Networks},
    year        = {1993},

    }
[231] Wilson, D. Randall and Martinez, Tony R., The Importance of Using Multiple Styles of Generalization, Proceedings of the First New Zealand International Conference on Artificial Neural Networks and Expert Systems (ANNES), 54--57, 1993, November. [Abstract] [Bibtex]
There are many ways for a learning system to generalize from training set data. There is likely no one style of generalization which will solve all problems better than any other style, for different styles will work better on some applications than others. This paper presents several styles of generalization and uses them to suggest that a collection of such styles can provide more accurate generalization than any one style by itself. Empirical results of generalizing on several real-world applications are given, and comparisons are made on the generalization accuracy of each style of generalization. The empirical results support the hypothesis that using multiple generalization styles can improve generalization accuracy.
@inproceedings{wilson.annes93.mult,
    title        = {The Importance of Using Multiple Styles of Generalization},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {Proceedings of the First New Zealand International Conference on Artificial Neural Networks and Expert Systems ({ANNES})},
    pages        = {54--57},
    month        = {November},
    year        = {1993},

    }
[232] Van Horn, Kevin S. and Martinez, Tony R., The BBG Rule Induction Algorithm, Proceedings of the 6th Australian Joint Conference on Artificial Intelligence, 348--355, 1993, November, Melbourne, Australia. [Abstract] [Bibtex]
We present an algorithm ({BBG}) for inductive learning from examples that outputs a rule list. {BBG} uses a combination of greedy and branch-and-bound techniques, and naturally handles noisy or stochastic learning situations. We also present the results of an empirical study comparing {BBG} with Quinlan's C4.5 on 1050 synthetic data sets. We find that {BBG} greatly outperforms C4.5 on rule-oriented problems, and equals or exceeds C4.5's performance on tree-oriented problems.
@inproceedings{vanhorn_1,
    title        = {The {BBG} Rule Induction Algorithm},
    author        = {Van Horn, Kevin S. and Martinez, Tony R.},
    booktitle    = {Proceedings of the 6th Australian Joint Conference on Artificial Intelligence},
    pages        = {348--355},
    month        = {November},
    year        = {1993},
    address    = {Melbourne, Australia},

    }
[233] Van Horn, Kevin S. and Martinez, Tony R., The Design and Evaluation of a Rule Induction Algorithm, 1993, Technical Report BYU-CS-93-11, November. [Abstract] [Bibtex]
This technical report expands on and fills in some of the details of the paper ``The {BBG} Rule Induction Algorithm''
@techreport{vanhorn_2,
    title        = {The Design and Evaluation of a Rule Induction Algorithm},
    author        = {Van Horn, Kevin S. and Martinez, Tony R.},
    institution    = {Technical Report {BYU}-{CS}-93-11},
    month        = {November},
    year        = {1993},

    }
[234] Martinez, Tony R. and Hughes, Brent W., Towards a General Distributed Platform for Learning and Generalization, Proceedings of the Conference on Artificial Neural Networks and Expert Systems ANNES'93, 216--219, 1993. [Abstract] [Bibtex]
Different learning models employ different styles of generalization on novel inputs. This paper proposes the need for multiple styles of generalization to support a broad application base. The Priority {ASOCS} model (Priority Adaptive Self-Organizing Concurrent System) is overviewed and presented as a potential platform which can support multiple generalization styles. {PASOCS} is an adaptive network composed of many simple computing elements operating asynchronously and in parallel. The {PASOCS} can operate in either a data processing mode or a learning mode. During data processing mode, the system acts as a parallel hardware circuit. During learning mode, the {PASOCS} incorporates rules, with attached priorities, which represent the application being learned. Learning is accomplished in a distributed fashion in time logarithmic in the number of rules. The new model has significant learning time and space complexity improvements over previous models.
@inproceedings{martinez_93b,
    title        = {Towards a General Distributed Platform for Learning and Generalization},
    author        = {Martinez, Tony R. and Hughes, Brent W.},
    booktitle    = {Proceedings of the Conference on Artificial Neural Networks and Expert Systems {ANNES}'93},
    pages        = {216--219},
    year        = {1993},

    }
[235] Martinez, Tony R. and Rudolph, George L., A Learning Model for Adaptive Network Routing, Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications IWANNT'93, 183--187, 1993. [Abstract] [Bibtex]
Increasing size, complexity, and dynamics of networks require adaptive routing mechanisms. This paper proposes initial concepts towards a learning and generalization mechanism to support adaptive real-time routing. An {ASOCS} learning model is employed as the basic adaptive router. Generalization of routing is based not only on source/destination address, but also on such factors as packet size, priority, privacy, network congestion, etc. Mechanisms involving continual adaptation based on feedback are presented. Extensions to conventional addressing which can support learning and generalization are proposed.
@inproceedings{martinez_93c,
    title        = {A Learning Model for Adaptive Network Routing},
    author        = {Martinez, Tony R. and Rudolph, George L.},
    booktitle    = {Proceedings of the International Workshop on Applications of Neural Networks to Telecommunications {IWANNT}'93},
    pages        = {183--187},
    year        = {1993},

    }
[236] Martinez, Tony R. and Barker, J. Cory and Giraud-Carrier, Christophe, A Generalizing Adaptive Discriminant Network, Proceedings of the World Congress on Neural Networks, 1, 613--616, 1993. [Abstract] [Bibtex]
This paper overviews the {AA1} (Adaptive Algorithm 1) model of {ASOCS} the (Adaptive Self-Organizing Concurrent Systems) approach. It also presents promising empirical generalization results of {AA1} with actual data. {AA1} is a topologically dynamic network which grows to fit the problem being learned. {AA1} generalizes in a self-organizing fashion to a network which seeks to find features which discriminate between concepts. Convergence to a training set is both guaranteed and bounded linearly in time.
@inproceedings{martinez_93a,
    title        = {A Generalizing Adaptive Discriminant Network},
    author        = {Martinez, Tony R. and Barker, J. Cory and Giraud-Carrier, Christophe},
    booktitle    = {Proceedings of the World Congress on Neural Networks},
    pages        = {613--616},
    volume        = {1},
    year        = {1993},

    }
[237] Wilson, D. Randall and Martinez, Tony R., The Potential of Prototype Styles of Generalization, The Sixth Australian Joint Conference on Artificial Intelligence (AI '93), 356--361, 1993, November. [Abstract] [Bibtex]
There are many ways for a learning system to generalize from training set data. This paper presents several generalization styles using prototypes in an attempt to provide accurate generalization on training set data for a wide variety of applications. These general- ization styles are efficient in terms of time and space, and lend themselves well to massively parallel architectures. Empirical results of generalizing on several real-world applications are given, and these results indicate that the prototype styles of generalization presented have potential to provide accurate generalization for many applications.
@inproceedings{wilson.ai93.proto,
    title        = {The Potential of Prototype Styles of Generalization},
    author        = {Wilson, D. Randall and Martinez, Tony R.},
    booktitle    = {The Sixth Australian Joint Conference on Artificial Intelligence ({AI} '93)},
    pages        = {356--361},
    month        = {November},
    year        = {1993},

    }
[238] Barker, J. Cory and Martinez, Tony R., Generalization by Controlled Intersection of Examples, Proceedings of The Sixth Australian Joint Conference on Artificial Intelligence, 323--327, 1993. [Abstract] [Bibtex]
{SG} (Specific to General) is a network for supervised inductive learning from examples that uses ideas from neural networks and symbolic inductive learning to gain benefits of both methods. The network is built of many simple nodes that learn important features in the input space and then monitor the ability of the features to predict output values. The network avoids the exponential nature of the number of features by creating specific features for each example and then expanding those features; making them more general. Expansion of a feature terminates when it encounters another feature with contradicting outputs. Empirical evaluation of the model on real-world data has shown that the network provides good generalization performance. Convergence is accomplished within a small number of training passes. The network provides these benefits while automatically allocating and deleting nodes and without requiring user adjustment of any parameters. The network learns incrementally and operates in a parallel fashion.
@inproceedings{barker_93a,
    title        = {Generalization by Controlled Intersection of Examples},
    author        = {Barker, J. Cory and Martinez, Tony R.},
    booktitle    = {Proceedings of The Sixth Australian Joint Conference on Artificial Intelligence},
    pages        = {323--327},
    year        = {1993},

    }
[239] Kemsley, David and Martinez, Tony R. and Campbell, Douglas M., A Survey of Neural Network Research and Fielded Applications, International Journal of Neural Networks, 2/3/4, 123--133, 1992. [Abstract] [Bibtex]
This paper gives a tabular presentation of approximately one hundred current neural network applications at different levels of maturity, from research to fielded products. The goal of this paper is not to be exhaustive, but to give a sampling overview demonstrating the diversity and amount of current application effort in different areas. The paper should aid both researchers and implementors to understand the diverse and potential impact of neural networks in real world applications. Tabular information is given regarding different features of neural network application efforts including model used, types of input and output data, accuracy, and research status. An extended bibliography allows a mechanism for further study into promising areas.
@article{kemsley_92,
    title        = {A Survey of Neural Network Research and Fielded Applications},
    author        = {Kemsley, David and Martinez, Tony R. and Campbell, Douglas M.},
    journal    = {International Journal of Neural Networks},
    pages        = {123--133},
    volume        = {2/3/4},
    year        = {1992},

    }
[240] Hart, Edward F., Extending ASOCS to Training-Set-Style Data, 1992, Brigham Young University, August. [Abstract] [Bibtex]
This thesis studies extensions to help the basic {ASOCS} neural network handle training set data. Background is discussed on the need for neural networks. Instance set math for the {ASOCS} model is presented. Six data sets are gathered and mapped into the {ASOCS} neural net. Tests are done with the data sets to assist in the discovery of several problems. Some of these problems have solutions suggested in this thesis.\\\\ Mutual exclusion, arbitrary discretization, One Difference minimization, noisy and missing data, generalization using a Hamming distance metric are all discussed during the research section. As a result of these extensions to the basic {ASOCS} model, training-set-style data is recognized much better than without the extensions. Further research using large test sets, other generalization techniques, and different discretization techniques is suggested.
@mastersthesis{hart_th,
    title        = {Extending {ASOCS} to Training-Set-Style Data},
    author        = {Hart, Edward F.},
    school    = {Brigham Young University},
    month        = {August},
    year        = {1992},

    }
[241] Martinez, Tony R., ASOCS: Towards Bridging Neural Network and Artificial Intelligence Learning, Proceedings of the 2nd Government Neural Network Workshop, 1991. [Abstract] [Bibtex]
A new class of connectionist architectures is presented called {ASOCS} (Adaptive Self-Organizing Concurrent Systems) [3,4]. {ASOCS} models support efficient computation through self-organized learning and parallel execution. Learning is done through the incremental presentation of rules and/or examples. Data types include Boolean and multi-state variables; recent models support analog variables. The model incorporates rules into an adaptive logic network in a parallel and self organizing fashion. The system itself resolves inconsistencies and generalizes as the rules are presented. After an introduction to the {ASOCS} paradigm, the abstract introduces current research thrusts which significantly increase the power and applicability of {ASOCS} models. For simplicity, we discuss only boolean mappings in the {ASOCS} overview.
@inproceedings{martinez_91c,
    title        = {{ASOCS}: Towards Bridging Neural Network and Artificial Intelligence Learning},
    author        = {Martinez, Tony R.},
    booktitle    = {Proceedings of the 2nd Government Neural Network Workshop},
    year        = {1991},

    }
[242] Rudolph, George L. and Martinez, Tony R., An Efficient Static Topology for Modeling ASOCS, Artificial Neural Networks, 729--734, 1991, Kohonen, et. al., Elsevier Science Publishers. [Abstract] [Bibtex]
{ASOCS} (Adaptive Self-Organizing Concurrent Systems) are a class of connectionist computational models which feature self-organized learning and parallel execution. One area of {ASOCS} research is the development of an efficient implementation of {ASOCS} using current technology. A result of this research is the Location-Independent {ASOCS} model ({LIA}). {LIA} uses a parallel, asynchronous network of independent nodes, which leads to an efficient physical realization using current technology. This paper reviews the behavior of the {LIA} model, and shows how a static binary tree topology efficiently supports that behavior. The binary tree topology allows for O(log($n$)) learning and execution times, where n is the number of nodes in the network.
@inproceedings{rudolph_91a,
    title        = {An Efficient Static Topology for Modeling {ASOCS}},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    editor        = {Kohonen, et. al.},
    booktitle    = {Artificial Neural Networks},
    pages        = {729--734},
    year        = {1991},
    publisher    = {Elsevier Science Publishers},

    }
[243] Martinez, Tony R. and Campbell, Douglas M., A Self-Adjusting Dynamic Logic Module, Journal of Parallel and Distributed Computing, 4, 303--313, 1991. [Abstract] [Bibtex]
This paper presents an {ASOCS} (Adaptive Self-Organizing Concurrent System) model for massively parallel processing of incrementally defined rule systems in such areas as adaptive logic, robotics, logical inference, and dynamic control. An {ASOCS} is an adaptive network composed of many simple computing elements operating asynchronously and in parallel. This paper focuses on Adaptive Algorithm 2 ({AA2}) and details its architecture and learning algorithm. {AA2} has significant memory and knowledge maintenance advantages over previous {ASOCS} models. An {ASOCS} can operate in either a data processing mode or a learning mode. During learning mode, the {ASOCS} is given a new rule expressed as a boolean conjunction. The {AA2} learning algorithm incorporates the new rule in a distributed fashion in a short, bounded time. During data processing mode, the {ASOCS} acts as a parallel hardware circuit.
@article{martinez_91a,
    title        = {A Self-Adjusting Dynamic Logic Module},
    author        = {Martinez, Tony R. and Campbell, Douglas M.},
    journal    = {Journal of Parallel and Distributed Computing},
    pages        = {303--313},
    volume        = {4},
    year        = {1991},

    }
[244] Martinez, Tony R. and Campbell, Douglas M., A Self-Organizing Binary Decision Tree for Incrementally Defined Rule Based Systems, IEEE Transactions on Systems, Man, and Cybernetics, 5, 1231--1238, 1991. [Abstract] [Bibtex]
This paper presents an {ASOCS} (adaptive self-organizing concurrent system) model for massively parallel processing of incrementally defined rule systems in such areas as adaptive logic, robotics, logical inference, and dynamic control. An {ASOCS} is an adaptive network composed of many simple computing elements operating asynchronously and in parallel. This paper focuses on adaptive algorithm 3 ({AA3}) and details its architecture and learning algorithm. It has advantages over previous {ASOCS} models in simplicity, implementability, and cost. An {ASOCS} can operate in either a data processing mode or a learning mode. During the data processing mode, an {ASOCS} acts as a parallel hardware circuit. In learning mode, rules expressed as boolean conjunctions are incrementally presented to the {ASOCS}. All {ASOCS} learning algorithms incorporate a new rule in a distributed fashion in a short, bounded time.
@inproceedings{martinez_91b,
    title        = {A Self-Organizing Binary Decision Tree for Incrementally Defined Rule Based Systems},
    author        = {Martinez, Tony R. and Campbell, Douglas M.},
    booktitle    = {{IEEE} Transactions on Systems, Man, and Cybernetics},
    pages        = {1231--1238},
    volume        = {5},
    year        = {1991},

    }
[245] Rudolph, George L., A Location-Independent ASOCS Model, 1991, Brigham Young University, June. [Abstract] [Bibtex]
This thesis proposes a new Adaptive Self-Organizing Concurrent System ({ASOCS}) model called Location-Independent {ASOCS} ({LIA}). The location-independent properties of {LIA} give it improved potential for implementation using current technology over previous {ASOCS} models. This thesis focuses on an architectural model and execution as well as on learning algorithms for {LIA}. Theoretical background, an analysis of {LIA}, and a brief comparison with other {ASOCS} models also are presented. The thesis does not discuss details of a hardware implementation of {LIA} beyond the model level.
@mastersthesis{rudolph_th,
    title        = {A Location-Independent {ASOCS} Model},
    author        = {Rudolph, George L.},
    school    = {Brigham Young University},
    month        = {June},
    year        = {1991},

    }
[246] McDonald, Kelly C. and Martinez, Tony R. and Campbell, Douglas M., A Connectionist Method for Adaptive Real-Time Network Routing, Proceedings of the 4th International Symposium on Artificial Intelligence, 371--377, 1991. [Abstract] [Bibtex]
This paper proposes a connectionist mechanism to support adaptive real-time routing in computer networks. In particular, an Adaptive Self-Organizing Concurrent System ({ASOCS}) model is used as the basic network router. {ASOCS} are connectionist models which achieve learning and processing in a parallel and self-organizing fashion. By exploiting parallel processing the {ASOCS} network router addresses the increased speed and complexity in computer networks. By using the {ASOCS} adaptive learning paradigm, a network router can utilize more flexible routing algorithms.
@inproceedings{mcdonald_91,
    title        = {A Connectionist Method for Adaptive Real-Time Network Routing},
    author        = {McDonald, Kelly C. and Martinez, Tony R. and Campbell, Douglas M.},
    booktitle    = {Proceedings of the 4th International Symposium on Artificial Intelligence},
    pages        = {371--377},
    year        = {1991},

    }
[247] Martinez, Tony R., Smart Memory: The Memory Processor Model, IFIP International Conference, 1990. In Modeling the Innovation: Communications, Automation and Information Systems, Carnevale, Lucertini, and Nicosia (Eds), North-Holland, 481--488, 1990. [Abstract] [Bibtex]
This paper overviews and proposes a class of smart memory devices called the memory processor model. Smart memory entails the tight coupling of memory and logic. The model seeks to alleviate the von Neumann bottleneck, take advantage of technology trends, improve overall system speed, and add encapsulation advantages. Speed is increased through locality of processing, communication savings, higher-level functionality, and parallelism. Parallelism is exploited at both the micro and macro levels. Data objects are accessed through descriptors, which give the memory a meta-knowledge concerning the objects, allowing for nontraditional access mechanisms. Both data types and operations are programmable. Innovative processing schemes, coupled with emerging technology densities, allow for substantial speed-up in traditional and novel memory operations. Three important paradigms introduced are descriptor processing, where operations are accomplished without access to the actual data, associative descriptor processing, supporting highly parallel access and processing, and the single-program multiple-data method, allowing parallelism by simultaneous processing of data objects distributed amongst multiple smart memories. Examples of specific operations are presented. This papers presents initial studies into the smart memory mechanism with the goal of describing its potential and stimulating further work.
@inproceedings{martinez_90a,
    title        = {Smart Memory: The Memory Processor Model},
    author        = {Martinez, Tony R.},
    booktitle    = {{IFIP} International Conference, 1990. In Modeling the Innovation: Communications, Automation and Information Systems, Carnevale, Lucertini, and Nicosia (Eds), North-Holland},
    pages        = {481--488},
    year        = {1990},

    }
[248] Martinez, Tony R., Progress in Neural Networks, ch. 5, 1, 105--126, 1990, Adaptive Self-Organizing Concurrent Systems, Omidvar, Omid, Ablex Publishing. [Abstract] [Bibtex]
(Book chapter that overviews the first three {ASOCS} models.)
@inbook{martinez_90c,
    chapter        = {Adaptive Self-Organizing Concurrent Systems},
    author        = {Martinez, Tony R.},
    editor        = {Omidvar, Omid},
    title    = {Progress in Neural Networks, ch. 5},
    pages        = {105--126},
    volume        = {1},
    year        = {1990},
    note        = {Ablex Publishing},

    }
[249] Martinez, Tony R., Consistency and Generalization of Incrementally Trained Connectionist Models, Proceedings of the International Symposium on Circuits and Systems, 706--709, 1990. [Abstract] [Bibtex]
This paper discusses aspects of consistency and generalization in connectionist networks which learn through incremental training by examples or rules. Differences between training set learning and incremental rule or example learning are presented. Generalization, the ability to output reasonable mappings when presented with novel input patterns, is discussed in light of the above learning methods. In particular, the contrast between hamming distance generalization and generalizing by high order combinations of critical variables is overviewed. Examples of detailed rules for an incremental learning model are presented for both consistency and generalization constraints.
@inproceedings{martinez_90b,
    title        = {Consistency and Generalization of Incrementally Trained Connectionist Models},
    author        = {Martinez, Tony R.},
    booktitle    = {Proceedings of the International Symposium on Circuits and Systems},
    pages        = {706--709},
    year        = {1990},

    }
[250] Martinez, Tony R., Smart Memory Architecture and Methods, Future Generation Computer Systems, 6, 145--162, 1990. [Abstract] [Bibtex]
This paper discusses potential functionalities of smart memories. Smart memory entails the tight coupling of memory and logic. A specific architecture called the memory processor model is proposed. The model seeks to alleviate the von Neumann bottleneck, take advantage of technology trends, improve overall system speed, and add encapsulation advantages. Speed is increased through locality of processing, communication savings, higher-level functionality, and parallelism. Data objects are accessed through descriptors, which give the memory a meta-knowledge concerning the objects, allowing for nontraditional access mechanisms. Both data types and operations are programmable, and the model is streamlined for memory operations and services. Innovative processing schemes, coupled with emerging technology densities, allow for substantial fine-grain parallelism in traditional and novel memory operations. Three important paradigms introduced are descriptor processing, where operations are accomplished without access to the actual data, associative descriptor processing, supporting highly parallel access and processing, and the single-program multiple-data method, allowing parallelism by simultaneous processing of data objects distributed amongst multiple smart memories. Examples of specific operations are presented. This paper presents initial studies into the smart memory mechanism with the goal of describing its potential and stimulating further work.
@article{martinez_90d,
    title        = {Smart Memory Architecture and Methods},
    author        = {Martinez, Tony R.},
    journal    = {Future Generation Computer Systems},
    pages        = {145--162},
    volume        = {6},
    year        = {1990},

    }
[251] Rudolph, George L. and Martinez, Tony R., DNA: A New ASOCS Model With Improved Implementation Potential, Proceedings of the IASTED International Symposium on Expert Systems and Neural Networks, 12--15, 1989. [Abstract] [Bibtex]
A new class of high-speed, self-adaptive, massively parallel computing models called {ASOCS} (Adaptive Self-Organizing Concurrent Systems) has been proposed. Current analysis suggests that there may be problems implementing {ASOCS} models in {VLSI} using the hierarchical network structures originally proposed. The problems are not inherent in the models, but rather in the technology used to implement them. This has led to the development of a new {ASOCS} model called {DNA} (Discriminant-Node {ASOCS}) that does not depend on a hierarchical node structure for success. Three areas of the {DNA} model are briefly discussed in this paper: {DNA}'s flexible nodes, how {DNA} overcomes problems other models have allocating unused nodes, and how {DNA} operates during processing and learning.
@inproceedings{rudolph_89a,
    title        = {{DNA}: A New {ASOCS} Model With Improved Implementation Potential},
    author        = {Rudolph, George L. and Martinez, Tony R.},
    booktitle    = {Proceedings of the {IASTED} International Symposium on Expert Systems and Neural Networks},
    pages        = {12--15},
    year        = {1989},

    }
[252] Hughes, Brent W., Prioritized Rule Systems, 1989, Brigham Young University, November. [Abstract] [Bibtex]
Non-von Neumann architectures attempt to overcome the ``word-at-a-time'' bottleneck of traditional computing systems. Neural nets are a class of non-von architectures whose goal is, among other things, to learn input-output mappings using highly distributed processing and memory. A neural net consists of many simple processing elements (nodes) with modifiable links between them, allowing for a high degree of parallelism. A typical neural net has fixed a topology. It learns by modifying the ``weights'' or ``conductances'' of the links between nodes.\\\\ Another model with similar goals, called {ASOCS}, learns by modifying its topology. Unlike a typical neural net, {ASOCS} is guaranteed to be able to represent and learn any desired mapping and to do so efficiently. This thesis presents an extension of {ASOCS} called Prioritized Rules Systems. The {PRS} abstract model provides a foundation for various architectural models that have a number of advantages over other {ASOCS} models. One example of such an architectural model is presented in the thesis along with a description of a program written to simulate that model. The processing and learning algorithms of the architectural model are based on theorems proved in the thesis.
@mastersthesis{hughes_th,
    title        = {Prioritized Rule Systems},
    author        = {Hughes, Brent W.},
    school    = {Brigham Young University},
    month        = {November},
    year        = {1989},

    }
[253] Martinez, Tony R. and Lindsey, M., On the Pseudo Multilayer Learning of Backpropagation, Proceedings of the IEEE Symposium on Parallel and Distributed Processing, 308--315, 1989. [Abstract] [Bibtex]
Rosenblatt's convergence theorem for the simple perceptron initiated much excitement about iterative weight modifying neural networks. However, this convergence only holds for the class of linearly separable functions, which is vanishingly small compared to arbitrary functions. With multilayer networks of nonlinear units it is possible, though not guaranteed, to solve arbitrary functions. Backpropagation is a method of training multilayer networks to converge to the solution of arbitrary functions. This paper describes how classification takes place in single and multilayer networks using threshold or sigmoid nodes. It then shows that the current backpropagation method can only do effective learning on one layer of a network at a time.
@inproceedings{martinez_89a,
    title        = {On the Pseudo Multilayer Learning of Backpropagation},
    author        = {Martinez, Tony R. and Lindsey, M.},
    booktitle    = {Proceedings of the {IEEE} Symposium on Parallel and Distributed Processing},
    pages        = {308--315},
    year        = {1989},

    }
[254] Martinez, Tony R., Neural Network Applicability: Classifying the Problem Space, Proceedings of the IASTED International Symposium on Expert Systems and Neural Networks, 41--44, 1989. [Abstract] [Bibtex]
The tremendous current effort to propose neurally inspired methods of computation forces closer scrutiny of real world application potential of these models. This paper categorizes applications into classes and particularly discusses features of applications which make them efficiently amenable to neural network methods. Computational machines do deterministic mappings of inputs to outputs and many computational mechanisms have been proposed for problem solutions. Neural network features include parallel execution, adaptive learning, generalization, and fault tolerance. Often, much effort is given to a model and applications which can already be implemented in a much more efficient way with an alternate technology. Neural networks are potentially powerful devices for many classes of applications, but not all. However, it is proposed that the class of applications for which neural networks are efficient is both large and commonly occurring in nature. Comparison of supervised, unsupervised, and generalizing systems is also included.
@inproceedings{martinez_89b,
    title        = {Neural Network Applicability: Classifying the Problem Space},
    author        = {Martinez, Tony R.},
    booktitle    = {Proceedings of the {IASTED} International Symposium on Expert Systems and Neural Networks},
    pages        = {41--44},
    year        = {1989},

    }
[255] Martinez, Tony R. and Vidal, J. J., Adaptive Parallel Logic Networks, Journal of Parallel and Distributed Computing, 1, 26--58, 1988, February. [Abstract] [Bibtex]
This paper presents a novel class of special purpose processors referred to as ASOCS (adaptive self-organizing concurrent systems). Intended applications include adaptive logic devices, robotics, process control, system malfunction management, and in general, applications of logic reasoning. ASOCS combines massive parallelism with self-organization to attain a distributed mechanism for adaptation. The ASOCS approach is based on an adaptive network composed of many simple computing elements (nodes) which operate in a combinational and asynchronous fashion. Problem specification (programming) is obtained by presenting to the system if-then rules expressed as Boolean conjunctions. New rules are added incrementally. In the current model, when conflicts occur, precedence is given to the most recent inputs. With each rule, desired network response is simply presented to the system, following which the network adjusts itself to maintain consistency and parsimony of representation. Data processing and adaptation form two separate phases of operation. During processing, the network acts as a parallel hardware circuit. Control of the adaptive process is distributed among the network nodes and efficiently exploits parallelism.
@article{martinez_88d,
    title        = {Adaptive Parallel Logic Networks},
    author        = {Martinez, Tony R. and Vidal, J. J.},
    journal    = {Journal of Parallel and Distributed Computing},
    pages        = {26--58},
    volume        = {1},
    month        = {February},
    year        = {1988},
    }
[256] Martinez, Tony R., ASOCS: A Multilayered Connectionist Network with Guaranteed Learning of Arbitrary Mappings, 2nd IEEE International Conference on Neural Networks, August, 1988, August. [Abstract] [Bibtex]
This paper reviews features of a new class of multilayer connectionist architectures known as {ASOCS} (Adaptive Self-Organizing Concurrent Systems). {ASOCS} is similar to most decision-making neural network models in that it attempts to learn an adaptive set of arbitrary vector mappings. However, it differs dramatically in its mechanisms. {ASOCS} is based on networks of adaptive digital elements which self-modify using local information. Function specification is entered incrementally by use of rules, rather than complete input-output vectors, such that a processing network is able to extract critical features from a large environment and give output in a parallel fashion. Learning also uses parallelism and self-organization such that a new rule is completely learned in time linear with the depth of the network. The model guarantees learning of any arbitrary mapping of boolean input-output vectors. The model is also stable in that learning does not erase any previously learned mappings except those explicitly contradicted.
@inproceedings{martinez_88c,
    title        = {{ASOCS}: A Multilayered Connectionist Network with Guaranteed Learning of Arbitrary Mappings},
    author        = {Martinez, Tony R.},
    booktitle    = {2nd {IEEE} International Conference on Neural Networks, August},
    month        = {August},
    year        = {1988},

    }
[257] Martinez, Tony R., On the Expedient Use of Neural Networks, 1, 1988, Neural Networks, S1, p. 552, Presented at the 1st Meeting of the International Neural Network Society. [Abstract] [Bibtex]
(Single Page Paper)
@misc{martinez_88a,
    title        = {On the Expedient Use of Neural Networks},
    author        = {Martinez, Tony R.},
    howpublished    = {Neural Networks, S1, p. 552, Presented at the 1st Meeting of the International Neural Network Society},
    volume        = {1},
    year        = {1988},

    }
[258] Martinez, Tony R., Digital Neural Networks, Proceedings of the 1988 IEEE Systems, Man, and Cybernetics Conference, 681--684, 1988. [Abstract] [Bibtex]
Demands for applications requiring massive parallelism in symbolic environments have given rebirth to research in models labeled as neural networks. These models are made up of many simple nodes which are highly interconnected such that computation takes place as data flows amongst the nodes of the network. To present, most models have proposed nodes based on simple analog functions, where inputs are multiplied by weights and summed, the total then optionally being transformed by an arbitrary function at the node. Learning in these systems is accomplished by adjusting the weights on the input lines. This paper discusses the use of digital (boolean) nodes as a primitive building block in connectionist systems. Digital nodes naturally engender new paradigms and mechanisms for learning and processing in connectionist networks. The digital nodes are used as the basic building block of a class of models called {ASOCS} (Adaptive Self-Organizing Concurrent Systems). These models combine massive parallelism with the ability to adapt in a self-organizing fashion. Basic features of standard neural network learning algorithms and those proposed using digital nodes are compared and contrasted. The latter mechanisms can lead to vastly improved efficiency for many applications.
@inproceedings{martinez_88b,
    title        = {Digital Neural Networks},
    author        = {Martinez, Tony R.},
    booktitle    = {Proceedings of the 1988 {IEEE} Systems, Man, and Cybernetics Conference},
    pages        = {681--684},
    year        = {1988},

    }
[259] Martinez, Tony R., Models of Parallel Adaptive Logic, Proceedings of the 1987 IEEE Systems, Man, and Cybernetics Conference, 290--296, 1987. [Abstract] [Bibtex]
This paper overviews a proposed architecture for adaptive parallel logic referred to as {ASOCS} (Adaptive Self-Organizing Concurrent System). The {ASOCS} approach is based on an adaptive network composed of many simple computing elements which operate in a parallel asynchronous fashion. Problem specification is given to the system by presenting if-then rules in the form of boolean conjunctions. Rules are added incrementally and the system adapts to the changing rule-base. Adaptation and data processing form two separate phases of operation. During processing the system acts as a parallel hardware circuit. The adaptation process is distributed amongst the computing elements and efficiently exploits parallelism. Adaptation is done in a self-organizing fashion and takes place in time linear with the depth of the network. This paper summarizes the overall {ASOCS} concept and overviews three specific architectures.
@inproceedings{martinez_87a,
    title        = {Models of Parallel Adaptive Logic},
    author        = {Martinez, Tony R.},
    booktitle    = {Proceedings of the 1987 {IEEE} Systems, Man, and Cybernetics Conference},
    pages        = {290--296},
    year        = {1987},

    }
[260] Martinez, Tony R., Adaptive Self-Organizing Logic Networks, 1986, UCLA Technical Report - CSD 860093, June, Ph.D. Dissertation. [Abstract] [Bibtex]
Along with the development of contemporary computer science the
limitations of sequential ``von Neumann'' machines have become more
apparent. It is now becoming clear that to handle projected needs
in speed and throughput, massively parallel architectures will be
needed. In this dissertation we propose a special purpose
architectural model that satisfies a general class of propositional
logic problems in a totally distributed and concurrent fashion. The
architectural model is identified as {ASOCS} (Adaptive Self-Organizing
Concurrent System). Problem specification is incremental and takes
the form of if-then rules (instances) expressed as Boolean conjunctions.
Possible applications include symbolic decision systems, propositional
production systems, digital pattern recognition and real-time process
control. The approach is based on an adaptive network composed of many
simple computing elements (nodes) which operate in a combinational and
asynchronous fashion. Control and processing in the network is
distributed amongst the network nodes. Adaptation and data processing
form two separate phases of operation. During processing, the network
acts as a parallel network of Boolean gates. Inputs and outputs of the
network are also Boolean. During adaptation the network structure and
the node functions can change to update the overall network function
as specified. As new rules are added to the rule base, the network
independently reconfigures to a logic circuit that remains both
minimal and consistent with the rule base. Thus, there is no explicit
programming. Desired network response is simply presented to the
system, following which the network adjusts itself accordingly.
Although the functionality of the network can be observed from
the outside, the internal network structure is unknown. The control
of the adaptive process is almost completely distributed and
efficiently exploits parallelism. Most communication takes place
between neighboring nodes with only minimal need for centralized
processing. The network modification is performed with considerable
concurrency and the adaptation time grows only linearly with the depth
of the network.
@techreport{martinez_diss,
    title        = {Adaptive Self-Organizing Logic Networks},
    author        = {Martinez, Tony R.},
    institution    = {{UCLA} Technical Report - {CSD} 860093},
    month        = {June},
    year        = {1986},
    note        = {Ph.D. Dissertation},
    }

Edit this publication list