Notice: Undefined variable: site_menusepcolor in /var/www/template.inc on line 63
|
Notice: Undefined variable: site_titleStack in /var/www/template.inc on line 70
Van Dam, Rob's Publications (detailed list) |
|
|
THIS PAGE IS NO LONGER MAINTAINED. Click here for our new publications list, which is more up-to-date.
This page contains the titles and abstracts of papers written by author Van Dam, Rob, a member of the BYU Neural Networks and Machine Learning (NNML) Research Group. Postscript files are available for most papers. A more concise list is available.
To view the entire list in one page, click here.
Adapting ADtrees for High Arity Features
- Authors: Rob Van Dam and Irene Geary and Dan Ventura
- Abstract:
ADtrees, a data structure useful for caching sufficient statistics, have been successfully adapted to grow lazily when memory is limited and to update sequentially with an incrementally updated dataset. For low arity symbolic features, ADtrees trade a slight increase in query time for a reduction in overall tree size. Unfortunately, for high arity features, the same technique can often result in a very large increase in query time and a nearly negligible tree size reduction. In the dynamic (lazy) version of the tree, both query time and tree size can increase for some applications. Here we present two modifications to the ADtree which can be used separately or in combination to achieve the originally intended space-time tradeoff in the ADtree when applied to datasets containing very high arity features.
- Reference: In Proceedings of the Association for the Advancement of Artificial Intelligence, pages 708–713, July 2008.
- BibTeX
- Download the file: pdf
ADtrees for Sequential Data and N-gram Counting
- Authors: Rob Van Dam and Dan Ventura
- Abstract:
We consider the problem of efficiently storing n-gram counts for large n over very large corpora. In such cases, the efficient storage of sufficient statistics can have a dramatic impact on system performance. One popular model for storing such data derived from tabular data sets with many attributes is the ADtree. Here, we adapt the ADtree to benefit from the sequential structure of corpora-type data. We demonstrate the usefulness of our approach on a portion of the well-known Wall Street Journal corpus from the Penn Treebank and show that our approach is exponentially more efficient than the naïve approach to storing n-grams and is also significantly more efficient than a traditional prefix tree.
- Reference: In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pages 492–497, October 2007.
- BibTeX
- Download the file: pdf
|
|
|
|