On Greedy Search for Decision Tree Classifiers,
their application in Astronomy (and a bit about clustering)
Usama Fayyad, Microsoft Research
I'll cover two topics in tree generation: greedy selection
measures and discretization of numeric-valued attributes.
In selection measures, I'll argue that the widely adopted
class of imputiy measures (e.g. information entropy and
gini index as used in CART) are in fact not the right class
of measures for use in classification. I'll present an alternative
class and show improvement results. I'll also present results
on extending the binary discretization scheme used in CART
to a multi-interval method. I'll use an application
in astronomy* to illustrate how such techniques can provide powerful
advantages over traditional statistical analysis methods and finally
wrap-up with recent work on facing the challenge of extending the
application to clustering over very large datasets to discover rare
classes (e.g. finding new high-red-shift quasars in the universe).
*Astronomy application was performed at the Jet propulsion Laboratory,
California Institute of Technology in collaboration with Caltech
Astronomy/Palomar Observatory.