On Greedy Search for Decision Tree Classifiers, their application in Astronomy (and a bit about clustering)

Usama Fayyad, Microsoft Research

I'll cover two topics in tree generation: greedy selection measures and discretization of numeric-valued attributes. In selection measures, I'll argue that the widely adopted class of imputiy measures (e.g. information entropy and gini index as used in CART) are in fact not the right class of measures for use in classification. I'll present an alternative class and show improvement results. I'll also present results on extending the binary discretization scheme used in CART to a multi-interval method. I'll use an application in astronomy* to illustrate how such techniques can provide powerful advantages over traditional statistical analysis methods and finally wrap-up with recent work on facing the challenge of extending the application to clustering over very large datasets to discover rare classes (e.g. finding new high-red-shift quasars in the universe).

*Astronomy application was performed at the Jet propulsion Laboratory, California Institute of Technology in collaboration with Caltech Astronomy/Palomar Observatory.