dTcmd is a command line program that exploits (some of) the features of YaDT C++ classes in order to build decision trees. dTcmd takes a metadata table and a training table as inputs and it constructs a decision tree. There are command line options to specify the minimum number of cases to split a node and the confidence limits in pruning tree. Also, optional test table and scoring table may be specified. Tables can be in comma separated text files, gzipped text files, or in internal binary format. Built trees can be saved as PMML complaint XML documents, text files or in binary format.
Command line arguments:
> dTcmd32 <input options> <tree options> <output options>
dTcmd64 is the 64 bit compiled version of dTcmd32. It runs 10-15% faster than dTcmd32 both on Windows and Linux.
Command line options:
Input data options
Input data options for dTcmd:
The option -f <file> is a shorthand for -fm <file>.names -fd <file>.data
Tables are represented either:
Mixture of text files and gzipped text files are possible (e.g.,metadata being in a (gzipped) text file whilst training data being in a text file).
Tree construction options
The following parameters affect the tree construction algorithm:
Output options
The following options affect the outputs of dTcmd:
Zero,one of more of these options can be specified.
Text files
Text files code tables in comma-separated format. To change separator to the character c, use the option -sep <c>. For instance, -sep " " switcesh to space separated columns. Also,the special string "?" represent unknown/null values.
Gzipped text files
Gzipped text files are files with suffix .gz obtained by compressing text files with gzip.
Metadata table
Metadata tables have three columns, which in order represents:
For instance,the file golf.names
outlook,string,discrete temperature,integer,continuous humidity,integer,continuous windy,string,discrete toPlay,string,class
describes training data consisting of the following columns:
Trainig data table
Training data tables have a number of columns according to the metadata table. The order of columns must be consistent with the order of metadata table rows. Unknown values are not admitted when the column type is weights or class. Here it is the golf.data training data file:
sunny,85,85,false,1,Don't Play sunny,80,90,true,1,Don't Play overcast,83,78,false,1.5,Play rain,70,96,false,0.8,Play rain,68,80,false,2,Play rain,65,70,true,1,Don't Play overcast,64,65,true,2.5,Play sunny,72,95,false,1,Don't Play sunny,69,70,false,1,Play rain,75,80,false,1.5,Play sunny,75,70,true,3,Play overcast,72,90,true,1.5,Play overcast,81,75,false,1,Play rain,71,80,true,1,Don't Play
Binary data table
dTcmd may save and load a binary file containing a binary representation of a metadata table and a training table (see options, -bd <file> and -db <file>). Binary input/output is faster and binary file size is much less than text file size. However, binary files are not guarranteed to be readable from future/past version of YaDT!
Binary tree
dTcmd may save and load a binary file containing a binary representation of a decision tree (see options, -bt <file> and -tb <file>). Binary files are not guarranteed to be readable from future/past version of YaDT!
XML tree
dTcmd may save to a file or to standard output a PMML complaint XML representation of the built tree (see options, -x <file> and -xstd).
Confusion matrix and text trees
dTcmd may save to a file or to standard output a text representation of the built tree and of confusion matrix over training and test data (see options, -t <file> and -tstd).
Verbose log
dTcmd may save to a file or to standard output a verbose log of computation in progress (see options, -l <file> and -lstd).
Test data table
Test data table has exactly the same format of training data table.
Score data table
Score data table has the same format of training data table with the following exceptions:
An example score file for the golf example is the following:
overcast,80,75,false,1 rain,90,75,true,2 sunny,98,82,false,3 sunny,80,75,true,4 overcast,90,75,false,5 rain,78,82,false,6
Scored data table
Scoring a score data with a tree yields a scored data table in output as a text file containing in the same order of score data table:
An example score file for the golf score data table is the following:
1,Play,1 2,Don't Play,1 3,Don't Play,0.8 4,Play,1 5,Play,0.9 6,Play,1