Changes in version 30.0.
1. Removed one prompt from multiple or longitudinal response option, requiring jump to version 30.
2. Splits on categorical variables with more than 11 levels changed to use splits on all LDA variables, with best split being the one minimizing total deviance.
3. Reduced the use of Wilson-Hilferty approximation.
4. Changed constant term to 0 at root node for proportional hazards
models.
5. Changed LaTeX trees to show treatment effects or hazard ratios when there is a treatment variable.
Changes in version 29.7.
1. If chosen-SE tree has no splits, the 0-SE tree is output.
2. Added number of training observations in caption of LaTeX diagram.
3. Added sorting of variable columns by name before CV subsetting.
4. Changed number of missing value column in summary output table to refer only to the observations used for training.
Changes in version 29.6.
1. Corrected a bug to do with piecewise simple polynomial least squares models.
2. Added output of proportions of observations in each arm when there is a treatment (R) variable.
Changes in version 29.5.
1. Ensured that the default split point selection for importance scoring is exahsutive search.
2. Improved variable selection procedure to look at splitting on next most significant variable if the current one does not yield admissible subnodes.
3. Changed default minimum node sample size to max(2,n/100).
Changes in version 29.4.
1. Corrected a bug caused by M variables in non-least squares models.
Changes in version 29.3.
1. Corrected a bug to do with missing values in linear splits for classification.
2. Fixed a problem to do with splits on categorical variables with more than 11 levels.
Changes in version 29.2.
1. Corrected some bugs to do with M variables in proportional hazards models.
2. Corrected some errors and typos in text and LaTeX output.
Changes in version 29.1.
1. Corrected a bug in splits on M variables.
2. Corrected some bugs in R and LaTeX output to do with M variables.
3. Corrected some bugs to do with P variables.
Changes in version 29.0.
1. Changed default SE pruning to 0-SE for stepwise regression.
2. Improved split point selection for polynomial models by using local mean imputation.
3. Introduced indicaor ("I") and periodic ("P") variables.
4. Corrected a bug in LaTeX output.
Changes in version 28.1.
1. Introduced missing-value flag ("M") variables.
2. Made many cosmetic changes to text, LaTeX and R outputs.
3. Changed random seed for cross-validation pruning to yield same results whether or not observations with zero weight are included or excluded.
4. For quantile regression, changed chi-squared test so that zero-residuals join the smaller of positive or negative residual categories.
5. Changed the default proportion of split points for importance scoring to 100 percent (exhaustive search).
6. Changed default to exhaustive search if number of observations with positive weight is not greater than 1 million.
Changes in version 28.0.
1. Removed the option for vertical vs sideways tree diagrams. As a result, all previous input files need regeneration.
2. Made exhaustive search the default if number of observations with positive weights is < 1 million.
3. Corrected display of "Node MSE" in output.
4. Changed colors in LaTeX diagrams to be more color-blind friendly.
5. Added a sign (positive or negative) in front of each fit variable to indicate its slope in LaTeX diagrams.
6. Improved spacing of labels on sides of nodes in LaTeX output.
Changes in version 27.9.
1. Corrected a bug that occurred when there is an R variable and only one C variable.
Changes in version 27.8.
1. Corrected a bug in Gi stepwise and multiple regression when there are B variables.
Changes in version 27.7.
1. Corrected a recently introduced bug in Gi when a node has only one spliitable N variable and no splittable S and C variables.
2. Added the number of terminal nodes in 0-SE tree to output.
Changes in version 27.6.
1. Corrected a mistake in multiple and stepwise linear options in Gi method.
Changes in version 27.5.
1. Changed "Classprior" to "Posterior" in output column label for classification trees.
2. Added minimum node sample size, minimum treatment sample sample (if applicable), and maximum number of split levels to LaTeX figure caption.
3. Corrected a bug to do with simple linear prognostic control with least squares and Gi method.
Changes in version 27.4
1. Reverted to original treatment of categorical variables for Gi method (no merging of categories).
2. Added columns of regression coefficients for treatment indicators in fitted value file when R variable is present.
3. Added columns of class proportions in fitted value file for classification.
4. Made 0-SE the default for Gi method with multiple linear option.
Changes in version 27.3
1. Added code to let program exit gracefully with error message if priors or misclassification costs files are incorrect.
2. Corrected a bug to do with writing fitted values for multiresponse variables.
Changes in version 27.2
1. Corrected some errors in R code when there is a treatment variable.
Changes in version 27.1
1. Added columns of class sample sizes to file of predicted values for classification with simple node models.
2. Improved trimming of terminal nodes to ensure no siblings have same predictions in classification.
3. Improved display of node infor in LaTeX classification trees.
Changes in version 27.0
1. Added class sizes to root node of LaTeX diagrams for classification.
2. Corrected some bugs to do with test-sample pruning.
Changes in version 26.9
1. Removed a trap in SELECT that caused program to abhort when all total costs are infinite in EXHAUSTQ.
2. Corrected some formatting errors in output.
3. For Gi method, categorical predictor levels merged to 4 levels.
4. Corrected R output for multiresponse option.
Changes in version 26.8
1. Corrected some bugs in R output files.
2. Corrected a bug to do with minimum node size.
Changes in version 26.7
1. Increased range of minimum node sizes for non-default option.
2. Made default SE=0 for propensity scoring.
3. Changed minimum number of each treatment to 1 in each node for propensity scoring.
4. Added output line "Run GUIDE with the command: guide < ..." after data file creation.
5. Removed unused variables from appearing in R code function.
6. Always ask to write R code for prediction.
Changes in version 26.6
1. Corrected a bug in LaTeX caption when there is a treatment and an uncensored response variable.
Changes in version 26.5
1. Removed t-statistics and p-values from constant terms in proportional hazards models.
2. Added name of censored survival time in LaTeX tree disgrams.
3. Corrected a bug that surfaced when a treatment variable has more than 2 values.
Changes in version 26.4
1. Set scale factor to 1.1 for chisquare scores of categorical variables with 2-3 levels.
Changes in version 26.3
1. Corrected a bug that affected importance scoring when there are no categorical variables.
Changes in version 26.2
1. Corrected a bug that concerns best polynomial regression with censored response (linear prognostic control).
2. Corrected a bug that gave the wrong 2nd best split variable.
3. Added level names of R variable in output.
Changes in version 26.1
1. Changed cell boundaries for interaction tests to 0.33 and 0.67 quantiles.
Changes in version 26.0
1. Corrected an error to do with linear prognostic control in subgroup identification (R variable).
2. If there are N and F variables and an R variable, multiple linear regression is disallowed.
3. Increased default number of split points searched in N and S variables for importance scoring.
Changes in version 25.4
1. Corrected a bug that affected situations with only 1 S variable and no C and N variables.
Changes in version 25.3
1. Corrected a bug with split point selection for N variables in SELECT.
Changes in version 25.2
1. Corrected a bug to do with candidate split point calculation in SELECT.
2. Added version number in captions of latex tree diagrams.
Changes in version 25.1
1. Corrected a bug to do with chisquare p-values that are too small when prop. hazards model is used.
2. Made improvements in bootstrap selection bias reduction.
3 Reduced default minimum node sample size to 2 (from 3).
Changes in version 25.0
1. Corrected a bug to do with minimum treatment sample sizes when R variable is present.
2. Made cosmetic changes to LaTeX diagrams.
Changes in version 24.9
1. Corrected a bug that occurred when there are R and N variables and a split did not have all the R levels.
Changes in version 24.8
1. Updated LAPACK to 3.7.0 (except for Windows Intel version, which still uses 3.6.1).
2. Made a change to how GUIDE deals with the non-default option of fitting separate regression node models when there are missing values. Previously, it would terminate with a suggestion to use another missing data option. Now it would automatically switch to using the default option.
3. Corrected a bug in LaTeX tree diagram when there is an R variable, the linear prognostic control option is chosen, and variable names have more than 10 characters.
4. Added a change that removes node numbers in LaTeX codes of trees with more than 20 terminal nodes.
5. Added a change that reduces font size of LaTeX trees according to their size.
Changes in version 24.7
1. For regression trees, kept an entry in the input file for storing regressor names, in case GUIDE defaults to piecewise-constant fitting with a non-default missing value option.
2. Fixed a bug in split point selection that causes the Mac NAG version to seg fault.
3. Corrected a bug introduced in 24.6 to do with split set selection.
4. Ensured that split points for ordinal variables are midpoints between data values.
Changes in version 24.6
1. Corrected a bug with split point selection.
Changes in version 24.5
1. Restored linear splits in classification trees (with and without missing values) but not forests.
2. Cosmetic changes to latex output: categorical values thatare too long are abbreviated and extra space beside node numbers is removed.
3. Arranged categorical splits so that fewer categorical values go to left node.
Changes in version 24.4
1. Corrected a bug in importance scoring.
Changes in version 24.3
1. Corrected a bug in GUIDE forest classification when all predictor variables are categorical.
2. Turned off LDA in GUIDE forest classification.
3. Disallow linear splits in classification if any S variable has missing values.
Changes in version 24.2
1. Added a space following node number in LaTeX files.
2. Improved formatting of categorical variable split values.
3. Corrected a bug to do with subgroup identification with R variable.
4. Ensured that all split points are midpoints between successive ordered data values.
Changes in version 24.1
1. Increased length of data values to 200 characters.
2. Corrected a bug concerning node colors in latex diagrams when R variable is present.
3. Removed sideways latex tree option.
Changes in version 24.0
1. Added option to change default minimum no. cases per treatment (requires changes to input files).
2. Reduced number of options at start of program.
3. Corrected an error in comment line of R code.
4. Removed pruning sequence in the output.
Changes in version 23.6
1. Corrected an error in coloring of nodes for subgroup identification.
Changes in version 23.5
1. Corrected an obscure bug .
2. Made cosmetic changes to improve manual.
Changes in version 23.4
1. Cosmetic changes due to revision of manual.
Changes in version 23.3
1. Removed the option of "0" for type of D variable in data conversion.
2. Corrected a bug that occurs if no latex output is requested.
3. Added a hint for model choice if subgroup identification is desired.
Changes in version 23.2
1. Corrected a bug in Intel version with regard to reading and writing doule precision infinity.
2. Upgraded to Lapack 3.6.1.
Changes in version 23.1
1. Corrected a bug to do with non-latex option.
Changes in version 23.0
1. Corrected a bug in variable selection for splitting.
2. Added a catch to prevent splits that result in nodes without prognostic linear predictor when there is a treatment variable and polynomial model is fitted.
Changes in version 22.3
1. Modified structure of printed output in regressor name file for best polynomial model with treatment variable.
Changes in version 22.2
1. Corrected a bug to do with not splitting a node for least squares with N variables.
Changes in version 22.1
1. Corrected a bug that prevented subgroup identification with multiresponse data.
2. Updated Lapack to 3.6.1.
Changes in version 22.0
1. Fixed a bug in split variable selection when there is only one splittable N.
2. Added option for linear prognostic control in randomized experiments.
3. Made font improvements to LaTeX output.
4. Made several small bug fixes.
Changes in version 21.6:
1. Changed to using quartiles to discretize ordinal variables for chi-squared tests in split variable selection (previously, means and SDs were used).
Changes in version 21.5:
1. Improved unbiasedness of importance scores when there is a mix of ordinal and categorical variables.
2. Corrected an error in converting data files to other formats when there are header lines.
3. Corrected a bug in interaction splits.
Changes in version 21.4:
1. Corrected some errors in importance scores for highly non-uniform data.
2. Changed ranks of importance scores to midranks in case of ties.
Changes in version 21.3:
1. Modified Wilson-Hilferty approximation to increase accuracy.
Changes in version 21.2:
1. Corrected a bug that affected multiresponse option when one or more D variables is completely missing in a node.
2. Change sample size text in latex files to italics.
Changes in version 21.1:
1. Stopped NAG compiler from printing underflow warnings.
2. Updated Lapack to 3.6.0.
3. Corrected a bug in output of min and max values for variables with all values missing.
4. Changed a default option for multiresponse models.
Changes in version 21.0:
1. Allowed data files to contain header lines; requires new input and description files.
2. Corrected a bug in importance scores when there are missing values.
3. Abbreviated variable names longer than 10 characters in latex output.
4. Corrected error in predicted values of training samples in ensemble methods.
Changes in version 20.5:
1. Added line for no pruning in output file if this option is chosen.
2. Corrected a bug to do with piecewise polynomial option when variable names are too long.
3. Increased output lengths of variable names from 20 to 60 characters.
Changes in version 20.4:
1. Fixed a bug in least squares fit when dependent variable values are constant.
2. Added output on pruning alpha sequence.
3. Added compiler info in output.
Changes in version 20.3:
1. Added a cosmetic change to batch input log.
2. Changed default option for missing values to mean imputation for
least-squares, quantile, Poisson and prop. hazards models.
3. Added some output statements to show progress.
4. Fixed an old bug in trim_nodes (only for classification with plurality rule).
Changes in version 20.2:
1. Corrected a bug to do with splitting on categorical variables with
many levels.
2. Corrected a bug to do with Poisson regression when there are
numerous zero response values.
Changes in version 20.1:
1. Corrected an I/O bug when a weight variable is present.
Changes in version 20.0:
1. Upgraded from Lapack 3.4.2 to 3.5.0.
2. Corrected a bug with input file for multiple responses when none is
missing.
3. Added ability to fit piecewise multiple linear proportional hazards
models with treatment variable.
4. Added ability to fit piecewise simple ANCOVA models with treatment
variable.
5. Added option to use mean imputation in quantile, Poisson, and
proportional hazards models.
6. Added option for propensity score grouping and causal modeling.
Changes in version 19.0:
1. Corrected a bug that potentially affected all applications with more than one categorical predictor variable.
2. Corrected a bug that affected applications with multiresponse data that do not use exhaustive search for split points.
2. Modified the procedure for DIF identification when p-values are 0.
Changes in version 18.7:
1. Corrected an error in linear interpolation of baseline cumulative hazard function.
Changes in version 18.6:
1. Corrected a bug to do with best simple polynomial model when there are missing values.
2. Corrected a bug to do with missing values in created products and powers.
3. Allowed tabular output to adapt to length of variable and class names.
4. Added values of log baseline cumulative hazard and median survival time to optional output for proportional hazards models.
Changes in version 18.5:
1. Corrected an error in a prompt for importance scoring.
2. Enabled variable names to use non-alphanumeric characters as long as 1st character is alphabetical. The characters #, %, {, }, and space (blank) are automatically replaced by dots.
3. Enabled any character to appear in a data value as long as it is enclosed in quotes.
Changes in version 18.4:
1. Removed a twice repeated prompt for default options.
2. Made outputting a file with fitted values the default.
Changes in version 18.3:
1. Corrected a bug to do with LDA for multi-response with treatment data.
Changes in version 18.2:
1. Corrected a bug that occurs when all observations on a N or F variable are missing in a node.
Changes in version 18.1:
1. Added option to perform LDA in each node for multiresponse data
with a treatment variable.
2. Removed option to choose max proportion of variance for PCA. Now it
is fixed at 0.95.
3. Corrected a bug in piecewise simple linear regression when there
are missing values.
4. Changed a call to lapack gesdd to gesvd.
Changes in version 18.0:
1. Added default options to all models to reduce number of prompts.
2. Corrected a bug in boundary values for split variable selection.
3. Corrected a bug in channeling of missing values when there are none
in training sample and split is due to an interaction.
4. Made numerous asthetic improvements to latex outout.
5. Added capability to perform differential item functioning.
Changes in version 17.11:
1. Corrected a bug in direction of missing values in when splitting is
due to an interaction.
Changes in version 17.10:
1. Corrected a bug affecting classification and non-multiresponse
problems inadvertently introduced in previous revision.
2. Corrected caption in latex output.
Changes in version 17.9:
1. Added option to allow missing dependent variables in multiresponse regression.
Changes in version 17.8:
1. Corrected a mistaken trap that disallowed R variables for survival data
Changes in version 17.7:
1. Corrected a bug in latex output when there are many values in a split set.
Changes in version 17.6:
1. Corrected a bug in the 64-bit linux version when there are more than 2 treatments.
2. Corrected a recently introduced bug in latex output.
Changes in version 17.5:
1. Increased the node size for multiresponse and longitudinal data when their number is small.
2. Changed default option for linear regression to impute with means if there are missing values.
3. Allowed stepwise regression to continue if number of variables exceeds sample size.
4. Changed some defaults for forest: #variables selected = #variables/3, mindat = max(5,n/200).
Changes in version 17.4:
1. Allowed choice of alternative models for least-squares non-constant fits when training data have no missing values.
2. Corrected a bug in LDA for categorical splits with more than 12 levels.
3. Changed routine for computation of F cdf to fcdf to avoid difficulties with large dfs.
Changes in version 17.3:
1. Changed DIF scores to p-values.
Changes in version 17.2:
1. Corrected a bug to do with reading double precision numbers using ltxunit.
2. Allowed program to switch automatically to fitting constant models when number of complete cases are too few.
3. Allowed periods in variable names.
4. Corrected a bug to do with mean/mode imputation of S variables.
5. Revised method of computing DIF scores.
Changes in version 17.1:
1. Added default option to use PCA for variable selection in multiresponse regression.
2. Cleaned up R code output for multiresponse and longitudinal data options (5 & 6).
3. Corrected a bug that affected data set creation (option 3) when there is more than one dependent variable.
4. Removed option to normalize D variables for longitudinal data (option 6).
5. Corrected a bug that got into the previous version.
Changes in version 17.0:
1. Added option for subgroup identification with multiple dependent variables.
2. Corrected some bugs to do with split point and split set selection.
Changes in version 16.4:
1. Corrected some bugs that disabled pruning with test samples for classification.
2. Corrected a bug affecting variable importance scoring.
Changes in version 16.3:
1. Ensured that categorical values not present in a node are not shown in splits in latex and text diagrams.
2. Removed option to overwrite existing files in data conversion.
Changes in version 16.2:
1. Cleaned up text, latex and R codes for tree structures to remove redundancies due to missing values.
2. Allowed automatic switching to piecewise-constant models if number of complete cases is too small.
3. Removed quotes in last instruction for using batch file.
Changes in version 16.1:
1. Added 3 options for missing values in N and F variables in least squares regression.
2. Changed splits on categorical variables: if a node is split on a C variable, the smaller subnode is placed on the left side. This causes all unseen categories to go to the larger (right) subnode.
3. For option 1, where a constant is fitted to obs with missing values, the constant is changed to the mean of the missing obs.
4. Corrected a bug in R code for prediction function.
5. Added a column of observed response values to predicted value output file for ensemble methods and changed the predicted value column heading from "Predicted" to "predicted".
6. Added double quotes around values of categorical variables in outputs.
Changes in version 16.0:
1. Simplified dialog when there is a weight variable. This necessitates input files to be recreated.
Changes in version 15.15:
1. Made cosmetic changes to node numbers in LaTeX output.
2. Corrected some labeling errors in subgroup identification options.
Changes in version 15.14:
1. Corrected a bug to do with importance ranking batch file creation for non least squares problems.
2. Added a default option to switch to piecewise constant trees if there are missing regressor values in linear regression.
Changes in version 15.13:
1. Corrected a bug that concerns least-squares simple polynomial models
2. Reduced the number of available models to Gs and Gi for subgroup identification.
3. Added quotes around character strings for nominal attributes in
ARFF formatted data files. If the total length of the values of an
attribute is more than 200 characters (including commas and quotes),
it is declared as string; otherwise it is declared as nominal.
Changes in version 15.12:
1. Corrected a bug that affected stepwise regression.
Changes in version 15.11:
1. Corrected a bug with missing censored survival times.
Changes in version 15.10:
1. Corrected a bug with smallest uncensored survival time.
2. For importance scoring, changed the default expected number of
noise variables found important to be 0.05 of the total number of
noise variables.
3. Corrected an error in figure caption of latex tree for proportional hazards models.
Changes in version 15.9:
1. Added option to print out regression coefficients to a file for
proportional hazards models.
2. Corrected a bug in computation of smallest uncensored time when weights are present.
3. Corrected a bug in generation of input files for importance scoring.
Changes in version 15.8:
1. Added a check that number of observations does not exceed 2^32.
2. Corrected a bug with no crossing of treatment and factors in PRELS.
Changes in version 15.7:
1. Corrected a bug in sideways LaTeX output.
2. Increased horizontal spacing of nodes in LaTeX trees.
3. Corrected some bugs in differential treatment options.
Changes in version 15.6.1:
1. Corrected a bug in multiresponse option when there is only one numeric variable.
Changes in version 15.6:
1. Corrected a bug in batch mode when non-default number of CV folds is used.
2. Increased default values of min_dat (constant regression and simple
classification) and lev_splits (single trees).
3. Changed minimum value from Wilson-Hilferty output to 0.001
(previously 0) to avoid problems with importance ranking when all
variables are insignificant.
4. Allowed continued splitting for importance scoring if interaction
tests fail (previously, the node is made terminal).
5. Changed the default values for max number of split levels and min
number of observations in each node.
6. Added capability for item response data.
Changes in version 15.5:
1. Corrected a bug in LaTeX output.
2. Reinstated choice of univariate or bivariate kernel and nearest-neighbor fits.
3. Added weights to multiresponse option.
4. Improved linear split algorithm to switch to univariate splits when
crimcoords are too large or too small.
Changes in version 15.4:
1. Added the Li-Martin method to approximate an F quantile with a
chi-square quantile (used only in Gi method).
2. Corrected a bug when df=0 in Gi method.
Changes in version 15.3:
1. Corrected some bugs and changed the format of LaTeX trees.
Changes in version 15.2:
1. Fixed a bug concerning spaces in values of treatment variables.
2. Added a trap to prevent building classification trees if a
treatment (R) variable is present.
Changes in version 15.1:
1. Improved linear split option by searching over all pairs.
2. Added a linear split option to bagged GUIDE.
3. Corrected a bug in pruned tree for least median of squares regression.
4. Added an option to show #misclassfified/sample size in LaTeX trees.
5. Corrected some bugs in display of LaTeX tree diagrams
Changes in version 15.0:
1. Improved splits on missing values.
2. Corrected a bug in Ancova models.
3. Corrected a bug in R code.
4. Added checks to ensure that all treatment values are present in all splits.
5. Added more information in LaTeX tree diagrams.
Changes in version 14.2:
1. Added a restriction to at most 10 groups for multiresponse and longitudinal data.
2. Corrected a bug in R code for multiresponse data.
Changes in version 14.1:
1. Increased length of character strings in output file of predicted values.
2. Corrected a bug in latex output when categorical values contain spaces.
3. Added node sample sizes to latex trees.
4. Corrected a bug in R code when B variables have no missing values.
Changes in version 14.0:
1. Revised options for applications with treatment variables.
2. Revised latex output.
3. Added option to produce R code for prediction of future cases.
Changes in version 13.4:
1. Modified the LaTeX output to include class sizes in each terminal
node of a classification tree and to say whether the model uses
estimated, equal or specified priors and unit or unequal
misclassification costs.
2. Fixed a bug in data reformatting option.
3. Added a new subgroup identification method.
Changes in version 13.3.2:
1. Removed a redundant and erroneous prompt for regression with weights.
Changes in version 13.3:
1. Corrected a bug in stepwise regression option that was introduced in a prior version.
2. Added header to fitted probability file for kernel density option in classification
3. Added "pstree[treemode=D]"
Changes in version 13.2:
1. Corrected a bug that caused an infinite loop in the stepwise regression option.
Changes in version 13.1:
1. Corrected the algorithm for linear splits.
2. Lowered default value of mindat to 2% of sample size.
3. Made some cosmetic changes to the output.
Changes in version 13.0:
1. Added a classification option for bivariate linear splits.
2. Corrected a bug with linear splits on missing value.
3. Corrected a mistake in output description file for importance scores.
Changes in version 12.6:
1. Allowed N and F variables with R variable.
Changes in version 12.5:
1. Changed value of contab chi-squared statistic to 0 (instead of 1) in case or errors.
2. Avoided computation of p-values; used Wilson-Hilferty in all cases.
3. If an R variable is present, made sure that each subnode from a
split has at least two R levels.
4. Contab chi-squared tests are computed for each level of the R variable, if present.
5. For the Gi (option 2) R method, the test statistic is the maximum
of exponential quantiles.
Changes in version 12.4:
1. Corrected a bug that affected survival data.
Changes in version 12.3:
1. Corrected a bug involving option 3 for treatment variable.
Changes in version 12.2:
1. Corrected several bugs that were introduced around version 12.0.
2. Corrected a bug in lsdev in poisson.f90 regarding NFIT for stepwise regression.
Changes in version 12.1:
1. Broke ties in chi-squared values and raised the ceiling for max
value for split variable selection.
2. Corrected an error in output of intermediate node sample means.
3. Added output info about missing values in splits.
Changes in version 12.0:
1. Corrected several bugs in Poisson and proportional hazards models.
2. Disallowed "N", "F" and "B" variables when an "R" variable is present.
3. Changed default in LaTeX trees to not print node numbers.
4. Changed search for splits on categorical variables to exhaustive
search for 9 or fewer categories for all except classification and
quantile regression.
Changes in version 11.7:
1. Skipped bootstrap calibration if there are only B and no S variables.
2. Skipped bootstrap calibration for constant models.
3. Corrected a bug introduced in 11.6.
4. Reverted back to default choice of "1" for "r" variables.
5. Updated lapack and lapack95 libraries to 3.4.1 and 3.0, resp.
Changes in version 11.6:
1. Corrected several bugs that affected Poisson and proportional hazards models.
2. Corrected bug that left out bootstrap calibration.
3. Changed the maximum length of character data entries to 50.
4. Added several traps for floating over and underflows.
Changes in version 11.5:
1. Fixed a bug about file name of predicted probabilities for kernel classification.
2. Fixed a bug in batch file creation for longitudinal data.
Changes in version 11.4:
1. Made non-exhaustive search the default.
2. Corrected some inconsistencies with regard to R variables.
3. Changed the license to BSD.
Changes in version 11.3:
1. Corrected a bug that affected missing D values in multiresponse data.
2. Changed LaTeX output to indicate where missing values go.
3. Changed LaTeX output for multiresponse and longitudinal data.
4. Added more options for treatment (R) variables.
Changes in version 11.2:
1. Corrected a bug involving names of variables that are too long in
output for polynomial models.
Changes in version 11.1:
1. Re-introduced choice of mean or median based CV pruning.
2. Added ability to deal with unbalanced observation times in longitudinal models.
3. Corrected some bugs in proportional hazards models when there are
cases with censored survival times less than smallest uncensored
survival time.
4. Made median (instead of mean) CV estimate the default for pruning
for proportional hazards models.
5. Numerous cosmetic changes.
Changes in version 10.6:
1. Corrected a bug that mixed itpos with izpos in proportional hazards models.
2. Required data for fitting proportional hazards models to have some
censored and some uncensored data.
3. Automatically discounted cases with censored survival times less
than smallest uncensored time.
Changes in version 10.5:
1. Stopped R variables from being used to split the nodes.
2. Made a node terminal during LDA splits if the LDA routine returns an error.
Changes in version 10.4:
1. Changed the default grouping for multiple dependent variables to be
ungrouped; up to 20 groups are allowed.
2. Corrected a mistake in the LaTeX figure caption.
3. Added two-sided p-values beside t-statistics in output.
Changes in version 10.3:
1. Allowed use of N variables in importance score option.
Changes in version 10.2:
1. Changed the output of importance scores.
2. Corrected a consistency problem in importance scores for quantile regression.
3. Added printing of D variable name(s) in output file.
Changes in version 10.1:
1. Corrected a bug in computation of linear discriminant splits.
Changes in version 10.0:
1. Corrected a bug in computation of chi-squared test statistic.
2. Corrected a bug in computation of linear splits.
3. Added "R" variable type to indicate treatment variables.
4. Added option for multi-response and longitudinal data.
5. Added option to draw LaTeX trees sidways.
Changes in version 9.4:
1. Corrected another bug concerning splits on missing predictor values.
Changes in version 9.3:
1. Corrected a bug concerning splits on missing predictor values.
2. Allowed importance scoring for piecewise-linear regression models.
Changes in version 9.2:
1. Corrected a bug that occurs with missing values in the D variable for classification.
Changes in version 9.1:
1. Corrected a bug that occurs with missing values in the D variable for regression.
Changes in version 9.0:
1. Changed the way missing values for ordered variables are
handled. Now if there are missing values in a split variable, a split
on missingness is one of the splits considered. If a split is on a
non-missing value, observations with missing values are channeled
through the split by replacing them with the mean of the non-missing
values in the split variable.
2. Added an option for multi-response dependent variables.
3. Increased maximum length of character strings to 80.
Changes in version 8.4:
1. Corrected a bug (introduced in ver.8.2) in calculation of
chi-square probabilities. The Wilson-Hilferty approximation is used
for very small p-values.
2. Fixed a bug in interaction splits on one categorical and one ordered variable.
Changes in version 8.3:
1. Fixed some bugs with batch file creation and execution.
Changes in version 8.2:
1. Corrected a bug that affected batch operation for importance scoring.
2. Changed to direct chi-square conversion instead of Wilson-Hilferty
approximation if df < 10.
3. Corrected bug in printing of importance scores.
Changes in version 8.1:
1. Corrected a bug triggered by missing values in the dependent variable.
Changes in version 8.0:
1. Added option for bagging and random forest ensembles.
2. Changed method of dealing with missing values. Missing values in
ordered variables used for splitting are treated as negative
infinity and observations are predicted with node sample mean if
there are missing values in regressors.
3. Added an option to obtain importance scores in two applications of GUIDE.
4. Added linear splits for classification trees.
5. Better control of interaction and linear split searches.
6. Improved split point and value set selection in interaction splits.
7. Fixed many small bugs.
Changes in version 7.9:
1. Added kernel and nearest-neighbor models for classification trees
Changes in version 7.8:
1. Corrected more bugs in polynomial regression option
Changes in version 7.7:
1. Corrected a bug in polynomial regression option
Changes in version 7.6:
1. Added the option to output a separate file containing scaled
importance scores and variable names.
2. Implemented an improved variable split selection method for data
with missing values.
Changes in version 7.5:
1. Corrected a bug in option #3 when there is no D variable in description file
2. Corrected a bug that affects formatting of ARFF data (option 3).
Changes in version 7.4:
1. Corrected a bug in linear splits when some observations have
missing values in the dependent variable
2. Added the option to draw LaTeX classification trees without node colors
Changes in version 7.3:
1. Require at least one B variable for stepwise simple ANCOVA
Changes in version 7.2:
1. Added checks to prevent over-writing of files
2. Changed default SE-rule to 0 for proportional hazards models
3. Fixed a (cosmetic) bug in the output of split values for
categorical variables with many values
4. Changed node information in LaTeX figures for classification trees
5. Eliminated printout of variable roles in importance ranking output
6. Increased the number of colored leaf nodes to 18 for classification
tree LateX diagrams
Changes in version 7.1:
1. Fixed a bug introduced in 7.0 that affected stepwise and polynomial models
2. Fixed a bug involving importance score cut-off
Changes in version 7.0:
1. Added kernel and nearest-neigbor models for classification trees
2. Added splits on linear combinations of two variables for classification
3. Improved the way variables are selected for splits and the way
split points are selected
4. Added coloring and other cosmetic features to LaTeX output
5. Improved the importance ranking method
Changes in version 6.2:
1. Changed algorithm for splits on categorical variables in classification
2. Added new data formats: C4.5 and ARFF
Changes in version 6.1:
1. Fixed a bug that wrote messages to fort file during batch file creation
2. Fixed a bug in calculation of estimated class priors when there is
a weight variable
Changes in version 6.0.1:
1. Fixed a bug that miscalculated test sample misclassification cost
when some classes are absent in the test sample
Changes in version 6.0:
1. Added classification tree capability
2. Added random forest capability
3. Made some changes to split selection algorithms
Changes in version 5.3:
1. Corrected a bug that affected latex files when there are no
categorical variables
Changes in version 5.2:
1. Changed variable selection for interaction tests to use two-levels
of splits
2. Reverted to true stepwise and ancova fitting for split selection
for these options
Changes in version 5.1:
1. Fixed a bug caused by missing values while reading data
Changes in version 5.0:
1. Improved approach to interaction tests to account for their number
2. Changed default SE for constant fit to 0
3. Added option for variable importance scores and for identification
of unimportant variables
Changes in version 4.4:
1. Fixed a bug in split variable selection routine that affected "s"
variables
Changes in version 4.3:
1. Made the program output progress after each CV iteration.
2. Extended the length of variable names from 8 to 10 for data
conversion to SAS.
3. Added code for PROC GLM and PROC REG if SAS output is selected.
4. Added a suggestion to use white or yellow colors if leaf node
numbers are selected.
5. Allow execution to continue if an excluded variable contains values
longer than 20 characters.
6. Allow option 3 (data conversion) to proceed if there are data
values longer than 20 characters, with a warning that they are
truncated.
Changes in version 4.2:
1. Corrected a bug that affects relative risk regression when some D
or T variables have missing values.
2. Added an option for weighted or unweighted error estimation when a
weight variable exists. The default is unweighted.
3. Changed from zero-truncated normal to 1-df chi-square statistic for
split variable selection.
Changes in version 4.1:
1. Corrected a bug in output for truncation type 2.
2. Reverted default value of mindat to 0 for stepwise option.
Changes in version 4.0:
1. Added an option for least median of squares (robust) regression for
multiple and best simple linear fitting.
2. Increased amount of information output to (optional) file containing
names and regression coefficients in leaf nodes.
3. Changed absolute z to truncated z for variable selection and
bootstrap bias correction.
4. Added an option to save multiple regression coefs in a separate file.
5. Added an option to fit piecewise least-squares multiple linear
regression without intercept terms.
6. Added an option to not truncate, truncate fitted values, or
truncate x-values before prediction.
7. Added option to drop insignificant leading powers in polynomial models.
8. Added option for stepwise simple linear ANCOVA.
9. Allowed stepwise linear option to use "c" or "b" categorical variables.
10. Fixed a bug that affected datasets with missing values but without
weight variables.
11. Fixed a bug in split point selection to use total mean deviance
instead of total deviance.
12. Added option for all subsets regression.
13. Improved search over split points for non-exhaustive search.
14. Changed option for non-exhaustive search from fraction to number.
Changes in version 3.1:
1. For stepwise and polynomial regression, added option to write the
leaf node number and the selected regressors into a separate file.
2. Added option for colored leaf nodes and improved font sizes in
LaTeX tree diagrams.
3. Added optional file of node IDs and fitted values of a column to
indicate training observation.
4. Added R-squared value for tree model (least-squares fit only).
5. Fixed bug in interaction test. Now preference for "c" variable is
given on to multiple linear regression.
6. Increased number of trees in pruning sequence.