Custom Split Decision Tree (CSDT)

Flexible Decision Tree Framework

Introduction

The Custom Split Decision Tree (CSDT) is a flexible and customizable decision tree algorithm that allows users to define their own criteria for data splitting and prediction. Unlike traditional decision trees, CSDT gives users the ability to tailor these critical functionalities, enabling optimized solutions for a variety of problem types.

Advantages of CSDT

Components of CSDT

1. Node Class

A node is the fundamental unit of a decision tree. The tree structure is built upon these nodes.

Node Attributes:

2. CSDT Class

The CSDT class manages the entire decision tree. It handles training, prediction, and visualization of the tree.

Attributes:

3. User-Defined Split and Prediction Functions

The standout feature of CSDT is the ability for users to define custom split and prediction functions.

    def calculate_mse(y, predictions,initial_solution):
        return mean_squared_error(y, predictions)
    
    def return_mean(y, x):
        return y.mean(axis=0)
    

CSDT in Practice

    tree = CSDT(
        max_depth=10,
        min_samples_leaf=5,
        min_samples_split=10,
        split_criteria=lambda y, x,initial_solutions: split_criteria_with_methods(y, x, pred=return_mean, split_criteria=calculate_mse,initial_solutions = initial_solutions),
        use_hashmaps = True,
        use_initial_solution = False
    )
    tree.fit(features_df, labels_df)
    

Conclusion

CSDT stands apart from traditional decision trees by offering users full control over data splitting and prediction processes. This flexibility makes it a powerful tool for specialized tasks, such as multi-target regression, custom error metrics, and domain-specific applications. Beyond being a machine learning model, CSDT serves as a platform for developing custom solutions tailored to unique datasets and problem types.