ForestFire¶

ForestFire

ForestFire is a tree-learning library with a Rust core and a Python API.

It is built around three ideas:

Those choices are deliberate:

the unified training surface keeps the public API stable even as the library grows from single trees into forests and boosting
the Table abstraction centralizes preprocessing, binning, sparse handling, and canary generation so the learners do not each re-implement data plumbing
the explicit IR separates model semantics from trainer internals, which is what makes optimized inference, introspection, and serialization all line up with the same underlying meaning

What exists today¶

ForestFire is intentionally opinionated in a few places:

it prefers a small number of strong public concepts over many learner-specific entrypoints
it treats training-time preprocessing as part of the model contract, not an invisible side effect
it uses canaries as an in-training stopping signal instead of relying on post-hoc pruning
it exposes optimized inference as a separate runtime view rather than pretending the training structure is automatically the best scoring structure

Getting Started: install and first training runs
Design And Architecture: the core abstractions and why they exist
Canary Strategy: why canaries exist, what they replace, and how they differ across DT, RF, and GBM
Runtime And IR: inference lowering, serialization, and execution design
Intermediate Representation: the semantic model package, schema, and portability story
Python API: Python surface and input handling
Rust API: Rust crates and training entrypoints
Examples: end-to-end workflows from training through reload and batch scoring
Training: algorithms, parameters, and stopping behavior
Builders:
Lookahead Builder: shortlist-based future-aware split ranking
Beam Builder: width-limited continuation search for split ranking
Optimal Builder: exhaustive subtree search with canary-driven stopping
Categorical Strategies: dummy, target, and fisher categorical handling through the native training API
Oblique Splits: pairwise linear splits, weight computation, candidate competition, and when to use them
Models And Introspection: prediction, optimization, serialization, and tree inspection
Benchmarks: benchmark tasks and artifact locations
Next Steps: forward-looking implementation and optimization notes
Releasing: Python and Cargo release flows