Rust API¶

The Rust side is split across crates:

forestfire-data
forestfire-core
forestfire-inference

Core training entrypoint¶

use forestfire_core::{train, TrainConfig};

let model = train(&table, TrainConfig::default())?;
let optimized = model.optimize_inference(Some(1))?;

TrainConfig::histogram_bins controls the numeric histogram resolution used during fitting. Leave it as None to reuse the incoming table bins, or set Some(NumericBins::Auto) / Some(NumericBins::Fixed(...)) to rebin the training view before split search.

When constructing TrainConfig explicitly, prefer starting from ..TrainConfig::default() unless you really need to spell out every field. That keeps examples resilient as new configuration fields are added.

The intended Rust lifecycle is:

build a training table through forestfire-data
train a semantic Model
use the semantic model for introspection and canonical serialization
derive an OptimizedModel when prediction speed matters
optionally snapshot the optimized runtime as a compiled artifact

Important core types¶

TrainConfig
TrainAlgorithm
Task
TreeType
SplitStrategy
BuilderStrategy
Criterion
Model
OptimizedModel

TrainConfig::split_strategy selects the split family:

SplitStrategy::AxisAligned
SplitStrategy::Oblique

Current support:

AxisAligned: all supported tree families
Oblique: dt, rf, and gbm when tree_type is Cart or Randomized

TrainConfig::builder selects the tree-construction strategy:

BuilderStrategy::Greedy
BuilderStrategy::Lookahead
BuilderStrategy::Beam
BuilderStrategy::Optimal

Related TrainConfig fields:

lookahead_depth
lookahead_top_k
lookahead_weight
beam_width

Those control lookahead and beam. optimal ignores them and is instead bounded by the normal tree limits plus canary filtering.

For gradient boosting specifically:

TrainConfig::canary_filter controls the ordinary root-stage canary window
TrainConfig::boosting_first_stage_retry_filter controls an optional retry window for stage 0
its default is Some(CanaryFilter::TopN(1)), which preserves strict top-1 behavior unless you widen it explicitly

Core capabilities¶

forestfire-core currently provides:

unified training dispatch
decision trees, random forests, and gradient boosting
per-row sample weights for all algorithms and tasks
multi-target regression (2-D y with n_targets > 1 columns, dt only)
MDI feature importances on Model and OptimizedModel
optimized inference runtimes
JSON IR serialization and deserialization
tree introspection metadata
compiled optimized runtime artifacts
used-feature introspection for semantic and optimized models

Useful runtime-oriented methods:

Model::used_feature_indices()
Model::used_feature_count()
Model::optimize_inference(...)
Model::optimize_inference_with_missing_features(...)
OptimizedModel::used_feature_indices()
OptimizedModel::used_feature_count()
OptimizedModel::serialize_compiled()
OptimizedModel::deserialize_compiled(...)

Optimized models still accept the full semantic feature space on input, but they lower the runtime into a compact projected feature space internally so batch preprocessing only touches the columns that appear in splits.

Inference inputs can also contain missing values through the normal raw-input paths, including floating-point NaN values and polars nulls when the feature is available.

That means there are really three layers to keep in mind:

Model: semantic meaning
OptimizedModel: lowered runtime
compiled artifact: serialized lowered runtime plus semantic IR

Example: semantic model vs optimized runtime¶

use forestfire_core::{train, TrainConfig};
use forestfire_data::Table;

let table = Table::new(
    vec![
        vec![0.0, 0.0, 10.0],
        vec![0.0, 1.0, 10.0],
        vec![1.0, 0.0, 10.0],
        vec![1.0, 1.0, 10.0],
    ],
    vec![0.0, 0.0, 0.0, 1.0],
)?;

let model = train(&table, TrainConfig::default())?;
let optimized = model.optimize_inference(Some(1))?;

println!("{:?}", model.used_feature_indices());
println!("{:?}", optimized.used_feature_indices());

Those used-feature methods reflect the semantic split structure, and the optimized runtime uses them to project inference input before scoring.

Example: compiled optimized artifact¶

let optimized = model.optimize_inference(Some(1))?;
let bytes = optimized.serialize_compiled()?;
let restored = forestfire_core::OptimizedModel::deserialize_compiled(&bytes, Some(1))?;

Use this when you want to preserve the lowered runtime layout across reloads instead of recomputing it from the semantic model each time.

Example: selective missing checks in the optimized runtime¶

let optimized = model.optimize_inference_with_missing_features(
    Some(1),
    Some(vec![0, 3, 7]),
)?;

Pass None to preserve missing-aware behavior for every used feature. Pass an explicit feature-index list only when your inference pipeline guarantees that other columns will never be missing.

Data crate¶

The forestfire-data crate provides the training-table abstractions and preprocessing/binned storage used by the learners.

Key types:

Table — primary training container (dense or sparse)
WeightedTable<'a> — wraps any &dyn TableAccess and overrides sample_weight with per-row weights; compose with any other table adapter
MultiTargetDenseTable<'a> — wraps a base table and attaches extra target columns; overrides n_targets() and target_value_at() so the regressor produces multi-target leaves

Example:

use forestfire_data::{Table, WeightedTable, MultiTargetDenseTable};

let base = Table::new(x_rows, first_target_col)?;
let weighted = WeightedTable::new(&base, weights);
let mt = MultiTargetDenseTable::new(&base, extra_targets);

Important point:

Table is the training-side abstraction
inference can use raw rows, named columns, sparse binary columns, and polars data directly through forestfire-core

That mirrors the Python surface: training normalizes through tables, while prediction accepts user-facing inference inputs directly.

Inference crate¶

The forestfire-inference crate contains inference-focused runtime utilities on top of the model IR and compiled runtimes.

Rust usage notes¶

Use the core and data crates directly from the workspace today. The library is still early-stage, so the repository state should be treated as the source of truth for the public surface.

Publishing order¶

The crates should be published in dependency order:

forestfire-data
forestfire-core

forestfire-inference and the example crate stay workspace-local and are not published.