Skip to content

Rust API

The Rust side is split across crates:

  • forestfire-data
  • forestfire-core
  • forestfire-inference

Core training entrypoint

use forestfire_core::{train, TrainConfig};

let model = train(&table, TrainConfig::default())?;
let optimized = model.optimize_inference(Some(1))?;

TrainConfig::histogram_bins controls the numeric histogram resolution used during fitting. Leave it as None to reuse the incoming table bins, or set Some(NumericBins::Auto) / Some(NumericBins::Fixed(...)) to rebin the training view before split search.

When constructing TrainConfig explicitly, prefer starting from ..TrainConfig::default() unless you really need to spell out every field. That keeps examples resilient as new configuration fields are added.

The intended Rust lifecycle is:

  1. build a training table through forestfire-data
  2. train a semantic Model
  3. use the semantic model for introspection and canonical serialization
  4. derive an OptimizedModel when prediction speed matters
  5. optionally snapshot the optimized runtime as a compiled artifact

Important core types

  • TrainConfig
  • TrainAlgorithm
  • Task
  • TreeType
  • SplitStrategy
  • BuilderStrategy
  • Criterion
  • Model
  • OptimizedModel

TrainConfig::split_strategy selects the split family:

  • SplitStrategy::AxisAligned
  • SplitStrategy::Oblique

Current support:

  • AxisAligned: all supported tree families
  • Oblique: dt, rf, and gbm when tree_type is Cart or Randomized

TrainConfig::builder selects the tree-construction strategy:

  • BuilderStrategy::Greedy
  • BuilderStrategy::Lookahead
  • BuilderStrategy::Beam
  • BuilderStrategy::Optimal

Related TrainConfig fields:

  • lookahead_depth
  • lookahead_top_k
  • lookahead_weight
  • beam_width

Those control lookahead and beam. optimal ignores them and is instead bounded by the normal tree limits plus canary filtering.

For gradient boosting specifically:

  • TrainConfig::canary_filter controls the ordinary root-stage canary window
  • TrainConfig::boosting_first_stage_retry_filter controls an optional retry window for stage 0
  • its default is Some(CanaryFilter::TopN(1)), which preserves strict top-1 behavior unless you widen it explicitly

Core capabilities

forestfire-core currently provides:

  • unified training dispatch
  • decision trees, random forests, and gradient boosting
  • per-row sample weights for all algorithms and tasks
  • multi-target regression (2-D y with n_targets > 1 columns, dt only)
  • MDI feature importances on Model and OptimizedModel
  • optimized inference runtimes
  • JSON IR serialization and deserialization
  • tree introspection metadata
  • compiled optimized runtime artifacts
  • used-feature introspection for semantic and optimized models

Useful runtime-oriented methods:

  • Model::used_feature_indices()
  • Model::used_feature_count()
  • Model::optimize_inference(...)
  • Model::optimize_inference_with_missing_features(...)
  • OptimizedModel::used_feature_indices()
  • OptimizedModel::used_feature_count()
  • OptimizedModel::serialize_compiled()
  • OptimizedModel::deserialize_compiled(...)

Optimized models still accept the full semantic feature space on input, but they lower the runtime into a compact projected feature space internally so batch preprocessing only touches the columns that appear in splits.

Inference inputs can also contain missing values through the normal raw-input paths, including floating-point NaN values and polars nulls when the feature is available.

That means there are really three layers to keep in mind:

  • Model: semantic meaning
  • OptimizedModel: lowered runtime
  • compiled artifact: serialized lowered runtime plus semantic IR

Example: semantic model vs optimized runtime

use forestfire_core::{train, TrainConfig};
use forestfire_data::Table;

let table = Table::new(
    vec![
        vec![0.0, 0.0, 10.0],
        vec![0.0, 1.0, 10.0],
        vec![1.0, 0.0, 10.0],
        vec![1.0, 1.0, 10.0],
    ],
    vec![0.0, 0.0, 0.0, 1.0],
)?;

let model = train(&table, TrainConfig::default())?;
let optimized = model.optimize_inference(Some(1))?;

println!("{:?}", model.used_feature_indices());
println!("{:?}", optimized.used_feature_indices());

Those used-feature methods reflect the semantic split structure, and the optimized runtime uses them to project inference input before scoring.

Example: compiled optimized artifact

let optimized = model.optimize_inference(Some(1))?;
let bytes = optimized.serialize_compiled()?;
let restored = forestfire_core::OptimizedModel::deserialize_compiled(&bytes, Some(1))?;

Use this when you want to preserve the lowered runtime layout across reloads instead of recomputing it from the semantic model each time.

Example: selective missing checks in the optimized runtime

let optimized = model.optimize_inference_with_missing_features(
    Some(1),
    Some(vec![0, 3, 7]),
)?;

Pass None to preserve missing-aware behavior for every used feature. Pass an explicit feature-index list only when your inference pipeline guarantees that other columns will never be missing.

Data crate

The forestfire-data crate provides the training-table abstractions and preprocessing/binned storage used by the learners.

Key types:

  • Table — primary training container (dense or sparse)
  • WeightedTable<'a> — wraps any &dyn TableAccess and overrides sample_weight with per-row weights; compose with any other table adapter
  • MultiTargetDenseTable<'a> — wraps a base table and attaches extra target columns; overrides n_targets() and target_value_at() so the regressor produces multi-target leaves

Example:

use forestfire_data::{Table, WeightedTable, MultiTargetDenseTable};

let base = Table::new(x_rows, first_target_col)?;
let weighted = WeightedTable::new(&base, weights);
let mt = MultiTargetDenseTable::new(&base, extra_targets);

Important point:

  • Table is the training-side abstraction
  • inference can use raw rows, named columns, sparse binary columns, and polars data directly through forestfire-core

That mirrors the Python surface: training normalizes through tables, while prediction accepts user-facing inference inputs directly.

Inference crate

The forestfire-inference crate contains inference-focused runtime utilities on top of the model IR and compiled runtimes.

Rust usage notes

Use the core and data crates directly from the workspace today. The library is still early-stage, so the repository state should be treated as the source of truth for the public surface.

Publishing order

The crates should be published in dependency order:

  1. forestfire-data
  2. forestfire-core

forestfire-inference and the example crate stay workspace-local and are not published.