Rust API¶
The Rust side is split across crates:
forestfire-dataforestfire-coreforestfire-inference
Core training entrypoint¶
use forestfire_core::{train, TrainConfig};
let model = train(&table, TrainConfig::default())?;
let optimized = model.optimize_inference(Some(1))?;
TrainConfig::histogram_bins controls the numeric histogram resolution used
during fitting. Leave it as None to reuse the incoming table bins, or set
Some(NumericBins::Auto) / Some(NumericBins::Fixed(...)) to rebin the
training view before split search.
When constructing TrainConfig explicitly, prefer starting from
..TrainConfig::default() unless you really need to spell out every field.
That keeps examples resilient as new configuration fields are added.
The intended Rust lifecycle is:
- build a training table through
forestfire-data - train a semantic
Model - use the semantic model for introspection and canonical serialization
- derive an
OptimizedModelwhen prediction speed matters - optionally snapshot the optimized runtime as a compiled artifact
Important core types¶
TrainConfigTrainAlgorithmTaskTreeTypeSplitStrategyBuilderStrategyCriterionModelOptimizedModel
TrainConfig::split_strategy selects the split family:
SplitStrategy::AxisAlignedSplitStrategy::Oblique
Current support:
AxisAligned: all supported tree familiesOblique:dt,rf, andgbmwhentree_typeisCartorRandomized
TrainConfig::builder selects the tree-construction strategy:
BuilderStrategy::GreedyBuilderStrategy::LookaheadBuilderStrategy::BeamBuilderStrategy::Optimal
Related TrainConfig fields:
lookahead_depthlookahead_top_klookahead_weightbeam_width
Those control lookahead and beam. optimal ignores them and is instead
bounded by the normal tree limits plus canary filtering.
For gradient boosting specifically:
TrainConfig::canary_filtercontrols the ordinary root-stage canary windowTrainConfig::boosting_first_stage_retry_filtercontrols an optional retry window for stage 0- its default is
Some(CanaryFilter::TopN(1)), which preserves strict top-1 behavior unless you widen it explicitly
Core capabilities¶
forestfire-core currently provides:
- unified training dispatch
- decision trees, random forests, and gradient boosting
- per-row sample weights for all algorithms and tasks
- multi-target regression (2-D
ywithn_targets > 1columns,dtonly) - MDI feature importances on
ModelandOptimizedModel - optimized inference runtimes
- JSON IR serialization and deserialization
- tree introspection metadata
- compiled optimized runtime artifacts
- used-feature introspection for semantic and optimized models
Useful runtime-oriented methods:
Model::used_feature_indices()Model::used_feature_count()Model::optimize_inference(...)Model::optimize_inference_with_missing_features(...)OptimizedModel::used_feature_indices()OptimizedModel::used_feature_count()OptimizedModel::serialize_compiled()OptimizedModel::deserialize_compiled(...)
Optimized models still accept the full semantic feature space on input, but they lower the runtime into a compact projected feature space internally so batch preprocessing only touches the columns that appear in splits.
Inference inputs can also contain missing values through the normal raw-input
paths, including floating-point NaN values and polars nulls when the
feature is available.
That means there are really three layers to keep in mind:
Model: semantic meaningOptimizedModel: lowered runtime- compiled artifact: serialized lowered runtime plus semantic IR
Example: semantic model vs optimized runtime¶
use forestfire_core::{train, TrainConfig};
use forestfire_data::Table;
let table = Table::new(
vec![
vec![0.0, 0.0, 10.0],
vec![0.0, 1.0, 10.0],
vec![1.0, 0.0, 10.0],
vec![1.0, 1.0, 10.0],
],
vec![0.0, 0.0, 0.0, 1.0],
)?;
let model = train(&table, TrainConfig::default())?;
let optimized = model.optimize_inference(Some(1))?;
println!("{:?}", model.used_feature_indices());
println!("{:?}", optimized.used_feature_indices());
Those used-feature methods reflect the semantic split structure, and the optimized runtime uses them to project inference input before scoring.
Example: compiled optimized artifact¶
let optimized = model.optimize_inference(Some(1))?;
let bytes = optimized.serialize_compiled()?;
let restored = forestfire_core::OptimizedModel::deserialize_compiled(&bytes, Some(1))?;
Use this when you want to preserve the lowered runtime layout across reloads instead of recomputing it from the semantic model each time.
Example: selective missing checks in the optimized runtime¶
let optimized = model.optimize_inference_with_missing_features(
Some(1),
Some(vec![0, 3, 7]),
)?;
Pass None to preserve missing-aware behavior for every used feature. Pass an
explicit feature-index list only when your inference pipeline guarantees that
other columns will never be missing.
Data crate¶
The forestfire-data crate provides the training-table abstractions and preprocessing/binned storage used by the learners.
Key types:
Table— primary training container (dense or sparse)WeightedTable<'a>— wraps any&dyn TableAccessand overridessample_weightwith per-row weights; compose with any other table adapterMultiTargetDenseTable<'a>— wraps a base table and attaches extra target columns; overridesn_targets()andtarget_value_at()so the regressor produces multi-target leaves
Example:
use forestfire_data::{Table, WeightedTable, MultiTargetDenseTable};
let base = Table::new(x_rows, first_target_col)?;
let weighted = WeightedTable::new(&base, weights);
let mt = MultiTargetDenseTable::new(&base, extra_targets);
Important point:
Tableis the training-side abstraction- inference can use raw rows, named columns, sparse binary columns, and
polarsdata directly throughforestfire-core
That mirrors the Python surface: training normalizes through tables, while prediction accepts user-facing inference inputs directly.
Inference crate¶
The forestfire-inference crate contains inference-focused runtime utilities on top of the model IR and compiled runtimes.
Rust usage notes¶
Use the core and data crates directly from the workspace today. The library is still early-stage, so the repository state should be treated as the source of truth for the public surface.
Publishing order¶
The crates should be published in dependency order:
forestfire-dataforestfire-core
forestfire-inference and the example crate stay workspace-local and are not published.