ocannl

OCANNL is sponsored by Ahrefs! Visit the Ahrefs website.

OCANNL -- OCaml Compiles Algorithms for Neural Networks Learning

Usage

The CUDA backend requires at least CUDA version 12.8. The Metal backend requires at least MSL version 3.1.

API documentation entry point.

A possible route to learning OCANNL:

  1. Read the introductory slides.
  2. Read: shapes and the generalized einsum beginner-to-advanced slides.
  3. Upcoming in v0.7: slides about Context.
  4. Read the migration guide.
  5. Read the syntax extensions documentation docs/syntax_extensions.md.
  6. Read the NN building blocks file lib/nn_blocks.ml and the training recipes lib/train.ml.
  7. Read the introductory part of the shape inference documentation docs/shape_inference.md.
  8. Skim the configuration documentation ocannl_config.example.
  9. Improve your understanding by reading or skimming the framework internals: tensor/shape.mli, tensor/tensor.mli, tensor/operation.ml, arrayjit/lib/context.mli.
  10. Read the implementation overview:
  11. The various tests.
  12. Shape inference details docs/shape_inference.md.
  13. Backend-independent optimizations docs/lowering_and_inlining.md -- lowering means translating (compiling) from the high-level representation (as assignments) to the low-level representation.

Using the tracing debugger with CUDA computations

To use debugging as provided by configuring Utils.settings.debug_log_from_routines <- true with the cuda backend, you need to wrap the code scheduling tasks and synchronizing cuda devices with Utils.capture_stdout_logs. The reason is that CUDA kernels are allowed to use printf, but not fprintf -- the driver dumps the printing buffer of a device to stdout at certain times (e.g. when synchronizing the device). For an example, see the implementation of Train.example_train_loop. Specifically, it wraps two sections: the call to Train.parallel_update, and the body of the returned infer_callback.

NOTE: debug logging from CUDA in complex settings is a bit tricky, it involves another thread (domain) intercepting and filtering stdout. If facing issues, try the setting never_capture_stdout=true (see ocannl_config.example).

Upcoming milestones

This is very tentative.

Releases

For more details, see CHANGES.

Why not just use OWL?

OCANNL follows different design choices than OWL. For example:

Installation

Although the project is called ocannl, the main package is called neural_nets_lib, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that ocannl is composed of arrayjit and neural_nets_lib.

The dependency on cudajit is optional so you have to install it first to enable the CUDA backend. The dependency on metal is MacOS-specific but automatic.

Code Organization

The codebase is organized to separate user-facing recipes from framework internals:

Development

NOTE TO POTENTIAL CONTRIBUTORS: while I am might be slowly starting to work with PRs in separate branches rather than just a stream of commits on the main branch, design migrations will be broken into small PRs to avoid main (master) branch staleness; and many changes will still be commits on the main branch. We allow for failing tests on the main branch, although going forward this would hopefully be happening less. Tagged i.e. released versions of the code are guaranteed to work as well as the given stage of the project permitted, the policy is that all tests must pass for releases with the backend sync_cc and must have the behavior excpected of a backend with all other backends. We try to minimize discrepancy across backends but prefer more stringent tests even if some backends only pass them "in spirit" rather than with exact expectations of the sync_cc backend.

OCANNL uses ppx_minidebug for debugging. Currently, we migrated to a per-file opt-in scheme for enabling ppx_minidebug at compile time (via environment variables, see the top of .ml files in question), and then a unified log level configuration (ocannl_log_level) for tuning logging at runtime. Due to the compile-time nature of the per-file settings, run dune clean after setting/exporting one of these environment variables.