ocannl

OCANNL is sponsored by Ahrefs! Visit the Ahrefs website.

OCANNL -- OCaml Compiles Algorithms for Neural Networks Learning

Usage

Starting from OCANNL 0.5.2, the CUDA backend requires at least CUDA version 12.8. The Metal backend requires at least MSL version 3.1.

API documentation entry point.

A possible route to learning OCANNL:

  1. Read the introductory slides.
  2. Get some basic grasp of the aims and design of the project by reading or skimming files in test/ and bin/.
  3. Read the syntax extensions documentation lib/syntax_extensions.md.
  4. Read the introductory part of the shape inference documentation lib/shape_inference.md.
  5. Read the configuration documentation ocannl_config.example.
  6. Improve your understanding by reading or skimming: lib/shape.mli, lib/tensor.mli, lib/operation.ml, arrayjit/lib/backend_intf.ml, lib/train.ml, and lib/nn_blocks.ml.
  7. Read arrayjit/lib/anatomy_of_a_backend.md.
  8. Read the implementation overview:

    1. Shape inference details lib/shape_inference.md.
    2. Backend-independent optimizations arrayjit/lib/lowering_and_inlining.md -- lowering means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
    3. More documentation to come.

Using the tracing debugger with CUDA computations

To use debugging as provided by configuring Utils.settings.debug_log_from_routines <- true with the cuda backend, you need to wrap the code scheduling tasks and synchronizing cuda devices with Utils.capture_stdout_logs. The reason is that CUDA kernels are allowed to use printf, but not fprintf -- the driver dumps the printing buffer of a device to stdout at certain times (e.g. when synchronizing the device). For an example, see the implementation of Train.example_train_loop. Specifically, it wraps two sections: the call to Train.parallel_update, and the body of the returned infer_callback.

NOTE: debug logging from CUDA in complex settings is a bit tricky, it involves another thread (domain) intercepting and filtering stdout. If facing issues, try the setting never_capture_stdout=true (see ocannl_config.example).

Upcoming milestones

This is very tentative.

Releases

For more details, see CHANGES.

Why not just use OWL?

OCANNL follows different design choices than OWL. For example:

Installation

Although the project is called ocannl, the main package is called neural_nets_lib, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that ocannl is composed of arrayjit and neural_nets_lib.

The dependency on cudajit and metal is optional, so you have to install them first to enable the CUDA or Apple Metal backends.

Development

NOTE TO POTENTIAL CONTRIBUTORS: while I am might be slowly starting to work with PRs in separate branches rather than just a stream of commits on the main branch, design migrations will be broken into small PRs to avoid main (master) branch staleness; and many changes will still be commits on the main branch. We allow for failing tests on the main branch, although going forward this would hopefully be happening less. Tagged i.e. released versions of the code are guaranteed to work as well as the given stage of the project permitted, the policy is that all tests must pass for releases with the backend sync_cc and must have the behavior excpected of a backend with all other backends. We try to minimize discrepancy across backends but prefer more stringent tests even if some backends only pass them "in spirit" rather than with exact expectations of the sync_cc backend.

OCANNL uses ppx_minidebug for debugging. Currently, we migrated to a per-file opt-in scheme for enabling ppx_minidebug at compile time (via environment variables, see the top of .ml files in question), and then a unified log level configuration (ocannl_log_level) for tuning logging at runtime. Due to the compile-time nature of the per-file settings, run dune clean after setting/exporting one of these environment variables.