ocannl

NOTE TO POTENTIAL CONTRIBUTORS: reach out so I can adjust my work style -- start using branches for refactoring. Otherwise you face frustration as the code might be broken. Tagged versions of the code are guaranteed to work as well as the given stage of the project permitted.

OCANNL is sponsored by Ahrefs! Visit the Ahrefs website.

OCANNL -- OCaml Compiles Algorithms for Neural Networks Learning

Usage

Starting from OCANNL 0.5.2, the CUDA backend requires at least CUDA version 12.8.

API documentation entry point.

A possible route to learning OCANNL:

  1. Read the introductory slides.
  2. Get some basic grasp of the aims and design of the project by reading or skimming files in test/ and bin/.
  3. Read the syntax extensions documentation lib/syntax_extensions.md.
  4. Read the introductory part of the shape inference documentation lib/shape_inference.md.
  5. Read the configuration documentation ocannl_config.example.
  6. Improve your understanding by reading or skimming: lib/shape.mli, lib/tensor.mli, lib/operation.ml, arrayjit/lib/backend_intf.ml, lib/train.ml, and lib/nn_blocks.ml.
  7. Read arrayjit/lib/anatomy_of_a_backend.md.
  8. Read the implementation overview:

    1. Shape inference details lib/shape_inference.md.
    2. Backend-independent optimizations arrayjit/lib/lowering_and_inlining.md -- lowering means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
    3. More documentation to come.

Using the tracing debugger with CUDA computations

To use debugging as provided by configuring Utils.settings.debug_log_from_routines <- true with the cuda backend, you need to wrap the code scheduling tasks and synchronizing cuda devices with Utils.capture_stdout_logs. The reason is that CUDA kernels are allowed to use printf, but not fprintf -- the driver dumps the printing buffer of a device to stdout at certain times (e.g. when synchronizing the device). For an example, see the implementation of Train.example_train_loop. Specifically, it wraps two sections: the call to Train.parallel_update, and the body of the returned infer_callback.

IMPORTANT: debug logging from CUDA in complex settings currently only works as intended for very small computation sizes. If facing issues, try the setting never_capture_stdout=true (see ocannl_config.example).

Upcoming milestones

This is very tentative.

Releases

For more details, see CHANGES.

Why not just use OWL?

OCANNL follows different design choices than OWL. For example:

Installation

Although the project is called ocannl, the main package is called neural_nets_lib, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that ocannl is composed of arrayjit and neural_nets_lib.

The dependency on ocaml-cudajit is optional, so you have to install it first to enable the Cuda backend.