Package patricia-tree

This library contains a single module: PatriciaTree.

This is version 0.11.0 of the library. It is known to work with OCaml versions ranging from 4.14 to 5.3.

This is an OCaml library that implements sets and maps as Patricia Trees, as described in Okasaki and Gill's 1998 paper Fast mergeable integer maps. It is a space-efficient prefix trie over the big-endian representation of the key's integer identifier.

The source code of this library is available on Github under an LGPL-2.1 license.

This library was written by Matthieu Lemerre, then further improved by Dorian Lesbre, as part of the Codex semantics library, developed at CEA List.

Installation

This library can be installed with opam:

opam install patricia-tree

Alternatively, you can clone the source repository and install with dune:

git clone git@github.com:codex-semantics-library/patricia-tree.git
cd patricia-tree
opan install . --deps-only
dune build -p patricia-tree
dune install
# To build documentation
opam install . --deps-only --with-doc
dune build @doc

See the examples to jump right into using this library.

Features

Quick overview

Functors

This library contains a single module, PatriciaTree. The functors used to build maps and sets are the following:

Interfaces

Here is a brief overview of the various module types of our library:

Examples

To use this library, install it and add the following to your dune files:

(executable ; or library
  ...
  (libraries patricia-tree ...)
)

Homogeneous map

Here is a small example of a non-generic map:

  1. Start by creating a key module. We use type int for keys in this example, but you can use any type, so long as it supports an efficient and injective to_int function.

    module IntKey : PatriciaTree.KEY with type t = int = struct
      type t = int
      let to_int x = x (* to_int must be injective and fast *)
    end
  2. Use it to instanciate the map/set functors:

    module IMap : PatriciaTree.MAP with type key = int
                = PatriciaTree.MakeMap(IntKey)
    
    module ISet : PatriciaTree.SET with type elt = int
                = PatriciaTree.MakeSet(IntKey)
  3. You can now use it as you would any other map, most of the interface is shared with the standard library's Map and Set (some functions have been renamed to highlight their differing requirements).

    # let map =
      IMap.empty |>
      IMap.add 1 "hello" |>
      IMap.add 2 "world" |>
      IMap.add 3 "how do you do?";;
    val map : string IMap.t = <abstr>

    (We also have of_list and of_seq functions for quick initialization)

    # IMap.find 1 map;;
    - : string = "hello"
    
    # IMap.cardinal map;;
    - : int = 3
  4. The strength of Patricia Tree is the speedup of operations on multiple maps with common subtrees. For example, in the following, the idempotent_inter_filter function will skip recursive calls to physically equal subtrees (kept as-is in the intersection). This allows faster than O(n) intersections.

    # let map2 =
        IMap.idempotent_inter_filter (fun _key _l _r -> None)
          (IMap.add 4 "something" map)
          (IMap.add 5 "something else" map);;
    val map2 : string IMap.t = <abstr>
    
    # map == map2;;
    - : bool = true

    Physical equality is preserved as much as possible, although some intersections may need to build new nodes and won't be fully physically equal, they will still share some subtrees.

    # let str = IMap.find 1 map;;
    val str : string = "hello"
    
    # IMap.add 1 str map == map (* already present *);;
    - : bool = true
    
    # IMap.add 1 "hello" map == map
      (* new string copy isn't physically equal to the old one *);;
    - : bool = false

    Note that physical equality isn't preserved when creating new copies of values (the newly created string "hello" isn't physically equal to str). It can also fail when maps have the same bindings but were created differently:

      # let map3 = IMap.remove 2 map;;
      val map3 : string IMap.t = <abstr>
    
      # IMap.add 2 (IMap.find 2 map) map3 == map;;
      - : bool = false

    If you want to maintain full physical equality (and thus get cheap equality test between maps), use the provided hash-consed maps and sets.

  5. Our library also allows cross map/set operations through the WithForeign functors:

      module CrossOperations = IMap.WithForeign(ISet.BaseMap)

    For example, you can only keep the bindings of map whose keys are in a given set:

      # let set = ISet.of_list [1; 3];;
      val set : ISet.t = <abstr>
    
      # let restricted_map = CrossOperations.nonidempotent_inter
        { f = fun _key value () -> value } map set;;
      val restricted_map : string IMap.t = <abstr>
    
      # IMap.to_list map;;
      - : (int * string) list = [(1, "hello"); (2, "world"); (3, "how do you do?")]
    
      # IMap.to_list restricted_map;;
      - : (int * string) list = [(1, "hello"); (3, "how do you do?")]

Heterogeneous map

Heterogeneous maps work very similarly to homogeneous ones, but come with extra liberty of having a generic type as a key.

  1. Here is a GADT example to use for our keys: a small typed expression language.

      type 'a expr =
        | G_Const_Int : int -> int expr
        | G_Const_Bool : bool -> bool expr
        | G_Addition : int expr * int expr -> int expr
        | G_Equal : 'a expr * 'a expr -> bool expr

    We can create our HETEROGENEOUS_KEY functor parameter using this type as follows:

      module Expr : PatriciaTree.HETEROGENEOUS_KEY with type 'a t = 'a expr = struct
        type 'a t = 'a expr
    
        (** Injective, so long as expressions are small enough
            (encodes the constructor discriminant in two lowest bits).
            Ideally, use a hash-consed type, to_int needs to be fast *)
        let rec to_int : type a. a expr -> int = function
          | G_Const_Int i ->   0 + 4*i
          | G_Const_Bool b ->  1 + 4*(if b then 1 else 0)
          | G_Addition(l,r) -> 2 + 4*(to_int l mod 10000 + 10000*(to_int r))
          | G_Equal(l,r) ->    3 + 4*(to_int l mod 10000 + 10000*(to_int r))
    
        (** Full polymorphic equality, requires annotation to type properly *)
        let rec polyeq : type a b. a expr -> b expr -> (a, b) PatriciaTree.cmp =
          fun l r -> match l, r with
          | G_Const_Int l, G_Const_Int r -> if l = r then Eq else Diff
          | G_Const_Bool l, G_Const_Bool r -> if l = r then Eq else Diff
          | G_Addition(ll, lr), G_Addition(rl, rr) -> (
              match polyeq ll rl with
              | Eq -> polyeq lr rr
              | Diff -> Diff)
          | G_Equal(ll, lr), G_Equal(rl, rr) ->    (
              match polyeq ll rl with
              | Eq -> (* this match is no-op, but it is required to typecheck *)
                      (match polyeq lr rr with Eq -> Eq | Diff -> Diff)
              | Diff -> Diff)
          | _ -> Diff
      end

    Note the full polymorphic equality, that returns a GADT term PatriciaTree.cmp which, when equal (Eq), is a proof of type equality between the type parameters.

  2. We can now instanciate our map functor. Note that in the heterogeneous case, we must also specify the value type (second functor argument) and how it depends on the key type (first parameter) and the map type (second parameter). Here the value only depends on the type of the key, not that of the map

      module EMap = PatriciaTree.MakeHeterogeneousMap
                      (Expr)
                      (struct type ('key, 'map) t = 'key end)
  3. You can now use this as you would any other dependent map:

      # let map : unit EMap.t =
        EMap.empty |>
        EMap.add (G_Const_Bool false) false |>
        EMap.add (G_Const_Int 5) 5 |>
        EMap.add (G_Addition (G_Const_Int 3, G_Const_Int 6)) 9 |>
        EMap.add (G_Equal (G_Const_Bool false, G_Equal (G_Const_Int 5, G_Const_Int 7))) true
      val map : unit EMap.t = <abstr>
    
      # EMap.find (G_Const_Bool false) map;;
      - : bool = false
    
      # EMap.find (G_Const_Int 5) map;;
      - : int = 5
    
      # EMap.cardinal map;;
      - : int = 4
  4. Physical equality preservation allows fast operations on multiple maps with common ancestors. In the heterogeneous case, these functions are a bit more complex since OCaml requires that first-order polymorphic functions be wrapped in records:

      # let map2 = EMap.idempotent_inter_filter
          { f = fun _key _l _r -> None } (* polymorphic 1rst order functions are wrapped in records *)
          (EMap.add (G_Const_Int 0) 8 map)
          (EMap.add (G_Const_Int 0) 9 map)
      val map2 : unit EMap.t = <abstr>

    Even though map and map2 have the same elements, they may not always be physically equal:

      # map == map2;;
      - : bool = false

    This is because they were created through different processes. They will still share subtrees. If you want to maintain full physical equality (and thus get cheap equality test between maps), use the provided hash-consed maps and sets.

Release status

This should be close to a stable release. It is already being used as part of a larger project successfully, and this usage as helped us mature the interface. As is, we believe the project is usable, and we don't anticipate any major change before 1.0.0. We didn't commit to a stable release straight away as we would like a bit more time using this library before doing so.

Known issues

There is a bug in the OCaml typechecker which prevents us from directly defining non-generic maps as instances of generic maps. To avoid this, non-generic maps use a separate value type ('a, 'b) snd (instead of just using 'b)

type (_, 'b) snd = Snd of 'b [@@unboxed]

It should not incur any extra performance cost as it is unboxed, but can appear when manipulating non-generic maps.

For more details about this issue, see the OCaml discourse discussion or the github issue.

Comparison to other OCaml libraries

ptmap and ptset

There are other implementations of Patricia Tree in OCaml, namely ptmap and ptset, both by J.C. Filliatre. These are smaller and closer to OCaml's built-in Map and Set, however:

dmap

Additionally, there is a dependent map library: dmap, which gave us the idea of making our PatriciaTree dependent. It allows creating type safe dependent maps similar to our heterogeneous maps. However, its maps aren't Patricia trees. They are binary trees build using a (polymorphic) comparison function, similarly to the maps of the standard library.

Another difference is that the type of values in the map is independent from the type of the keys, allowing keys to be associated with different values in different maps. i.e. we map 'a key to any ('a, 'b) value type, whereas dmap only maps 'a key to 'a or 'a value.

dmap also works with OCaml >= 4.12, whereas we require OCaml >= 4.14.

Contributions and bug reports

Any contributions are welcome!

You can report any bug, issues, or desired features using the Github issue tracker. Please include OCaml, dune, and library version information in you bug reports.

If you want to contribute code, feel free to fork the repository on Github and open a pull request. By doing so you agree to release your code under this project's license (LGPL-2.1).

There is no imposed coding style for this repository, here are just a few guidelines and conventions: