How_to_run.remote now accepts an optional assert_binary_hash parameter
connection_timeout parameter in Map_reduce.Config.create, with the same semantics as in with_spawn_argsAsync_rpc's direct pipe rpcscomplete_subcommands pass-through argument to start_app for Command.runRemote_executable.run now accepts an optional assert_binary_hash parameter
Remote_executable.t by setting assert_binary_hash to falsecreate_reverse_direct_pipe to Parallel_intf.@@deriving bin_io for managed workers.Introduce a spawn_in_foreground function that returns a Process.t along with the worker.
Also use this opportunity to clean up the handling of file descriptors in the spawn case as well.
Add a couple features to Rpc_parallel to make debugging connection issues easier.
Worker.Connection.close_reasonWorker.Connection.sexp_of_tAdd some extra security around making Rpc calls to workers.
Because we do not expose the port of a worker, unless you do really hacky things, you are going to go through Rpc_parallel when running Rpcs. When Rpc_parallel connects to a worker, it initializes a connection state (that includes the worker state). This initialization would raise if the worker did not have a server listening on the port that the client was talking to. Add some security by enforcing unification of worker_ids instead of ports (which will be reused by the OS).
redirect_stderr and redirect_stdout to Map_reduce so it is easier to debug your workers. Also removed spawn_exn in favor of only exposing spawn_config_exn. If you want to spawn a single worker, make a config of one worker.Cleans up the implementation-side interface for aborting Pipe_rpcs.
Summary
The aborted Deferred.t that got passed to Pipe_rpc implementations is gone. The situations where it would have been determined now close the reading end of the user-supplied pipe instead.
Details
Previously, when an RPC dispatcher decided to abort a query, the RPC implementation would get its aborted Deferred.t filled in, but would remain free to write some final elements to the pipe.
This is a little bit more flexible than the new interface, but it's also problematic in that the implementer could easily just not pay attention to aborted. (They're not obligated to pay attention to when the pipe is closed, either, but at least they can't keep writing to it.) We don't think the extra flexibility was used at all.
In the future, we may also simplify the client side to remove the abort function on the dispatch side (at least when not using dispatch_iter). For the time being it remains, but closing the received pipe is the preferred way of aborting the query.
There are a couple of known ways this might have changed behavior from before. Both of these appear not to cause problems in the jane repo.
abort), since the implementor's pipe will also be closed in this case. Doing this was already unsafe, though, since the pipe was closed if the RPC connection was closed.aborted was only determined if a client aborted the query or the connection was closed. The new alternative, Pipe.closed called on the returned pipe, will also be determined if the implementation closes the pipe itself. This is unlikely to cause serious issues but could potentially cause some confusing logging.This feature completely restructured rpc_parallel to make it more transparent in what it does and doesn't do. I offloaded all the connection management and cleanup responsibilities to the user.
run was called on the worker type (i.e. a host and port) because it's implementation would get a connection (using an existing one or doing a Rpc.Connection.client).run is called on a connection type, so you have to manage connections yourselfManaged module that does connection management for youHeartbeater type that can be used in spawned workers to connect to their master and handle cleanupThere was a race condition where spawn would return a Worker.t even though the worker had not daemonized yet. This manifested itself in a spurious test case failure.
This also allowed for running the worker initialization code before we daemonize.
Heartbeater moduleMake all global state and rpc servers lazy.
Namely:
Rpc_parallelRpc_parallel bug that caused a Master to Worker call to never return.Add new functions spawn_and_connect and spawn_and_connection_exn which return back the worker and a connection to the worker.
Also, move all the example code off of the managed module
Some refactoring:
serve out of the Connection module. It shouldn't be in there because it does nothing with the Connection.t typeConnection.t's. E.g. no longer expose close_reason because all the exception handling will be done with the registered exn handlers.Rpc_settings moduleglobal_state into as_worker and as_masterInit_connection_state_rpc.queryconnection_timeout argument in rpc_parallel. This argument exists in Rpc_parallel_core.Parallel, but it is not exposed in Rpc_parallel.Parallel.Parallel.Make_worker() functor application top-levelMake errors/exceptions in Rpc_parallel more observable
Rpc_parallelMonitor.main (e.g. Async_log does this)Rpc_parallel logic and start RPC servers for every command.Fixed a file-descriptor leak
There was a file descriptor leak when killing workers. Their stdin, stdout, and stderr remain open. We now close them after the worker process exits.
Parallel.State.get function, to check whether Rpc_parallel has been initialized correctly.Added Map_reduce module, which is an easy-to-use parallel map/reduce library.
It can be used to map/fold over a list while utilizing multiple cores on multiple machines.
Also added support for workers to keep their own inner state.
Fixed bug in which zombie process was created per spawned worker.
Also fixed shutdown on remote workers
Initial import.