This is a benchmark for reinforcement learning algorithms.  In order
to support the great variety of reinforcement learning problems, we
are using an i/o interface.

Reinforcement learning algorithms are nonuniform in the way that they
must accesss (a simulator of) the world.  We support 3 (or
effectively 4) interfaces.
	-deterministic generative model: 
		The learner repeatedly proposes a (state, random seed, action) 
		triple used to draw the next (observation,reward,state) triple.
	-generative model
		The learner repeatedly proposes a (state, action) 
		pair used to draw the next (observation,reward,state) triple.
	-trace model 
		The learner repea	tedly proposes an action used to draw the next 
		(observation,reward) pair.  (there is implicit state that the learner has
		no control over).	
	-direct experience (only one run of the trace model)

Ideally, benchmarks are written as deterministic generative models,
since this is sufficient to match all interfaces.  We provide two
metaprograms which transform models.  The syntax for invocation of the
first one is:

gfdg <deterministic_generative_model>

The resulting process presents a generative model interface to the
deterministic generative model.

The other interface is: 

tfg <generative_model>

The resulting process presents a trace model interface to the
generative model.

For double invocation, use:

tfg +gfdg <deterministic_generative_model>

Each of these outputs a description of the action space on startup.

------------------------------
deterministic generative model
------------------------------

The syntax for interacting with a deterministic generative model is simple.

Recieve:(one per line)
<observation>
<reward>
<state>
Send: (one per line, don't forget to flush)
<state>
<seed>
<action>

An empty state is interpreted as the initial state (the action is
ignored) and an empty seed is interpreted as a random seed.

------------------------------
generative model
------------------------------

The syntax for interacting with a generative model is simple.

Recieve:(one per line)
<observation>
<reward>
<state>
Send: (one per line, don't forget to flush)
<state>
<action>

An empty state is interpreted as the initial state (the action is
ignored).

------------------------------
trace model
------------------------------

The syntax for interacting with a trace model is simple.

Recieve:(one per line)
<observation>
<reward>
Send: (one per line, don't forget to flush)
<action>

(all state is implicit.)
