We will demonstrate below how to use our benchmark runner pipeline.
Installation and reference imports
!pip install google-vizier[jax,algorithms]
from vizier import algorithms as vza from vizier import benchmarks as vzb from vizier.algorithms import designers from vizier.benchmarks import experimenters
Example experimenter and designer factory which we will use later.
experimenter = experimenters.NumpyExperimenter( experimenters.bbob.Sphere, experimenters.bbob.DefaultBBOBProblemStatement(5) ) designer_factory = designers.GridSearchDesigner.from_problem
Algorithms and Experimenters
Every study can be seen conceptually as a simple loop between an algorithm and objective. In terms of code, the algorithm corresponds to a
Policy and objective to an
Below is a simple sequential loop.
designer = designer_factory(experimenter.problem_statement()) for _ in range(100): suggestion = designer.suggest() trial = suggestion.to_trial() experimenter.evaluate([trial]) completed_trials = vza.CompletedTrials([trial]) designer.update(completed_trials, vza.ActiveTrials())
As seen above however, one modification we can make is to use variable batch sizes, rather than only suggesting and evaluating one-by-one. More generally, certain implementation details may arise:
How many parallel suggestions should the algorithm generate?
How many suggestions can be evaluated at once?
Should we use early stopping on certain unpromising trials?
Should we use a custom stopping condition instead of a fixed for-loop?
Can we swap in a different algorithm mid-loop?
Can we swap in a different objective mid-loop?
The code flexibility needed to simulate these real-life scenarios may cause
complications as the evaluation benchmark may no longer be stateless. In order
to broadly cover such scenarios, our API introduces the
class BenchmarkSubroutine(Protocol): """Abstraction for core benchmark routines. Benchmark protocols are modular alterations of BenchmarkState by reference. """ def run(self, state: BenchmarkState) -> None: """Abstraction to alter BenchmarkState by reference."""
All routines use and potentially modify a
BenchmarkState, which holds information about the objective via an
Experimenter and the algorithm itself wrapped by a
class BenchmarkState: """State of a benchmark run. It is altered via benchmark protocols.""" experimenter: Experimenter algorithm: PolicySuggester
To wrap multiple
BenchmarkSubRoutines together, we can use the
class BenchmarkRunner(BenchmarkSubroutine): """Run a sequence of subroutines, all repeated for a few iterations.""" # A sequence of benchmark subroutines that alter BenchmarkState. benchmark_subroutines: Sequence[BenchmarkSubroutine] # Number of times to repeat applying benchmark_subroutines. num_repeats: int def run(self, state: BenchmarkState) -> None: """Run algorithm with benchmark subroutines with repetitions."""
Below is a typical example of simple suggestion and evaluation:
runner = vzb.BenchmarkRunner( benchmark_subroutines=[ vzb.GenerateSuggestions(), vzb.EvaluateActiveTrials(), ], num_repeats=100, ) benchmark_state_factory = vzb.DesignerBenchmarkStateFactory( experimenter=experimenter, designer_factory=designer_factory ) benchmark_state = benchmark_state_factory() runner.run(benchmark_state)
We may obtain the evaluated trials via the
benchmark_state, which contains a
PolicySupporter via its
all_trials = benchmark_state.algorithm.supporter.trials print(all_trials)
Note that this design is maximally informative on everything that has happened
so far in the study. For instance, we may also query incomplete/unused
suggestions using the
Benchmark Runners can be found here.