{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "wEvqTaRNtS5z" }, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/vizier/blob/main/docs/guides/benchmarks/running_benchmarks.ipynb)\n", "\n", "# Running Benchmarks\n", "We will demonstrate below how to use our benchmark runner pipeline." ] }, { "cell_type": "markdown", "metadata": { "id": "Qi88APk7Qy4d" }, "source": [ "## Installation and reference imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zHUVrnl9wnhO" }, "outputs": [], "source": [ "!pip install google-vizier[jax,algorithms]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "eGzQYe6ZcP7z" }, "outputs": [], "source": [ "from vizier import algorithms as vza\n", "from vizier import benchmarks as vzb\n", "from vizier.algorithms import designers\n", "from vizier.benchmarks import experimenters" ] }, { "cell_type": "markdown", "metadata": { "id": "BEuMSlNlc_FX" }, "source": [ "Example experimenter and designer factory which we will use later." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "AIwgqoH_dAte" }, "outputs": [], "source": [ "experimenter = experimenters.NumpyExperimenter(\n", " experimenters.bbob.Sphere, experimenters.bbob.DefaultBBOBProblemStatement(5)\n", ")\n", "\n", "designer_factory = designers.GridSearchDesigner.from_problem" ] }, { "cell_type": "markdown", "metadata": { "id": "Erk6WFp7Q1Y4" }, "source": [ "## Algorithms and Experimenters\n", "Every study can be seen conceptually as a simple loop between an algorithm and objective. In terms of code, the algorithm corresponds to a `Designer`/`Policy` and objective to an `Experimenter`.\n", "\n", "Below is a simple sequential loop." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sXEOi4Vhl7qL" }, "outputs": [], "source": [ "designer = designer_factory(experimenter.problem_statement())\n", "\n", "for _ in range(100):\n", " suggestion = designer.suggest()[0]\n", " trial = suggestion.to_trial()\n", " experimenter.evaluate([trial])\n", " completed_trials = vza.CompletedTrials([trial])\n", " designer.update(completed_trials, vza.ActiveTrials())" ] }, { "cell_type": "markdown", "metadata": { "id": "wZlArEz1l4Kt" }, "source": [ "As seen above however, one modification we can make is to use variable batch\n", "sizes, rather than only suggesting and evaluating one-by-one. More generally,\n", "certain implementation details may arise:\n", "\n", "* How many parallel suggestions should the algorithm generate?\n", "* How many suggestions can be evaluated at once?\n", "* Should we use early stopping on certain unpromising trials?\n", "* Should we use a custom stopping condition instead of a fixed for-loop?\n", "* Can we swap in a different algorithm mid-loop?\n", "* Can we swap in a different objective mid-loop?" ] }, { "cell_type": "markdown", "metadata": { "id": "tkgb2C0hPsvG" }, "source": [ "## API\n", "The code flexibility needed to simulate these real-life scenarios may cause\n", "complications as the evaluation benchmark may no longer be stateless. In order\n", "to broadly cover such scenarios, our [API](https://github.com/google/vizier/blob/main/vizier/benchmarks/__init__.py) introduces the `BenchmarkSubroutine`:\n", "\n", "```python\n", "class BenchmarkSubroutine(Protocol):\n", " \"\"\"Abstraction for core benchmark routines.\n", "\n", " Benchmark protocols are modular alterations of BenchmarkState by reference.\n", " \"\"\"\n", "\n", " def run(self, state: BenchmarkState) -\u003e None:\n", " \"\"\"Abstraction to alter BenchmarkState by reference.\"\"\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "-QUBdrhgqFlB" }, "source": [ "All routines use and potentially modify a `BenchmarkState`, which holds information about the objective via an `Experimenter` and the algorithm itself wrapped by a `PolicySuggester`.\n", "\n", "```python\n", "class BenchmarkState:\n", " \"\"\"State of a benchmark run. It is altered via benchmark protocols.\"\"\"\n", "\n", " experimenter: Experimenter\n", " algorithm: PolicySuggester\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "zyOVOvGtqPrf" }, "source": [ "To wrap multiple `BenchmarkSubRoutines` together, we can use the `BenchmarkRunner`:\n", "\n", "```python\n", "class BenchmarkRunner(BenchmarkSubroutine):\n", " \"\"\"Run a sequence of subroutines, all repeated for a few iterations.\"\"\"\n", "\n", " # A sequence of benchmark subroutines that alter BenchmarkState.\n", " benchmark_subroutines: Sequence[BenchmarkSubroutine]\n", " # Number of times to repeat applying benchmark_subroutines.\n", " num_repeats: int\n", "\n", " def run(self, state: BenchmarkState) -\u003e None:\n", " \"\"\"Run algorithm with benchmark subroutines with repetitions.\"\"\"\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "CVU_X3Wxo-8e" }, "source": [ "## Example usage\n", "Below is a typical example of simple suggestion and evaluation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sNZv5Bj6ou6n" }, "outputs": [], "source": [ "runner = vzb.BenchmarkRunner(\n", " benchmark_subroutines=[\n", " vzb.GenerateSuggestions(),\n", " vzb.EvaluateActiveTrials(),\n", " ],\n", " num_repeats=100,\n", ")\n", "\n", "benchmark_state_factory = vzb.DesignerBenchmarkStateFactory(\n", " experimenter=experimenter, designer_factory=designer_factory\n", ")\n", "benchmark_state = benchmark_state_factory()\n", "\n", "runner.run(benchmark_state)" ] }, { "cell_type": "markdown", "metadata": { "id": "T4A24qM1rEM5" }, "source": [ "We may obtain the evaluated trials via the `benchmark_state`, which contains a\n", "`PolicySupporter` via its `algorithm` field:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DkJ801PPrNLK" }, "outputs": [], "source": [ "all_trials = benchmark_state.algorithm.supporter.trials\n", "print(all_trials)" ] }, { "cell_type": "markdown", "metadata": { "id": "D01yeY4XseNb" }, "source": [ "Note that this design is maximally informative on everything that has happened\n", "so far in the study. For instance, we may also query incomplete/unused\n", "suggestions using the `PolicySupporter`." ] }, { "cell_type": "markdown", "metadata": { "id": "eu-zL7_Bs9Kt" }, "source": [ "## References\n", "* Benchmark Runners can be found [here](https://github.com/google/vizier/tree/main/vizier/_src/benchmarks/runners).\n", "\n" ] } ], "metadata": { "colab": { "name": "Running Benchmarks.ipynb", "private_outputs": true, "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }