Benchmarking with Ray

We provide a brief guide below on the Vizier + Ray integration, and how to benchmark with all publicly available algorithms on Ray Tune. Notably, Tune integrates with a wide range of additional hyperparameter optimization tools, including Ax, BayesOpt, BOHB, Dragonfly, FLAML, HEBO, Hyperopt, Nevergrad, Optuna, SigOpt, skopt, and ZOOpt.

alt-text

Initial Installation

!pip install google-vizier[jax]

!pip install -U "ray[default]"

Algorithm and Experimenter Factories

As mentioned in previous guides, since we want to compare algorithms across multiple benchmarks, we first create a bunch of relevant benchmark experimenters. To do so, we use SerializableExperimenterFactory from our Experimenters API to modularize the construction of multiple benchmark components.

For example, here we can create a diverse set of BBOB functions with different dimensions via the BBOBExperimenterFactory. Then, we can print out the full serialization of the benchmarks that we have created.

import itertools
import numpy as np
from vizier.benchmarks import experimenters

function_names = [
    'Sphere',
    'BentCigar',
    'Katsuura',
]
dimensions = [4, 8]
product_list = list(itertools.product(function_names, dimensions))

experimenter_factories = []
for product in product_list:
  name, dim = product
  bbob_factory = experimenters.BBOBExperimenterFactory(name=name, dim=dim)
  experimenter_factory = experimenters.SingleObjectiveExperimenterFactory(
      bbob_factory,
      shift=np.random.uniform(low=-2, high=2, size=dim),
      noise_type='LIGHT_ADDITIVE_GAUSSIAN',
  )
  experimenter_factories.append(experimenter_factory)
  print(experimenter_factory.dump())

Next, we need to define our algorithms by installing the relevant packages and importing the relevant algorithms. For simplicity, we only compare against only a subset of the algorithms that Ray supports.

NOTE: We provide the VizierSearch class in our own libaries that can directly use the Searcher API in Ray. The imports are given below.

pip install ax-platform scikit-optimize hyperopt optuna bayesian-optimization

from ray import tune
from ray.tune.search.ax import AxSearch
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search.hyperopt import HyperOptSearch
from ray.tune.search.optuna import OptunaSearch
from ray.tune.search.skopt import SkOptSearch
from vizier import raytune as vzr
from vizier._src.raytune.vizier_search import VizierSearch

algorithm_factories = {
    'ray': lambda: None,
    'vizier': VizierSearch,
    'ax': AxSearch,
    'bayesopt': BayesOptSearch,
    'optuna': OptunaSearch,
    'hyperopt': HyperOptSearch,
    'skopt': SkOptSearch,
}

Running RayTune

Running RayTune using ExperimenterFactory is made easy using our utility libraries which takes in any factory with a TuneConfig to run the algorithm on the corresponding benchmark. Let us first run one algorithm on the first benchmark and see the results that we get.

NOTE: This uses a local Ray instance.

ALGORITHM_NAME = 'ray'  # @param str
experimenter_factory = experimenter_factories[0]
factory = algorithm_factories[ALGORITHM_NAME]
tune_config = tune.TuneConfig(
    search_alg=factory(),
    num_samples=4,
    max_concurrent_trials=1,
)
vzr.run_tune.run_tune_from_factory(experimenter_factory, tune_config)

Now, we repeat our runs for each ExperimenterFactory and each algorithm, converting the results into PlotElements for easy plotting and comparison.

from vizier.benchmarks import analyzers

NUM_REPEATS = 3  # @param
NUM_ITERATIONS = 50  # @param


def results_to_element(results_list):
  curves = []
  for results in results_list:
    raw_ys = np.array(results.get_dataframe()['bbob_eval_before_noise'])
    ys = np.minimum.accumulate(raw_ys)
    curve = analyzers.ConvergenceCurve(
        xs=np.arange(1, len(ys) + 1),
        ys=ys.reshape((1, len(ys))),
        trend=analyzers.ConvergenceCurve.YTrend.DECREASING,
    )
    curves.append(curve)
  all_curves = analyzers.ConvergenceCurve.align_xs(curves)
  ele = analyzers.PlotElement(curve=all_curves[0], yscale='symlog')
  return ele


all_records = []
for experimenter_factory in experimenter_factories:
  for algorithm, factory in algorithm_factories.items():
    results = []
    for _ in range(NUM_REPEATS):
      tune_config = tune.TuneConfig(
          search_alg=factory(),
          num_samples=NUM_ITERATIONS,
          max_concurrent_trials=1,
      )
      results.append(
          vzr.run_tune.run_tune_from_factory(experimenter_factory, tune_config)
      )
    ele = results_to_element(results)
    record = analyzers.BenchmarkRecord(
        algorithm=algorithm,
        experimenter_metadata=experimenter_factory.dump(),
        plot_elements={'objective': ele},
    )
    all_records.append(record)

analyzed_records = analyzers.BenchmarkRecordAnalyzer.add_comparison_metrics(
    records=all_records, baseline_algo='ray'
)
analyzers.plot_from_records(analyzed_records)

Running Parallelized Ray

In the previous example, we are using Ray local instances and running each benchmark in sequential format, which can take minutes. When there are a large number of benchmarks or computationally intensive benchmark runs, using parallelism distributed across each (algorithm, benchmark) tuple is crucial for reasonable benchmarking turnaround. We recommend using the Ray Jobs API to distribute work across clusters.