OSS Vizier as a Backend

We demonstrate how OSS Vizier can be used as a distributed backend for PyGlove-based tuning tasks.

This assumes the user is already familiar with PyGlove primitives.

Installation and reference imports

!pip install google-vizier
!pip install pyglove

import multiprocessing
import multiprocessing.pool
import os

import pyglove as pg
from vizier import pyglove as pg_vizier
from vizier.service import servers

Preliminaries

In the original PyGlove setting, one can normally perform evolutionary computation, for example:

search_space = pg.Dict(x=pg.floatv(0.0, 1.0), y=pg.floatv(0.0, 1.0))
algorithm = pg.evolution.regularized_evolution()
num_trials = 100


def evaluator(value: pg.Dict):
  return value.x**2 - value.y**2


for value, feedback in pg.sample(
    search_space,
    algorithm=algorithm,
    num_examples=num_trials,
    name='basic_run',
):
  reward = evaluator(value)
  feedback(reward=reward)

However, in many real-world scenarios, the evaluator may be much more expensive. For example, in neural architecture search applications, evaluator may be the result of an entire neural network training pipeline.

This leads to the need for a backend, in order to:

Distribute the evaluations over multiple workers.
Store the valuable results reliably and handle worker faults.

Initializing the OSS Vizier backend

The main initializer to call is vizier.pyglove.init(...), which should only be called once per process (not thread). This function will edit global Python variables for determining values such as:

Prefix for study names.
Endpoint of the VizierService for storing data and handling requests.
Port for the PythiaService for computing suggestions.

In the local case, this can be called as-is:

pg_vizier.init('my_study')

Alternatively, if using a remote server, the endpoint can be specified as well:

server = servers.DefaultVizierServer()  # Normally hosted on a remote machine.
pg_vizier.init('my_study', vizier_endpoint=server.endpoint)

Parallelization

Due to the OSS Vizier backend, all workers may conveniently use exactly the same evaluation loop to work on a study:

NUM_WORKERS = 10


def work_fn(worker_id):
  print(f"Worker ID: {worker_id}")
  for value, feedback in pg.sample(
      search_space,
      algorithm=algorithm,
      num_examples=num_trials // NUM_WORKERS,
      name="worker_run",
  ):
    reward = evaluator(value)
    feedback(reward=reward)

There are three common forms of parallelization over the evaluation computation:

Multiple threads, single process.
Multiple processes, single machine.
Multiple machines.

Each of these cases defines the “worker”, which can be a thread, process or machine respectively. We demonstrate examples of every type of parallelization below.

Multiple threads, single process

with multiprocessing.pool.ThreadPool(num_workers) as pool:
  pool.map(work_fn, range(NUM_WORKERS))

Multiple processes, single machine

processes = []
for worker_id in range(NUM_WORKERS):
  p = multiprocessing.Process(target=work_fn, args=(worker_id,))
  p.start()
  processes.append(p)

for p in processes:
  p.join()

Multiple machines

# Server Machine
server = servers.DefaultVizierServer()

# Worker Machine
worker_id = os.uname()[1]
pg_vizier.init('my_study', vizier_endpoint=server.endpoint)
work_fn(worker_id)