Open in Colab

Metadata

We provide a guide below on common developer uses of the Metadata primitive.

OSS Vizier can store Metadata in both the ProblemStatement and each TrialSuggestion/Trial, with common use cases:

  • Containing additional information outside of standard parameter types.

  • Allowing user code to store small amounts of state information inside OSS Vizier, attached to the OSS Vizier study.

  • Wrapping search spaces and corresponding algorithms which are naturally incompatible with OSS Vizier’s default API, to still allow a distributed backend service.

Installation and reference imports

!pip install google-vizier
from vizier import pyvizier as vz
from google.protobuf import any_pb2

Metadata basics

The Metadata is a key-value store, where:

  • Keys are UTF-8 strings.

  • Values can be strings or protocol buffers.

While values of type int, float, and more complex objects can also be used, the developer is responsible for serializing / unserializing said objects.

metadata = vz.Metadata()
metadata['proto'] = any_pb2.Any(...)
metadata['string'] = 'hello'

Additionally, Metadata can act as a “dictionary of dictionaries”, i.e. a hierarchy of dictionaries, via its Namespace functionality via calling .ns(), which creates another Metadata which shares data with the original.

child_metadata = metadata.ns('child')

grandchild_metadata = child_metadata.ns('child')
grandchild_metadata['string'] = 'goodbye'

assert metadata.ns('child').ns('child')['string'] == 'goodbye'

ProblemStatement Metadata

The ProblemStatement object contains a metadata attribute, ideally for storing global metadata related to the study. Note that Metadata will not be used in the optimization process, UNLESS there is a custom algorithm configured to use it.

Below is a usage example when training an image classifier, where one may wish to store training-related attributes in Metadata.

problem_statement = vz.ProblemStatement()
problem_statement.metadata['dataset'] = 'cifar10'
problem_statement.metadata['architecture'] = 'resnet_18'

Trial Metadata

TrialSuggestion and subclass Trial also contain a metadata attribute. This in contrast, should be used to store metadata related to the specific Trial.

In the image classification case, examples would be the type of GPU used for training and if the training worker has been preempted.

trial = vz.Trial()
trial.metadata['gpu_used'] = 'P100'
trial.metadata['preempted'] = 'True'

OSS Vizier as a backend via Metadata

As an advanced developer use case, one may extend OSS Vizier’s search space capabilities using Metadata. Custom algorithms can provide full freedom in expressing more complex search spaces (e.g. graphs) using Metadata.

Example use cases:

  • Combinatorial optimization, where the search space may consist of graphs or multiple selection (e.g. \({N \choose K}\)) primitives. Algorithms commonly include evolutionary methods, which also require custom mutation operations.

  • Free-form textual data used for suggestions (and maybe even evaluation metrics!), as common with language-based applications.

# Setup combinatorial search space.
choose_problem = vz.ProblemStatement()
choose_problem.metadata = vz.Metadata({'N': '10', 'K': '3'})

# Example of a suggestion proposed by a custom algorithm.
suggestion = vz.TrialSuggestion()
suggestion.metadata['chosen_indices'] = '[0, 3, 7]'

The algorithm behavior can even be changed mid-optimization with Metadata using a client! This is in fact used extensively in our integrations with PyGlove to allow a running Pythia policy to change search spaces or mutations online.

# Original mutation rate.
mutation_problem = vz.ProblemStatement()
mutation_problem.metadata = vz.Metadata({'mutation_rate': '0.1'})

# ...
# Assume algorithm started running in the Pythia service.
# ...

# Set new mutation rate.
study_metadata = vz.Metadata({'mutation_rate': '0.2'})

# Prevent this trial from being used in the population.
trial_metadata = vz.Metadata({'use_in_population' = 'False'})
trial_id = 1

# Create unit of metadata update.
metadata_delta = vz.MetadataDelta(
    on_study=study_metadata, on_trials={trial_id: trial_metadata})

Once we have a client, we can commit the metadata update:

client.update_metadata(metadata_delta)