{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "AtQDdpyoaigd" }, "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/vizier/blob/main/docs/guides/developer/metadata.ipynb)\n", "\n", "# Metadata\n", "We provide a guide below on common developer uses of the `Metadata`\n", "primitive.\n", "\n", "OSS Vizier can store `Metadata` in both the `ProblemStatement` and each\n", "`TrialSuggestion`/`Trial`, with common use cases:\n", "\n", "* Containing additional information outside of standard parameter types.\n", "* Allowing user code to store small amounts of state information inside OSS Vizier, attached to the OSS Vizier study.\n", "* Wrapping search spaces and corresponding algorithms which are naturally incompatible with OSS Vizier's default API, to still allow a distributed backend service." ] }, { "cell_type": "markdown", "metadata": { "id": "8pKXXZWQ7zsG" }, "source": [ "## Installation and reference imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0gKN_IPlv_iA" }, "outputs": [], "source": [ "!pip install google-vizier" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ynnlYcq_eUCM" }, "outputs": [], "source": [ "from vizier import pyvizier as vz\n", "from google.protobuf import any_pb2" ] }, { "cell_type": "markdown", "metadata": { "id": "22d8H8vib3xZ" }, "source": [ "## Metadata basics\n", "The [`Metadata`](https://github.com/google/vizier/blob/main/vizier/pyvizier/__init__.py) is a key-value store, where:\n", "\n", "* Keys are UTF-8 strings.\n", "* Values can be strings or protocol buffers.\n", "\n", "While values of type `int`, `float`, and more complex objects can also be\n", "used, **the developer is responsible for serializing / unserializing said objects.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Dc9CEZNTdsb-" }, "outputs": [], "source": [ "metadata = vz.Metadata()\n", "metadata['proto'] = any_pb2.Any(...)\n", "metadata['string'] = 'hello'" ] }, { "cell_type": "markdown", "metadata": { "id": "MjlvWueweje4" }, "source": [ "Additionally, `Metadata` can act as a \"dictionary of dictionaries\", i.e. a hierarchy of dictionaries, via its `Namespace` functionality via calling `.ns()`, which creates another `Metadata` which shares data with the original." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hA2kzmOUetV3" }, "outputs": [], "source": [ "child_metadata = metadata.ns('child')\n", "\n", "grandchild_metadata = child_metadata.ns('child')\n", "grandchild_metadata['string'] = 'goodbye'\n", "\n", "assert metadata.ns('child').ns('child')['string'] == 'goodbye'" ] }, { "cell_type": "markdown", "metadata": { "id": "p82qL-rCf5nR" }, "source": [ "## ProblemStatement Metadata\n", "The `ProblemStatement` object contains a `metadata` attribute, ideally for storing global metadata related to the study. Note that `Metadata` will not be used in the optimization process, UNLESS there is a custom algorithm configured to use it.\n", "\n", "Below is a usage example when training an image classifier, where one may wish to store training-related attributes in `Metadata`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZpvrnMgQgN3e" }, "outputs": [], "source": [ "problem_statement = vz.ProblemStatement()\n", "problem_statement.metadata['dataset'] = 'cifar10'\n", "problem_statement.metadata['architecture'] = 'resnet_18'" ] }, { "cell_type": "markdown", "metadata": { "id": "qJzLE6KigZMP" }, "source": [ "## Trial Metadata\n", "`TrialSuggestion` and subclass `Trial` also contain a `metadata` attribute. This in contrast, should be used to store metadata related to the specific Trial.\n", "\n", "In the image classification case, examples would be the type of GPU used for training and if the training worker has been preempted." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SbAynY0nizB1" }, "outputs": [], "source": [ "trial = vz.Trial()\n", "trial.metadata['gpu_used'] = 'P100'\n", "trial.metadata['preempted'] = 'True'" ] }, { "cell_type": "markdown", "metadata": { "id": "SgIVHrzlaoVY" }, "source": [ "## OSS Vizier as a backend via `Metadata`\n", "As an advanced developer use case, one may extend OSS Vizier's search space capabilities using `Metadata`. Custom algorithms can provide full freedom in expressing more complex search spaces (e.g. graphs) using `Metadata`.\n", "\n", "Example use cases:\n", "\n", "* Combinatorial optimization, where the search space may consist of graphs or multiple selection (e.g. ${N \\choose K}$) primitives. Algorithms commonly include evolutionary methods, which also require custom mutation operations.\n", "* Free-form textual data used for suggestions (and maybe even evaluation metrics!), as common with language-based applications." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QomOE_0Faqm3" }, "outputs": [], "source": [ "# Setup combinatorial search space.\n", "choose_problem = vz.ProblemStatement()\n", "choose_problem.metadata = vz.Metadata({'N': '10', 'K': '3'})\n", "\n", "# Example of a suggestion proposed by a custom algorithm.\n", "suggestion = vz.TrialSuggestion()\n", "suggestion.metadata['chosen_indices'] = '[0, 3, 7]'" ] }, { "cell_type": "markdown", "metadata": { "id": "G6utMx_xPQK8" }, "source": [ "The algorithm behavior can even be changed mid-optimization with `Metadata` using a client! This is in fact used extensively in our integrations with [PyGlove](https://github.com/google/pyglove) to allow a running Pythia policy to change search spaces or mutations online." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1QAyB_nzPW-0" }, "outputs": [], "source": [ "# Original mutation rate.\n", "mutation_problem = vz.ProblemStatement()\n", "mutation_problem.metadata = vz.Metadata({'mutation_rate': '0.1'})\n", "\n", "# ...\n", "# Assume algorithm started running in the Pythia service.\n", "# ...\n", "\n", "# Set new mutation rate.\n", "study_metadata = vz.Metadata({'mutation_rate': '0.2'})\n", "\n", "# Prevent this trial from being used in the population.\n", "trial_metadata = vz.Metadata({'use_in_population' = 'False'})\n", "trial_id = 1\n", "\n", "# Create unit of metadata update.\n", "metadata_delta = vz.MetadataDelta(\n", " on_study=study_metadata, on_trials={trial_id: trial_metadata})" ] }, { "cell_type": "markdown", "metadata": { "id": "FmXsDxApMaBA" }, "source": [ "Once we have a client, we can commit the metadata update:\n", "\n", "```python\n", "client.update_metadata(metadata_delta)\n", "```" ] } ], "metadata": { "colab": { "name": "Metadata.ipynb", "private_outputs": true, "provenance": [] }, "gpuClass": "standard", "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }