Comet Logging and Visualization Integration (#9232)

* add comet to logger interface * add comet logger * add support for updated parameters * clean up offline logger creation * update callback args for comet logger * add comet optimizer * add optimizer config * add comet README * update tutorial notebook with Comet section * add option to log class level metrics * add support for class level metrics and confusion matrix * handle errors when adding files to artifacts * fix typo * clean resume workflow * updates for HPO * update comet README * fix typo in comet README * update code snippets in comet README * update comet links in tutorial * updated links * change optimizer batch size param and update comet README image * update comet section in tutorial * use prexisting cmd line flags to configure logger * update artifact upload/download flow * remove come remove comet logger specific cmd line args * move downloading weights into comet logger code * remove extra argparse * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change checkpoint logging flow to follow offline logger * update resume flow * add comet logger to remote dataset property * update cmd line args in hpo * set types for integer/float env variables * update README * fix typo in README * default to always logging model predictions * Update tutorial.ipynb * Update train.py * Add Comet to Integrations table * Update README.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ciCo-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Comet Logging and Visualization Integration (#9232)
903b239f · Dhruv Nair · GitHub · 5a134e06 · 903b239f · 903b239f
--- a/README.md
+++ b/README.md
@@ -160,46 +160,31 @@ python train.py --data coco.yaml --cfg yolov5n.yaml --weights '' --batch-size 12

 </details>

-## <div align="center">Environments</div>
-
-Get started in seconds with our verified environments. Click each icon below for details.
-
-<div align="center">
-  <a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb">
-    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-colab-small.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
-  <a href="https://www.kaggle.com/ultralytics/yolov5">
-    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-kaggle-small.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
-  <a href="https://hub.docker.com/r/ultralytics/yolov5">
-    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-docker-small.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
-  <a href="https://github.com/ultralytics/yolov5/wiki/AWS-Quickstart">
-    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-aws-small.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
-  <a href="https://github.com/ultralytics/yolov5/wiki/GCP-Quickstart">
-    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-gcp-small.png" width="10%" /></a>
-</div>

 ## <div align="center">Integrations</div>

+<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/image-integrations-loop.png" width="100%" />
+
 <div align="center">
+  <a href="https://bit.ly/yolov5-deci-platform">
+    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-comet.png" width="10%" /></a>
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
  <a href="https://bit.ly/yolov5-deci-platform">
    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-deci.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="14%" height="0" alt="" />
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
  <a href="https://cutt.ly/yolov5-readme-clearml">
    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-clearml.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="14%" height="0" alt="" />
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
  <a href="https://roboflow.com/?ref=ultralytics">
    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow.png" width="10%" /></a>
-  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="14%" height="0" alt="" />
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
  <a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb.png" width="10%" /></a>
 </div>

-|Deci ⭐ NEW|ClearML ⭐ NEW|Roboflow|Weights & Biases
-|:-:|:-:|:-:|:-:|
-|Automatically compile and quantize YOLOv5 for better inference performance in one click at [Deci](https://bit.ly/yolov5-deci-platform)|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)
+|Comet ⭐ NEW|Deci ⭐ NEW|ClearML ⭐ NEW|Roboflow|Weights & Biases
+|:-:|:-:|:-:|:-:|:-:|
+|Visualize model metrics and predictions and upload models and datasets in realtime with [Comet](https://www.comet.com/site/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration)|Automatically compile and quantize YOLOv5 for better inference performance in one click at [Deci](https://bit.ly/yolov5-deci-platform)|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)


 ## <div align="center">Why YOLOv5</div>
@@ -323,6 +308,28 @@ python export.py --weights yolov5s-cls.pt resnet50.pt efficientnet_b0.pt --inclu
 </details>


+## <div align="center">Environments</div>
+
+Get started in seconds with our verified environments. Click each icon below for details.
+
+<div align="center">
+  <a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb">
+    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-colab-small.png" width="10%" /></a>
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
+  <a href="https://www.kaggle.com/ultralytics/yolov5">
+    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-kaggle-small.png" width="10%" /></a>
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
+  <a href="https://hub.docker.com/r/ultralytics/yolov5">
+    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-docker-small.png" width="10%" /></a>
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
+  <a href="https://github.com/ultralytics/yolov5/wiki/AWS-Quickstart">
+    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-aws-small.png" width="10%" /></a>
+  <img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
+  <a href="https://github.com/ultralytics/yolov5/wiki/GCP-Quickstart">
+    <img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-gcp-small.png" width="10%" /></a>
+</div>
+
+
 ## <div align="center">Contribute</div>

 We love your input! We want to make contributing to YOLOv5 as easy and transparent as possible. Please see our [Contributing Guide](CONTRIBUTING.md) to get started, and fill out the [YOLOv5 Survey](https://ultralytics.com/survey?utm_source=github&utm_medium=social&utm_campaign=Survey) to send us feedback on your experiences. Thank you to all our contributors!

--- a/train.py
+++ b/train.py
@@ -52,6 +52,7 @@ from utils.general import (LOGGER, check_amp, check_dataset, check_file, check_g
                           init_seeds, intersect_dicts, labels_to_class_weights, labels_to_image_weights, methods,
                           one_cycle, print_args, print_mutation, strip_optimizer, yaml_save)
 from utils.loggers import Loggers
+from utils.loggers.comet.comet_utils import check_comet_resume
 from utils.loggers.wandb.wandb_utils import check_wandb_resume
 from utils.loss import ComputeLoss
 from utils.metrics import fitness
@@ -330,7 +331,7 @@ def train(hyp, opt, device, callbacks):  # hyp is path/to/hyp.yaml or hyp dictio
                mem = f'{torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0:.3g}G'  # (GB)
                pbar.set_description(('%11s' * 2 + '%11.4g' * 5) %
                                     (f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1]))
-                callbacks.run('on_train_batch_end', model, ni, imgs, targets, paths)
+                callbacks.run('on_train_batch_end', model, ni, imgs, targets, paths, list(mloss))
                if callbacks.stop_training:
                    return
            # end batch ------------------------------------------------------------------------------------------------
@@ -465,11 +466,11 @@ def parse_opt(known=False):
    parser.add_argument('--seed', type=int, default=0, help='Global training seed')
    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')

-    # Weights & Biases arguments
-    parser.add_argument('--entity', default=None, help='W&B: Entity')
-    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
-    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
-    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')
+    # Logger arguments
+    parser.add_argument('--entity', default=None, help='Entity')
+    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='Upload data, "val" option')
+    parser.add_argument('--bbox_interval', type=int, default=-1, help='Set bounding-box image logging interval')
+    parser.add_argument('--artifact_alias', type=str, default='latest', help='Version of dataset artifact to use')

    return parser.parse_known_args()[0] if known else parser.parse_args()

@@ -481,8 +482,8 @@ def main(opt, callbacks=Callbacks()):
        check_git_status()
        check_requirements()

-    # Resume
-    if opt.resume and not (check_wandb_resume(opt) or opt.evolve):  # resume from specified or most recent last.pt
+    # Resume (from specified or most recent last.pt)
+    if opt.resume and not check_wandb_resume(opt) and not check_comet_resume(opt) or opt.evolve:
        last = Path(check_file(opt.resume) if isinstance(opt.resume, str) else get_latest_run())
        opt_yaml = last.parent.parent / 'opt.yaml'  # train options yaml
        opt_data = opt.data  # original dataset

--- a/tutorial.ipynb
+++ b/tutorial.ipynb
@@ -413,7 +413,7 @@
        "import utils\n",
        "display = utils.notebook_init()  # checks"
      ],
-      "execution_count": 1,
+      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
@@ -465,7 +465,7 @@
        "!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images\n",
        "# display.Image(filename='runs/detect/exp/zidane.jpg', width=600)"
      ],
-      "execution_count": 2,
+      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
@@ -535,7 +535,7 @@
        "torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip')  # download (780M - 5000 images)\n",
        "!unzip -q tmp.zip -d ../datasets && rm tmp.zip  # unzip"
      ],
-      "execution_count": 3,
+      "execution_count": null,
      "outputs": [
        {
          "output_type": "display_data",
@@ -566,7 +566,7 @@
        "# Validate YOLOv5s on COCO val\n",
        "!python val.py --weights yolov5s.pt --data coco.yaml --img 640 --half"
      ],
-      "execution_count": 4,
+      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
@@ -653,11 +653,14 @@
      "cell_type": "code",
      "source": [
        "#@title Select YOLOv5 🚀 logger {run: 'auto'}\n",
-        "logger = 'TensorBoard' #@param ['TensorBoard', 'ClearML', 'W&B']\n",
+        "logger = 'TensorBoard' #@param ['TensorBoard', 'Comet', 'ClearML', 'W&B']\n",
        "\n",
        "if logger == 'TensorBoard':\n",
        "  %load_ext tensorboard\n",
        "  %tensorboard --logdir runs/train\n",
+        "elif logger == 'Comet':\n",
+        "  %pip install -q comet_ml\n",
+        "  import comet_ml; comet_ml.init()\n",
        "elif logger == 'ClearML':\n",
        "  %pip install -q clearml && clearml-init\n",
        "elif logger == 'W&B':\n",
@@ -683,7 +686,7 @@
        "# Train YOLOv5s on COCO128 for 3 epochs\n",
        "!python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --cache"
      ],
-      "execution_count": 5,
+      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
@@ -857,6 +860,28 @@
        "# 4. Visualize"
      ]
    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Comet Logging and Visualization 🌟 NEW\n",
+        "[Comet](https://www.comet.com/site/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration) is now fully integrated with YOLOv5. Track and visualize model metrics in real time, save your hyperparameters, datasets, and model checkpoints, and visualize your model predictions with [Comet Custom Panels](https://www.comet.com/docs/v2/guides/comet-dashboard/code-panels/about-panels/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration)! Comet makes sure you never lose track of your work and makes it easy to share results and collaborate across teams of all sizes! \n",
+        "\n",
+        "Getting started is easy:\n",
+        "```shell\n",
+        "pip install comet_ml  # 1. install\n",
+        "export COMET_API_KEY=<Your API Key>  # 2. paste API key\n",
+        "python train.py --img 640 --epochs 3 --data coco128.yaml --weights yolov5s.pt  # 3. train\n",
+        "```\n",
+        "\n",
+        "To learn more about all of the supported Comet features for this integration, check out the [Comet Tutorial](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/comet). If you'd like to learn more about Comet, head over to our [documentation](https://www.comet.com/docs/v2/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration). Get started by trying out the Comet Colab Notebook:\n",
+        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RG0WOQyxlDlo5Km8GogJpIEJlg_5lyYO?usp=sharing)\n",
+        "\n",
+        "<img width=\"1920\" alt=\"yolo-ui\" src=\"https://user-images.githubusercontent.com/7529846/187608607-ff89c3d5-1b8b-4743-a974-9275301b0524.png\">"
+      ],
+      "metadata": {
+        "id": "nWOsI5wJR1o3"
+      }
+    },
    {
      "cell_type": "markdown",
      "source": [
@@ -1096,4 +1121,4 @@
      "outputs": []
    }
  ]
-}
\ No newline at end of file
+}
--- a/utils/loggers/__init__.py
+++ b/utils/loggers/__init__.py
@@ -17,7 +17,7 @@ from utils.loggers.wandb.wandb_utils import WandbLogger
 from utils.plots import plot_images, plot_labels, plot_results
 from utils.torch_utils import de_parallel

-LOGGERS = ('csv', 'tb', 'wandb', 'clearml')  # *.csv, TensorBoard, Weights & Biases, ClearML
+LOGGERS = ('csv', 'tb', 'wandb', 'clearml', 'comet')  # *.csv, TensorBoard, Weights & Biases, ClearML
 RANK = int(os.getenv('RANK', -1))

 try:
@@ -41,6 +41,18 @@ try:
 except (ImportError, AssertionError):
    clearml = None

+try:
+    if RANK not in [0, -1]:
+        comet_ml = None
+    else:
+        import comet_ml
+
+        assert hasattr(comet_ml, '__version__')  # verify package import not local dir
+        from utils.loggers.comet import CometLogger
+
+except (ModuleNotFoundError, ImportError, AssertionError):
+    comet_ml = None
+

 class Loggers():
    # YOLOv5 Loggers class
@@ -80,7 +92,10 @@ class Loggers():
            prefix = colorstr('ClearML: ')
            s = f"{prefix}run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML"
            self.logger.info(s)
-
+        if not comet_ml:
+            prefix = colorstr('Comet: ')
+            s = f"{prefix}run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet"
+            self.logger.info(s)
        # TensorBoard
        s = self.save_dir
        if 'tb' in self.include and not self.opt.evolve:
@@ -107,6 +122,18 @@ class Loggers():
        else:
            self.clearml = None

+        # Comet
+        if comet_ml and 'comet' in self.include:
+            if isinstance(self.opt.resume, str) and self.opt.resume.startswith("comet://"):
+                run_id = self.opt.resume.split("/")[-1]
+                self.comet_logger = CometLogger(self.opt, self.hyp, run_id=run_id)
+
+            else:
+                self.comet_logger = CometLogger(self.opt, self.hyp)
+
+        else:
+            self.comet_logger = None
+
    @property
    def remote_dataset(self):
        # Get data_dict if custom dataset artifact link is provided
@@ -115,12 +142,18 @@ class Loggers():
            data_dict = self.clearml.data_dict
        if self.wandb:
            data_dict = self.wandb.data_dict
+        if self.comet_logger:
+            data_dict = self.comet_logger.data_dict

        return data_dict

    def on_train_start(self):
-        # Callback runs on train start
-        pass
+        if self.comet_logger:
+            self.comet_logger.on_train_start()
+
+    def on_pretrain_routine_start(self):
+        if self.comet_logger:
+            self.comet_logger.on_pretrain_routine_start()

    def on_pretrain_routine_end(self, labels, names):
        # Callback runs on pre-train routine end
@@ -131,8 +164,11 @@ class Loggers():
                self.wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in paths]})
            # if self.clearml:
            #    pass  # ClearML saves these images automatically using hooks
+            if self.comet_logger:
+                self.comet_logger.on_pretrain_routine_end(paths)

-    def on_train_batch_end(self, model, ni, imgs, targets, paths):
+    def on_train_batch_end(self, model, ni, imgs, targets, paths, vals):
+        log_dict = dict(zip(self.keys[0:3], vals))
        # Callback runs on train batch end
        # ni: number integrated batches (since train start)
        if self.plots:
@@ -148,11 +184,21 @@ class Loggers():
                if self.clearml:
                    self.clearml.log_debug_samples(files, title='Mosaics')

+        if self.comet_logger:
+            self.comet_logger.on_train_batch_end(log_dict, step=ni)
+
    def on_train_epoch_end(self, epoch):
        # Callback runs on train epoch end
        if self.wandb:
            self.wandb.current_epoch = epoch + 1

+        if self.comet_logger:
+            self.comet_logger.on_train_epoch_end(epoch)
+
+    def on_val_start(self):
+        if self.comet_logger:
+            self.comet_logger.on_val_start()
+
    def on_val_image_end(self, pred, predn, path, names, im):
        # Callback runs on val image end
        if self.wandb:
@@ -160,7 +206,11 @@ class Loggers():
        if self.clearml:
            self.clearml.log_image_with_boxes(path, pred, names, im)

-    def on_val_end(self):
+    def on_val_batch_end(self, batch_i, im, targets, paths, shapes, out):
+        if self.comet_logger:
+            self.comet_logger.on_val_batch_end(batch_i, im, targets, paths, shapes, out)
+
+    def on_val_end(self, nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix):
        # Callback runs on val end
        if self.wandb or self.clearml:
            files = sorted(self.save_dir.glob('val*.jpg'))
@@ -169,6 +219,9 @@ class Loggers():
            if self.clearml:
                self.clearml.log_debug_samples(files, title='Validation')

+        if self.comet_logger:
+            self.comet_logger.on_val_end(nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)
+
    def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
        # Callback runs at the end of each fit (train+val) epoch
        x = dict(zip(self.keys, vals))
@@ -199,6 +252,9 @@ class Loggers():
            self.clearml.current_epoch_logged_images = set()  # reset epoch image limit
            self.clearml.current_epoch += 1

+        if self.comet_logger:
+            self.comet_logger.on_fit_epoch_end(x, epoch=epoch)
+
    def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):
        # Callback runs on model save event
        if (epoch + 1) % self.opt.save_period == 0 and not final_epoch and self.opt.save_period != -1:
@@ -209,6 +265,9 @@ class Loggers():
                                                      model_name='Latest Model',
                                                      auto_delete_file=False)

+        if self.comet_logger:
+            self.comet_logger.on_model_save(last, epoch, final_epoch, best_fitness, fi)
+
    def on_train_end(self, last, best, epoch, results):
        # Callback runs on training end, i.e. saving best model
        if self.plots:
@@ -237,10 +296,16 @@ class Loggers():
                                                  name='Best Model',
                                                  auto_delete_file=False)

+        if self.comet_logger:
+            final_results = dict(zip(self.keys[3:10], results))
+            self.comet_logger.on_train_end(files, self.save_dir, last, best, epoch, final_results)
+
    def on_params_update(self, params: dict):
        # Update hyperparams or configs of the experiment
        if self.wandb:
            self.wandb.wandb_run.config.update(params, allow_val_change=True)
+        if self.comet_logger:
+            self.comet_logger.on_params_update(params)


 class GenericLogger:

--- a/utils/loggers/comet/README.md
+++ b/utils/loggers/comet/README.md
--- a/utils/loggers/comet/__init__.py
+++ b/utils/loggers/comet/__init__.py
--- a/utils/loggers/comet/comet_utils.py
+++ b/utils/loggers/comet/comet_utils.py
+import logging
+import os
+from urllib.parse import urlparse
+
+try:
+    import comet_ml
+except (ModuleNotFoundError, ImportError):
+    comet_ml = None
+
+import yaml
+
+logger = logging.getLogger(__name__)
+
+COMET_PREFIX = "comet://"
+COMET_MODEL_NAME = os.getenv("COMET_MODEL_NAME", "yolov5")
+COMET_DEFAULT_CHECKPOINT_FILENAME = os.getenv("COMET_DEFAULT_CHECKPOINT_FILENAME", "last.pt")
+
+
+def download_model_checkpoint(opt, experiment):
+    model_dir = f"{opt.project}/{experiment.name}"
+    os.makedirs(model_dir, exist_ok=True)
+
+    model_name = COMET_MODEL_NAME
+    model_asset_list = experiment.get_model_asset_list(model_name)
+
+    if len(model_asset_list) == 0:
+        logger.error(f"COMET ERROR: No checkpoints found for model name : {model_name}")
+        return
+
+    model_asset_list = sorted(
+        model_asset_list,
+        key=lambda x: x["step"],
+        reverse=True,
+    )
+    logged_checkpoint_map = {asset["fileName"]: asset["assetId"] for asset in model_asset_list}
+
+    resource_url = urlparse(opt.weights)
+    checkpoint_filename = resource_url.query
+
+    if checkpoint_filename:
+        asset_id = logged_checkpoint_map.get(checkpoint_filename)
+    else:
+        asset_id = logged_checkpoint_map.get(COMET_DEFAULT_CHECKPOINT_FILENAME)
+        checkpoint_filename = COMET_DEFAULT_CHECKPOINT_FILENAME
+
+    if asset_id is None:
+        logger.error(f"COMET ERROR: Checkpoint {checkpoint_filename} not found in the given Experiment")
+        return
+
+    try:
+        logger.info(f"COMET INFO: Downloading checkpoint {checkpoint_filename}")
+        asset_filename = checkpoint_filename
+
+        model_binary = experiment.get_asset(asset_id, return_type="binary", stream=False)
+        model_download_path = f"{model_dir}/{asset_filename}"
+        with open(model_download_path, "wb") as f:
+            f.write(model_binary)
+
+        opt.weights = model_download_path
+
+    except Exception as e:
+        logger.warning("COMET WARNING: Unable to download checkpoint from Comet")
+        logger.exception(e)
+
+
+def set_opt_parameters(opt, experiment):
+    """Update the opts Namespace with parameters
+    from Comet's ExistingExperiment when resuming a run
+
+    Args:
+        opt (argparse.Namespace): Namespace of command line options
+        experiment (comet_ml.APIExperiment): Comet API Experiment object
+    """
+    asset_list = experiment.get_asset_list()
+    resume_string = opt.resume
+
+    for asset in asset_list:
+        if asset["fileName"] == "opt.yaml":
+            asset_id = asset["assetId"]
+            asset_binary = experiment.get_asset(asset_id, return_type="binary", stream=False)
+            opt_dict = yaml.safe_load(asset_binary)
+            for key, value in opt_dict.items():
+                setattr(opt, key, value)
+            opt.resume = resume_string
+
+    # Save hyperparameters to YAML file
+    # Necessary to pass checks in training script
+    save_dir = f"{opt.project}/{experiment.name}"
+    os.makedirs(save_dir, exist_ok=True)
+
+    hyp_yaml_path = f"{save_dir}/hyp.yaml"
+    with open(hyp_yaml_path, "w") as f:
+        yaml.dump(opt.hyp, f)
+    opt.hyp = hyp_yaml_path
+
+
+def check_comet_weights(opt):
+    """Downloads model weights from Comet and updates the
+    weights path to point to saved weights location
+
+    Args:
+        opt (argparse.Namespace): Command Line arguments passed
+            to YOLOv5 training script
+
+    Returns:
+        None/bool: Return True if weights are successfully downloaded
+            else return None
+    """
+    if comet_ml is None:
+        return
+
+    if isinstance(opt.weights, str):
+        if opt.weights.startswith(COMET_PREFIX):
+            api = comet_ml.API()
+            resource = urlparse(opt.weights)
+            experiment_path = f"{resource.netloc}{resource.path}"
+            experiment = api.get(experiment_path)
+            download_model_checkpoint(opt, experiment)
+            return True
+
+    return None
+
+
+def check_comet_resume(opt):
+    """Restores run parameters to its original state based on the model checkpoint
+    and logged Experiment parameters.
+
+    Args:
+        opt (argparse.Namespace): Command Line arguments passed
+            to YOLOv5 training script
+
+    Returns:
+        None/bool: Return True if the run is restored successfully
+            else return None
+    """
+    if comet_ml is None:
+        return
+
+    if isinstance(opt.resume, str):
+        if opt.resume.startswith(COMET_PREFIX):
+            api = comet_ml.API()
+            resource = urlparse(opt.resume)
+            experiment_path = f"{resource.netloc}{resource.path}"
+            experiment = api.get(experiment_path)
+            set_opt_parameters(opt, experiment)
+            download_model_checkpoint(opt, experiment)
+
+            return True
+
+    return None
--- a/utils/loggers/comet/hpo.py
+++ b/utils/loggers/comet/hpo.py
+import argparse
+import json
+import logging
+import os
+import sys
+from pathlib import Path
+
+import comet_ml
+
+logger = logging.getLogger(__name__)
+
+FILE = Path(__file__).resolve()
+ROOT = FILE.parents[3]  # YOLOv5 root directory
+if str(ROOT) not in sys.path:
+    sys.path.append(str(ROOT))  # add ROOT to PATH
+
+from train import parse_opt, train
+from utils.callbacks import Callbacks
+from utils.general import increment_path
+from utils.torch_utils import select_device
+
+# Project Configuration
+config = comet_ml.config.get_config()
+COMET_PROJECT_NAME = config.get_string(os.getenv("COMET_PROJECT_NAME"), "comet.project_name", default="yolov5")
+
+
+def get_args(known=False):
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='initial weights path')
+    parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
+    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')
+    parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')
+    parser.add_argument('--epochs', type=int, default=300, help='total training epochs')
+    parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
+    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')
+    parser.add_argument('--rect', action='store_true', help='rectangular training')
+    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
+    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
+    parser.add_argument('--noval', action='store_true', help='only validate final epoch')
+    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
+    parser.add_argument('--noplots', action='store_true', help='save no plot files')
+    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
+    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
+    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in "ram" (default) or "disk"')
+    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
+    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
+    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
+    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
+    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')
+    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
+    parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')
+    parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')
+    parser.add_argument('--name', default='exp', help='save to project/name')
+    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
+    parser.add_argument('--quad', action='store_true', help='quad dataloader')
+    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
+    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
+    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
+    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
+    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
+    parser.add_argument('--seed', type=int, default=0, help='Global training seed')
+    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')
+
+    # Weights & Biases arguments
+    parser.add_argument('--entity', default=None, help='W&B: Entity')
+    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
+    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
+    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')
+
+    # Comet Arguments
+    parser.add_argument("--comet_optimizer_config", type=str, help="Comet: Path to a Comet Optimizer Config File.")
+    parser.add_argument("--comet_optimizer_id", type=str, help="Comet: ID of the Comet Optimizer sweep.")
+    parser.add_argument("--comet_optimizer_objective", type=str, help="Comet: Set to 'minimize' or 'maximize'.")
+    parser.add_argument("--comet_optimizer_metric", type=str, help="Comet: Metric to Optimize.")
+    parser.add_argument("--comet_optimizer_workers",
+                        type=int,
+                        default=1,
+                        help="Comet: Number of Parallel Workers to use with the Comet Optimizer.")
+
+    return parser.parse_known_args()[0] if known else parser.parse_args()
+
+
+def run(parameters, opt):
+    hyp_dict = {k: v for k, v in parameters.items() if k not in ["epochs", "batch_size"]}
+
+    opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok or opt.evolve))
+    opt.batch_size = parameters.get("batch_size")
+    opt.epochs = parameters.get("epochs")
+
+    device = select_device(opt.device, batch_size=opt.batch_size)
+    train(hyp_dict, opt, device, callbacks=Callbacks())
+
+
+if __name__ == "__main__":
+    opt = get_args(known=True)
+
+    opt.weights = str(opt.weights)
+    opt.cfg = str(opt.cfg)
+    opt.data = str(opt.data)
+    opt.project = str(opt.project)
+
+    optimizer_id = os.getenv("COMET_OPTIMIZER_ID")
+    if optimizer_id is None:
+        with open(opt.comet_optimizer_config) as f:
+            optimizer_config = json.load(f)
+        optimizer = comet_ml.Optimizer(optimizer_config)
+    else:
+        optimizer = comet_ml.Optimizer(optimizer_id)
+
+    opt.comet_optimizer_id = optimizer.id
+    status = optimizer.status()
+
+    opt.comet_optimizer_objective = status["spec"]["objective"]
+    opt.comet_optimizer_metric = status["spec"]["metric"]
+
+    logger.info("COMET INFO: Starting Hyperparameter Sweep")
+    for parameter in optimizer.get_parameters():
+        run(parameter["parameters"], opt)
--- a/utils/loggers/comet/optimizer_config.json
+++ b/utils/loggers/comet/optimizer_config.json
+{
+  "algorithm": "random",
+  "parameters": {
+    "anchor_t": {
+      "type": "discrete",
+      "values": [
+        2,
+        8
+      ]
+    },
+    "batch_size": {
+      "type": "discrete",
+      "values": [
+        16,
+        32,
+        64
+      ]
+    },
+    "box": {
+      "type": "discrete",
+      "values": [
+        0.02,
+        0.2
+      ]
+    },
+    "cls": {
+      "type": "discrete",
+      "values": [
+        0.2
+      ]
+    },
+    "cls_pw": {
+      "type": "discrete",
+      "values": [
+        0.5
+      ]
+    },
+    "copy_paste": {
+      "type": "discrete",
+      "values": [
+        1
+      ]
+    },
+    "degrees": {
+      "type": "discrete",
+      "values": [
+        0,
+        45
+      ]
+    },
+    "epochs": {
+      "type": "discrete",
+      "values": [
+        5
+      ]
+    },
+    "fl_gamma": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "fliplr": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "flipud": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "hsv_h": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "hsv_s": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "hsv_v": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "iou_t": {
+      "type": "discrete",
+      "values": [
+        0.7
+      ]
+    },
+    "lr0": {
+      "type": "discrete",
+      "values": [
+        1e-05,
+        0.1
+      ]
+    },
+    "lrf": {
+      "type": "discrete",
+      "values": [
+        0.01,
+        1
+      ]
+    },
+    "mixup": {
+      "type": "discrete",
+      "values": [
+        1
+      ]
+    },
+    "momentum": {
+      "type": "discrete",
+      "values": [
+        0.6
+      ]
+    },
+    "mosaic": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "obj": {
+      "type": "discrete",
+      "values": [
+        0.2
+      ]
+    },
+    "obj_pw": {
+      "type": "discrete",
+      "values": [
+        0.5
+      ]
+    },
+    "optimizer": {
+      "type": "categorical",
+      "values": [
+        "SGD",
+        "Adam",
+        "AdamW"
+      ]
+    },
+    "perspective": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "scale": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "shear": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "translate": {
+      "type": "discrete",
+      "values": [
+        0
+      ]
+    },
+    "warmup_bias_lr": {
+      "type": "discrete",
+      "values": [
+        0,
+        0.2
+      ]
+    },
+    "warmup_epochs": {
+      "type": "discrete",
+      "values": [
+        5
+      ]
+    },
+    "warmup_momentum": {
+      "type": "discrete",
+      "values": [
+        0,
+        0.95
+      ]
+    },
+    "weight_decay": {
+      "type": "discrete",
+      "values": [
+        0,
+        0.001
+      ]
+    }
+  },
+  "spec": {
+    "maxCombo": 0,
+    "metric": "metrics/mAP_0.5",
+    "objective": "maximize"
+  },
+  "trials": 1
+}
--- a/val.py
+++ b/val.py
@@ -259,7 +259,7 @@ def run(
            plot_images(im, targets, paths, save_dir / f'val_batch{batch_i}_labels.jpg', names)  # labels
            plot_images(im, output_to_target(out), paths, save_dir / f'val_batch{batch_i}_pred.jpg', names)  # pred

-        callbacks.run('on_val_batch_end')
+        callbacks.run('on_val_batch_end', batch_i, im, targets, paths, shapes, out)

    # Compute metrics
    stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]  # to numpy
@@ -289,7 +289,7 @@ def run(
    # Plots
    if plots:
        confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))
-        callbacks.run('on_val_end')
+        callbacks.run('on_val_end', nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)

    # Save JSON
    if save_json and len(jdict):