Unverified 提交 903b239f authored 作者: Dhruv Nair's avatar Dhruv Nair 提交者: GitHub

Comet Logging and Visualization Integration (#9232)

* add comet to logger interface * add comet logger * add support for updated parameters * clean up offline logger creation * update callback args for comet logger * add comet optimizer * add optimizer config * add comet README * update tutorial notebook with Comet section * add option to log class level metrics * add support for class level metrics and confusion matrix * handle errors when adding files to artifacts * fix typo * clean resume workflow * updates for HPO * update comet README * fix typo in comet README * update code snippets in comet README * update comet links in tutorial * updated links * change optimizer batch size param and update comet README image * update comet section in tutorial * use prexisting cmd line flags to configure logger * update artifact upload/download flow * remove come remove comet logger specific cmd line args * move downloading weights into comet logger code * remove extra argparse * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change checkpoint logging flow to follow offline logger * update resume flow * add comet logger to remote dataset property * update cmd line args in hpo * set types for integer/float env variables * update README * fix typo in README * default to always logging model predictions * Update tutorial.ipynb * Update train.py * Add Comet to Integrations table * Update README.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ciCo-authored-by: 's avatarAyush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: 's avatarGlenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: 's avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
上级 5a134e06
......@@ -160,46 +160,31 @@ python train.py --data coco.yaml --cfg yolov5n.yaml --weights '' --batch-size 12
</details>
## <div align="center">Environments</div>
Get started in seconds with our verified environments. Click each icon below for details.
<div align="center">
<a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-colab-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://www.kaggle.com/ultralytics/yolov5">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-kaggle-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://hub.docker.com/r/ultralytics/yolov5">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-docker-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://github.com/ultralytics/yolov5/wiki/AWS-Quickstart">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-aws-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://github.com/ultralytics/yolov5/wiki/GCP-Quickstart">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-gcp-small.png" width="10%" /></a>
</div>
## <div align="center">Integrations</div>
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/image-integrations-loop.png" width="100%" />
<div align="center">
<a href="https://bit.ly/yolov5-deci-platform">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-comet.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
<a href="https://bit.ly/yolov5-deci-platform">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-deci.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="14%" height="0" alt="" />
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
<a href="https://cutt.ly/yolov5-readme-clearml">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-clearml.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="14%" height="0" alt="" />
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
<a href="https://roboflow.com/?ref=ultralytics">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-roboflow.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="14%" height="0" alt="" />
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="9%" height="0" alt="" />
<a href="https://wandb.ai/site?utm_campaign=repo_yolo_readme">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-wb.png" width="10%" /></a>
</div>
|Deci ⭐ NEW|ClearML ⭐ NEW|Roboflow|Weights & Biases
|:-:|:-:|:-:|:-:|
|Automatically compile and quantize YOLOv5 for better inference performance in one click at [Deci](https://bit.ly/yolov5-deci-platform)|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)
|Comet ⭐ NEW|Deci ⭐ NEW|ClearML ⭐ NEW|Roboflow|Weights & Biases
|:-:|:-:|:-:|:-:|:-:|
|Visualize model metrics and predictions and upload models and datasets in realtime with [Comet](https://www.comet.com/site/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration)|Automatically compile and quantize YOLOv5 for better inference performance in one click at [Deci](https://bit.ly/yolov5-deci-platform)|Automatically track, visualize and even remotely train YOLOv5 using [ClearML](https://cutt.ly/yolov5-readme-clearml) (open-source!)|Label and export your custom datasets directly to YOLOv5 for training with [Roboflow](https://roboflow.com/?ref=ultralytics) |Automatically track and visualize all your YOLOv5 training runs in the cloud with [Weights & Biases](https://wandb.ai/site?utm_campaign=repo_yolo_readme)
## <div align="center">Why YOLOv5</div>
......@@ -323,6 +308,28 @@ python export.py --weights yolov5s-cls.pt resnet50.pt efficientnet_b0.pt --inclu
</details>
## <div align="center">Environments</div>
Get started in seconds with our verified environments. Click each icon below for details.
<div align="center">
<a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-colab-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://www.kaggle.com/ultralytics/yolov5">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-kaggle-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://hub.docker.com/r/ultralytics/yolov5">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-docker-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://github.com/ultralytics/yolov5/wiki/AWS-Quickstart">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-aws-small.png" width="10%" /></a>
<img src="https://github.com/ultralytics/assets/raw/master/social/logo-transparent.png" width="5%" alt="" />
<a href="https://github.com/ultralytics/yolov5/wiki/GCP-Quickstart">
<img src="https://github.com/ultralytics/yolov5/releases/download/v1.0/logo-gcp-small.png" width="10%" /></a>
</div>
## <div align="center">Contribute</div>
We love your input! We want to make contributing to YOLOv5 as easy and transparent as possible. Please see our [Contributing Guide](CONTRIBUTING.md) to get started, and fill out the [YOLOv5 Survey](https://ultralytics.com/survey?utm_source=github&utm_medium=social&utm_campaign=Survey) to send us feedback on your experiences. Thank you to all our contributors!
......
......@@ -52,6 +52,7 @@ from utils.general import (LOGGER, check_amp, check_dataset, check_file, check_g
init_seeds, intersect_dicts, labels_to_class_weights, labels_to_image_weights, methods,
one_cycle, print_args, print_mutation, strip_optimizer, yaml_save)
from utils.loggers import Loggers
from utils.loggers.comet.comet_utils import check_comet_resume
from utils.loggers.wandb.wandb_utils import check_wandb_resume
from utils.loss import ComputeLoss
from utils.metrics import fitness
......@@ -330,7 +331,7 @@ def train(hyp, opt, device, callbacks): # hyp is path/to/hyp.yaml or hyp dictio
mem = f'{torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0:.3g}G' # (GB)
pbar.set_description(('%11s' * 2 + '%11.4g' * 5) %
(f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1]))
callbacks.run('on_train_batch_end', model, ni, imgs, targets, paths)
callbacks.run('on_train_batch_end', model, ni, imgs, targets, paths, list(mloss))
if callbacks.stop_training:
return
# end batch ------------------------------------------------------------------------------------------------
......@@ -465,11 +466,11 @@ def parse_opt(known=False):
parser.add_argument('--seed', type=int, default=0, help='Global training seed')
parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')
# Weights & Biases arguments
parser.add_argument('--entity', default=None, help='W&B: Entity')
parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')
# Logger arguments
parser.add_argument('--entity', default=None, help='Entity')
parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='Upload data, "val" option')
parser.add_argument('--bbox_interval', type=int, default=-1, help='Set bounding-box image logging interval')
parser.add_argument('--artifact_alias', type=str, default='latest', help='Version of dataset artifact to use')
return parser.parse_known_args()[0] if known else parser.parse_args()
......@@ -481,8 +482,8 @@ def main(opt, callbacks=Callbacks()):
check_git_status()
check_requirements()
# Resume
if opt.resume and not (check_wandb_resume(opt) or opt.evolve): # resume from specified or most recent last.pt
# Resume (from specified or most recent last.pt)
if opt.resume and not check_wandb_resume(opt) and not check_comet_resume(opt) or opt.evolve:
last = Path(check_file(opt.resume) if isinstance(opt.resume, str) else get_latest_run())
opt_yaml = last.parent.parent / 'opt.yaml' # train options yaml
opt_data = opt.data # original dataset
......
......@@ -413,7 +413,7 @@
"import utils\n",
"display = utils.notebook_init() # checks"
],
"execution_count": 1,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
......@@ -465,7 +465,7 @@
"!python detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images\n",
"# display.Image(filename='runs/detect/exp/zidane.jpg', width=600)"
],
"execution_count": 2,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
......@@ -535,7 +535,7 @@
"torch.hub.download_url_to_file('https://ultralytics.com/assets/coco2017val.zip', 'tmp.zip') # download (780M - 5000 images)\n",
"!unzip -q tmp.zip -d ../datasets && rm tmp.zip # unzip"
],
"execution_count": 3,
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
......@@ -566,7 +566,7 @@
"# Validate YOLOv5s on COCO val\n",
"!python val.py --weights yolov5s.pt --data coco.yaml --img 640 --half"
],
"execution_count": 4,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
......@@ -653,11 +653,14 @@
"cell_type": "code",
"source": [
"#@title Select YOLOv5 🚀 logger {run: 'auto'}\n",
"logger = 'TensorBoard' #@param ['TensorBoard', 'ClearML', 'W&B']\n",
"logger = 'TensorBoard' #@param ['TensorBoard', 'Comet', 'ClearML', 'W&B']\n",
"\n",
"if logger == 'TensorBoard':\n",
" %load_ext tensorboard\n",
" %tensorboard --logdir runs/train\n",
"elif logger == 'Comet':\n",
" %pip install -q comet_ml\n",
" import comet_ml; comet_ml.init()\n",
"elif logger == 'ClearML':\n",
" %pip install -q clearml && clearml-init\n",
"elif logger == 'W&B':\n",
......@@ -683,7 +686,7 @@
"# Train YOLOv5s on COCO128 for 3 epochs\n",
"!python train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --cache"
],
"execution_count": 5,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
......@@ -857,6 +860,28 @@
"# 4. Visualize"
]
},
{
"cell_type": "markdown",
"source": [
"## Comet Logging and Visualization 🌟 NEW\n",
"[Comet](https://www.comet.com/site/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration) is now fully integrated with YOLOv5. Track and visualize model metrics in real time, save your hyperparameters, datasets, and model checkpoints, and visualize your model predictions with [Comet Custom Panels](https://www.comet.com/docs/v2/guides/comet-dashboard/code-panels/about-panels/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration)! Comet makes sure you never lose track of your work and makes it easy to share results and collaborate across teams of all sizes! \n",
"\n",
"Getting started is easy:\n",
"```shell\n",
"pip install comet_ml # 1. install\n",
"export COMET_API_KEY=<Your API Key> # 2. paste API key\n",
"python train.py --img 640 --epochs 3 --data coco128.yaml --weights yolov5s.pt # 3. train\n",
"```\n",
"\n",
"To learn more about all of the supported Comet features for this integration, check out the [Comet Tutorial](https://github.com/ultralytics/yolov5/tree/master/utils/loggers/comet). If you'd like to learn more about Comet, head over to our [documentation](https://www.comet.com/docs/v2/?ref=yolov5&utm_source=yolov5&utm_medium=affilliate&utm_campaign=yolov5_comet_integration). Get started by trying out the Comet Colab Notebook:\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RG0WOQyxlDlo5Km8GogJpIEJlg_5lyYO?usp=sharing)\n",
"\n",
"<img width=\"1920\" alt=\"yolo-ui\" src=\"https://user-images.githubusercontent.com/7529846/187608607-ff89c3d5-1b8b-4743-a974-9275301b0524.png\">"
],
"metadata": {
"id": "nWOsI5wJR1o3"
}
},
{
"cell_type": "markdown",
"source": [
......@@ -1096,4 +1121,4 @@
"outputs": []
}
]
}
\ No newline at end of file
}
......@@ -17,7 +17,7 @@ from utils.loggers.wandb.wandb_utils import WandbLogger
from utils.plots import plot_images, plot_labels, plot_results
from utils.torch_utils import de_parallel
LOGGERS = ('csv', 'tb', 'wandb', 'clearml') # *.csv, TensorBoard, Weights & Biases, ClearML
LOGGERS = ('csv', 'tb', 'wandb', 'clearml', 'comet') # *.csv, TensorBoard, Weights & Biases, ClearML
RANK = int(os.getenv('RANK', -1))
try:
......@@ -41,6 +41,18 @@ try:
except (ImportError, AssertionError):
clearml = None
try:
if RANK not in [0, -1]:
comet_ml = None
else:
import comet_ml
assert hasattr(comet_ml, '__version__') # verify package import not local dir
from utils.loggers.comet import CometLogger
except (ModuleNotFoundError, ImportError, AssertionError):
comet_ml = None
class Loggers():
# YOLOv5 Loggers class
......@@ -80,7 +92,10 @@ class Loggers():
prefix = colorstr('ClearML: ')
s = f"{prefix}run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML"
self.logger.info(s)
if not comet_ml:
prefix = colorstr('Comet: ')
s = f"{prefix}run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet"
self.logger.info(s)
# TensorBoard
s = self.save_dir
if 'tb' in self.include and not self.opt.evolve:
......@@ -107,6 +122,18 @@ class Loggers():
else:
self.clearml = None
# Comet
if comet_ml and 'comet' in self.include:
if isinstance(self.opt.resume, str) and self.opt.resume.startswith("comet://"):
run_id = self.opt.resume.split("/")[-1]
self.comet_logger = CometLogger(self.opt, self.hyp, run_id=run_id)
else:
self.comet_logger = CometLogger(self.opt, self.hyp)
else:
self.comet_logger = None
@property
def remote_dataset(self):
# Get data_dict if custom dataset artifact link is provided
......@@ -115,12 +142,18 @@ class Loggers():
data_dict = self.clearml.data_dict
if self.wandb:
data_dict = self.wandb.data_dict
if self.comet_logger:
data_dict = self.comet_logger.data_dict
return data_dict
def on_train_start(self):
# Callback runs on train start
pass
if self.comet_logger:
self.comet_logger.on_train_start()
def on_pretrain_routine_start(self):
if self.comet_logger:
self.comet_logger.on_pretrain_routine_start()
def on_pretrain_routine_end(self, labels, names):
# Callback runs on pre-train routine end
......@@ -131,8 +164,11 @@ class Loggers():
self.wandb.log({"Labels": [wandb.Image(str(x), caption=x.name) for x in paths]})
# if self.clearml:
# pass # ClearML saves these images automatically using hooks
if self.comet_logger:
self.comet_logger.on_pretrain_routine_end(paths)
def on_train_batch_end(self, model, ni, imgs, targets, paths):
def on_train_batch_end(self, model, ni, imgs, targets, paths, vals):
log_dict = dict(zip(self.keys[0:3], vals))
# Callback runs on train batch end
# ni: number integrated batches (since train start)
if self.plots:
......@@ -148,11 +184,21 @@ class Loggers():
if self.clearml:
self.clearml.log_debug_samples(files, title='Mosaics')
if self.comet_logger:
self.comet_logger.on_train_batch_end(log_dict, step=ni)
def on_train_epoch_end(self, epoch):
# Callback runs on train epoch end
if self.wandb:
self.wandb.current_epoch = epoch + 1
if self.comet_logger:
self.comet_logger.on_train_epoch_end(epoch)
def on_val_start(self):
if self.comet_logger:
self.comet_logger.on_val_start()
def on_val_image_end(self, pred, predn, path, names, im):
# Callback runs on val image end
if self.wandb:
......@@ -160,7 +206,11 @@ class Loggers():
if self.clearml:
self.clearml.log_image_with_boxes(path, pred, names, im)
def on_val_end(self):
def on_val_batch_end(self, batch_i, im, targets, paths, shapes, out):
if self.comet_logger:
self.comet_logger.on_val_batch_end(batch_i, im, targets, paths, shapes, out)
def on_val_end(self, nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix):
# Callback runs on val end
if self.wandb or self.clearml:
files = sorted(self.save_dir.glob('val*.jpg'))
......@@ -169,6 +219,9 @@ class Loggers():
if self.clearml:
self.clearml.log_debug_samples(files, title='Validation')
if self.comet_logger:
self.comet_logger.on_val_end(nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)
def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):
# Callback runs at the end of each fit (train+val) epoch
x = dict(zip(self.keys, vals))
......@@ -199,6 +252,9 @@ class Loggers():
self.clearml.current_epoch_logged_images = set() # reset epoch image limit
self.clearml.current_epoch += 1
if self.comet_logger:
self.comet_logger.on_fit_epoch_end(x, epoch=epoch)
def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):
# Callback runs on model save event
if (epoch + 1) % self.opt.save_period == 0 and not final_epoch and self.opt.save_period != -1:
......@@ -209,6 +265,9 @@ class Loggers():
model_name='Latest Model',
auto_delete_file=False)
if self.comet_logger:
self.comet_logger.on_model_save(last, epoch, final_epoch, best_fitness, fi)
def on_train_end(self, last, best, epoch, results):
# Callback runs on training end, i.e. saving best model
if self.plots:
......@@ -237,10 +296,16 @@ class Loggers():
name='Best Model',
auto_delete_file=False)
if self.comet_logger:
final_results = dict(zip(self.keys[3:10], results))
self.comet_logger.on_train_end(files, self.save_dir, last, best, epoch, final_results)
def on_params_update(self, params: dict):
# Update hyperparams or configs of the experiment
if self.wandb:
self.wandb.wandb_run.config.update(params, allow_val_change=True)
if self.comet_logger:
self.comet_logger.on_params_update(params)
class GenericLogger:
......
差异被折叠。
差异被折叠。
import logging
import os
from urllib.parse import urlparse
try:
import comet_ml
except (ModuleNotFoundError, ImportError):
comet_ml = None
import yaml
logger = logging.getLogger(__name__)
COMET_PREFIX = "comet://"
COMET_MODEL_NAME = os.getenv("COMET_MODEL_NAME", "yolov5")
COMET_DEFAULT_CHECKPOINT_FILENAME = os.getenv("COMET_DEFAULT_CHECKPOINT_FILENAME", "last.pt")
def download_model_checkpoint(opt, experiment):
model_dir = f"{opt.project}/{experiment.name}"
os.makedirs(model_dir, exist_ok=True)
model_name = COMET_MODEL_NAME
model_asset_list = experiment.get_model_asset_list(model_name)
if len(model_asset_list) == 0:
logger.error(f"COMET ERROR: No checkpoints found for model name : {model_name}")
return
model_asset_list = sorted(
model_asset_list,
key=lambda x: x["step"],
reverse=True,
)
logged_checkpoint_map = {asset["fileName"]: asset["assetId"] for asset in model_asset_list}
resource_url = urlparse(opt.weights)
checkpoint_filename = resource_url.query
if checkpoint_filename:
asset_id = logged_checkpoint_map.get(checkpoint_filename)
else:
asset_id = logged_checkpoint_map.get(COMET_DEFAULT_CHECKPOINT_FILENAME)
checkpoint_filename = COMET_DEFAULT_CHECKPOINT_FILENAME
if asset_id is None:
logger.error(f"COMET ERROR: Checkpoint {checkpoint_filename} not found in the given Experiment")
return
try:
logger.info(f"COMET INFO: Downloading checkpoint {checkpoint_filename}")
asset_filename = checkpoint_filename
model_binary = experiment.get_asset(asset_id, return_type="binary", stream=False)
model_download_path = f"{model_dir}/{asset_filename}"
with open(model_download_path, "wb") as f:
f.write(model_binary)
opt.weights = model_download_path
except Exception as e:
logger.warning("COMET WARNING: Unable to download checkpoint from Comet")
logger.exception(e)
def set_opt_parameters(opt, experiment):
"""Update the opts Namespace with parameters
from Comet's ExistingExperiment when resuming a run
Args:
opt (argparse.Namespace): Namespace of command line options
experiment (comet_ml.APIExperiment): Comet API Experiment object
"""
asset_list = experiment.get_asset_list()
resume_string = opt.resume
for asset in asset_list:
if asset["fileName"] == "opt.yaml":
asset_id = asset["assetId"]
asset_binary = experiment.get_asset(asset_id, return_type="binary", stream=False)
opt_dict = yaml.safe_load(asset_binary)
for key, value in opt_dict.items():
setattr(opt, key, value)
opt.resume = resume_string
# Save hyperparameters to YAML file
# Necessary to pass checks in training script
save_dir = f"{opt.project}/{experiment.name}"
os.makedirs(save_dir, exist_ok=True)
hyp_yaml_path = f"{save_dir}/hyp.yaml"
with open(hyp_yaml_path, "w") as f:
yaml.dump(opt.hyp, f)
opt.hyp = hyp_yaml_path
def check_comet_weights(opt):
"""Downloads model weights from Comet and updates the
weights path to point to saved weights location
Args:
opt (argparse.Namespace): Command Line arguments passed
to YOLOv5 training script
Returns:
None/bool: Return True if weights are successfully downloaded
else return None
"""
if comet_ml is None:
return
if isinstance(opt.weights, str):
if opt.weights.startswith(COMET_PREFIX):
api = comet_ml.API()
resource = urlparse(opt.weights)
experiment_path = f"{resource.netloc}{resource.path}"
experiment = api.get(experiment_path)
download_model_checkpoint(opt, experiment)
return True
return None
def check_comet_resume(opt):
"""Restores run parameters to its original state based on the model checkpoint
and logged Experiment parameters.
Args:
opt (argparse.Namespace): Command Line arguments passed
to YOLOv5 training script
Returns:
None/bool: Return True if the run is restored successfully
else return None
"""
if comet_ml is None:
return
if isinstance(opt.resume, str):
if opt.resume.startswith(COMET_PREFIX):
api = comet_ml.API()
resource = urlparse(opt.resume)
experiment_path = f"{resource.netloc}{resource.path}"
experiment = api.get(experiment_path)
set_opt_parameters(opt, experiment)
download_model_checkpoint(opt, experiment)
return True
return None
import argparse
import json
import logging
import os
import sys
from pathlib import Path
import comet_ml
logger = logging.getLogger(__name__)
FILE = Path(__file__).resolve()
ROOT = FILE.parents[3] # YOLOv5 root directory
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
from train import parse_opt, train
from utils.callbacks import Callbacks
from utils.general import increment_path
from utils.torch_utils import select_device
# Project Configuration
config = comet_ml.config.get_config()
COMET_PROJECT_NAME = config.get_string(os.getenv("COMET_PROJECT_NAME"), "comet.project_name", default="yolov5")
def get_args(known=False):
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='initial weights path')
parser.add_argument('--cfg', type=str, default='', help='model.yaml path')
parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')
parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')
parser.add_argument('--epochs', type=int, default=300, help='total training epochs')
parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')
parser.add_argument('--rect', action='store_true', help='rectangular training')
parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
parser.add_argument('--noval', action='store_true', help='only validate final epoch')
parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
parser.add_argument('--noplots', action='store_true', help='save no plot files')
parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in "ram" (default) or "disk"')
parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')
parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')
parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')
parser.add_argument('--name', default='exp', help='save to project/name')
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
parser.add_argument('--quad', action='store_true', help='quad dataloader')
parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
parser.add_argument('--seed', type=int, default=0, help='Global training seed')
parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')
# Weights & Biases arguments
parser.add_argument('--entity', default=None, help='W&B: Entity')
parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')
# Comet Arguments
parser.add_argument("--comet_optimizer_config", type=str, help="Comet: Path to a Comet Optimizer Config File.")
parser.add_argument("--comet_optimizer_id", type=str, help="Comet: ID of the Comet Optimizer sweep.")
parser.add_argument("--comet_optimizer_objective", type=str, help="Comet: Set to 'minimize' or 'maximize'.")
parser.add_argument("--comet_optimizer_metric", type=str, help="Comet: Metric to Optimize.")
parser.add_argument("--comet_optimizer_workers",
type=int,
default=1,
help="Comet: Number of Parallel Workers to use with the Comet Optimizer.")
return parser.parse_known_args()[0] if known else parser.parse_args()
def run(parameters, opt):
hyp_dict = {k: v for k, v in parameters.items() if k not in ["epochs", "batch_size"]}
opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok or opt.evolve))
opt.batch_size = parameters.get("batch_size")
opt.epochs = parameters.get("epochs")
device = select_device(opt.device, batch_size=opt.batch_size)
train(hyp_dict, opt, device, callbacks=Callbacks())
if __name__ == "__main__":
opt = get_args(known=True)
opt.weights = str(opt.weights)
opt.cfg = str(opt.cfg)
opt.data = str(opt.data)
opt.project = str(opt.project)
optimizer_id = os.getenv("COMET_OPTIMIZER_ID")
if optimizer_id is None:
with open(opt.comet_optimizer_config) as f:
optimizer_config = json.load(f)
optimizer = comet_ml.Optimizer(optimizer_config)
else:
optimizer = comet_ml.Optimizer(optimizer_id)
opt.comet_optimizer_id = optimizer.id
status = optimizer.status()
opt.comet_optimizer_objective = status["spec"]["objective"]
opt.comet_optimizer_metric = status["spec"]["metric"]
logger.info("COMET INFO: Starting Hyperparameter Sweep")
for parameter in optimizer.get_parameters():
run(parameter["parameters"], opt)
{
"algorithm": "random",
"parameters": {
"anchor_t": {
"type": "discrete",
"values": [
2,
8
]
},
"batch_size": {
"type": "discrete",
"values": [
16,
32,
64
]
},
"box": {
"type": "discrete",
"values": [
0.02,
0.2
]
},
"cls": {
"type": "discrete",
"values": [
0.2
]
},
"cls_pw": {
"type": "discrete",
"values": [
0.5
]
},
"copy_paste": {
"type": "discrete",
"values": [
1
]
},
"degrees": {
"type": "discrete",
"values": [
0,
45
]
},
"epochs": {
"type": "discrete",
"values": [
5
]
},
"fl_gamma": {
"type": "discrete",
"values": [
0
]
},
"fliplr": {
"type": "discrete",
"values": [
0
]
},
"flipud": {
"type": "discrete",
"values": [
0
]
},
"hsv_h": {
"type": "discrete",
"values": [
0
]
},
"hsv_s": {
"type": "discrete",
"values": [
0
]
},
"hsv_v": {
"type": "discrete",
"values": [
0
]
},
"iou_t": {
"type": "discrete",
"values": [
0.7
]
},
"lr0": {
"type": "discrete",
"values": [
1e-05,
0.1
]
},
"lrf": {
"type": "discrete",
"values": [
0.01,
1
]
},
"mixup": {
"type": "discrete",
"values": [
1
]
},
"momentum": {
"type": "discrete",
"values": [
0.6
]
},
"mosaic": {
"type": "discrete",
"values": [
0
]
},
"obj": {
"type": "discrete",
"values": [
0.2
]
},
"obj_pw": {
"type": "discrete",
"values": [
0.5
]
},
"optimizer": {
"type": "categorical",
"values": [
"SGD",
"Adam",
"AdamW"
]
},
"perspective": {
"type": "discrete",
"values": [
0
]
},
"scale": {
"type": "discrete",
"values": [
0
]
},
"shear": {
"type": "discrete",
"values": [
0
]
},
"translate": {
"type": "discrete",
"values": [
0
]
},
"warmup_bias_lr": {
"type": "discrete",
"values": [
0,
0.2
]
},
"warmup_epochs": {
"type": "discrete",
"values": [
5
]
},
"warmup_momentum": {
"type": "discrete",
"values": [
0,
0.95
]
},
"weight_decay": {
"type": "discrete",
"values": [
0,
0.001
]
}
},
"spec": {
"maxCombo": 0,
"metric": "metrics/mAP_0.5",
"objective": "maximize"
},
"trials": 1
}
......@@ -259,7 +259,7 @@ def run(
plot_images(im, targets, paths, save_dir / f'val_batch{batch_i}_labels.jpg', names) # labels
plot_images(im, output_to_target(out), paths, save_dir / f'val_batch{batch_i}_pred.jpg', names) # pred
callbacks.run('on_val_batch_end')
callbacks.run('on_val_batch_end', batch_i, im, targets, paths, shapes, out)
# Compute metrics
stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)] # to numpy
......@@ -289,7 +289,7 @@ def run(
# Plots
if plots:
confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))
callbacks.run('on_val_end')
callbacks.run('on_val_end', nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)
# Save JSON
if save_json and len(jdict):
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论