%pip install --upgrade "comet_ml>=3.44.0"
import comet_ml
comet_ml.login(project_name="remote-artifacts")
COMET INFO: Comet API key is valid
For this guide, we're going to use the DOTA dataset. DOTA is a collection of aerial images that have been collected from different sensors and platforms.
The dataset has been uploaded to an S3 bucket. First let's download the metadata for this dataset from our S3 bucket.
!wget https://cdn.comet.ml/dota_split/DOTA_1.0.json
--2023-01-10 23:10:20-- https://cdn.comet.ml/dota_split/DOTA_1.0.json Resolving cdn.comet.ml (cdn.comet.ml)... 65.9.112.9, 65.9.112.7, 65.9.112.41, ... Connecting to cdn.comet.ml (cdn.comet.ml)|65.9.112.9|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13289256 (13M) [application/json] Saving to: ‘DOTA_1.0.json’ DOTA_1.0.json 100%[===================>] 12,67M 2,28MB/s in 5,8s 2023-01-10 23:10:27 (2,20 MB/s) - ‘DOTA_1.0.json’ saved [13289256/13289256]
First, lets define the class names present in this dataset
LABEL_CLASS_NAMES = [
"plane",
"baseball-diamond",
"bridge",
"ground-track-field",
"small-vehicle",
"large-vehicle",
"ship",
"tennis-court",
"basketball-court",
"storage-tank",
"soccer-ball-field",
"roundabout",
"harbor",
"swimming-pool",
"helicopter",
]
Next, we're going to load in the metadata file that we've downloaded from our S3 bucket and format it in a way that allows us to track the URLs for the individual image assets in a Remote Artifact. We will also track the annotations as asset metadata.
import json
base_url = "https://cdn.comet.ml/dota_split"
metadata_file = "./DOTA_1.0.json"
with open(metadata_file, "r") as f:
dota_metadata = json.load(f)
annotation_map = {}
for annotation in dota_metadata["annotations"]:
img_id = annotation["image_id"]
annotation_map.setdefault(img_id, [])
annotation_map[img_id].append(annotation)
artifact = comet_ml.Artifact(
name="DOTA", artifact_type="dataset", metadata={"class_names": LABEL_CLASS_NAMES}
)
for image in dota_metadata["images"]:
try:
annotations = annotation_map[image["id"]]
artifact.add_remote(
f"{base_url}/images/{image['file_name']}",
metadata={"annotations": annotations},
)
except Exception as e:
continue
experiment = comet_ml.start()
experiment.log_artifact(artifact)
experiment.end()
COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting. COMET INFO: Experiment is live on comet.com https://www.comet.com/lothiraldan/remote-artifacts/6293676561fc4b07a83a496aa0c3a31e COMET INFO: Artifact 'DOTA' version 1.0.0 created COMET INFO: Scheduling the upload of 3628 assets for a size of 224.13 KB, this can take some time COMET INFO: Still scheduling the upload of 1475 assets, remaining size 91.06 KB COMET INFO: Artifact 'lothiraldan/DOTA:1.0.0' has started uploading asynchronously COMET INFO: --------------------------- COMET INFO: Comet.ml Experiment Summary COMET INFO: --------------------------- COMET INFO: Data: COMET INFO: display_summary_level : 1 COMET INFO: url : https://www.comet.com/lothiraldan/remote-artifacts/6293676561fc4b07a83a496aa0c3a31e COMET INFO: Uploads: COMET INFO: artifact assets : 3628 (224.13 KB) COMET INFO: artifacts : 1 COMET INFO: environment details : 1 COMET INFO: filename : 1 COMET INFO: git metadata : 1 COMET INFO: installed packages : 1 COMET INFO: notebook : 1 COMET INFO: source_code : 1 COMET INFO: --------------------------- COMET INFO: Uploading metrics, params, and assets to Comet before program termination (may take several seconds) COMET INFO: The Python SDK has 3600 seconds to finish before aborting... COMET INFO: Waiting for completion of the file uploads (may take several seconds) COMET INFO: The Python SDK has 10800 seconds to finish before aborting... COMET INFO: Still uploading 3060 file(s), remaining 198.66 KB/1.90 MB COMET INFO: Still uploading 1371 file(s), remaining 102.85 KB/7.59 MB, Throughput 6.12 KB/s, ETA ~17s COMET INFO: Artifact 'lothiraldan/DOTA:1.0.0' has been fully uploaded successfully