The datasets that cptac
distributes are still being actively worked on by the teams that generated them. Additionally, we periodically make improvements to the cptac
package itself. Thus, we regularly release new versions of the data and the package. This tutorial will go over how to access both those data and package updates.
Note: In this tutorial, we intentionally get cptac
to generate the various errors and warnings it gives when your data or package is out of date. We do this on purpose, so you can see what it looks like; the tutorial is not broken.
Each time you import cptac
into a Python environment, it automatically checks whether you have the most recent release of the package. If you don't, it will print a warning like this:
import cptac
Warning: Your version of cptac (0.6.2) is out-of-date. Latest is 0.6.3. Please run 'pip install --upgrade cptac' to update it. (/home/caleb/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/ipykernel_launcher.py, line 1)
As the warning directs, simply run pip install --upgrade cptac
to get the latest version of the package. This will ensure that you have all the latest functionality of the package, and that you're able to access the latest versions of all the datasets.
Each time there's a new version of the package, we release the new version on PyPI, and also post a release page on GitHub. You can use GitHub's "Watch" feature to get an email sent to you every time we do this. Simply log in to GitHub, browse to the main page for our repository, click on the "Watch" button in the upper right corner of the page, and select the "Releases only" option from the drop-down box, as shown below. You will then get an email every time we release another version of the package.
Periodically, there will be data updates released for different datasets. cptac
automatically checks for this whenever you load a dataset, and if you don't manually specify a version when loading a dataset, it will raise an exception if your latest installed version of the data doesn't match the latest data version that's released. The error message will give you instructions for downloading the new data version.
Note: The error information below is rather long. This is because Jupyter Notebooks automatically prints the entire stack trace that accompanies an error. The informative error message is at the bottom. If you were using cptac
in the command line or in a script, only the informative error message at the bottom would be printed.
gb = cptac.Gbm()
--------------------------------------------------------------------------- AmbiguousLatestError Traceback (most recent call last) <ipython-input-2-c077f699f23e> in <module> ----> 1 gb = cptac.Gbm() ~/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/cptac/gbm.py in __init__(self, version) 57 } 58 ---> 59 super().__init__(cancer_type="gbm", version=version, valid_versions=valid_versions, data_files=data_files) 60 61 # Load the data into dataframes in the self._data dict ~/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/cptac/dataset.py in __init__(self, cancer_type, version, valid_versions, data_files) 43 44 # Validate the version ---> 45 self._version = validate_version(version, self._cancer_type, use_context="init", valid_versions=valid_versions) 46 47 # Get the paths to the data files ~/anaconda3/envs/cptac-dev/lib/python3.7/site-packages/cptac/file_tools.py in validate_version(version, dataset, use_context, valid_versions) 67 return_version = index_latest 68 elif use_context == "init": ---> 69 raise AmbiguousLatestError(f"You requested to load the {dataset} dataset. Latest version is {index_latest}, which is not installed locally. To download it, run \"cptac.download(dataset='{dataset}')\". You will then be able to load the latest version of the dataset. To skip this and instead load the older version that is already installed, call \"cptac.{dataset.title()}(version='{latest_installed}')\".") 70 else: 71 raise InvalidParameterError(f"{version} is an invalid version for the {dataset} dataset. Valid versions: {', '.join(index.keys())}") AmbiguousLatestError: You requested to load the gbm dataset. Latest version is 2.0, which is not installed locally. To download it, run "cptac.download(dataset='gbm')". You will then be able to load the latest version of the dataset. To skip this and instead load the older version that is already installed, call "cptac.Gbm(version='1.0')".
To download the new data version, run the cptac.download
function as the error message directs. cptac
will notify you that it is downloading new data.
cptac.download(dataset="gbm", version="latest")
True
You can then load the dataset, and cptac
will automatically load the latest data version.
gb = cptac.Gbm()
gb.version()
cptac warning: The GBM dataset is under publication embargo until March 01, 2021. CPTAC is a community resource project and data are made available rapidly after generation for community research use. The embargo allows exploring and utilizing the data, but analysis may not be published until after the embargo date. Please see https://proteomics.cancer.gov/data-portal/about/data-use-agreement or enter cptac.embargo() to open the webpage for more details. (C:\Users\humbe\miniconda3\lib\site-packages\ipykernel_launcher.py, line 1)
'3.0'
After you have updated a dataset, you can still access old versions of the data. This is helpful, for example, if you want to compare your analyses between data versions. To load an older version of the data, simply pass the desired version number to the version
parameter when loading the dataset:
gb = cptac.Gbm(version="1.0")
gb.version()
Loading gbm v1.0...
cptac warning: Old gbm data version. Latest is 3.0. This is 1.0. (C:\Users\humbe\miniconda3\lib\site-packages\ipykernel_launcher.py, line 1)
cptac warning: The GBM dataset is under publication embargo until March 01, 2021. CPTAC is a community resource project and data are made available rapidly after generation for community research use. The embargo allows exploring and utilizing the data, but analysis may not be published until after the embargo date. Please see https://proteomics.cancer.gov/data-portal/about/data-use-agreement or enter cptac.embargo() to open the webpage for more details. (C:\Users\humbe\miniconda3\lib\site-packages\ipykernel_launcher.py, line 1)
'1.0'