Check out more notebooks at our Community Notebooks Repository!
Title: How to work with cloud storage
Author: David L Gibbs
Created: 2019-07-17
Purpose: Demonstrate how to move files --in and out of-- GCS.
Documentation at: https://cloud.google.com/storage/docs/listing-objects
# with gcloud, we can authenticate ourselves
!gcloud auth login
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth?redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&prompt=select_account&response_type=code&client_id=32555940559.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&access_type=offline
[13678:13697:0501/104705.579283:ERROR:browser_process_sub_thread.cc(217)] Waited 3 ms for network service
Opening in existing browser session.
WARNING: `gcloud auth login` no longer writes application default credentials.
If you need to use ADC, see:
gcloud auth application-default --help
You are now logged in as [dgibbs@systemsbiology.org].
Your current project is [isb-cgc-02-0001]. You can change this setting by running:
$ gcloud config set project PROJECT_ID
# and we can select our project
!gcloud config set project PROJECT_ID
ERROR: (gcloud.config.set) The project property must be set to a valid project ID, not the project name [PROJECT_ID]
To set your project, run:
$ gcloud config set project PROJECT_ID
or to unset it, run:
$ gcloud config unset project
Using a 'bang', or the exclamaition point (!), we can run command line commands. This includes file operations like 'ls', 'mv', and 'cp'. We can also use Google's tools including 'gsutil'.
# here we get a list of files stored *locally*
!ls -lha
total 2.1M drwxr-xr-x 3 davidgibbs davidgibbs 4.0K Apr 30 12:13 . drwxr-xr-x 8 davidgibbs davidgibbs 4.0K Apr 30 10:07 .. -rw-r--r-- 1 davidgibbs davidgibbs 7.9K Apr 30 10:08 'BCGSC microRNA expression.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 19K Apr 30 11:06 'BRAF-V600 study using CCLE data.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 174K Apr 30 10:08 'Copy Number segments.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 111K Apr 30 10:45 'Creating TCGA cohorts -- part 1.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 26K Apr 30 10:08 'Creating TCGA cohorts -- part 2.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 130K Apr 30 10:08 'Creating TCGA cohorts -- part 3.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 65K Apr 30 10:08 'DNA Methylation.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 15K Apr 30 12:13 how_to_move_files.ipynb drwxr-xr-x 2 davidgibbs davidgibbs 4.0K Apr 30 11:08 .ipynb_checkpoints -rw-r--r-- 1 davidgibbs davidgibbs 362K Apr 30 10:08 isb_cgc_bam_slicing_with_pysam.ipynb -rw-r--r-- 1 davidgibbs davidgibbs 362K Apr 30 10:08 ISB_cgc_bam_slicing_with_pysam.ipynb -rw-r--r-- 1 davidgibbs davidgibbs 116K Apr 30 10:08 ISB_CGC_Query_of_the_Month_November_2018.ipynb -rw-r--r-- 1 davidgibbs davidgibbs 22K Apr 30 10:08 'Protein expression.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 2.5K Apr 30 10:08 README.md -rw-r--r-- 1 davidgibbs davidgibbs 382K Apr 30 10:08 RegulomeExplorer_1_Gexpr_CNV.ipynb -rw-r--r-- 1 davidgibbs davidgibbs 28K Apr 30 10:46 renamed_test.bam -rw-r--r-- 1 davidgibbs davidgibbs 106K Apr 30 10:08 'Somatic Mutations.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 40K Apr 30 10:08 'TCGA Annotations.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 15K Apr 30 10:08 'The ISB-CGC open-access TCGA tables in BigQuery.ipynb' -rw-r--r-- 1 davidgibbs davidgibbs 51K Apr 30 10:08 'UNC HiSeq mRNAseq gene expression.ipynb'
# Now we use gsutil to list files in our bucket.
!gsutil ls gs://bam_bucket_1/
gs://bam_bucket_1/renamed_test.bam gs://bam_bucket_1/test.bam gs://bam_bucket_1/test_2.bam gs://bam_bucket_1/test_3.bam
# then we can copy to our local env.
!gsutil cp gs://bam_bucket_1/test.bam test_dl.bam
Copying gs://bam_bucket_1/test.bam... / [1 files][ 27.5 KiB/ 27.5 KiB] Operation completed over 1 objects/27.5 KiB.
# and it made it?
!ls -lha | grep test
-rw-r--r-- 1 davidgibbs davidgibbs 28K Apr 30 10:46 renamed_test.bam -rw-r--r-- 1 davidgibbs davidgibbs 28K May 1 10:47 test_dl.bam
# then we can copy it back to our bucket
!mv test_dl.bam renamed_test.bam
!gsutil cp renamed_test.bam gs://bam_bucket_1/renamed_test.bam
Copying file://renamed_test.bam [Content-Type=application/octet-stream]... / [1 files][ 27.5 KiB/ 27.5 KiB] Operation completed over 1 objects/27.5 KiB.
# to install and import the library
#!pip3 install --upgrade --user google-cloud-storage
!gcloud auth application-default login
#Your browser has been opened to visit:
#
# https://accounts.google.com/o/oauth2/auth?redirect_uri=.....
#
#Credentials saved to file: [/home/davidgibbs/.config/gcloud/application_default_credentials.json]
Your browser has been opened to visit: https://accounts.google.com/o/oauth2/auth?redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&prompt=select_account&response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&access_type=offline [14422:14441:0501/104735.895643:ERROR:browser_process_sub_thread.cc(217)] Waited 8 ms for network service Opening in existing browser session. Credentials saved to file: [/home/davidgibbs/.config/gcloud/application_default_credentials.json] These credentials will be used by any library that requests Application Default Credentials. To generate an access token for other uses, run: gcloud auth application-default print-access-token
!export GOOGLE_APPLICATION_CREDENTIALS="/home/davidgibbs/.config/gcloud/application_default_credentials.json"
import google.auth
import google.cloud.storage as storage
credentials, project = google.auth.default()
/home/davidgibbs/.local/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/ warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
# connection to our project
storage_client = storage.Client()
/home/davidgibbs/.local/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/ warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING) /home/davidgibbs/.local/lib/python3.6/site-packages/google/auth/_default.py:66: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/ warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
for b in storage_client.list_buckets():
print(b)
<Bucket: artifacts.isb-cgc-02-0001.appspot.com> <Bucket: bam_bucket_1> <Bucket: dataproc-6b064c10-086c-44db-b3b5-f14e410e0c13-us> <Bucket: dave_scratch_cosmic_v86> <Bucket: daves-cromwell-bucket> <Bucket: gibbs_bucket_nov162016> <Bucket: isb-cgc-02-0001> <Bucket: isb-cgc-02-0001-datalab> <Bucket: isb-cgc-02-0001-scratch> <Bucket: isb-cgc-02-0001-workflows> <Bucket: isb_dataproc_oct28> <Bucket: may_2018_qotm> <Bucket: pancan_staging> <Bucket: public_bucket_for_data_file_lists> <Bucket: qotm_nov> <Bucket: qotm_oct_2018> <Bucket: qotm_oct_20182018-10-29-23-47-53> <Bucket: qotm_oct_20182018-10-31-23-34-51> <Bucket: smr-workspace-mlengine> <Bucket: test_bucket_888> <Bucket: us.artifacts.isb-cgc-02-0001.appspot.com> <Bucket: vm-config.isb-cgc-02-0001.appspot.com> <Bucket: vm-containers.isb-cgc-02-0001.appspot.com> <Bucket: wild_new_bucket>
# here we'll create a bucket
bucket = storage_client.create_bucket('wild_new_bucket_2000')
print('Bucket {} created'.format(bucket.name))
Bucket wild_new_bucket_2000 created
# and then move a file to it
bucket = storage_client.get_bucket('wild_new_bucket_2000')
blob = bucket.blob('test_dl_upload.bam')
blob.upload_from_filename('renamed_test.bam')
# and check that it made it
blobs = bucket.list_blobs()
for blob in blobs:
print(blob.name)
test_dl_upload.bam
# end of notebook