The DataCollection class queries for datasets (collections in NASA terminology) and can use a variety of criteria. The basics are the spatio temporal parameters but we can also search based on the data center (or DAAC), the dataset version or cloud hosted data.
This notebook provides some examples of how to search for datasets using different parameters.
Collection search parameters
dataset origin and location
spatio temporal parameters
dataset metadata parameters
Once the query has been formed with one or more search parameters we can get the results by using either hits()
or get()
.
get(10)
, if we do not specify the default number is 2000from earthaccess import DataCollections
# We only need to specify the DAAC and if we're looking for cloud hosted data
query = DataCollections().daac("LPDAAC").cloud_hosted(False)
# we use hits to get a count for the collections that match our query
query.hits()
# Now we get the collections' metadata
collections = query.get(10)
# let's print only the first collection, uncomment the next line
# collections[0]
# We can print a small summary of the dataset, here for the first 10 collections
summaries = [collection.summary() for collection in collections]
summaries
Note: Some DAACs don't have cloud hosted collections yet, some have cloud collections but do not allow direct access
# Now let's search using keyword and daac
# from earthaccess import DataCollections
query = DataCollections().keyword("fi*e").daac("LPDAAC")
# we use hits to get a count for the collections that match our query
query.hits()
# Now let's search using keyword and daac
query = DataCollections().keyword("fire").daac("LPDAAC")
# we use hits to get a count for the collections that match our query
query.hits()
# Let's get only the info on the first 10 collections and filter the fields
collections = query.get(10)
# let's print just the first collection, do you really want to look at all the metadata ?
# We can print a small summary of the dataset, here for the first 10 collections again
summaries = [collection.summary() for collection in collections]
summaries
query = DataCollections().cloud_hosted(True).bounding_box(-25.31, 63.23, -11.95, 66.65)
query.hits()
query = (
DataCollections()
.cloud_hosted(True)
.short_name("ECCO_L4_GMSL_TIME_SERIES_MONTHLY_V4R4")
)
for c in query.get(40):
print(c.summary(), "\n")