This module defines the basic DataBunch
object that is used inside Learner
to train a model. This is the generic class, that can take any kind of fastai Dataset
or DataLoader
. You'll find helpful functions in the data module of every application to directly create this DataBunch
for you.
from fastai.gen_doc.nbdoc import *
from fastai.basic_data import *
show_doc(DataBunch, doc_string=False)
class
DataBunch
[source]
DataBunch
(train_dl
:DataLoader
,valid_dl
:DataLoader
,test_dl
:Optional
[DataLoader
]=None
,device
:device
=None
,tfms
:Optional
[Collection
[Callable
]]=None
,path
:PathOrStr
='.'
,collate_fn
:Callable
='data_collate'
)
Bind together a train_dl
, a valid_dl
and optionally a test_dl
, ensures they are on device
and apply to them tfms
as batch are drawn. path
is used internally to store temporary files, collate_fn
is passed to the pytorch Dataloader
(replacing the one there) to explain how to collate the samples picked for a batch. By default, it applies data to the object sent (see in vision.image
why this can be important).
An example of tfms
to pass is normalization. train_dl
, valid_dl
and optionally test_dl
will be wrapped in DeviceDataLoader
.
show_doc(DataBunch.create, doc_string=False)
Create a DataBunch
from train_ds
, valid_ds
and optionally test_ds
, with batch size bs
and by using num_workers
. tfms
and device
are passed to the init method.
show_doc(DataBunch.dl)
dl
[source]
dl
(ds_type
:DatasetType
=<DatasetType.Valid: 2>
) →DeviceDataLoader
Returns appropriate Dataset
for validation, training, or test (ds_type
).
show_doc(DataBunch.add_tfm)
add_tfm
[source]
add_tfm
(tfm
:Callable
)
Adds a transform to all dataloaders.
show_doc(DeviceDataLoader, doc_string=False)
class
DeviceDataLoader
[source]
DeviceDataLoader
(dl
:DataLoader
,device
:device
,tfms
:List
[Callable
]=None
,collate_fn
:Callable
='data_collate'
,skip_size1
:bool
=False
)
Put the batches of dl
on device
after applying an optional list of tfms
. collate_fn
will replace the one of dl
. All dataloaders of a DataBunch
are of this type.
show_doc(DeviceDataLoader.create, doc_string=False)
Create a DeviceDataLoader
on device
from a dataset
with batch size bs
, num_workers
processes and a given collate_fn
. The dataloader will shuffle
the data if that flag is set to True, and tfms
are passed to the init method. All kwargs
are passed to the pytorch DataLoader
class initialization.
show_doc(DeviceDataLoader.one_batch)
show_doc(DeviceDataLoader.add_tfm)
add_tfm
[source]
add_tfm
(tfm
:Callable
)
Add a transform (i.e. same as self.tfms.append(tfm)
).
show_doc(DeviceDataLoader.remove_tfm)
remove_tfm
[source]
remove_tfm
(tfm
:Callable
)
Remove a transform.
show_doc(DatasetType, doc_string=False)
Enum
= [Train, Valid, Test]
Internal enumerator to name the training, validation and test dataset/dataloader.
show_doc(DatasetBase, title_level=3)
show_doc(LabelDataset, title_level=3)
class
LabelDataset
[source]
LabelDataset
(classes
:Collection
,class2idx
:Dict
[Any
,int
]=None
) ::DatasetBase
Base class for fastai datasets that do classification, mapped according to classes
.
show_doc(SingleClassificationDataset, title_level=3)
class
SingleClassificationDataset
[source]
SingleClassificationDataset
(classes
:StrList
) ::DatasetBase
A Dataset
that contains no data, only classes
, mainly used for inference with set_item
show_doc(DeviceDataLoader.proc_batch)
show_doc(DeviceDataLoader.collate_fn)