Shark Data Containers Quick Reference¶
Relevant Types¶
Data, UnlabeledData, LabeledData (also the typedefs ClassificationDataset, CompressedClassificationDataset, RegressionDataset), DataView, Data, DataDistribution, LabeledDataDistribution, CVFolds.
Container / View Creation¶
Data<T>() | create empty data container | Dataset.h |
Data<T>(data) | create shallow copy with content sharing | Dataset.h |
Data<T>(N) | create new data container with N batches | Dataset.h |
Data<T>(N, elem) | create new data container with N elements, with blueprint elem | Dataset.h |
UnlabeledData<T>() | create empty data container | Dataset.h |
UnlabeledData<T>(data) | create shallow copy with content sharing | Dataset.h |
UnlabeledData<T>(N) | create new data container with N batches | Dataset.h |
UnlabeledData<T>(N, elem) | create new data container with N elements, with blueprint elem | Dataset.h |
LabeledData<I,L>() | create empty data container | Dataset.h |
LabeledData<I,L>(input, labels) | create shallow copy with content sharing | Dataset.h |
LabeledData<I,L>(N) | create new data container with N batches | Dataset.h |
LabeledData<I,L>(N, elem) | create new data container with N elements, with blueprint elem | Dataset.h |
DataView<DatasetType>(data) | create view of data for fast random access to elements | DataView.h |
createDataFromRange() | create from begin+end iterators, e.g., from std::vector | Dataset.h |
createLabeledDataFromRange() | create from two ranges for inputs and labels | Dataset.h |
toDataset() | create data container from view | DataView.h |
Batch Access¶
data.empty() | true iff data.numberOfBatches() == 0 | Dataset.h |
data.numberOfBatches() | number of batches in the container | Dataset.h |
data.batch(i) | (reference to) the i-th batch | Dataset.h |
data.batches() | stl-compliant access to batches as a range | Dataset.h |
Element Access¶
Warning
Random access to elements is a linear time operation!
Never iterate over elements by index. Consider employing
a DataView
for random access.
data.numberOfElements() | number of elements in the container | Dataset.h |
data.element(i) | (proxy to) the i-th elements | Dataset.h |
data.elements() | stl-compliant access to (proxies to) elements as a range | Dataset.h |
Further Methods¶
swap() | swap container contents (constant time) | Dataset.h |
makeIndependent() | make sure data is not shared with other containers | Dataset.h |
shuffle() | randomly reorder elements (not only batches) | Dataset.h |
append(data) | concatenate containers | Dataset.h |
LabeledData::inputs() | underlying container of inputs | Dataset.h |
LabeledData::labels() | underlying container of labels | Dataset.h |
Sizes and Dimensions¶
numberOfClasses() | number of classes (maximal class label + 1) | Dataset.h |
classSizes() | vector of class sizes | Dataset.h |
dataDimension() | dimension of vectors in the data set | Dataset.h |
inputDimension() | dimension of input vectors in the data set | Dataset.h |
labelDimension() | dimension of label vectors in the data set | Dataset.h |
Subset Creation and Folds for Cross-validation¶
splitAtElement() | split data into front and back part (often training and test) | Dataset.h |
subset() | create indexed subset from DataView | DataView.h |
createCVIID() | create folds by i.i.d. assignment of element to folds | CVDatasetTools.h |
createCVSameSize() | create folds of roughly equal size | CVDatasetTools.h |
createCVSameSizeBalanced() | create folds of roughly equal size, stratifying classes | CVDatasetTools.h |
createCVIndexed() | create folds explicitly by index | CVDatasetTools.h |
createCVFullyIndexed() | create folds explicitly by index with reordering | CVDatasetTools.h |
Data::splice() | split data at batch boundaries (contrary of append) | Dataset.h |
indexedSubset() | obtain subset of batches from indices | Dataset.h |
rangeSubset | obtain subset of batches from range | Dataset.h |
selectFeatures() | filter out a subset of features from Data | Dataset.h |
selectInputFeatures() | filter out a subset of features from LabeledData | Dataset.h |
Import / Export¶
importCSV() | import from comma separated values (CSV) file | Csv.h |
exportCSV() | export to comma separated values (CSV) file | Csv.h |
importSparseData() | import from sparse vector (libSVM) format | SparseData.h |
exportSparseData() | export to sparse vector (libSVM) format | SparseData.h |
importHDF5() | import from HDF5 file used by mldata.org | HDF5.h |
importPGM() | import single PGM image | Pgm.h |
importPGMDir | import directory of PGM images | Pgm.h |
importPGMSet() | import set of PGM images | Pgm.h |
exportPGM() | export single PGM image | Pgm.h |