Shark Data Containers Quick Reference

Relevant Types

Data, UnlabeledData, LabeledData (also the typedefs ClassificationDataset, CompressedClassificationDataset, RegressionDataset), DataView, Data, DataDistribution, LabeledDataDistribution, CVFolds.

Container / View Creation

Data<T>() create empty data container Dataset.h
Data<T>(data) create shallow copy with content sharing Dataset.h
Data<T>(N) create new data container with N batches Dataset.h
Data<T>(N, elem) create new data container with N elements, with blueprint elem Dataset.h
UnlabeledData<T>() create empty data container Dataset.h
UnlabeledData<T>(data) create shallow copy with content sharing Dataset.h
UnlabeledData<T>(N) create new data container with N batches Dataset.h
UnlabeledData<T>(N, elem) create new data container with N elements, with blueprint elem Dataset.h
LabeledData<I,L>() create empty data container Dataset.h
LabeledData<I,L>(input, labels) create shallow copy with content sharing Dataset.h
LabeledData<I,L>(N) create new data container with N batches Dataset.h
LabeledData<I,L>(N, elem) create new data container with N elements, with blueprint elem Dataset.h
DataView<DatasetType>(data) create view of data for fast random access to elements DataView.h
createDataFromRange() create from begin+end iterators, e.g., from std::vector Dataset.h
createLabeledDataFromRange() create from two ranges for inputs and labels Dataset.h
toDataset() create data container from view DataView.h

Batch Access

data.empty() true iff data.numberOfBatches() == 0 Dataset.h
data.numberOfBatches() number of batches in the container Dataset.h
data.batch(i) (reference to) the i-th batch Dataset.h
data.batches() stl-compliant access to batches as a range Dataset.h

Element Access

Warning

Random access to elements is a linear time operation! Never iterate over elements by index. Consider employing a DataView for random access.

data.numberOfElements() number of elements in the container Dataset.h
data.element(i) (proxy to) the i-th elements Dataset.h
data.elements() stl-compliant access to (proxies to) elements as a range Dataset.h

Further Methods

swap() swap container contents (constant time) Dataset.h
makeIndependent() make sure data is not shared with other containers Dataset.h
shuffle() randomly reorder elements (not only batches) Dataset.h
append(data) concatenate containers Dataset.h
LabeledData::inputs() underlying container of inputs Dataset.h
LabeledData::labels() underlying container of labels Dataset.h

Sizes and Dimensions

numberOfClasses() number of classes (maximal class label + 1) Dataset.h
classSizes() vector of class sizes Dataset.h
dataDimension() dimension of vectors in the data set Dataset.h
inputDimension() dimension of input vectors in the data set Dataset.h
labelDimension() dimension of label vectors in the data set Dataset.h

Subset Creation and Folds for Cross-validation

splitAtElement() split data into front and back part (often training and test) Dataset.h
subset() create indexed subset from DataView DataView.h
createCVIID() create folds by i.i.d. assignment of element to folds CVDatasetTools.h
createCVSameSize() create folds of roughly equal size CVDatasetTools.h
createCVSameSizeBalanced() create folds of roughly equal size, stratifying classes CVDatasetTools.h
createCVIndexed() create folds explicitly by index CVDatasetTools.h
createCVFullyIndexed() create folds explicitly by index with reordering CVDatasetTools.h
Data::splice() split data at batch boundaries (contrary of append) Dataset.h
indexedSubset() obtain subset of batches from indices Dataset.h
rangeSubset obtain subset of batches from range Dataset.h
selectFeatures() filter out a subset of features from Data Dataset.h
selectInputFeatures() filter out a subset of features from LabeledData Dataset.h

Import / Export

importCSV() import from comma separated values (CSV) file Csv.h
exportCSV() export to comma separated values (CSV) file Csv.h
importSparseData() import from sparse vector (libSVM) format SparseData.h
exportSparseData() export to sparse vector (libSVM) format SparseData.h
importHDF5() import from HDF5 file used by mldata.org HDF5.h
importPGM() import single PGM image Pgm.h
importPGMDir import directory of PGM images Pgm.h
importPGMSet() import set of PGM images Pgm.h
exportPGM() export single PGM image Pgm.h