Budgeted stochastic gradient descent training for kernel-based models. More...

#include <shark/Algorithms/Trainers/Budgeted/KernelBudgetedSGDTrainer.h>

Inheritance diagram for shark::KernelBudgetedSGDTrainer< InputType, CacheType >:

Public Types
enum	preInitializationMethod { NONE, RANDOM }
	preinitialization methods More...

typedef AbstractKernelFunction< InputType >	KernelType

typedef KernelClassifier< InputType >	ClassifierType

typedef KernelExpansion< InputType >	ModelType

typedef AbstractLoss< unsigned int, RealVector >	LossType

typedef ConstProxyReference< typename Batch< InputType >::type const >::type	ConstBatchInputReference

typedef CacheType	QpFloatType

typedef LabeledData< InputType, unsigned int >::element_type	ElementType

typedef KernelMatrix< InputType, QpFloatType >	KernelMatrixType

typedef PartlyPrecomputedMatrix< KernelMatrixType >	PartlyPrecomputedMatrixType

Public Types inherited from shark::AbstractTrainer< KernelClassifier< InputType > >
typedef KernelClassifier< InputType >	ModelType

typedef ModelType::InputType	InputType

typedef typename KernelClassifier< InputType > ::OutputType	LabelType

typedef LabeledData< InputType, LabelType >	DatasetType

Public Types inherited from shark::IParameterizable<>
typedef RealVector	ParameterVectorType

Public Member Functions
	KernelBudgetedSGDTrainer (KernelType kernel, const LossType loss, double C, bool offset, bool unconstrained=false, size_t budgetSize=500, AbstractBudgetMaintenanceStrategy< InputType > *budgetMaintenanceStrategy=NULL, size_t epochs=1, size_t preInitializationMethod=NONE, double minMargin=1.0f)
	Constructor Note that there is no cache size involved, as merging vectors will always create new ones, which makes caching roughly obsolete. More...

size_t	budgetSize () const

void	setBudgetSize (std::size_t budgetSize)

AbstractBudgetMaintenanceStrategy< InputType > *	budgetMaintenanceStrategy () const

void	setBudgetMaintenanceStrategy (AbstractBudgetMaintenanceStrategy< InputType > *budgetMaintenanceStrategy)

double	minMargin () const

void	setMinMargin (double minMargin)

std::string	name () const
	From INameable: return the class name. More...

void	train (ClassifierType &classifier, const LabeledData< InputType, unsigned int > &dataset)

std::size_t	epochs () const

void	setEpochs (std::size_t value)

KernelType *	kernel ()
	get the kernel function More...

const KernelType *	kernel () const
	get the kernel function More...

void	setKernel (KernelType *kernel)
	set the kernel function More...

bool	isUnconstrained () const

double	C () const
	return the value of the regularization parameter More...

void	setC (double value)
	set the value of the regularization parameter (must be positive) More...

bool	trainOffset () const
	check whether the model to be trained should include an offset term More...

RealVector	parameterVector () const
	Returns the vector of hyper-parameters. More...

void	setParameterVector (RealVector const &newParameters)
	Sets the vector of hyper-parameters. More...

size_t	numberOfParameters () const
	Returns the number of hyper-parameters. More...

Public Member Functions inherited from shark::AbstractTrainer< KernelClassifier< InputType > >
virtual void	train (ModelType &model, DatasetType const &dataset)=0
	Core of the Trainer interface. More...

Public Member Functions inherited from shark::INameable
virtual	~INameable ()

Public Member Functions inherited from shark::ISerializable
virtual	~ISerializable ()
	Virtual d'tor. More...

virtual void	read (InArchive &archive)
	Read the component from the supplied archive. More...

virtual void	write (OutArchive &archive) const
	Write the component to the supplied archive. More...

void	load (InArchive &archive, unsigned int version)
	Versioned loading of components, calls read(...). More...

void	save (OutArchive &archive, unsigned int version) const
	Versioned storing of components, calls write(...). More...

	BOOST_SERIALIZATION_SPLIT_MEMBER ()

Public Member Functions inherited from shark::IParameterizable<>
virtual	~IParameterizable ()

Protected Attributes
KernelType *	m_kernel
	pointer to kernel function More...

const LossType *	m_loss
	pointer to loss function More...

double	m_C
	regularization parameter More...

bool	m_offset
	should the resulting model have an offset term? More...

bool	m_unconstrained
	should C be stored as log(C) as a parameter? More...

std::size_t	m_budgetSize

AbstractBudgetMaintenanceStrategy< InputType > *	m_budgetMaintenanceStrategy

std::size_t	m_epochs
	number of training epochs (sweeps over the data), or 0 for default = max(10, C) More...

std::size_t	m_preInitializationMethod

double	m_minMargin

Detailed Description

template<class InputType, class CacheType = float>
class shark::KernelBudgetedSGDTrainer< InputType, CacheType >

Budgeted stochastic gradient descent training for kernel-based models.

This is an implementation of the BSGD algorithm, developed by: Wang, Crammer and Vucetic: Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training, JMLR 2012. Basically this is pegasos, so something similar to a perceptron. The main difference is that we do restrict the sparsity of the weight vector to a (currently predefined) value. Therefore, whenever this sparsity is reached, we have to decide how to add a new vector to the model, without destroying this sparsity. Several methods have been proposed for this, Wang et al. main insight is that merging two budget vectors (i.e. two vectors in the model). If the first one is searched by norm of its alpha coefficient, the second one can be found by some optimization problem, yielding a roughly optimal pair. This pair can be merged and by doing so the budget has now space for a new vector. Such strategies are called budget maintenance strategies.

This implementation owes much to the 'reference' implementation: in the BudgetedSVM software.

For the documentation of the basic SGD algorithm, please refer to: KernelSGDTrainer.h. Note that we did not take over the special alpha scaling from that class. Therefore this class is perhaps numerically not as robust as SGD.

Definition at line 97 of file KernelBudgetedSGDTrainer.h.

Member Typedef Documentation

◆ ClassifierType

template<class InputType , class CacheType = float>

typedef KernelClassifier<InputType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ClassifierType

Definition at line 102 of file KernelBudgetedSGDTrainer.h.

◆ ConstBatchInputReference

template<class InputType , class CacheType = float>

typedef ConstProxyReference<typename Batch<InputType>::type const>::type shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ConstBatchInputReference

Definition at line 105 of file KernelBudgetedSGDTrainer.h.

◆ ElementType

template<class InputType , class CacheType = float>

typedef LabeledData<InputType, unsigned int>::element_type shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ElementType

Definition at line 107 of file KernelBudgetedSGDTrainer.h.

◆ KernelMatrixType

template<class InputType , class CacheType = float>

typedef KernelMatrix<InputType, QpFloatType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::KernelMatrixType

Definition at line 109 of file KernelBudgetedSGDTrainer.h.

◆ KernelType

template<class InputType , class CacheType = float>

typedef AbstractKernelFunction<InputType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::KernelType

Definition at line 101 of file KernelBudgetedSGDTrainer.h.

◆ LossType

template<class InputType , class CacheType = float>

typedef AbstractLoss<unsigned int, RealVector> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::LossType

Definition at line 104 of file KernelBudgetedSGDTrainer.h.

◆ ModelType

template<class InputType , class CacheType = float>

typedef KernelExpansion<InputType> shark::KernelBudgetedSGDTrainer< InputType, CacheType >::ModelType

Definition at line 103 of file KernelBudgetedSGDTrainer.h.

◆ PartlyPrecomputedMatrixType

template<class InputType , class CacheType = float>

typedef PartlyPrecomputedMatrix< KernelMatrixType > shark::KernelBudgetedSGDTrainer< InputType, CacheType >::PartlyPrecomputedMatrixType

Definition at line 110 of file KernelBudgetedSGDTrainer.h.

◆ QpFloatType

template<class InputType , class CacheType = float>

typedef CacheType shark::KernelBudgetedSGDTrainer< InputType, CacheType >::QpFloatType

Definition at line 106 of file KernelBudgetedSGDTrainer.h.

Member Enumeration Documentation

◆ preInitializationMethod

template<class InputType , class CacheType = float>

enum shark::KernelBudgetedSGDTrainer::preInitializationMethod

preinitialization methods

Enumerator
NONE
RANDOM

Definition at line 115 of file KernelBudgetedSGDTrainer.h.

Constructor & Destructor Documentation

◆ KernelBudgetedSGDTrainer()

template<class InputType , class CacheType = float>

shark::KernelBudgetedSGDTrainer< InputType, CacheType >::KernelBudgetedSGDTrainer	(	KernelType *	kernel,
		const LossType *	loss,
		double	C,
		bool	offset,
		bool	unconstrained = `false`,
		size_t	budgetSize = `500`,
		AbstractBudgetMaintenanceStrategy< InputType > *	budgetMaintenanceStrategy = `NULL`,
		size_t	epochs = `1`,
		size_t	preInitializationMethod = `NONE`,
		double	minMargin = `1.0f`
	)

inline

Constructor Note that there is no cache size involved, as merging vectors will always create new ones, which makes caching roughly obsolete.

Parameters

[in]	kernel	kernel function to use for training and prediction
[in]	loss	(sub-)differentiable loss function
[in]	C	regularization parameter - always the 'true' value of C, even when unconstrained is set
[in]	offset	whether to train with offset/bias parameter or not
[in]	unconstrained	when a C-value is given via setParameter, should it be piped through the exp-function before using it in the solver?
[in]	budgetSize	size of the budget/model that the final solution will have. Note that it might be smaller though.
[in]	budgetMaintenanceStrategy	object that contains the logic for maintaining the budget size.
[in]	epochs	number of epochs the SGD solver should run. if zero is given, the size will be the max of 10datasetsize or Cdatasetsize
[in]	preInitializationMethod	the method to preinitialize the budget.
[in]	minMargin	the margin every vector has to obey. Usually this is 1.

Definition at line 134 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_loss, and SHARK_RUNTIME_CHECK.

Member Function Documentation

◆ budgetMaintenanceStrategy()

template<class InputType , class CacheType = float>

AbstractBudgetMaintenanceStrategy<InputType>* shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetMaintenanceStrategy ( ) const

inline

return pointer to the budget maintenance strategy

Returns: pointer to the budget maintenance strategy.

Definition at line 185 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy.

Referenced by shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetMaintenanceStrategy().

◆ budgetSize()

template<class InputType , class CacheType = float>

size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetSize ( ) const

inline

get budget size

Returns: budget size

Definition at line 167 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetSize.

Referenced by shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetSize().

◆ C()

template<class InputType , class CacheType = float>

double shark::KernelBudgetedSGDTrainer< InputType, CacheType >::C ( ) const

inline

return the value of the regularization parameter

Definition at line 437 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_C.

◆ epochs()

template<class InputType , class CacheType = float>

std::size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::epochs ( ) const

inline

Return the number of training epochs. A value of 0 indicates that the default of max(10, C) should be used.

Definition at line 400 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_epochs.

◆ isUnconstrained()

template<class InputType , class CacheType = float>

bool shark::KernelBudgetedSGDTrainer< InputType, CacheType >::isUnconstrained ( ) const

inline

check whether the parameter C is represented as log(C), thus, in a form suitable for unconstrained optimization, in the parameter vector

Definition at line 431 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_unconstrained.

◆ kernel() [1/2]

template<class InputType , class CacheType = float>

KernelType* shark::KernelBudgetedSGDTrainer< InputType, CacheType >::kernel ( )

inline

get the kernel function

Definition at line 413 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel.

Referenced by shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setKernel().

◆ kernel() [2/2]

template<class InputType , class CacheType = float>

const KernelType* shark::KernelBudgetedSGDTrainer< InputType, CacheType >::kernel ( ) const

inline

get the kernel function

Definition at line 418 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel.

◆ minMargin()

template<class InputType , class CacheType = float>

double shark::KernelBudgetedSGDTrainer< InputType, CacheType >::minMargin ( ) const

inline

return min margin

Returns: current min margin

Definition at line 203 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_minMargin.

Referenced by shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setMinMargin().

◆ name()

template<class InputType , class CacheType = float>

std::string shark::KernelBudgetedSGDTrainer< InputType, CacheType >::name ( ) const

inlinevirtual

From INameable: return the class name.

Reimplemented from shark::INameable.

Definition at line 219 of file KernelBudgetedSGDTrainer.h.

◆ numberOfParameters()

template<class InputType , class CacheType = float>

size_t shark::KernelBudgetedSGDTrainer< InputType, CacheType >::numberOfParameters ( ) const

inlinevirtual

Returns the number of hyper-parameters.

Reimplemented from shark::IParameterizable<>.

Definition at line 474 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel, and shark::IParameterizable< VectorType >::numberOfParameters().

◆ parameterVector()

template<class InputType , class CacheType = float>

RealVector shark::KernelBudgetedSGDTrainer< InputType, CacheType >::parameterVector ( ) const

inlinevirtual

Returns the vector of hyper-parameters.

Reimplemented from shark::IParameterizable<>.

Definition at line 456 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_C, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel, shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_unconstrained, and shark::IParameterizable< VectorType >::parameterVector().

◆ setBudgetMaintenanceStrategy()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetMaintenanceStrategy ( AbstractBudgetMaintenanceStrategy< InputType > * budgetMaintenanceStrategy )

inline

set budget maintenance strategy

Parameters

[in] budgetMaintenanceStrategy set strategy to given object.

Definition at line 194 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetMaintenanceStrategy(), and shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetMaintenanceStrategy.

◆ setBudgetSize()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setBudgetSize ( std::size_t budgetSize )

inline

set budget size

Parameters

[in] budgetSize size of budget.

Definition at line 176 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::budgetSize(), and shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_budgetSize.

◆ setC()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setC ( double value )

inline

set the value of the regularization parameter (must be positive)

Definition at line 443 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_C, and RANGE_CHECK.

◆ setEpochs()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setEpochs ( std::size_t value )

inline

Set the number of training epochs. A value of 0 indicates that the default of max(10, C) should be used.

Definition at line 407 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_epochs.

◆ setKernel()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setKernel ( KernelType * kernel )

inline

set the kernel function

Definition at line 423 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::kernel(), and shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel.

◆ setMinMargin()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setMinMargin ( double minMargin )

inline

set min margin

Parameters

[in] minMargin new min margin.

Definition at line 212 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_minMargin, and shark::KernelBudgetedSGDTrainer< InputType, CacheType >::minMargin().

◆ setParameterVector()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::setParameterVector ( RealVector const & newParameters )

inlinevirtual

Sets the vector of hyper-parameters.

Reimplemented from shark::IParameterizable<>.

Definition at line 464 of file KernelBudgetedSGDTrainer.h.

References shark::KernelBudgetedSGDTrainer< InputType, CacheType >::m_kernel, shark::IParameterizable< VectorType >::numberOfParameters(), shark::IParameterizable< VectorType >::setParameterVector(), and SHARK_ASSERT.

◆ train()

template<class InputType , class CacheType = float>

void shark::KernelBudgetedSGDTrainer< InputType, CacheType >::train	(	ClassifierType &	classifier,
		const LabeledData< InputType, unsigned int > &	dataset
	)