public class Crossfolder extends Object
Partitions a data set for cross-validation.
The resulting data is placed in an output directory with the following files:
datasets.yaml
- a manifest file listing all the data setspartNN.train.csv
- a CSV file containing the train data for part NNpartNN.train.yaml
- a YAML manifest for the training data for part NNpartNN.test.csv
- a CSV file containing the test data for part NNpartNN.test.yaml
- a YAML manifest for the test data for part NNModifier and Type | Field and Description |
---|---|
static String |
ITEM_FILE_NAME |
Constructor and Description |
---|
Crossfolder() |
Crossfolder(String n) |
Modifier and Type | Method and Description |
---|---|
void |
execute()
Run the crossfold command.
|
List<DataSet> |
getDataSets()
Get the train-test splits as data sets.
|
EntityType |
getEntityType()
Get the entity type that this crossfolder will crossfold.
|
CrossfoldMethod |
getMethod()
Get the method to be used for crossfolding.
|
String |
getName()
Get the visible name of this crossfold split.
|
Path |
getOutputDir()
Get the output directory.
|
OutputFormat |
getOutputFormat()
Get the output format for the crossfolder.
|
int |
getPartitionCount()
Get the partition count.
|
boolean |
getSkipIfUpToDate() |
StaticDataSource |
getSource()
Get the data source backing this crossfold manager.
|
boolean |
getWriteTimestamps()
Query whether timestamps will be written.
|
void |
setEntityType(EntityType entityType)
Set the entity type that this crossfolder will crossfold.
|
Crossfolder |
setMethod(CrossfoldMethod meth)
Set the method to be used by the crossfolder.
|
Crossfolder |
setName(String n)
Set a name for this crossfolder.
|
Crossfolder |
setOutputDir(File dir)
Set the output directory for this crossfold operation.
|
Crossfolder |
setOutputDir(Path dir)
Set the output directory for this crossfold operation.
|
Crossfolder |
setOutputDir(String dir)
Set the output directory for this crossfold operation.
|
Crossfolder |
setOutputFormat(OutputFormat format)
Set the output format for the crossfolder.
|
Crossfolder |
setPartitionCount(int partition)
Set the number of partitions to generate.
|
Crossfolder |
setSkipIfUpToDate(boolean skip)
Set whether the crossfolder should skip if all files are up to date.
|
Crossfolder |
setSource(StaticDataSource src)
Set the data source.
|
Crossfolder |
setWriteTimestamps(boolean pack)
Configure whether to include timestamps in the output file.
|
String |
toString() |
public static final String ITEM_FILE_NAME
public Crossfolder()
public Crossfolder(String n)
public EntityType getEntityType()
Get the entity type that this crossfolder will crossfold.
public void setEntityType(EntityType entityType)
Set the entity type that this crossfolder will crossfold.
entityType
- The entity type to crossfold.public Crossfolder setPartitionCount(int partition)
Set the number of partitions to generate.
partition
- The number of paritionspublic int getPartitionCount()
Get the partition count.
public Crossfolder setOutputFormat(OutputFormat format)
Set the output format for the crossfolder.
format
- The output format.public OutputFormat getOutputFormat()
Get the output format for the crossfolder.
public Crossfolder setOutputDir(Path dir)
Set the output directory for this crossfold operation.
dir
- The output directory.public Crossfolder setOutputDir(File dir)
Set the output directory for this crossfold operation.
dir
- The output directory.public Crossfolder setOutputDir(String dir)
Set the output directory for this crossfold operation.
dir
- The output directory.public Path getOutputDir()
Get the output directory.
public Crossfolder setSource(StaticDataSource src)
Set the data source.
src
- public Crossfolder setMethod(CrossfoldMethod meth)
Set the method to be used by the crossfolder.
meth
- The method to use.public CrossfoldMethod getMethod()
Get the method to be used for crossfolding.
public Crossfolder setWriteTimestamps(boolean pack)
Configure whether to include timestamps in the output file.
pack
- true
to include timestamps (the default), false
otherwise.public boolean getWriteTimestamps()
Query whether timestamps will be written.
true
if output will include timestamps.public String getName()
Get the visible name of this crossfold split.
public Crossfolder setName(String n)
Set a name for this crossfolder. It will be used to generate the names of individual data sets, for example.
n
- The crossfolder name.public StaticDataSource getSource()
Get the data source backing this crossfold manager.
public Crossfolder setSkipIfUpToDate(boolean skip)
Set whether the crossfolder should skip if all files are up to date. The default is to always re-crossfold, even if the files are up to date.
skip
- true
to skip crossfolding if files are up to date.public boolean getSkipIfUpToDate()
public void execute()
Run the crossfold command. Write the partition files to the disk by reading in the source file.
public List<DataSet> getDataSets()
Get the train-test splits as data sets.