public class TrainTestExperiment extends Object
Sets up and runs train-test evaluations. This class can be used directly, but it will usually be controlled from the train-test
command line tool in turn driven by a Gradle script. For a simpler way to programatically run an evaluation, see SimpleEvaluator
, which provides a simplified interface to train-test evaluations with cross-validation.
A train-test experiment experiment consists of three things:
Global output is aggregated into a CSV file; individual tasks or metrics may produce additional files.
Constructor and Description |
---|
TrainTestExperiment() |
Modifier and Type | Method and Description |
---|---|
void |
addAlgorithm(AlgorithmInstance algo)
Add an algorithm to the experiment.
|
void |
addAlgorithm(String name,
groovy.lang.Closure<?> block)
Add an algorithm configured by a Groovy closure.
|
void |
addAlgorithm(String name,
Path file)
Add one or more algorithms by loading a config file.
|
void |
addAlgorithms(List<AlgorithmInstance> algos)
Add multiple algorithm instances.
|
void |
addAlgorithms(Path file)
Add one or more algorithms from a configuration file.
|
void |
addDataSet(DataSet ds)
Add a data set.
|
void |
addDataSets(List<DataSet> dss)
Add several data sets.
|
void |
addTask(EvalTask task)
Add an evaluation task.
|
Table |
execute()
Run the experiment.
|
List<AlgorithmInstance> |
getAlgorithms()
Get the algorithm instances.
|
Path |
getCacheDirectory()
Get the cache directory for model components.
|
ClassLoader |
getClassLoader()
Get the class loader for this experiment.
|
List<DataSet> |
getDataSets()
Get the list of data sets to use.
|
Path |
getOutputFile()
Get the primary output file.
|
ExperimentOutputLayout |
getOutputLayout() |
boolean |
getShareModelComponents()
Query whether this experiment will cache and share components.
|
List<EvalTask> |
getTasks()
Get the eval tasks to be used in this experiment.
|
int |
getThreadCount()
Get the number of threads that the experiment may use.
|
Path |
getUserOutputFile()
Get the per-user output file.
|
static TrainTestExperiment |
load(Path file)
Load a train-test experiment from a YAML file.
|
void |
setCacheDirectory(Path dir)
Set the cache directory for model components.
|
void |
setClassLoader(ClassLoader loader)
Set the class loader for this experiment.
|
void |
setOutputFile(Path out)
Set the primary output file.
|
void |
setShareModelComponents(boolean shares)
Control whether model components will be shared.
|
void |
setThreadCount(int tc)
Set the number of threads the experiment may use.
|
void |
setUserOutputFile(Path file)
Set the per-user output file.
|
public void setOutputFile(Path out)
Set the primary output file.
out
- The file where the primary aggregate output should go.public Path getOutputFile()
Get the primary output file.
public Path getUserOutputFile()
Get the per-user output file.
public void setUserOutputFile(Path file)
Set the per-user output file.
file
- The file for per-user measurements.public List<AlgorithmInstance> getAlgorithms()
Get the algorithm instances.
public void addAlgorithm(AlgorithmInstance algo)
Add an algorithm to the experiment.
algo
- The algorithm to add.public void addAlgorithms(List<AlgorithmInstance> algos)
Add multiple algorithm instances.
algos
- The algorithm instances to add.public void addAlgorithm(String name, groovy.lang.Closure<?> block)
Add an algorithm configured by a Groovy closure. Mostly useful for testing.
name
- The algorithm name.block
- The algorithm configuration block.public void addAlgorithm(String name, Path file)
Add one or more algorithms by loading a config file.
name
- The algorithm name.file
- The config file to load.public void addAlgorithms(Path file)
Add one or more algorithms from a configuration file.
file
- The configuration file.public List<DataSet> getDataSets()
Get the list of data sets to use.
public void addDataSet(DataSet ds)
Add a data set.
ds
- The data set to add.public void addDataSets(List<DataSet> dss)
Add several data sets.
dss
- The data sets to add.public boolean getShareModelComponents()
Query whether this experiment will cache and share components.
true
if model components will be shared.setShareModelComponents(boolean)
public void setShareModelComponents(boolean shares)
Control whether model components will be shared. If setCacheDirectory(Path)
is also set, components will be cached on disk; otherwise, they will be opportunistically shared in memory.
Cached output improves throughput and memory use, but makes build times effectively meaningless. It is turned on by default, but turn it off if you want to measure recommender build times.
shares
- true
to enable caching of shared model components.public Path getCacheDirectory()
Get the cache directory for model components.
public void setCacheDirectory(Path dir)
Set the cache directory for model components.
dir
- The directory where model components will be cached.public int getThreadCount()
Get the number of threads that the experiment may use.
public void setThreadCount(int tc)
Set the number of threads the experiment may use.
tc
- The number of threads that the experiment may use. If 0 (the default), consults the property lenskit.eval.threadCount
, and if that is unset, uses as many threads as there are available processors according to Runtime.availableProcessors()
.public ClassLoader getClassLoader()
Get the class loader for this experiment.
public void setClassLoader(ClassLoader loader)
Set the class loader for this experiment.
loader
- The class loader to use.public List<EvalTask> getTasks()
Get the eval tasks to be used in this experiment.
public void addTask(EvalTask task)
Add an evaluation task.
task
- An evaluation task to run.public Table execute()
Run the experiment.
public ExperimentOutputLayout getOutputLayout()
public static TrainTestExperiment load(Path file) throws IOException
Load a train-test experiment from a YAML file.
file
- The file to load.IOException