Getting Started with the Evaluator
The LensKit evaluator lets you train algorithms over data sets, measure their performance, and cross-validate the results for robustness. This page describes how to get started using the evaluator for a simple experiment.
LensKit’s evaluation capabilities are exposed as a plugin for Gradle, a Java-based tool for automating builds, tests, and other processes.
We’ve provided a template to help you get started. This template contains an example Gradle file, algorithm definitions, and the bootstrap scripts to run Gradle. You can get the template either by cloning it with Git, or by downloading the source as a zip archive from GitHub.
In addition to the template, you will need:
Java 7 or later.
A tool for analyzing the results. We provide an example IPython notebook that uses Pandas and matplotlib. If you do not yet have a scientific Python environment, Anaconda Python is an easy-to-install distribution that includes all the necessary pieces.
A LensKit experiment has several files and directories:
This file controls the entire build and evaluation process.
The configuration file for the algorithm to test. You’ll often have more than one of these.
This directory contains the Java sources for your custom recommender components, just like in standard Gradle and Maven projects.
This directory contains the Java tests for your custom components.
This directory contains the input data. In the quickstart template, the MovieLens data set is automatically downloaded and placed in this directory.
This directory is created by the Gradle build process and contains your compiled class files and the evaluator’s output.
Running the Evaluation
You can run the quickstart now, by running
If you are on Windows:
This will download the data, compile the custom Java code, and run the LensKit experiment.
You can then see the results by running
which will open a web browser, and you can choose the
analyze-output.ipynb notebook. After running the analysis, it should
look something like this.
What It Does
This experiment does a 5-fold cross-validation of three algorithms over the MovieLens 100K data set:
User-item average baseline
A custom re-implementation of user-item average baseline (to show you how to include LensKit code in the experiment project)
The cross-validation is done by partitioning users into 5 sets. Each set is used to produce a train-test pair of rating files; the test rating set contains 20% of each test user’s ratings, while the training set contains the remainder of their ratings along with all the ratings for the non-test users (the users in the other 4 partitions).
The following metrics are computed:
RMSE of rating predictions
nDCG of rating predictions (only considers test items, measuring the rank effectiveness of the recommender)
Mean reciprocal rank with 10-item recommendation lists, considering an item ‘relevant’ if the user rated it.
The example project defines a number of Gradle tasks. You have already
evaluate task. Some of the other tasks you can run are:
Compile all source code, including unit tests.
Compile the code and run the unit tests for the custom LensKit components.
Run the evaluation and then process the analysis notebook into a static HTML file
build/analysis.htmlcontaining the results of the experiment, with charts.
Delete all compiled code, temporary data files, and evaluation results, but not the downloaded data files.
Delete the downloaded data files.
Gradle automatically checks the dependencies of each task, and skips it if nothing affecting its operation has changed since the last time it was run.