Trainset class¶
-
class
surprise.Trainset(ur, ir, n_users, n_items, n_ratings, rating_scale, raw2inner_id_users, raw2inner_id_items)[source]¶ A trainset contains all useful data that constitute a training set.
It is used by the
fit()method of every prediction algorithm. You should not try to build such an object on your own but rather use theDataset.folds()method or theDatasetAutoFolds.build_full_trainset()method.Trainsets are different from
Datasets. You can think of aDatasetas the raw data, and Trainsets as higher-level data where useful methods are defined. Also, aDatasetmay be comprised of multiple Trainsets (e.g. when doing cross validation).-
ur¶ The users ratings. This is a dictionary containing lists of tuples of the form
(item_inner_id, rating). The keys are user inner ids.- Type
defaultdictoflist
-
ir¶ The items ratings. This is a dictionary containing lists of tuples of the form
(user_inner_id, rating). The keys are item inner ids.- Type
defaultdictoflist
-
n_users¶ Total number of users \(|U|\).
-
n_items¶ Total number of items \(|I|\).
-
n_ratings¶ Total number of ratings \(|R_{train}|\).
-
rating_scale¶ The minimum and maximal rating of the rating scale.
- Type
tuple
-
global_mean¶ The mean of all ratings \(\mu\).
-
all_ratings()[source]¶ Generator function to iterate over all ratings.
- Yields
A tuple
(uid, iid, rating)where ids are inner ids (see this note).
-
build_anti_testset(fill=None)[source]¶ Return a list of ratings that can be used as a testset in the
test()method.The ratings are all the ratings that are not in the trainset, i.e. all the ratings \(r_{ui}\) where the user \(u\) is known, the item \(i\) is known, but the rating \(r_{ui}\) is not in the trainset. As \(r_{ui}\) is unknown, it is either replaced by the
fillvalue or assumed to be equal to the mean of all ratingsglobal_mean.- Parameters
fill (float) – The value to fill unknown ratings. If
Nonethe global mean of all ratingsglobal_meanwill be used.- Returns
A list of tuples
(uid, iid, fill)where ids are raw ids.
-
build_testset()[source]¶ Return a list of ratings that can be used as a testset in the
test()method.The ratings are all the ratings that are in the trainset, i.e. all the ratings returned by the
all_ratings()generator. This is useful in cases where you want to to test your algorithm on the trainset.
-
knows_item(iid)[source]¶ Indicate if the item is part of the trainset.
An item is part of the trainset if the item was rated at least once.
- Parameters
iid (int) – The (inner) item id. See this note.
- Returns
Trueif item is part of the trainset, elseFalse.
-
knows_user(uid)[source]¶ Indicate if the user is part of the trainset.
A user is part of the trainset if the user has at least one rating.
- Parameters
uid (int) – The (inner) user id. See this note.
- Returns
Trueif user is part of the trainset, elseFalse.
-
to_inner_iid(riid)[source]¶ Convert an item raw id to an inner id.
See this note.
- Parameters
riid (str) – The item raw id.
- Returns
The item inner id.
- Return type
int
- Raises
ValueError – When item is not part of the trainset.
-
to_inner_uid(ruid)[source]¶ Convert a user raw id to an inner id.
See this note.
- Parameters
ruid (str) – The user raw id.
- Returns
The user inner id.
- Return type
int
- Raises
ValueError – When user is not part of the trainset.
-