Metadata-Version: 2.1
Name: recsys_slates_dataset
Version: 0.0.4
Summary: Recommender Systems Dataset from FINN.no containing the presented items and whether and what the user clicked on.
Home-page: https://github.com/finn-no/recsys_slates_dataset/tree/main/
Author: simen eide
Author-email: simeneide@gmail.com
License: License to be decided
Description: # FINN.no Slate Dataset for Recommender Systems
        > Data and helper functions for FINN.no slate dataset containing both viewed items and clicks from the FINN.no second hand marketplace.
        
        
        We release the *FINN.no slate dataset* to improve recommender systems research.
        The dataset includes both search and recommendation interactions between users and the platform over a 30 day period.
        The dataset has logged both exposures and clicks, *including interactions where the user did not click on any of the items in the slate*.
        To our knowledge there exist no such large-scale dataset, and we hope this contribution can help researchers constructing improved models and improve offline evaluation metrics.
        
        ![A visualization of a presented slate to the user on the frontpage of FINN.no](finn-frontpage.png)
        
        For each user u and interaction step t we recorded all items in the visible slate ![equ](https://latex.codecogs.com/gif.latex?a_t^u(s_t^u) ) (up to the scroll length ![equ](https://latex.codecogs.com/gif.latex?s_t^u)), and the user's click response ![equ](https://latex.codecogs.com/gif.latex?c_t^u).
        The dataset consists of 37.4 million interactions, |U| ≈ 2.3) million  users and |I| ≈ 1.3 million items that belong to one of G = 290 item groups. For a detailed description of the data please see the [paper](https://arxiv.org/abs/2104.15046).
        
        ![A visualization of a presented slate to the user on the frontpage of FINN.no](interaction_illustration.png)
        
        FINN.no is the leading marketplace in the Norwegian classifieds market and provides users with a platform to buy and sell general merchandise, cars, real estate, as well as house rentals and job offerings.
        For questions, email simen.eide@finn.no or file an issue.
        
        ## Install
        
        `pip install recsys_slates_dataset`
        
        ## How to use
        
        To download the generic numpy data files:
        
        ```
        from recsys_slates_dataset import datahelper
        datahelper.download_data_files(data_dir="data")
        ```
        
        Download and prepare data into ready-to-use pytorch dataloaders:
        
        ``` python
        from recsys_slates_dataset import dataset_torch
        ind2val, itemattr, dataloaders = dataset_torch.load_dataloaders(data_dir="data")
        ```
        
        ## Organization
        The repository is organized as follows:
        - The dataset is placed in `data/` and stored using git-lfs. We also provide an automatic download function in the pip package (preferred usage).
        - The code open sourced from the article ["Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling"](https://arxiv.org/abs/2104.15046) is found in (`code_eide_et_al21/`). However, we are in the process of making the data more generally available which makes the code incompatible with the current (newer) version of the data. Please use [the v1.0 release of the repository](https://github.com/finn-no/recsys-slates-dataset/tree/v1.0) for a compatible version of the code and dataset.
        
        ## Quickstart dataset [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/finn-no/recsys-slates-dataset/blob/master/examples/quickstart-finn-recsys-slate-data.ipynb)
        We provide a quickstart jupyter notebook that runs on Google Colab (quickstart-finn-recsys-slate-data.ipynb) which includes all necessary steps above.
        It gives a quick introduction to how to use the dataset.
        
        ## Citations
        This repository accompany the paper ["Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling"](https://arxiv.org/abs/2104.15046) by Simen Eide, David S. Leslie and Arnoldo Frigessi.
        The article is under review, and the pre-print can be obtained [here](https://arxiv.org/abs/2104.15046).
        
        If you use either the code, data or paper, please consider citing the paper.
        
        ```
        @article{eide2021dynamic,
              title={Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling}, 
              author={Simen Eide and David S. Leslie and Arnoldo Frigessi},
              year={2021},
              eprint={2104.15046},
              archivePrefix={arXiv},
              primaryClass={stat.ML}
        }
        ```
        
        ## Todo
        This repository is currently *work in progress*, and we will provide descriptions and tutorials. Suggestions and contributions to make the material more available is welcome.
        There are some features of the repository that we are working on:
        
        - [x] Release the dataset as numpy objects instead of pytorch arrays. This will help non-pytorch users to more easily utilize the data
        - [x] Maintain a pytorch dataset for easy usage
        - [x] Create a pip package for easier installation and usage. the package should download the dataset using a function.
        - [x] Make the quickstart guide compatible with the pip package and numpy format.
        - [ ] The git lfs is currently broken by removing some lines in .gitattributes that is in conflict with nbdev. The dataset is still usable using the building download functions as they use a different source. However, we should fix this. An issue is [posted on nbdev](https://github.com/fastai/nbdev/issues/506).
        - [ ] Add easily useable functions that compute relevant metrics such as hitrate, log-likelihood etc.
        - [ ] Distribute the data on other platforms such as kaggle.
        - [ ] Add a short description of the data in the readme.md directly.
        
        As the current state is in early stage, it makes sense to allow the above changes non-backward compatible. 
        However, this should be completed within the next couple of months.
        
Keywords: recommender systems,dataset,slates,recsys
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown
