{"id":1143,"date":"2023-10-22T14:06:28","date_gmt":"2023-10-22T14:06:28","guid":{"rendered":"https:\/\/ml-gis-service.com\/?p=1143"},"modified":"2026-05-04T04:20:07","modified_gmt":"2026-05-04T04:20:07","slug":"how-should-you-evaluate-session-based-recommendations","status":"publish","type":"post","link":"https:\/\/ml-gis-service.com\/index.php\/2023\/10\/22\/how-should-you-evaluate-session-based-recommendations\/","title":{"rendered":"How should you evaluate session-based recommendations?"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Session-based recommendation engine in Python<\/p>\n<cite>part 2<\/cite><\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Previous parts<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><a href=\"https:\/\/ml-gis-service.com\/index.php\/2023\/09\/23\/which-movie-should-you-recommend-next-session-based-recommendation-engine-in-python-part-1\/\" data-type=\"link\" data-id=\"https:\/\/ml-gis-service.com\/index.php\/2023\/09\/23\/which-movie-should-you-recommend-next-session-based-recommendation-engine-in-python-part-1\/\">Which movie should you recommend next?<\/a><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2023\/10\/img1-1024x576.jpg\" alt=\"\" class=\"wp-image-1145\" style=\"aspect-ratio:1.7777777777777777;width:1064px;height:auto\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2023\/10\/img1-1024x576.jpg 1024w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2023\/10\/img1-300x169.jpg 300w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2023\/10\/img1-768x432.jpg 768w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2023\/10\/img1-1536x864.jpg 1536w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2023\/10\/img1.jpg 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Measuring model performance can be tricky if you work with a single output, but what should you do if model creates a sequence of items? The session-based recommenders may return one recommendation, but it will likely be irrelevant to a user. You may notice that almost all recommendations are generated in a sequence. The reason is simple: at least one product should be relevant among the five recommendations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Recommendation systems metrics<\/h2>\n\n\n\n<p>The basic metrics for recommender systems are <strong>Precision<\/strong>, <strong>Recall<\/strong>, and <strong>Mean Reciprocal Rank<\/strong> (<em>MRR<\/em>). Precision and Recall are known from classifiers. MRR is an evaluation procedure explicitly designed for sequential outputs. If you want to underline that those metrics are used for recommendations, you will use <code>@k<\/code> characters, where <code>k<\/code> is the number of items returned. <code>Precision@5<\/code> is the average precision of the five recommendations, and <code>MRR@20<\/code> means that twenty recommendations are evaluated.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example<\/h3>\n\n\n\n<p>A recommender created these outputs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>r1: [banana, cherry, tomato, avocado, strawberry]<\/li>\n\n\n\n<li>r2: [mango, banana, apple, blueberry, lemon]<\/li>\n\n\n\n<li>r3: [apple, watermelon, orange, pear, cherry]<\/li>\n<\/ul>\n\n\n\n<p>And the real fruits bought later by customers are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>c1: [avocado, tomato, lemon, orange]<\/li>\n\n\n\n<li>c2: [mango, grapes, watermelon, coconut, papaya, pineapple]<\/li>\n\n\n\n<li>c3: [apple, orange, cherry, banana, kiwi, grapefruit, banana, coconut, blueberry]<\/li>\n<\/ul>\n\n\n\n<p>With this information it is possible to calculate the session-based metrics.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Precision<\/h3>\n\n\n\n<p><strong>How many relevant items are present in the top-k recommendations?<\/strong> In other words, <em>precision is a fraction of relevant items to all items recommended<\/em>:<\/p>\n\n\n\n<p>$Precision@k = RelevantRecommendations \/ AllRecommendations$<\/p>\n\n\n\n<p>Based on the examples, <code>Precision@5<\/code> is:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prec1: 0.4<\/li>\n\n\n\n<li>Prec2: 0.2<\/li>\n\n\n\n<li>Prec3: 0.6<\/li>\n<\/ul>\n\n\n\n<p>Usually, you are not interested in a single reading but the average, so finally you get <code>Precision@k<\/code> equals to <code>0.4<\/code>.<\/p>\n\n\n\n<p><strong>Important! <\/strong>Precision doesn\u2019t depend on the position of relevant items in a sequence. Precision informs only about the existence of the relevant item in a sequence. Think about it this way: what will be <code>Precision@inf<\/code> when the recommender returns all products or all movies from a database?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Recall<\/h3>\n\n\n\n<p><strong>How many relevant items are returned from all possible relevant items for a user<\/strong>? Recall is a <em>fraction of relevant items from recommendation to ALL relevant items<\/em>:<\/p>\n\n\n\n<p>$Recall@k = RelevantRecommendations \/ AllRelevantItems$<\/p>\n\n\n\n<p>Based on examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Rec1: 0.5<\/li>\n\n\n\n<li>Rec2: 0.17<\/li>\n\n\n\n<li>Rec3: 0.34<\/li>\n<\/ul>\n\n\n\n<p>The average <code>Recall@5<\/code> is equal to <code>0.34<\/code>.<\/p>\n\n\n\n<p><strong>Important!<\/strong> Recall doesn\u2019t depend on the position of relevant items in a sequence. Due to the fact that recommendations have limited number of items Recall may never be close to 1. Why? Consider a scenario: you set your system for five recommendations, but the average customer buys ten or more products. Even if all five items are relevant, there are still five products that the recommender didn\u2019t show. Thus, the maximum recall will be equal to 0.5. This parameter may not be useful with a large product space.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Mean Reciprocal Rank<\/h3>\n\n\n\n<p>This metric is positional. It tells how fast relevant products occurred in a recommended sequence. Reciprocal Rank is calculated as the inverted position of the first relevant item in a sequence, and the mean is taken from multiple tested sequences from user actions.<\/p>\n\n\n\n<p>Based on examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MRR1: 0.33 (1\/3 &#8211; first relevant item on position 3),<\/li>\n\n\n\n<li>MRR2: 1 (1\/1)<\/li>\n\n\n\n<li>MRR3: 1 (1\/1)<\/li>\n<\/ul>\n\n\n\n<p>The average <strong>MRR@5<\/strong> is equal to (1\/3 + 1 + 1) \/ 3 = <strong>0.77<\/strong><\/p>\n\n\n\n<p><strong>Important! <\/strong>Only the first occurrence is counted. MRR doesn\u2019t count all relevant items from the recommendation. A researcher might consider using <code>Precision@k<\/code> with <code>MRR@k<\/code> to cover more system properties.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Scoring recommendations<\/h2>\n\n\n\n<p>Scoring recommendations is fairly easy when you use built-in functions from the session-based recommendation engines. Package <code>WSKNN<\/code> has function <code>score_model()<\/code>. It calculates the MRR, Precision, and Recall.<\/p>\n\n\n\n<p>As you may recall from <a href=\"https:\/\/ml-gis-service.com\/index.php\/2023\/09\/23\/which-movie-should-you-recommend-next-session-based-recommendation-engine-in-python-part-1\/\" data-type=\"link\" data-id=\"https:\/\/ml-gis-service.com\/index.php\/2023\/09\/23\/which-movie-should-you-recommend-next-session-based-recommendation-engine-in-python-part-1\/\">the previous blog post<\/a>, a fitted model takes multiple parameters. The number of recommendations is usually fixed, because this number is forced by a business logic. Most of the time we control:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>number of the closest neighbors<\/li>\n\n\n\n<li>the possible neighbors sampling strategy from <code>['common_items', 'recent', 'random']<\/code><\/li>\n\n\n\n<li>the possible neighbors sample size<\/li>\n\n\n\n<li>session weighting strategy from <code>['linear', 'log', 'quadratic']<\/code><\/li>\n\n\n\n<li>session\u2019s items ranking strategy from <code>['inv', 'linear', 'log', 'quadratic']<\/code><\/li>\n<\/ul>\n\n\n\n<p>Thus, there are five parameters, two of those are numbers and three other have fixed values that relate to the sampling and ranking logic. It is not easy to pick the best set of parameters for our model without evaluation. In this guide you will learn how could you do it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span id=\"docs-internal-guid-36014745-7fff-cc19-2f0c-580786767b7a\" style=\"font-size:16pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;\"><\/span>Step-by-step Coding Guide<\/h2>\n\n\n\n<p>If you didn\u2019t do it before, setup your environment as in steps from 1 to 3. If you have done it before, follow the code from step 4.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1<\/h3>\n\n\n\n<p>Download the MovieLens dataset (MovieLens 100k). You can get data from the tutorial\u2019s repository here: [1].<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2<\/h3>\n\n\n\n<p>Create <code>mamba<\/code> environment or virtual environment.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mamba create -n movie-recommender Python=\u201d3.10\u201d<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3<\/h3>\n\n\n\n<p>Activate the environment, install <code>pip<\/code> and <code>notebook<\/code> from <code>mamba<\/code>, and then install <code>wsknn<\/code> from <code>pip<\/code>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"bash\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">mamba activate movie-recommender\n(movie-recommender) mamba install pip notebook\n(movie-recommender) pip install wsknn<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4<\/h3>\n\n\n\n<p>Open Jupyter Notebook and create a new Python3 notebook.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5<\/h3>\n\n\n\n<p>In the first cell, import the required packages and functions.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from typing import Dict, List, Union\n\nimport numpy as np\nimport pandas as pd\nfrom tqdm import tqdm\n\nfrom wsknn import fit\nfrom wsknn.evaluate import score_model\nfrom wsknn.preprocessing.parse_static import parse_flat_file\n<\/pre>\n\n\n\n<p>The <code>typing<\/code> package\u2019s objects are used for type hinting in custom functions. <code>Numpy<\/code> and <code>pandas<\/code> are packages for data transformations, and <code>tqdm<\/code> shows a progress bar when multiple models are tested and scored.<\/p>\n\n\n\n<p><code>WSKNN<\/code> methods <code>fit()<\/code> and <code>parse_flat_file()<\/code> were covered in the last article. The core function here is <code>score_model()<\/code>. The function takes the trained model, and validation dataset to calculate MRR, Precision, and Recall. But first, you need to prepare data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6<\/h3>\n\n\n\n<p>Read and prepare training and validation datasets.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def train_validate_samples(set_of_sessions):\n    \n    sessions_keys = list(set_of_sessions.keys())\n    n_sessions = int(0.1 * len(sessions_keys))\n    key_sample = np.random.choice(sessions_keys, n_sessions)\n    \n    training_set = {_key: set_of_sessions[_key] for _key in sessions_keys if _key not in key_sample}\n    validation_set = [set_of_sessions[_key] for _key in key_sample]\n    \n    return training_set, validation_set\n\n\nfpath = 'ml-100k\/u.data'\nds = parse_flat_file(fpath, sep='\\t', session_index=0, product_index=1, time_index=3, time_to_numeric=True)\n\ntraining_ds, validation_ds = train_validate_samples(ds[1].session_items_actions_map)\n<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7<\/h3>\n\n\n\n<p>The most convenient method of performing experiments and storing results is within a Python class. A basic implementation can be:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">class TestModels:\n\n    def __init__(self, training_set: Dict, test_set: List, psets: List):\n        self.training_set = training_set\n        self.test_set = test_set\n        self.psets = psets\n        self.scoring_results = self.get_scoring()\n\n    def get_scoring(self):\n        \"\"\"\n        Method scores multiple different models\n        \"\"\"\n        scorings = []\n        for params in tqdm(self.psets):\n            model = fit(sessions=self.training_set, **params)\n            scores = score_model(sessions=self.test_set, trained_model=model, k=5)\n            scores.update(params)\n            scorings.append(scores)\n\n        scoring_results = pd.DataFrame(scorings)\n        return scoring_results\n\n    def scores(self):\n        return self.scoring_results\n<\/pre>\n\n\n\n<p>Next, you can test the class implementation with a small set of models. You need to define list of 2-3 parameters sets and push it into the <code>TestModels<\/code> class along with a training, and a test datasets. Here are dictionaries with fixed parameters:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Neighbors are most recent sessions\n# Items are weighted and ranked by log function - newest items in the session are most important\n\nparameter_set_recent_log_log = {\n    'number_of_recommendations': 5,\n    'number_of_neighbors': 10,\n    'sampling_strategy': 'recent',\n    'sample_size': 50,\n    'weighting_func': 'log',\n    'ranking_strategy': 'log',\n    'return_events_from_session': False,\n    'recommend_any': False\n}\n\n# Neighbors are sampled based on the common items\n# Items are weighted and ranked by linear function\n\nparameter_set_common_lin_lin = {\n    'number_of_recommendations': 5,\n    'number_of_neighbors': 10,\n    'sampling_strategy': 'common_items',\n    'sample_size': 50,\n    'weighting_func': 'linear',\n    'ranking_strategy': 'linear',\n    'return_events_from_session': False,\n    'recommend_any': False\n}\n\n# Neighbors are sampled randomly\n# Items are weighted by log function and then ranked by their inverted position in a sequence (1\/i)\n\nparameter_set_random_log_inv = {\n    'number_of_recommendations': 5,\n    'number_of_neighbors': 10,\n    'sampling_strategy': 'random',\n    'sample_size': 50,\n    'weighting_func': 'log',\n    'ranking_strategy': 'inv',\n    'return_events_from_session': False,\n    'recommend_any': False\n}\n<\/pre>\n\n\n\n<p>To get scores pass those dictionaries into <code>TestModels<\/code> instance:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">scorer = TestModels(training_ds,\n                   validation_ds,\n                   [\n                       parameter_set_recent_log_log,\n                       parameter_set_common_lin_lin,\n                       parameter_set_random_log_inv\n                   ])\n\ndf = scorer.scores()\n\nprint(df.head())\n<\/pre>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><\/td><td><strong>MRR<\/strong><\/td><td><strong>Precision<\/strong><\/td><td><strong>Recall<\/strong><\/td><td><strong>sampling_strategy<\/strong><\/td><td><strong>weighting_func<\/strong><\/td><td><strong>ranking_strategy&nbsp;<\/strong><\/td><\/tr><tr><td><strong>0<\/strong><\/td><td>0.796099<\/td><td>0.610638<\/td><td>0.048660&nbsp;<\/td><td>recent&nbsp;<\/td><td>log<\/td><td>log<\/td><\/tr><tr><td><strong>1<\/strong><\/td><td>0.714184&nbsp;<\/td><td>0.497872<\/td><td>0.040337&nbsp;<\/td><td>common_items&nbsp;<\/td><td>linear<\/td><td>linear<\/td><\/tr><tr><td><strong>2<\/strong><\/td><td>0.751241<\/td><td>0.623404&nbsp;<\/td><td>0.052864&nbsp;<\/td><td>random<\/td><td>log<\/td><td>inv<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">Results of the initial class check.<\/figcaption><\/figure>\n\n\n\n<p>Columns <code>number_of_recommendations, number_of_neighbors, sample_size, return_events_from_session<\/code>, and <code>recommend_any<\/code> are hidden in the table above because those parameters are fixed. Scoring differences are mostly noticeable when you control <code>sampling_strategy, weighting_func<\/code>, and <code>ranking_strategy<\/code> parameters.The class works as expected. The bigger parameter space may be analyzed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 8<\/h3>\n\n\n\n<p>Writing each possible dictionary manually would be a tedious task. You can define a function that will create a number of configurations to try.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def generate_parameter_sets(number_of_recommendations: Union[List, int] = 5,\n                            number_of_neighbors: Union[List, int] = 10,\n                            sample_size: Union[List, int] = 100,\n                            return_events_from_session: bool = False,\n                            required_sampling_event = None,\n                            required_sampling_event_index: int = None,\n                            sampling_str_event_weights_index: int = None,\n                            recommend_any: bool = False):\n    \"\"\"\n    Function generates multiple parameter sets.\n    \"\"\"\n    if isinstance(number_of_recommendations, int):\n        number_of_recommendations = [number_of_recommendations]\n\n    if isinstance(number_of_neighbors, int):\n        number_of_neighbors = [number_of_neighbors]\n\n    if isinstance(sample_size, int):\n        sample_size = [sample_size]\n\n    sampling_strategies = ['common_items', 'recent', 'random']\n    weighting_funcs = ['linear', 'log', 'quadratic']\n    ranking_strategies = ['inv', 'linear', 'log', 'quadratic']\n\n    parameters_sets = []\n\n    for n_recs in number_of_recommendations:\n        for n_neighb in number_of_neighbors:\n            for s_size in sample_size:\n                for s_strategy in sampling_strategies:\n                    for weight_f in weighting_funcs:\n                        for rank_s in ranking_strategies:\n                            d = {\n                                'number_of_recommendations': n_recs,\n                                'number_of_neighbors': n_neighb,\n                                'sampling_strategy': s_strategy,\n                                'sample_size': s_size,\n                                'weighting_func': weight_f,\n                                'ranking_strategy': rank_s,\n                                'return_events_from_session': return_events_from_session,\n                                'recommend_any': recommend_any,\n                                'required_sampling_event': required_sampling_event,\n                                'required_sampling_event_index': required_sampling_event_index,\n                                'sampling_str_event_weights_index': sampling_str_event_weights_index\n                            }\n                            parameters_sets.append(d)\n    return parameters_sets\n\npgrid = generate_parameter_sets(number_of_neighbors=[10, 20, 50], sample_size=[100, 200, 500])\n\nprint(len(pgrid))\n<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>> 324<\/pre>\n\n\n\n<p>There are 324 model configurations with a mixed number of the closest neighbors, possible neighbors sample sizes, sampling strategies, weighting functions, and ranking strategies. It will take some time to check every model configuration by the <code>TestModels<\/code> class. It has a progress bar, so we will know how long it takes to get the results.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">scorer = TestModels(training_ds,\n                   validation_ds,\n                   pgrid)\n\ndf = scorer.scores()\n<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 9<\/h3>\n\n\n\n<p>The last step is to check scores and which configurations are the best for specific metrics. It is unlikely that all three metrics will be the highest possible for a single configuration.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Optimal configuration &#8211; MRR<\/h4>\n\n\n\n<p>Which configuration returns the relevant items in the best positions in a recommended sequence?<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">df.sort_values('MRR', ascending=False).head(1)\n<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MRR: 0.855674<\/li>\n\n\n\n<li>Precision: 0.691489<\/li>\n\n\n\n<li>Recall: 0.059915<\/li>\n\n\n\n<li>The number of closest neighbors (`number_of_neighbors`): 50<\/li>\n\n\n\n<li>Sampling strategy: recent<\/li>\n\n\n\n<li>Possible neighbors sample size: 500<\/li>\n\n\n\n<li>Weighting function: log<\/li>\n\n\n\n<li>Ranking Strategy: inv<\/li>\n<\/ul>\n\n\n\n<p>The configuration can be translated to the natural language as: the optimal MRR is achieved when you sample 500 of the possible neighbors based on the recency of their actions. Neighbors similarity is calculated based on the assumption that the newest elements in a sequence have the highest weights. Similarly, the final recommendations weighting takes into account the position of an item in a sequence (first movie has the highest score).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Optimal configuration &#8211; Precision<\/h4>\n\n\n\n<p>Which configuration returns the highest ratio of the relevant items to the sequence items?<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">df.sort_values('Precision', ascending=False).head(1)\n<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MRR: 0.825532<\/li>\n\n\n\n<li>Precision: 0.702128<\/li>\n\n\n\n<li>Recall: 0.061024<\/li>\n\n\n\n<li>The number of closest neighbors (`number_of_neighbors`): 50<\/li>\n\n\n\n<li>Sampling strategy: random<\/li>\n\n\n\n<li>Possible neighbors sample size: 500<\/li>\n\n\n\n<li>Weighting function: log<\/li>\n\n\n\n<li>Ranking Strategy: quadratic<\/li>\n<\/ul>\n\n\n\n<p><p dir=\"ltr\" style=\"line-height:1.38;margin-top:0pt;margin-bottom:0pt;\" id=\"docs-internal-guid-a24e6a2a-7fff-019d-d4b7-6c4d26913544\"><span style=\"font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;\">The optimal Precision is achieved when you take a random sample of 500 possible neighbors. The neighbors similarity is calculated based on the assumption that the newest elements in a sequence have much higher weights than older elements. Similarly, the final recommendations weighting uses the quadratic weighting function that assigns large weights to the first elements in a sequence and very small weights to the last elements.<\/span><\/p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Optimal configuration &#8211; Recall<\/h4>\n\n\n\n<p><span id=\"docs-internal-guid-29efc15e-7fff-5e6c-6153-4a8e877c40a6\" style=\"font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;\">Which configuration returns the highest ratio of the relevant items to the all relevant items for a user?<\/span><\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">df.sort_values(\u2018Recall', ascending=False).head(1)\n<\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MRR: 0.825532<\/li>\n\n\n\n<li>Precision: 0.702128<\/li>\n\n\n\n<li>Recall: 0.061024<\/li>\n\n\n\n<li>The number of closest neighbors (`number_of_neighbors`): 50<\/li>\n\n\n\n<li>Sampling strategy: random<\/li>\n\n\n\n<li>Possible neighbors sample size: 500<\/li>\n\n\n\n<li>Weighting function: log<\/li>\n\n\n\n<li>Ranking Strategy: quadratic<\/li>\n<\/ul>\n\n\n\n<p>It is the same configuration as for Precision (and the reason is simple: we use fixed-length window for testing relevant items). The more interesting is the fact that Recall is very low. It means that the fraction of recommendations (5) to all relevant items is tiny; users usually watch many more movies than 5.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>In this article, you have learned how to score session-based recommendations with <strong>theoretical metrics<\/strong>. Be warned that those metrics shouldn\u2019t be the only reason you implement a model in one configuration, not another. Sometimes, you must put business logic first and decide to push different parameters into a production. And in the recommendation systems reality, the most valuable metrics are those from business analytics: monetization or click-through rates.<\/p>\n\n\n\n<p>In the next chapter, you will tweak a system to force it to follow business logic. The session events (watched movies) will take custom weights.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bibliography<\/h2>\n\n\n\n<p>[1] https:\/\/github.com\/SimonMolinsky\/blog-post-wsknn-movielens<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Scoring recommender systems<\/p>\n","protected":false},"author":1,"featured_media":1160,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,251,19,3,249,250],"tags":[161,261,262,258,259,7,260,257,256,252,254],"class_list":["post-1143","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-e-commerce","category-machine-learning","category-python","category-recommendation-engine","category-wsknn","tag-e-commerce","tag-map","tag-mar","tag-mrr","tag-precisionk","tag-python","tag-recallk","tag-recommender-engine","tag-recommender-system","tag-session-based-recommendations","tag-wsknn"],"_links":{"self":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/1143","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/comments?post=1143"}],"version-history":[{"count":17,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/1143\/revisions"}],"predecessor-version":[{"id":1256,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/1143\/revisions\/1256"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media\/1160"}],"wp:attachment":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media?parent=1143"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/categories?post=1143"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/tags?post=1143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}