{"id":203,"date":"2021-05-20T21:51:29","date_gmt":"2021-05-20T21:51:29","guid":{"rendered":"https:\/\/ml-gis-service.com\/?p=203"},"modified":"2021-07-18T09:50:50","modified_gmt":"2021-07-18T09:50:50","slug":"spatial-interpolation-101-interpolation-of-mercury-concentrations-in-the-mediterranean-sea-with-idw-and-python","status":"publish","type":"post","link":"https:\/\/ml-gis-service.com\/index.php\/2021\/05\/20\/spatial-interpolation-101-interpolation-of-mercury-concentrations-in-the-mediterranean-sea-with-idw-and-python\/","title":{"rendered":"Spatial Interpolation 101: Interpolation of Mercury Concentrations in the Mediterranean Sea with IDW and Python"},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><strong>Build your first spatial interpolation algorithm in Python<\/strong><\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">&lt;&lt; Previous part: <a href=\"https:\/\/ml-gis-service.com\/index.php\/2020\/11\/05\/spatial-interpolation-101-introduction-to-inverse-distance-weighting-interpolation-technique\/\" data-type=\"post\" data-id=\"64\">Spatial Interpolation 101: Introduction to Inverse Distance Weighting Interpolation Technique<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Spatial interpolation techniques are very useful tools for environmental protection cases. Imagine that we have to perform analysis of the soil composition near open pit mine. It could be very expensive to sample each square meter of this area. We&#8217;re not able to take thousands of samples, but few dozens is within our reach. With those hundred available we can perform interpolation to estimate area\u2019s contamination levels at unsampled locations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can use multiple interpolation techniques for this task. <em>Inverse Distance Weighting<\/em> (IDW) is one of them and we are going to uncover it in this lesson. It is the simplest and good for building vanilla model for further tuning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this article we\u2019ll use IDW for<strong> interpolation of mercury concentrations near the coast of the Mediterranean Sea.<\/strong> <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We\u2019ll learn how to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Implement Inverse Distance Weighting algorithm in Python and&nbsp;Numpy,<\/li><li>Transform spatial data for modeling purposes.<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Data Sources<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>Mercury concentrations dataset in this article comes from: Cinnirella, Sergio; Bruno, Delia Evelina; Pirrone, Nicola; Horvat, Milena; \u017divkovi\u0107, Igor; Evers, David; Johnson, Sarah; Sunderland, Elsie (2019): Mercury concentrations in biota in the Mediterranean Sea, a compilation of 40 years of surveys. PANGAEA,&nbsp;<a href=\"https:\/\/doi.org\/10.1594\/PANGAEA.899723\">https:\/\/doi.org\/10.1594\/PANGAEA.899723<\/a><\/li><li>And I\u2019ve found this dataset and article by the <strong>GEOSS search engine<\/strong>: Geoportal:&nbsp;<a href=\"https:\/\/www.geoportal.org\/\">https:\/\/www.geoportal.org<\/a><\/li><li>You may check lesson\u2019s code, notebooks, data and shapefiles here: <a href=\"https:\/\/github.com\/szymon-datalions\/geoprocessing\/tree\/master\/data%20analysis\/notebooks\/spatial_101\/0102_spatial_analysis\">Github repository<\/a><\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Theoretical Background<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Mercury is a neurotoxin. It accumulates in sea organisms and indirectly in our bodies as a <em>methylmercury<\/em> (MeHg). U.S. EPA guidelines tells us that <em>methylmercury<\/em> in high concentrations:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>affects our sight (loss of peripheral vision),<\/li><li>affects our muscles (weakness of muscle, impaired coordination, problems with speech, hearing and walking),<\/li><li>changes our neural responses and creates anxious sensation of \u201cpins and needles\u201d, usually in the hands, feet, and around the mouth,<\/li><li>is very dangerous for unborn infants\u2019 brains and nervous systems [1].<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">There are some publications that MeHg may cause tumors, but there are others which show no significant effects of exposure to MeHg and cancer development. We should be cautious with conclusions on this topic; however, the risk of the central nervous system damage in adults and extremely high threat for the fetus\u2019 brain and neural development is enough to monitor mercury concentration in sea organisms which, for some world populations, are the main food source.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data modeling pipeline and programming environment preparation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data modeling pipeline for each data science problem is very similar. Steps from programming environment preparation up to the model development are as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>(Environment Preparation) <strong>Anaconda setup<\/strong>.<\/li><li>(Environment Preparation) <strong>Installation of Python Packages<\/strong>.<\/li><li>(Data) <strong>Data download.<\/strong><\/li><li>(Data) <strong>Data processing and transformation.<\/strong><\/li><li>(Data) <strong>Data exploration.<\/strong><\/li><li>(Model) <strong>Model Development.<\/strong><\/li><li>(Model) <strong>Model Implementation.<\/strong><\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">In this tutorial we cover points 4-6. I expect that you can set up Anaconda and it\u2019s environment and install necessary Python packages. (If I\u2019m wrong please tell me in a comment section, then I\u2019ll prepare short article how to set up environment and install Anaconda in MacOS and\/or Linux systems).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Our programming environment has five Python packages:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>numpy<\/strong>&nbsp;for scientific computation,<\/li><li><strong>pandas<\/strong>&nbsp;for tabular data processing and analysis,<\/li><li><strong>geopandas<\/strong>&nbsp;for spatial data processing and analysis,<\/li><li><strong>matplotlib<\/strong>&nbsp;for data visualization,<\/li><li><strong>seaborn<\/strong>&nbsp;for data visualization.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Install them from the&nbsp;<code>conda-forge<\/code>&nbsp;channel with Python of version from 3.6 or above. Then download data from publication (csv file)&nbsp;<a href=\"https:\/\/doi.org\/10.1594\/PANGAEA.899723\">https:\/\/doi.org\/10.1594\/PANGAEA.899723<\/a>&nbsp;along with Mediterranean coastline shapefile from my <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-05\/idw-medi\">Github<\/a> (<strong>medi_coastline.shp<\/strong>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are ready to go!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data preparation and exploration<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">Code listings from this part are included in the&nbsp;<strong>idw-data-exploration.ipynb<\/strong>&nbsp;notebook.<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Before we start modeling it is a good idea to look into a dataset. We read csv file with&nbsp;Pandas, slice dataset and store it as a new csv file for model development. We don\u2019t need all information provided by authors for our purposes. Listing 1 shows how to read data and look into it.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 1: Data exploration: read csv file<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\n\ndf = pd.read_csv('190516_m2b_marine_biota.csv',\n                 sep=';', encoding = \"ISO-8859-1\",\n                 index_col='id')\n\nprint(df.head())<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Our DataFrame has 36 columns. What columns are avaliable? And how many rows has our dataset? Let\u2019s check it (Listing 2).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 2: Data exploration: check columns and number of records in a dataset<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">print('Columns in dataframe:')\nfor idx, col in enumerate(df.columns):\n    print(idx+1, col)\n\nprint('Dataframe length:', len(df))<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">From those columns we are interested in:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>lat<\/strong>&nbsp;(latitude),<\/li><li><strong>lon<\/strong>&nbsp;(longitude),<\/li><li><strong>mea_ug_kg_orig<\/strong>&nbsp;(mean Hg concentrations in a sampled organisms).<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">But still, we have more than 24 thousand rows of data\u2026 It is a good idea to take only a part of&nbsp;<code>DataFrame<\/code>&nbsp;for a model development. We rather not use entire data set when we are building model from scratch. Why? Two main reasons are:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>New code is (usually) not optimized from the computational perspective and if you test its output and logic you need to do it as fast as possible,<\/li><li>Sometimes data is flawed in some way and it is hard to find which records are corrupted. It is easier to find those corrupted records in small dataset than a big one.<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">How do we divide our dataset? We can do it by the date of sampling and limit records to those sampled in the 21<sup>st<\/sup>&nbsp;century. The icing on the cake here is that measurements taken in the less distant past from today should be considered as closer to the current state of <strong>Hg<\/strong> concentrations and more important from the policy-making perspective. Listing 3 shows process of data division by time with&nbsp;Pandas&nbsp;time-indexing. In the first step we create new column with sampling dates formatted as&nbsp;<code>datetime<\/code> pandas object. Next we check how many records are available after 2000s and in the last step we create the new&nbsp;<code>DataFrame<\/code>&nbsp;with limited number of records (Listing 3).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 3: Data exploration: sampling records by the date of observation<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">df['samp_date'] = pd.to_datetime(df['samp_date'],\n                                 format='%Y-%M-%d')\nupdated_df = df[df['samp_date'] > '2000-01-01']\n\nprint(len(df['samp_date'][df['samp_date'] > '2000-01-01']))<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Before we export <code>updated_df<\/code>&nbsp;to the csv file we are going to slice it, and we&#8217;ll leave only&nbsp;<strong>lat<\/strong>,&nbsp;<strong>lon<\/strong>&nbsp;and&nbsp;<strong>mea_ug_kg_orig<\/strong>&nbsp;columns. Then we save our final&nbsp;<code>DataFrame<\/code>&nbsp;(Listing 4).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 4: Data exploration: save prepared DataFrame to csv<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">ndf = updated_df[['lat', 'lon', 'mea_ug_kg_orig']]\nndf.to_csv('prepared_data_mercury_concentrations.csv',\n            encoding='utf-8', sep=',')\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Well done! Data is prepared, time for modeling!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">IDW model<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">Code listings from this part are included in the&nbsp;<strong>idw-function.ipynb<\/strong>&nbsp;notebook.<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Input data visualization<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Open a new&nbsp;<em>Jupyter Notebook<\/em>&nbsp;file and import:&nbsp;<code>numpy<\/code>,&nbsp;<code>pandas<\/code>,&nbsp;<code>geopandas<\/code>,&nbsp;<code>matplotlib.pyplot<\/code>&nbsp;and&nbsp;<code>seaborn<\/code>. Read prepared&nbsp;<code>DataFrame<\/code>&nbsp;and limit it to the records with zero and positive value of the mean concentration of mercury (Listing 5).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 5: Import and data preprocessing<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import numpy as np\nimport pandas as pd\nimport geopandas as gpd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\ndf = pd.read_csv('prepared_data_mercury_concentrations.csv',\n                 index_col='id')\ndf = df[df['mea_ug_kg_orig'] >= 0] <\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">We use&nbsp;seaborn\u2019s&nbsp;<code>scatterplot()<\/code>&nbsp;to plot recorded samples and their respective values. Scatter plot function takes many arguments, and we use few of them:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>x<\/strong>&nbsp;\u2013 longitude coordinates of samples,<\/li><li><strong>y<\/strong>&nbsp;\u2013 latitude coordinates of samples,<\/li><li><strong>hue<\/strong>&nbsp;\u2013 mean mercury concentrations (scatter plot\u2019s points are colored due to the hue),<\/li><li><strong>hue_norm&nbsp;<\/strong>\u2013 custom hue limits of numerical data, we use this to emphasize places where mean mercury concentrations exceed 460 micro-grams per kilogram in a fish tissue [2],<\/li><li><strong>palette&nbsp;<\/strong>\u2013 color scheme for hue,<\/li><li><strong>alpha<\/strong>&nbsp;\u2013 parameter for transparency control for better visual presentation (0.7),<\/li><li><strong>data<\/strong>&nbsp;\u2013 baseline <code>DataFrame<\/code>.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The code is presented in the Listing 6 and output image is presented in the Figure 1.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 6: Scatterplot of <em>HgMe<\/em> concentrations<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Norm from EPA guidance\nepa_norm = 460\nhue_norm_custom = (0, epa_norm)\n\nplt.figure(figsize=(12 ,6))\nsns.scatterplot(x='lon', y='lat', hue='mea_ug_kg_orig',\n                hue_norm=hue_norm_custom,\n                palette='coolwarm', alpha=.7,\n                data=df)\nplt.show()<\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"700\" height=\"362\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/baseline-700.png\" alt=\"\" class=\"wp-image-223\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/baseline-700.png 700w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/baseline-700-300x155.png 300w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><figcaption>Scatterplot of methylmercury concentrations nearby shores of the Mediterranean Sea.<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Mercury concentrations near Italy\u2019s coastline exceeds EPA limits and at the same time the largest density of measurements can be seen near the Italy. We shouldn\u2019t conclude anything from it, but it is a good starting point for further analysis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Shapefile Preparation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>We need a canvas to perform interpolation<\/strong>. Or, in other words, a set of points for which we interpolate values. It can be a&nbsp;<code>numpy<\/code> array, but in our geospatial world, vector files are more useful, and we use <code>shapefile<\/code> with points along Mediterranean Sea coastline (without borders of isles). File&nbsp;<strong><code><a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-05\/idw-medi\">medi_coastline.shp<\/a><\/code><\/strong>&nbsp;contains prepared points. Let\u2019s read it with&nbsp;<code>Geopandas<\/code>&nbsp;and look into a spatial&nbsp;DataFrame. Listing 7 is a read operation of a&nbsp;<code>GeoDataFrame<\/code>&nbsp;object and its coordinate reference system. The main difference between&nbsp;<code>GeoDataFrame<\/code>&nbsp;and&nbsp;<code>DataFrame<\/code>&nbsp;is that the first one has additional geometry column and the <em>coordinate reference system<\/em> (<em>crs<\/em>) property.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 7: IDW model: read shapefile<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">gdf = gpd.read_file('medi_coastline.shp')\nprint(gdf.head())\nprint(gdf.crs)<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Geometry in the&nbsp;<code>gdf<\/code>&nbsp;is described as the&nbsp;<em>MultiPoint<\/em>&nbsp;object from&nbsp;shapely&nbsp;package. We can calculate distance between&nbsp;<em>MultiPoint<\/em>&nbsp;objects and our latitude \/ longitude pairs of points from the non-spatial <code>DataFrame<\/code>\u2026 but it will make distance calculation more complex. Instead of writing a single complicated function which is transforming&nbsp;<em>MultiPoint<\/em>&nbsp;into&nbsp;<em>floats,<\/em>&nbsp;and calculating distance between those points, we are going to create separate function which transform <strong>geometry<\/strong> field of&nbsp;<code>GeoDataFrame<\/code>&nbsp;into latitude \/ longitude <em>floats<\/em>. Listing 8 is an example of implementation. Note that we get lat and lon as&nbsp;<em>float<\/em>&nbsp;point values with the&nbsp;<strong><code>.x<\/code><\/strong>&nbsp;and the&nbsp;<strong><code>.y<\/code><\/strong>&nbsp;attributes of <em>Point<\/em>&nbsp;and&nbsp;<em>MultiPoint<\/em>&nbsp;classes.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 8: IDW model: transform MultiPoint object coordinates into floats<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def set_coordinate(point_geometry, coordinate: str):\n    \"\"\"\n    Function returns coordinate x or y from multipoint geometry in geodataframe.\n    :param point_geometry: MultiPoint or Point geometry,\n    :param coordinate: x (longitude) or y (latitude).\n    \"\"\"\n    \n    if coordinate == 'x':\n        try:\n            x_coo = point_geometry.x\n            return x_coo\n        except AttributeError:\n            x_coo = point_geometry[0].x\n            return x_coo\n    elif coordinate == 'y':\n        try:\n            y_coo = point_geometry.y\n            return y_coo\n        except AttributeError:\n            y_coo = point_geometry[0].y\n            return y_coo\n    else:\n        raise KeyError('Available coordinates: \"x\" for longitude or \"y\" for latitude')\n\ngdf['lat'] = gdf['geometry'].apply(set_coordinate, args=('y'))\ngdf['lon'] = gdf['geometry'].apply(set_coordinate, args=('x'))\nprint(gdf.head())<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">At this moment we have spatial data prepared and we are able to move to the modeling step.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">IDW algorithm implementation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s recall <a href=\"https:\/\/ml-gis-service.com\/index.php\/2020\/11\/05\/spatial-interpolation-101-introduction-to-inverse-distance-weighting-interpolation-technique\/\">IDW equation<\/a> (1):<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$f(m) = \\frac{\\sum_{i}\\lambda_{i}*f(m_{i})}{\\sum_{i}\\lambda{i}}; (1)$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>$f(m)$ is a value at unknown location,<\/li><li>$i$ is i-th known location,<\/li><li>$f(m_{i})$ is a value at known location,<\/li><li>$\\lambda$ is a weight assigned to the known location.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">We must assign specific weight to get proper results. And weight depends on the distance between known point and unknown point (2).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\lambda_{i} = \\frac{1}{d_{i}^{p}}; (2)$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>$d$ is a distance from known point&nbsp;$i$ to the unknown point,<\/li><li>$p$ is a hyperparameter which controls how strong is a relationship between known point and unknown point. You may set large&nbsp;$p$ if you want to show strong relationship between closest point and very weak influence of distant points. On the other hand, you may set small&nbsp; $p$&nbsp;to emphasize fact that points are influencing each other with the same power irrespectively of their distance.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">There is a&nbsp;<em>hidden<\/em>&nbsp;function, because <strong>distance&nbsp;$d$&nbsp;between points must be calculated<\/strong> and the most obvious way to do it is to use <em>Euclidean distance<\/em> between points:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$d=\\sqrt{(x_{i}-x_{-})^{2}+(y_{i}-y_{-})^{2}}; (3)$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">where $x_{-}$ and $y_{-}$ are coordinates of a point where we want to estimate unknown value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We have two functions. One is for distance calculation and second is for inverse distance weighting of unknown values. Distance calculation is really simple and we can do it with numpy. Listing 9 is its implementation.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 9: IDW model: distance calculation function<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def calculate_distances(all_points, unknown_point):\n    # Calculate distances\n    d_lat = (all_points[:, 0] - unknown_point[0])**2\n    d_lon = (all_points[:, 1] - unknown_point[1])**2\n    dists = np.sqrt(d_lon + d_lat)\n    return dists<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">IDW algorithm is a little trickier to implement. We should pass into a function:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>unknown point coordinates (one pair),<\/li><li>known points\u2019 coordinates and values (multiple pairs),<\/li><li>power of distance (single value),<\/li><li>and number of closest distances to the unknown point which are affecting it but this is optional parameter. We may use all available points to estimate value at specific location.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The latter is not included in the equation (1), but it is an optimization for our calculations, and it allows us to control how many neighbors can influence specific point value. It may be very unlikely from the physical perspective that measurements taken near Egyptian coast are correlated with measurements near Croatia, that\u2019s why we are controlling number of neighbors. Figure 3 is a block diagram of IDW algorithm. It seems to be complex algorithm but in reality it is a simple waterfall-style diagram which requires few lines of code (Listing 10).<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"700\" height=\"963\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/idw_algorithm-700.png\" alt=\"\" class=\"wp-image-225\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/idw_algorithm-700.png 700w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/idw_algorithm-700-218x300.png 218w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><figcaption>Figure 3: Block diagram of IDW estimation algorithm.<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">In summary IDW algoirithm works as follow:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>Calculate distance between point with not-known value and all known points.<\/li><li>Append calculated distances as a new column to the&nbsp;numpy array&nbsp;with known distances.<\/li><li>Sort this array by the last column.<\/li><li>Take&nbsp;<code>ndist<\/code>&nbsp;number of records for analysis. It may be easily done with&nbsp;numpy&nbsp;indexing.<\/li><li>Check if the first distance from a sorted list is equal to 0. If yes, then return point value from this position and end calculations. If no, then calculate weights.<\/li><li>Single weight is a ratio of 1 and a distance raised to the n-th <code>power<\/code>. Calculate weights for each element in the array.<\/li><li>Calculate numerator of equation (1). It is a product of weights and known points values.<\/li><li>Estimate interpolated point value as a ratio of a sum of all weight-value products and a sum of weights.<\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Listing 10 is an implementation of this procedure. We are using this function with a <code>GeoDataFrame<\/code>&nbsp;which we have prepared earlier. We use&nbsp;<em>lambda expression<\/em>&nbsp;to iterate calculations through all points of the <code>gdf<\/code> and we set&nbsp;<code>power<\/code> parameter to 3 and&nbsp;<code>ndist<\/code>&nbsp;parameter to 4.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 10: IDW model: algorithm implementation<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def inverse_distance_weighting(unknown_point, points, power, ndist=10):\n    \"\"\"\n    Function estimates values in unknown points with with inverse\n    weighted interpolation technique.\n    \n    INPUT:\n    :param unknown_point: lat, lon coordinates of unknown point,\n    :param points: (array) list of points [lat, lon, val],\n    :param power: (float) constant used to calculate IDW weight\n                  -> weight = 1\/(distance**power),\n    :param ndist: (int) how many closest distances are included\n                  in weighting,\n    \n    OUTPUT:\n    :return interpolated_value: (float) interpolated value\n                                by IDW method.\n    \n    Inverse distance weighted interpolation is:\n    \n    est = SUM(WEIGHTS * KNOWN VALS) \/ SUM(WEIGHTS)\n    and\n    WEIGHTS = 1 \/ (DISTANCE TO UNKNOWN**power)\n    \n    where:\n    power is a constant hyperparameter which tells how much point \n    is influenced by other points. \n    \"\"\"\n\n    distances = calculate_distances(points, unknown_point)\n    points_and_dists = np.c_[points, distances]\n    \n    # Sort and get only 10 values\n    points_and_dists = points_and_dists[points_and_dists[:, -1].argsort()]\n    vals_for_idw = points_and_dists[:ndist, :]\n    \n    # Check if first distance is 0\n    if vals_for_idw[0, -1] == 0:\n        return vals_for_idw[0, 2]\n    else:\n        # If its not perform calculations\n        weights = 1 \/ (vals_for_idw[:, -1]**power)\n        numerator = weights * vals_for_idw[:, 2]\n        interpolated_value = np.sum(numerator) \/ np.sum(weights)\n        return interpolated_value\n\n# Interpolate points\n\nknown_points_array = df.to_numpy()\npower = 3\nnd = 4\ngdf['est'] = gdf.apply(lambda col: inverse_distance_weighting(\n    [col['lat'], col['lon']],\n    known_points_array, power, ndist=nd), axis=1)<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Yes, and that\u2019s all! We can now check our interpolated data (Listing 11) and show it as a scatterplot.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 11: IDW model: output visualization<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">plt.figure(figsize=(15 ,9))\nsns.scatterplot(x='lon', y='lat', hue='est', alpha=0.7,\n                palette='coolwarm',\n                hue_norm=hue_norm_custom, data=gdf)\nplt.show()<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Final visualization, for the whole coastline of the Mediterranean Sea is presented in the Figure 4. You may see how values are grouped together and create clusters along with some parts of the Mediterranean Sea coastline.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/interpolated-700.png\" alt=\"\" class=\"wp-image-227\" width=\"700\" height=\"422\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/interpolated-700.png 700w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/05\/interpolated-700-300x181.png 300w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><figcaption>Figure 4: Interpolated Hg concentrations values along Mediterranean Sea coastline<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Exercises<\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li>Use different shapefile for analysis provided <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-05\/idw-medi\">here<\/a> named&nbsp;<code>exercise_points.shp<\/code>. What do you think about the output?<\/li><li>Check what happens when power value is set between 0 and 1, or below 0.<\/li><li>Check what happens if you use all points for the estimation.<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Next part<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">>> <a href=\"https:\/\/ml-gis-service.com\/index.php\/2021\/07\/18\/spatial-interpolation-101-interpolation-in-three-dimensions-with-python-and-idw-algorithm-the-mercury-concentration-in-mediterranean-sea\/\">Three-dimensional interpolation of Mercury Concentrations in the Mediterranean Sea with Python<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bibliography<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">[1] Health Effects of Exposures to Mercury. United States Environmental Protection Agency. Link:&nbsp;<a href=\"https:\/\/www.epa.gov\/mercury\/health-effects-exposures-mercury\">https:\/\/www.epa.gov\/mercury\/health-effects-exposures-mercury<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[2] EPA-FDA Fish Advice: Technical Information. United States Environmental Protection Agency. Link:&nbsp;<a href=\"https:\/\/www.epa.gov\/fish-tech\/epa-fda-fish-advice-technical-information\">https:\/\/www.epa.gov\/fish-tech\/epa-fda-fish-advice-technical-information<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[3] <a href=\"https:\/\/github.com\/szymon-datalions\/geoprocessing\/tree\/master\/data%20analysis\/notebooks\/spatial_101\/0102_spatial_analysis\">Github with article\u2019s notebooks and shapefiles<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[4]&nbsp;Cinnirella, Sergio; Bruno, Delia Evelina; Pirrone, Nicola; Horvat, Milena; \u017divkovi\u0107, Igor; Evers, David; Johnson, Sarah; Sunderland, Elsie (2019): Mercury concentrations in biota in the Mediterranean Sea, a compilation of 40 years of surveys. PANGAEA,&nbsp;<a href=\"https:\/\/doi.org\/10.1594\/PANGAEA.899723\">https:\/\/doi.org\/10.1594\/PANGAEA.899723<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Changelog<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>2021-07-18: Update of the link to the next lesson.<\/li><li>2021-05-21: First release of the article.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Inverse Distance Weighting of mercury concentrations in the Mediterranean Sea with Python<\/p>\n","protected":false},"author":1,"featured_media":229,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,3,30,31],"tags":[47,44,35,46,32,42,45,7,33,34,43],"class_list":["post-203","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-python","category-spatial-statistics","category-tutorials","tag-data-science","tag-environment-protection","tag-how-to-calculate-inverse-distance-weighting","tag-idw","tag-inverse-distance-weighting","tag-mediterranean-sea","tag-mercury","tag-python","tag-spatial-interpolation","tag-spatial-statistics","tag-toxic"],"_links":{"self":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/comments?post=203"}],"version-history":[{"count":13,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/203\/revisions"}],"predecessor-version":[{"id":358,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/203\/revisions\/358"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media\/229"}],"wp:attachment":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media?parent=203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/categories?post=203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/tags?post=203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}