{"id":329,"date":"2021-07-18T09:45:36","date_gmt":"2021-07-18T09:45:36","guid":{"rendered":"https:\/\/ml-gis-service.com\/?p=329"},"modified":"2021-10-03T19:21:38","modified_gmt":"2021-10-03T19:21:38","slug":"spatial-interpolation-101-interpolation-in-three-dimensions-with-python-and-idw-algorithm-the-mercury-concentration-in-mediterranean-sea","status":"publish","type":"post","link":"https:\/\/ml-gis-service.com\/index.php\/2021\/07\/18\/spatial-interpolation-101-interpolation-in-three-dimensions-with-python-and-idw-algorithm-the-mercury-concentration-in-mediterranean-sea\/","title":{"rendered":"Spatial Interpolation 101: Interpolation in Three Dimensions with Python and IDW algorithm. The Mercury Concentration in Mediterranean Sea."},"content":{"rendered":"\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Move from the 2D interpolation into the 3D interpolation with the Inverse Distance Weighting algorithm.<\/p><\/blockquote>\n\n\n\n<p>&lt;&lt; Previous part: <a href=\"https:\/\/ml-gis-service.com\/index.php\/2021\/05\/20\/spatial-interpolation-101-interpolation-of-mercury-concentrations-in-the-mediterranean-sea-with-idw-and-python\/\" data-type=\"post\" data-id=\"203\">Spatial Interpolation 101: Interpolation of Mercury Concentrations in the Mediterranean Sea with IDW and Python<\/a><\/p>\n\n\n\n<p><em>If this is your first article in the series I recommend to read the previous lesson linked above.<\/em><\/p>\n\n\n\n<p>Transition from 2D into 3D is not so hard as it seems to be. We can easily adapt algorithm written for the 2-Dimensional IDW to perform the same operation in the <em>n<\/em>-dimensional space. In this article <strong>we are going to adapt our 2D algorithm to the new case of volumetric interpolation<\/strong> as well we will take a look into <strong>different styles of visualization of 3-dimensional data<\/strong>. Those are mostly research problems however we will tackle engineering issues too. We will write our first <strong>unit test to debug the newly created algorithm<\/strong>. We continue our analysis of the mercury contamination in Mediterranean Sea and we\u2019ll build volumetric profile of the <em>methylmercury<\/em> (<em>MeHg<\/em>) concentrations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data Sources<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>The Mercury Concentrations Dataset in this article comes from: Cinnirella, Sergio; Bruno, Delia Evelina; Pirrone, Nicola; Horvat, Milena; \u017divkovi\u0107, Igor; Evers, David; Johnson, Sarah; Sunderland, Elsie (2019): Mercury concentrations in biota in the Mediterranean Sea, a compilation of 40 years of surveys. PANGAEA,&nbsp;<a href=\"https:\/\/doi.org\/10.1594\/PANGAEA.899723\">https:\/\/doi.org\/10.1594\/PANGAEA.899723<\/a><\/li><li>And I\u2019ve found this dataset and the article by the <strong>GEOSS search engine<\/strong>: Geoportal:&nbsp;<a href=\"https:\/\/www.geoportal.org\/\">https:\/\/www.geoportal.org<\/a><\/li><li>You may check lesson\u2019s code, notebooks, data and shapefiles here:<a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-07\/idw-3d\"> Github repository<\/a><\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Environment Preparation<\/h2>\n\n\n\n<p>You should create working environment with four packages:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><code>numpy<\/code> for the scientific computation,<\/li><li><code>pandas<\/code> for the tabular data processing and analysis,<\/li><li><code>geopandas<\/code> for the spatial data processing and analysis,<\/li><li><code>matplotlib<\/code> for the data visualization.<\/li><\/ul>\n\n\n\n<p>Install them from the&nbsp;<code>conda-forge<\/code>&nbsp;channel with Python of version from <em>3.6<\/em> or higher. Then download data from publication (<code>csv<\/code> file)&nbsp;<a href=\"https:\/\/doi.org\/10.1594\/PANGAEA.899723\">https:\/\/doi.org\/10.1594\/PANGAEA.899723<\/a>&nbsp;along with the Mediterranean coastline <code>shapefile<\/code> from my&nbsp;<a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-07\/idw-3d\/0103_spatial_analysis\">Github<\/a>&nbsp;(<strong>input\/italy_coastline.shp<\/strong>).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Algorithm Development<\/h2>\n\n\n\n<p><strong>! Code listings in this part are coming from the <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/blob\/main\/2021-07\/idw-3d\/0103_spatial_analysis\/idw-3D-function.ipynb\">idw-3d-function.ipynb<\/a> notebook<\/strong>.<\/p>\n\n\n\n<p>Let\u2019s imagine that you are a Data Scientist and your boss has delegated to you implementation of the algorithm which is not available in the existing Open Source packages. The question is how should you start process of implementation? We can start from the something what we shouldn&#8217;t do. We should never implement new statistical algorithms with the real-world data! Real observations are messy; and they rarely follow a easy-to-understand pattern (or distribution, if you are more into statistics). Work related to the data preparation and data cleaning may be tedious and it steals our precious time which could be devoted for the core algorithm. The right path is to create an artificial data of the well-known properties. It is easier to debug and to cover our code with tests based on the fixed dataset which is always the same, or has the same statistical properties every time when we use it. Artificial data may be created on the fly with <code>numpy<\/code>, the same package which is used for the math operations. Moreover, we are going to visualize our input data source with <code>matplotlib<\/code> package.<\/p>\n\n\n\n<p>Our work has seven steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\" type=\"1\"><li>Generate a set of the random coordinates in a 3-dimensional space.<\/li><li>Plot the artificial points.<\/li><li>Write a function which calculates Euclidean distance between the created points and the sampled location with the <em>unknown<\/em> value.<\/li><li>Test the distance calculation function.<\/li><li>Rewrite the IDW function from the <a href=\"https:\/\/ml-gis-service.com\/index.php\/2021\/05\/20\/spatial-interpolation-101-interpolation-of-mercury-concentrations-in-the-mediterranean-sea-with-idw-and-python\/\" data-type=\"post\" data-id=\"203\">previous tutorial<\/a>.<\/li><li>Test the IDW function with the prepared data.<\/li><li>Test the IDW with a generated point cloud, then interpolate unknown values and plot everything to check a result.<\/li><\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1 &#8211; Step 2: Generate data for the algorithm development. Plot Points in the 3-Dimensional Space<\/h3>\n\n\n\n<p>How to generate the artificial data? You may do it manually but it could be a very time consuming task. The good and simple idea is to use the random numbers generator. With it we are able to create multiple 3D sets of points with random values to test and to build our algorithm. Creation of random set of coordinates in a <em>n<\/em>-dimensional space is very easy&#8230; if we utilize <code>numpy<\/code> and its <code>random.randint()<\/code> function. The arguments of this function are the lowest possible value and the highest possible value which can be picked, and the size of an output array. Function returns an array of randomly chosen integers. <strong>Listing 1<\/strong> shows the <code>random.randint()<\/code> function in action. We generate an array of a random 100 coordinates in the form [x, y, z]. Then we show created points on the plane with the <code>scatter3D()<\/code> function implemented in the <code>matplotlib<\/code> package.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 1: Algorithm development: random points generation and visualization<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># import packages\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# Generate 100 random arrays of 3 integer values between 0 and 1000\nrandom_points = np.random.randint(low=0, high=1000, size=(100, 3))\n\n# Show generated points in 3D scatterplot\nfigure = plt.figure(figsize=(8, 8), facecolor='w')\nax = figure.add_subplot(111, projection='3d')\nax.scatter3D(random_points[:, 0], random_points[:, 1], random_points[:, 2])\nplt.show()<\/pre>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"487\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig1-spatial_int_101_0103.png\" alt=\"\" class=\"wp-image-333\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig1-spatial_int_101_0103.png 500w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig1-spatial_int_101_0103-300x292.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><figcaption>Figure 1: The Random Points Visualized in the 3-Dimensional Space.<\/figcaption><\/figure><\/div>\n\n\n\n<p><em>Figure 1<\/em> represents the output from the <code>numpy.random.randint()<\/code> function. Your realization is probably different because we didn\u2019t set the <em>random seed<\/em> number which allows the reproducibility of the research. However, if you like to obtain the same points throughout the realization of the program then set the random seed to the constant value. Method for setting up the random seed in the Python program is <code>numpy.random.seed(NUMBER)<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3 &#8211; Step 4: Write the function which calculates Euclidean distance between the selected set of known points and the location with unknown value. Test this function<\/h3>\n\n\n\n<p>We can calculate distance between any set of points in any dimension with the nested <code>for<\/code> loops but why should we bother with it when we can use the powerful <code>numpy<\/code> package and make calculations faster? The idea is to use the <em>array indexing<\/em>. Each column in array represents different dimension and we can quickly subtract point coordinate from each element in the column, then we can sum those and calculate square root of those sums. Figure 2 shows this operation graphically and Listing 2 is an implementation of this process.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"889\" height=\"500\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig2-spatial_int_101_0103.jpg\" alt=\"\" class=\"wp-image-336\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig2-spatial_int_101_0103.jpg 889w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig2-spatial_int_101_0103-300x169.jpg 300w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig2-spatial_int_101_0103-768x432.jpg 768w\" sizes=\"auto, (max-width: 889px) 100vw, 889px\" \/><figcaption>Figure 2: Algorithm of distance calculation with <code>numpy<\/code> and the array indexing process.<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 2: Euclidean distance calculation in n-dimensional space with <code>numpy<\/code><\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def calculate_distances(coordinates, unknown_point):\n    # Get coordinates dimension\n    # Number of columns in numpy array \/ Number of dimensions\n    coordinates_dim = coordinates.shape[1]  \n    \n    distances_between_dims = []\n    \n    for i in range(coordinates_dim):\n        distance = (coordinates[:, i] - unknown_point[i])**2\n        distances_between_dims.append(distance)\n    \n    dists_array = np.sqrt(np.sum(distances_between_dims, axis=0))\n    \n    return dists_array<\/pre>\n\n\n\n<p>Variable <code>coordinates_dim<\/code> controls the range of a <code>for<\/code> loop and by it we understand the number of dimensions of our dataset. Within the loop we are calculating distance between specific dimensions of all <em>known<\/em> points and given <em>unknown<\/em> point &#8211; we obtain list of lists with distances per dimension. Outside the loop distances are summed along each row (<code>axis=0<\/code>) and square root of those sums is a specific distance.<\/p>\n\n\n\n<p>It is good idea to write a <em>unit test<\/em> to check if our function works as we expected. Unit test requires the hard-coded values or the special files to monitor how our script behaves. Unit tests are very useful when you develop a new package and changes in one part of it may affect functions in the other. To easily find what went wrong we run unit tests and check which one has failed. Listing 3 is the simple unit test of the output of our function. We have the hard-coded 3\u00d73 identity matrix, the 3-element array of 2\u2019s and the expected output which is the 3-element array of 3\u2019s. We expect that distance from the <code>test_arr<\/code> to the <code>test_unknown_point<\/code> is always equal to 3. (You can check it manually =&gt; $\\sqrt{(4 + 4 + 1)}$).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 3: Unit test of the distance calculation function<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Test your algorithm with prepared array and unknown point\n\ntest_arr = np.array([\n    [1, 0, 0],\n    [0, 1, 0],\n    [0, 0, 1]\n])\n\ntest_unknown_point = [2, 2, 2]\noutput_distances = np.array([3, 3, 3])\n\ndistances = calculate_distances(coordinates=test_arr, unknown_point=test_unknown_point)\n\nassert (distances == output_distances).all()  # Here is a test<\/pre>\n\n\n\n<p>Hopefully our function has passed the test and we are able to move to the next step of an algorithm development.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Rewrite IDW function from previous tutorial<\/h3>\n\n\n\n<p><a href=\"https:\/\/ml-gis-service.com\/index.php\/2021\/05\/20\/spatial-interpolation-101-interpolation-of-mercury-concentrations-in-the-mediterranean-sea-with-idw-and-python\/\" data-type=\"post\" data-id=\"203\">In the previous lesson<\/a> our IDW function was designed only for the two-dimensional data. Let\u2019s recall it (Listing 4).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 4: The two-dimensional IDW function<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def inverse_distance_weighting(unknown_point, points, power, ndist=10):\n\n    distances = calculate_distances(points, unknown_point)\n    points_and_dists = np.c_[points, distances]\n    \n    # Sort and get only 10 values\n    points_and_dists = points_and_dists[points_and_dists[:, -1].argsort()]\n    vals_for_idw = points_and_dists[:ndist, :]\n    \n    # Check if first distance is 0\n    if vals_for_idw[0, -1] == 0:\n        return vals_for_idw[0, 2]\n    else:\n        # If its not perform calculations\n        weights = 1 \/ (vals_for_idw[:, -1]**power)\n        numerator = weights * vals_for_idw[:, 2]\n        interpolated_value = np.sum(numerator) \/ np.sum(weights)\n        return interpolated_value<\/pre>\n\n\n\n<p>What must be changed to use this function for the <strong>any<\/strong> number of dimensions?<\/p>\n\n\n\n<p>We assume that the function argument <code>points<\/code> is an array of coordinates and their respective measurements. As example: <code>[LAT, LON, DEPTH, MERCURY CONCENTRATION]<\/code>. We cannot pass all the values for the distance calculation, thus we use <code>numpy\u2019s<\/code> indexing to pass only the geographical coordinates and a depth. Instead of:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">distances = calculate_distances(points, unknown_point)<\/pre>\n\n\n\n<p>We are going to write:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">distances = calculate_distances(points[:, :-1], unknown_point)<\/pre>\n\n\n\n<p>And instead of:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">points_and_dists = np.c_[points, distances]<\/pre>\n\n\n\n<p>We change it to:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">vals_and_dists = np.c_[points[:, -1], distances]<\/pre>\n\n\n\n<p>In the first change we pass only the coordinates to the <code>calculate_distances()<\/code> function. The second change is analogical to the first one but this time we pass <strong>all rows and only the last column<\/strong> to <strong>the new array of  the observed values and distances between them and the point of unknown value<\/strong>.<\/p>\n\n\n\n<p>Finally we get an array with two elements: <code>[known point value, distance between known point and unknown point]<\/code>. As you may noticed we\u2019ve changed variable name from <code>points_and_dists<\/code> to <code>vals_and_dists<\/code>. It is a naming convention derived from the clean code: the variable name should be descriptive and it definitively shouldn&#8217;t be misleading.<\/p>\n\n\n\n<p>The sorting part doesn\u2019t change. We must check if the first value in a sorted list is different than 0. From this point the main difference is related to the array indexing which you may see in the code Listing 5 with a full implementation of this function. The main problem is a distance calculation function. If we build it to work with the n-dimensional data then our IDW function will work without any problems.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 5: Multidimensional IDW function<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def inverse_distance_weighting(unknown_point, points, power, ndist=10):\n    \"\"\"\n    Function estimates values in unknown points with with inverse weighted interpolation technique.\n    \n    INPUT:\n    :param unknown_point: coordinates of unknown point,\n    :param points: (array) list of points and they values [dim 1, ..., dim n, val],\n    :param power: (float) constant used to calculate IDW weight -> weight = 1\/(distance**power),\n    :param ndist: (int) how many closest distances are included in weighting,\n    \n    OUTPUT:\n    :return interpolated_value: (float) interpolated value by IDW method.\n    \n    Inverse distance weighted interpolation is:\n    \n    est = SUM(WEIGHTS * KNOWN VALS) \/ SUM(WEIGHTS)\n    and\n    WEIGHTS = 1 \/ (DISTANCE TO UNKNOWN**power)\n    \n    where:\n    power is a constant hyperparameter which tells how much point is influenced by other points. \n    \"\"\"\n\n    distances = calculate_distances(points[:, :-1], unknown_point)\n    vals_and_dists = np.c_[points[:, -1], distances]\n    \n    # Sort and get only 10 values\n    vals_and_dists = vals_and_dists[vals_and_dists[:, 1].argsort()]\n    vals_for_idw = vals_and_dists[:ndist, :]\n    \n    # Check if first distance is 0\n    if vals_for_idw[0, 1] == 0:\n        # If first distance is 0 then return first value\n        return vals_for_idw[0, 0]\n    else:\n        # If it's not perform calculations\n        weights = 1 \/ (vals_for_idw[:, 1]**power)\n        numerator = weights * vals_for_idw[:, 0]\n        interpolated_value = np.sum(numerator) \/ np.sum(weights)\n        return interpolated_value<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Test IDW function<\/h3>\n\n\n\n<p>As we\u2019ve tested the distance calculation, we are going to test IDW algorithm too. The idea is the same, we create the hard-coded arrays and check if the output is the same as it should be based on the manual calculations. Listing 6 is the test for it.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 6: Unit test of Inverse Distance Weighting function<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">test_arr = np.array([\n    [1, 0, 0, 1],\n    [0, 1, 0, 1],\n    [0, 0, 1, 1]\n])\n\ntest_unknown_point = [2, 2, 2]\n\noutput_est = 1\n\nestimated_value = inverse_distance_weighting(test_unknown_point, test_arr, 2)\n\nassert output_est == estimated_value<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Test IDW with artificial data<\/h3>\n\n\n\n<p>To close the part with a function development we will test it with the randomly generated points. We should create a evenly spaced set of points in the 3-dimensional space (Figure 3). We can achieve it within <code>numpy<\/code> with two operations: the <em>coordinates generation<\/em> and their <em>positioning<\/em> (Listing 7).<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"487\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig3-spatial_int_101_0103.jpg\" alt=\"\" class=\"wp-image-341\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig3-spatial_int_101_0103.jpg 500w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig3-spatial_int_101_0103-300x292.jpg 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><figcaption>Figure 3: 3-dimensional set of points for interpolation &#8211; the interpolation grid.<\/figcaption><\/figure><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 7: Function for creation of evenly spaced points cloud<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def generate_coordinates(lower, upper, step_size, dimension):\n    coordinates = [np.arange(lower, upper, step_size) for i in range(0, dimension)]\n    mesh = np.hstack(np.meshgrid(*coordinates)).swapaxes(0, 1).reshape(dimension, -1).T\n    return mesh\ncoords = generate_coordinates(0, 101, 10, 3)<\/pre>\n\n\n\n<p>The method from the Listing 7 uses <code>numpy.arange()<\/code> function to generate an array with coordinates from the lower to the upper limit and then it uses <code>numpy.meshgrid()<\/code> to create all possible realizations (permutations) of the points in a space of size controlled by a given dimension (Figure 3). Unfortunately for us, all of those points are only coordinates without any value. We must generate sample of random points in the same space, but this time with assigned values. This could be done with function from Listing 1. Only difference is the number of records in a row, then it was three <code>(x, y, z)<\/code> and now we need four <code>(x, y, z, value)<\/code>. One realization of this function is presented in the Listing 8 and Figure 4 shows the generated set of points.<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"475\" height=\"487\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig4-spatial_int_101_0103.jpg\" alt=\"\" class=\"wp-image-343\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig4-spatial_int_101_0103.jpg 475w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig4-spatial_int_101_0103-293x300.jpg 293w\" sizes=\"auto, (max-width: 475px) 100vw, 475px\" \/><figcaption>Figure 4: Random set of points with assigned values.<\/figcaption><\/figure><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 8: Random sample of values in 3D space.<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Generate points with values\n\nknown_points = np.random.randint(0, 101, size=(20, 4))\n\n# Show known points\n\nfigure = plt.figure(figsize=(8, 8), facecolor='w')\nax = figure.add_subplot(111, projection='3d')\np = ax.scatter3D(known_points[:, 0], known_points[:, 1], known_points[:, 2],\n             c=known_points[:, -1], cmap='viridis')\nfigure.colorbar(p)\nplt.show()<\/pre>\n\n\n\n<p>The result of our work up to this moment is a 3D canvas of evenly spaced points and few samples for interpolation. We are able to perform interpolation and to fill the canvas with values. We will store output as the new <code>numpy<\/code> array of rows with <code>[x, y, z, interpolated value]<\/code> and to do so, we are going to iterate through each record of the empty canvas. Listing 10 shows function which performs this process and Figure 5 presents the resulting grid.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 9: Interpolate values from canvas.<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def interpolate_values(array_of_coordinates, array_of_known_points):\n    output_arr = []\n    \n    for row in array_of_coordinates:\n        interpolated = inverse_distance_weighting(row, array_of_known_points, power=3, ndist=5)\n        interpol_arr = np.zeros(shape=(1, len(row) + 1))\n        interpol_arr[:, :-1] = row\n        interpol_arr[:, -1] = interpolated\n        output_arr.append(interpol_arr[0])\n        \n    return np.array(output_arr)\n\ninterp = interpolate_values(coords, known_points)\n\n# Show interpolation results\n\nfigure = plt.figure(figsize=(12, 12), facecolor='w')\nax = figure.add_subplot(111, projection='3d')\np = ax.scatter3D(interp[:, 0], interp[:, 1], interp[:, 2],\n             c=interp[:, -1], cmap='viridis')\nfigure.colorbar(p)\nplt.show()<\/pre>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"465\" height=\"487\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig5-spatial_int_101_0103.jpg\" alt=\"\" class=\"wp-image-344\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig5-spatial_int_101_0103.jpg 465w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig5-spatial_int_101_0103-286x300.jpg 286w\" sizes=\"auto, (max-width: 465px) 100vw, 465px\" \/><figcaption>Figure 5: A set of values interpolated with the multidimensional IDW function.<\/figcaption><\/figure><\/div>\n\n\n\n<p>Congratulations! We\u2019ve created the IDW algorithm for interpolation of the points values in the 3D space! Finally, we are going to work with the real world data!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data Preparation<\/h2>\n\n\n\n<p><strong>! Code listings in this part are coming from the <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/blob\/main\/2021-07\/idw-3d\/0103_spatial_analysis\/idw-3d-and-mercury-concentrations-data-prep.ipynb\">idw-3d-and-mercury-concentrations-data-prep.ipynb<\/a> notebook<\/strong>.<\/p>\n\n\n\n<p>We are using similar procedure of a data preparation as in the <a href=\"https:\/\/ml-gis-service.com\/index.php\/2021\/05\/20\/spatial-interpolation-101-interpolation-of-mercury-concentrations-in-the-mediterranean-sea-with-idw-and-python\/\" data-type=\"post\" data-id=\"203\">previous tutorial<\/a>. The significant difference is that we are going to use <strong>all dates of measurements<\/strong> and we <strong>limit the extent to the coastline of Italy<\/strong>. Download data from <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-07\/idw-3d\/0103_spatial_analysis\/input\">here<\/a> and open new Jupyter Notebook. Import <code>pandas<\/code> and read <code>csv<\/code> file with observations with <code>pandas.read_csv()<\/code> method. Create a copy of a DataFrame with columns <strong><code>[\u2018lon\u2019, \u2018lat\u2019, \u2018depth_m\u2019, \u2018mea_ug_kg_orig\u2019]<\/code><\/strong> which are: longitude, latitude, depth, mean concentration of Hg in the sampled organisms (Listing 10).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 10: Read and prepare DataFrame with observations<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\n\n# Read dataframe\n\ndf = pd.read_csv('input\/190516_m2b_marine_biota.csv', sep=';', encoding = \"ISO-8859-1\", index_col='id')\n\n# Get only columns: ['lon', 'lat', 'depth_m', 'mea_ug_kg_orig']\n\ncols = ['lon', 'lat', 'depth_m', 'mea_ug_kg_orig']\n\nndf = df[cols]<\/pre>\n\n\n\n<p>As we decided earlier we limit the study extent to the Italian coast and to the maximum sampling depth of 20 meters. Study extent is defined by the extremities: 35.3N \u2013 47.05N; 6.37E \u2013 18.31E. With <code>numpy<\/code> array indexing we are able to limit area of the study to the specific area within a given coordinates (Listing 11).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 11: Defining study extent with pandas DataFrame indexing<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Limit data by latitude and longitude and depth\n# Latitude limits: 35.3N - 47.05N\n# Longitude limits: 6.37E - 18.31E\n# Depth limit: 0m - 20m\n\nlat_limit_bottom = ndf['lat'] > 35.3\nlat_limit_top =  ndf['lat'] &lt; 47.05\n\nlon_limit_bottom = ndf['lon'] > 6.37\nlon_limit_top =  ndf['lon'] &lt; 18.31\n\ndepth_limit_top = ndf['depth_m'] >= 0  # inverted!\ndepth_limit_bottom = ndf['depth_m'] &lt; 20\n\nndf = ndf[lat_limit_bottom &amp; lat_limit_top &amp; lon_limit_bottom &amp; lon_limit_top\n         &amp; depth_limit_bottom &amp; depth_limit_top]<\/pre>\n\n\n\n<p>Created <code>DataFrame<\/code> has 2387 records and we assume that it is enough for the analysis. We should store transformed data and it can be done with the method <code>pandas.to_csv()<\/code>. In this tutorial we&#8217;ll name output file as the&nbsp;<strong><code>prepared_data_mercury_concentrations_volumetric.csv<\/code><\/strong>. You can download this file from the <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-07\/idw-3d\/0103_spatial_analysis\">tutorial reposito<\/a><a href=\"https:\/\/github.com\/szymon-datalions\/articles\/tree\/main\/2021-07\/idw-3d\/0103_spatial_analysis\/input\">ry<\/a> if you\u2019ve skipped the step of a data prepration.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Analysis<\/h2>\n\n\n\n<p><strong>! Code listings in this part are coming from the <a href=\"https:\/\/github.com\/szymon-datalions\/articles\/blob\/main\/2021-07\/idw-3d\/0103_spatial_analysis\/idw-3D-analysis.ipynb\">idw-3d-analysis.ipynb<\/a> notebook.<\/strong><\/p>\n\n\n\n<p>At the beginning of this lesson we\u2019ve developed tools for data analysis. Then we\u2019ve prepared dataset. Now it\u2019s time for something far more interesting. Finally we are going to join our materials (data) and tools (algorithms) to perform spatial analysis. Our work here is straightforward with one exception at the end of the project. This part of the project requires import of a standard <em>Spatial Data Science <\/em>libraries (<code>pandas<\/code>, <code>geopandas<\/code>, <code>numpy<\/code>) and <code>matplotlib.pyplot<\/code> for the visualization. We\u2019ll import <code>ScalarMappable<\/code> class from <code>matplotlib.cm<\/code> and use it in the last part of the exercise for the <em>colorbar plotting<\/em>.<\/p>\n\n\n\n<p>Start with loading of:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong><code>prepared_data_mercury_concentrations_volumetric.csv<\/code><\/strong> file with <code>pandas<\/code>,<\/li><li><strong><code>italy_coastline.shp<\/code><\/strong> with <code>geopandas<\/code> (Listing 12).<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 12: Import packages and load data for analysis<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\nimport geopandas as gpd\nimport numpy as np\n\nimport matplotlib.pyplot as plt\nfrom matplotlib.cm import ScalarMappable\n\n# 1. Read data.\n\ndf = pd.read_csv('prepared_data_mercury_concentrations_volumetric.csv', sep=',', encoding='utf8', index_col='id')\ngdf = gpd.read_file('input\/italy_coastline.shp')\ngdf.set_index('id', inplace=True)<\/pre>\n\n\n\n<p>The good idea is to check our <code>DataFrame<\/code> before we start an analysis. The mercury concentrations table should look like<strong>:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>id<\/strong><\/td><td><strong>lon<\/strong><\/td><td><strong>lat<\/strong><\/td><td><strong>depth_m<\/strong><\/td><td><strong>mea_ug_kg_orig<\/strong><\/td><\/tr><tr><td>1169<\/td><td>12.5<\/td><td>44.9<\/td><td>5.0<\/td><td>114.0<\/td><\/tr><tr><td>9577<\/td><td>14.0<\/td><td>43.8<\/td><td>10.0<\/td><td>22.0<\/td><\/tr><\/tbody><\/table><figcaption>Table 1: A sample from the <code>DataFrame<\/code> with Mercury concentrations in Mediterranean Sea organisms.<\/figcaption><\/figure>\n\n\n\n<p>With <code>geopandas<\/code> we are able to plot our <code>GeoDataFrame<\/code> and check visually if everything is ok with the geometry. Use for it <code>gdf.plot()<\/code> method. We may inspect geometry type too, just need to print geometry attribute of the <code>GeoDataFrame<\/code> with <code>gdf.geometry<\/code> command. In our case it is the <code>MultiPoint<\/code> type.<\/p>\n\n\n\n<p><strong>The hardest part<\/strong>: build the volumetric profile of the coastline and perform inverse distance weighting. For the sake of simplicity, we skip the real depths and we build five depth levels for each point (0, 5, 10, 15, 20 meters). We can create the new <code>DataFrame<\/code> with columns for each level, but there is an option to create those profiles when interpolation algorithms runs, so we will use this opportunity. Now it is hard to imagine how will we do it, but Figure 6 shows algorithm to make it clearer. We have to write four functions to achieve the final result. Three of them already exist and those are: <code>calculate_distances()<\/code> from Listing 2, <code>inverse_distance_weighting()<\/code> from Listing 5 and <code>interpolate_values()<\/code> from Listing 9 which will be slightly changed for the needs of our process.<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"560\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig6-spatial_int_101_0103.png\" alt=\"\" class=\"wp-image-346\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig6-spatial_int_101_0103.png 800w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig6-spatial_int_101_0103-300x210.png 300w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig6-spatial_int_101_0103-768x538.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><figcaption>Figure 6: The algorithm for spatial interpolation with IDW. This algorithm expands given canvas by depth profiles.<\/figcaption><\/figure><\/div>\n\n\n\n<p>Algorithm for 3D IDW is complicated but complexity comes from the depth profile building. Without this step, as example if we provide the pre-defined canvas, then process is straightforward. Look into list below, <em>the italic paragraphs are the algorithm without the depth profile building<\/em> and <strong>the bold paragraph describes place of the additional step of the depth profile creation<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>iterate through each point where you want to interpolate values (here it is the <code>Geometry<\/code> object from the Figure 6),<\/em><\/li><li><strong>build depth profiles for the specific unknown point and return array with a series with points and the specific depths, as example: <code>[ [x, y, 0], [x, y, 5], [x, y, 10] ]<\/code> where we have three depth levels,<\/strong><\/li><li><em>perform inverse distance weighting on those points with the array containing points of the known values (here it is the <code>Points<\/code> object from the Figure 6),<\/em><\/li><li><em>assign a interpolated value to the point and return it.<\/em><\/li><\/ul>\n\n\n\n<p>All values should be passed into the <code>interpolate_values()<\/code> function (Listing 13 is a changed function). Then, in the loop, each unknown point from canvas is transferred to the <code>build_depth_profile()<\/code> function (Listing 14). From there the array with depth profiles is returned again to <code>interpolate_values()<\/code> and for each depth of this array inverse distance weighting is applied to the canvas points. Then the interpolated value is stored in the array. When all points are interpolated then output array is returned, and we can visualize it.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 13: Interpolate values, updated function<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def interpolate_values(array_of_coordinates, array_of_known_points):\n    output_arr = []\n    \n    for row in array_of_coordinates:\n        pts = build_volumetric_profile_array(row[0]) # Here is an update\n        for pt in pts:\n            interpolated = inverse_distance_weighting(pt, array_of_known_points, power=3, ndist=5)\n            interpol_arr = np.zeros(shape=(1, len(pt) + 1))\n            interpol_arr[:, :-1] = pt\n            interpol_arr[:, -1] = interpolated\n            output_arr.append(interpol_arr[0])\n        \n    return np.array(output_arr)<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 14: Build depth profile<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def build_volumetric_profile_array(points):\n    \"\"\"Function adds depth levels to the coastline geometry\"\"\"\n    ranges = np.arange(0, 21, 5)\n    mpts = []\n    for r in ranges:\n        vals = [points[0].x, points[0].y, r]\n        mpts.append(vals)\n    return np.array(mpts)<\/pre>\n\n\n\n<p>Now we move to the very interesting part of the data visualization in multiple dimensions. We have done it before (Listing 9) and we will do it again. Here\u2019s a bit of advice: our depth profiles are positive integers so our visualization will be inverted (deepest part will be at the top). To change this behavior you can modify function from Listing 14 to produce negative values, or you can invert axis in <code>matplotlib<\/code>. To do the latter we use <code>ax.invert_zaxis()<\/code> method (Figure 7).<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"478\" height=\"487\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig7-spatial_int_101_0103.jpg\" alt=\"\" class=\"wp-image-347\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig7-spatial_int_101_0103.jpg 478w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig7-spatial_int_101_0103-294x300.jpg 294w\" sizes=\"auto, (max-width: 478px) 100vw, 478px\" \/><figcaption>Figure 7: The interpolated depth profiles of Italy\u2019s coastline.<\/figcaption><\/figure><\/div>\n\n\n\n<p>Ok&#8230; there are some patterns but unfortunately plot is not readable! It is hard to distinguish specific points and values! <\/p>\n\n\n\n<p>Three or more dimensional data visualization is tricky. The good idea is to change plot to the series of 2-dimensional scatter plots to show how values are changed within the depth. This is not so easy task and I was struck with it when I wanted to plot <code>colorbar<\/code>. Fortunately, there is a method to do it. We\u2019ve mentioned earlier that we must import <code>ScallarMappable<\/code> class from <code>matplotlib<\/code>. Here&#8217;s the use of it. We are going to build a <code>colorbar<\/code> based on the assumptions that the minimum concentration of <em>MgHg<\/em> is 0 and maximum is 30&nbsp;000. Whole process is presented in the Listing 15.<\/p>\n\n\n\n<div class=\"wp-block-image is-style-default\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"296\" height=\"1024\" src=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig8fig8-spatial_int_101_0103-correct-296x1024.jpg\" alt=\"\" class=\"wp-image-348\" srcset=\"https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig8fig8-spatial_int_101_0103-correct-296x1024.jpg 296w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig8fig8-spatial_int_101_0103-correct-87x300.jpg 87w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig8fig8-spatial_int_101_0103-correct-444x1536.jpg 444w, https:\/\/ml-gis-service.com\/wp-content\/uploads\/2021\/07\/fig8fig8-spatial_int_101_0103-correct.jpg 578w\" sizes=\"auto, (max-width: 296px) 100vw, 296px\" \/><figcaption>Figure 8: Multiple depth profiles as a series of scatter plots.<\/figcaption><\/figure><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 15: Plot multiple depth profiles as a series of plots<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Get unique depths\n\nunique_depths = np.unique(interpolated[:, 2])\n\n# Prepare figures\n\nscale = np.linspace(0, 30000, 300)\ncmap = plt.get_cmap('viridis')\nnorm = plt.Normalize(scale.min(), scale.max())\nsm = ScalarMappable(norm=norm, cmap=cmap)\nsm.set_array([])\n\nfig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 35))\n\nfor idx, depth_level in enumerate(unique_depths):\n    \n    x = interpolated[:, 0][interpolated[:, 2] == depth_level]\n    y = interpolated[:, 1][interpolated[:, 2] == depth_level]\n    col = interpolated[:, -1][interpolated[:, 2] == depth_level]\n    \n    axes[idx].scatter(x, y, c=col, vmin=0, vmax=30000, cmap='viridis')\n    axes[idx].set_title('Mercury concentrtations at depth: {} meters'.format(depth_level))\n    \n    fig.colorbar(sm, ax=axes[idx])<\/pre>\n\n\n\n<p>Figure 8 is better for comparison between the depths. This teaches us that it is better to distribute data among multiple plots if it will be analyzed by the human-expert. The same approach may be used for the time-series presentation or other 3-dimensional datasets.<\/p>\n\n\n\n<p>Ok, that\u2019s all. We have created full algorithm for the3D IDW and now we are able to perform IDW in a n-dimensional space. Additionally, we know how to plot a 3D data to preserve meaningful information and reduce noise related to the overlapping points. Listing 16 is a full algorithm to perform this kind of analysis. Feel free to use it and share it!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Listing 16: Full code for 3D IDW interpolation<\/h4>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\nimport geopandas as gpd\nimport numpy as np\n\nimport matplotlib.pyplot as plt\nfrom matplotlib.cm import ScalarMappable\n\n# 1. Read data.\n# 2. Check data.\n# 3. Prepare model(geometry).\n# 4. Interpolate missing values.\n# 5. Visualize output.\n\n# 1. Read data.\n\ndf = pd.read_csv('prepared_data_mercury_concentrations_volumetric.csv', sep=',', encoding='utf8', index_col='id')\ngdf = gpd.read_file('input\/italy_coastline.shp')\ngdf.set_index('id', inplace=True)\n\n# 3. Prepare model(geometry).\n\n# For each point add depth profile - create new dataframe with those points\n\ndef build_volumetric_profile_array(points):\n    \"\"\"Function adds depth levels to the coastline geometry\"\"\"\n    ranges = np.arange(0, 21, 5)\n    mpts = []\n    for r in ranges:\n        vals = [points[0].x, points[0].y, r]\n        mpts.append(vals)\n    return np.array(mpts)\n\n# 4. Interpolate missing values.\n# 4. Calculate distances between 3D coordinates list and unknown point. Use numpy array operations.\n\ndef calculate_distances(coordinates, unknown_point):\n    # Get coordinates dimension\n    coordinates_dim = coordinates.shape[1]  # Number of columns in numpy array \/ Number of dimensions\n    \n    distances_between_dims = []\n    \n    for i in range(coordinates_dim):\n        distance = (coordinates[:, i] - unknown_point[i])**2\n        distances_between_dims.append(distance)\n    \n    dists_array = np.sqrt(np.sum(distances_between_dims, axis=0))\n    \n    return dists_array\n\ndef inverse_distance_weighting(unknown_point, points, power, ndist=10):\n    \"\"\"\n    Function estimates values in unknown points with with inverse weighted interpolation technique.\n    \n    INPUT:\n    :param unknown_point: coordinates of unknown point,\n    :param points: (array) list of points and they values [dim 1, ..., dim n, val],\n    :param power: (float) constant used to calculate IDW weight -> weight = 1\/(distance**power),\n    :param ndist: (int) how many closest distances are included in weighting,\n    \n    OUTPUT:\n    :return interpolated_value: (float) interpolated value by IDW method.\n    \n    Inverse distance weighted interpolation is:\n    \n    est = SUM(WEIGHTS * KNOWN VALS) \/ SUM(WEIGHTS)\n    and\n    WEIGHTS = 1 \/ (DISTANCE TO UNKNOWN**power)\n    \n    where:\n    power is a constant hyperparameter which tells how much point is influenced by other points. \n    \"\"\"\n\n    distances = calculate_distances(points[:, :-1], unknown_point)\n    vals_and_dists = np.c_[points[:, -1], distances]\n    \n    # Sort and get only 10 values\n    vals_and_dists = vals_and_dists[vals_and_dists[:, 1].argsort()]\n    vals_for_idw = vals_and_dists[:ndist, :]\n    \n    # Check if first distance is 0\n    if vals_for_idw[0, 1] == 0:\n        # If first distance is 0 then return first value\n        return vals_for_idw[0, 0]\n    else:\n        # If it's not perform calculations\n        weights = 1 \/ (vals_for_idw[:, 1]**power)\n        numerator = weights * vals_for_idw[:, 0]\n        interpolated_value = np.sum(numerator) \/ np.sum(weights)\n        return interpolated_value\n\ndef interpolate_values(array_of_coordinates, array_of_known_points):\n    output_arr = []\n    \n    for row in array_of_coordinates:\n        pts = build_volumetric_profile_array(row[0])\n        for pt in pts:\n            interpolated = inverse_distance_weighting(pt, array_of_known_points, power=3, ndist=5)\n            interpol_arr = np.zeros(shape=(1, len(pt) + 1))\n            interpol_arr[:, :-1] = pt\n            interpol_arr[:, -1] = interpolated\n            output_arr.append(interpol_arr[0])\n        \n    return np.array(output_arr)\n\ngeometry_array = gdf.to_numpy()\ninterpolated = interpolate_values(geometry_array, df.to_numpy())\n\n# 5. Visualize output.\n\n# Get unique depths\n\nunique_depths = np.unique(interpolated[:, 2])\n\n# Prepare figures\n\nscale = np.linspace(0, 30000, 300)\ncmap = plt.get_cmap('viridis')\nnorm = plt.Normalize(scale.min(), scale.max())\nsm = ScalarMappable(norm=norm, cmap=cmap)\nsm.set_array([])\n\nfig, axes = plt.subplots(nrows=5, ncols=1, sharex=True, figsize=(10, 35))\n\nfor idx, depth_level in enumerate(unique_depths):\n    \n    x = interpolated[:, 0][interpolated[:, 2] == depth_level]\n    y = interpolated[:, 1][interpolated[:, 2] == depth_level]\n    col = interpolated[:, -1][interpolated[:, 2] == depth_level]\n    \n    axes[idx].scatter(x, y, c=col, vmin=0, vmax=30000, cmap='viridis')\n    axes[idx].set_title('Mercury concentrtations at depth: {} meters'.format(depth_level))\n    \n    fig.colorbar(sm, ax=axes[idx])<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Exercises:<\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li>Refactor the code and create a class for the IDW calculation. This class should have all functions (methods) from the lesson. Think about the class arguments (parameters) which should be included during the object initialization.<\/li><li>Think how to visualize a 4-dimensional dataset. As example <code>[lon, lat, depth, time]<\/code>? Is there a good way to do it? What are the trade-offs of each way?<\/li><li>There is one particular problem related to our analysis: we assume that the depth has the same spatial dimension as the lon\/lat coordinates and this is not true. How to enhance algorithm to support different scales? (We will answer this question in the future, but feel free to experiment with this issue).<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Next Lesson<\/h2>\n\n\n\n<p><em><a href=\"https:\/\/ml-gis-service.com\/index.php\/2021\/10\/03\/spatial-interpolation-101-statistical-introduction-to-the-semivariance-concept\/\">Sample mean, standard deviation and confidence intervals. The foundation for the semivariance concept.<\/a><\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bibliography<\/h2>\n\n\n\n<p>[1] Health Effects of Exposures to Mercury. United States Environmental Protection Agency. Link:&nbsp;<a href=\"https:\/\/www.epa.gov\/mercury\/health-effects-exposures-mercury\">https:\/\/www.epa.gov\/mercury\/health-effects-exposures-mercury<\/a><\/p>\n\n\n\n<p>[2] EPA-FDA Fish Advice: Technical Information. United States Environmental Protection Agency. Link:&nbsp;<a href=\"https:\/\/www.epa.gov\/fish-tech\/epa-fda-fish-advice-technical-information\">https:\/\/www.epa.gov\/fish-tech\/epa-fda-fish-advice-technical-information<\/a><\/p>\n\n\n\n<p>[3] <a href=\"https:\/\/github.com\/szymon-datalions\/geoprocessing\/tree\/master\/data%20analysis\/notebooks\/spatial_101\/0102_spatial_analysis\">Github with article\u2019s notebooks and shapefiles<\/a><\/p>\n\n\n\n<p>[4]&nbsp;Cinnirella, Sergio; Bruno, Delia Evelina; Pirrone, Nicola; Horvat, Milena; \u017divkovi\u0107, Igor; Evers, David; Johnson, Sarah; Sunderland, Elsie (2019): Mercury concentrations in biota in the Mediterranean Sea, a compilation of 40 years of surveys. PANGAEA,&nbsp;<a href=\"https:\/\/doi.org\/10.1594\/PANGAEA.899723\">https:\/\/doi.org\/10.1594\/PANGAEA.899723<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Changelog<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>2021-07-18: The first release.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Move from the 2D interpolation into the 3D interpolation with the Inverse Distance Weighting algorithm.<\/p>\n","protected":false},"author":1,"featured_media":354,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,19,3,17,30,31],"tags":[57,35,32,78,10,7,62,33],"class_list":["post-329","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-machine-learning","category-python","category-scripts","category-spatial-statistics","category-tutorials","tag-geopandas","tag-how-to-calculate-inverse-distance-weighting","tag-inverse-distance-weighting","tag-missing-values","tag-numpy","tag-python","tag-spatial","tag-spatial-interpolation"],"_links":{"self":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/329","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/comments?post=329"}],"version-history":[{"count":16,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/329\/revisions"}],"predecessor-version":[{"id":517,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/329\/revisions\/517"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media\/354"}],"wp:attachment":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media?parent=329"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/categories?post=329"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/tags?post=329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}