{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00000-5835b262-3de7-443c-a97e-730574b6ce42",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/restrepo/ComputationalMethods/blob/master/material/Pandas.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00001-0bfa3d46-76d2-4b2c-ac43-a47b10e8dd42",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "d8BGX58hxE06",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00002-a39b4545-ea8c-4c33-9821-2e035ea68068",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "zm9tRsn7xE0_"
   },
   "source": [
    "From http://pandas.pydata.org/pandas-docs/stable/\n",
    "\n",
    "pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.\n",
    "\n",
    "See also:\n",
    "\n",
    "* https://github.com/restrepo/data-analysis\n",
    "  * https://classroom.github.com/g/sSMBdBqN\n",
    "  * https://classroom.github.com/a/PcbQBE7F\n",
    "* https://github.com/restrepo/PythonTipsAndTricks\n",
    "* https://pbpython.com/excel-pandas-comp.html [[archive.org](https://web.archive.org/web/20201126143453/https://pbpython.com/excel-pandas-comp.html)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00003-eedb4370-f372-408b-9c5b-493892545a83",
    "deepnote_cell_type": "markdown",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Good and practice books about `Pandas` possibilities are:\n",
    "\n",
    "\n",
    "[__Python for Data Analysis__](https://drive.google.com/open?id=0BxoOXsn2EUNIWExXbVc4SDN0YTQ)<br/>\n",
    "Data Wrangling with Pandas, NumPy, and IPython<br/>\n",
    "_By William McKinney_\n",
    "\n",
    "\n",
    "This other is about aplications based on `Pandas`:\n",
    "![image.png](https://covers.oreillystatic.com/images/0636920030515/cat.gif) [Introduction to Machine Learning with Python](https://drive.google.com/open?id=0BxoOXsn2EUNISGhrdEZ3S29fS3M)<br/>\n",
    "A Guide for Data Scientists\n",
    "By Sarah Guido, Andreas Müller\n",
    "\n",
    "[Python Books](https://drive.google.com/drive/u/0/folders/0BxoOXsn2EUNIUFFUWUkxd0JSVFk?resourcekey=0-Eo6AxHgut_mZ1-M3vD10oQ)\n",
    "\n",
    "`Pandas` can be used in a similar way to `R`, which is based on similar data structures. `Pandas` also can replace the use of graphical interfaces to access spreadsheets like Excel. In particular, can be used in joint with the module [`xlsxwriter`](https://xlsxwriter.readthedocs.io/tutorial02.html) to produce professional Excel spreadsheets at the programatical level.\n",
    "\n",
    "The similar data structure in Wolfram Alpha and Mathematica is  the `Dataset`, in C++ is the ROOT framework, and in the old Fortran 77 was `paw`. In this way, a dictionary of equivalent commands can be stablished between the several frameworks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{contents}\n",
    ":depth: 2\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00004-3b86c23e-e896-4ea9-a25d-b66422b19630",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "zd4jo7KLxE1C",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Standard way to load the module"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "cell_id": "00005-8f9079dd-73f3-4214-9b12-a007b2b221a9",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "deepnote_to_be_reexecuted": false,
    "execution_start": 1627405767098,
    "id": "K3BSlyLBxE1E",
    "source_hash": "9b82ee11"
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We already saw how NumPy arrays can improve the analysis of numerical data. For heterogeneous data the recommended tool are Pandas dataframes.\n",
    "\n",
    "Heterogeneous and nested data can be stored as list of dictionaries. For example, for people with names, birth date, sex, and a job list with start and end date, we can have"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a dictionary for each person's data\n",
    "person1 = {\"Name\": \"John Doe\", \"Birth Date\": \"01-01-1980\", \"Sex\": \"Male\", \n",
    "           \"Job\": [{\"Job Title\": \"Software Engineer\", \"Start Date\": \"01-01-2000\", \"End Date\": \"01-01-2005\"}, \n",
    "                   {\"Job Title\": \"Data Scientist\", \"Start Date\": \"01-01-2005\", \"End Date\": None}]}\n",
    "person2 = {\"Name\": \"Jane Doe\", \"Birth Date\": \"01-01-1985\", \"Sex\": \"Female\", \n",
    "           \"Job\": [{\"Job Title\": \"Product Manager\", \"Start Date\": \"01-01-2010\", \"End Date\": \"01-01-2015\"}, \n",
    "                   {\"Job Title\": \"Project Manager\", \"Start Date\": \"01-01-2015\", \"End Date\": \"01-01-2020\"}]}\n",
    "person3 = {\"Name\": \"Jim Smith\", \"Birth Date\": \"01-01-1990\", \"Sex\": \"Male\", \n",
    "           \"Job\": [{\"Job Title\": \"Data Analyst\", \"Start Date\": \"01-01-2010\", \"End Date\": \"01-01-2015\"}, \n",
    "                   {\"Job Title\": \"Business Analyst\", \"Start Date\": \"01-01-2015\", \"End Date\": \"01-01-2020\"}]}\n",
    "person4 = {\"Name\": \"Sara Johnson\", \"Birth Date\": \"01-01-1995\", \"Sex\": \"Female\", \n",
    "           \"Job\": [{\"Job Title\": \"Product Designer\", \"Start Date\": \"01-01-2015\", \"End Date\": \"01-01-2020\"}, \n",
    "                   {\"Job Title\": \"UX Designer\", \"Start Date\": \"01-01-2020\", \"End Date\": None}]}\n",
    "\n",
    "# Create a list of dictionaries\n",
    "people = [person1, person2, person3, person4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can create a DataFrame from the list of dictionaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Birth Date</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Job</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Doe</td>\n",
       "      <td>01-01-1980</td>\n",
       "      <td>Male</td>\n",
       "      <td>[{'Job Title': 'Software Engineer', 'Start Dat...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>01-01-1985</td>\n",
       "      <td>Female</td>\n",
       "      <td>[{'Job Title': 'Product Manager', 'Start Date'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Jim Smith</td>\n",
       "      <td>01-01-1990</td>\n",
       "      <td>Male</td>\n",
       "      <td>[{'Job Title': 'Data Analyst', 'Start Date': '...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sara Johnson</td>\n",
       "      <td>01-01-1995</td>\n",
       "      <td>Female</td>\n",
       "      <td>[{'Job Title': 'Product Designer', 'Start Date...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Name  Birth Date     Sex  \\\n",
       "0      John Doe  01-01-1980    Male   \n",
       "1      Jane Doe  01-01-1985  Female   \n",
       "2     Jim Smith  01-01-1990    Male   \n",
       "3  Sara Johnson  01-01-1995  Female   \n",
       "\n",
       "                                                 Job  \n",
       "0  [{'Job Title': 'Software Engineer', 'Start Dat...  \n",
       "1  [{'Job Title': 'Product Manager', 'Start Date'...  \n",
       "2  [{'Job Title': 'Data Analyst', 'Start Date': '...  \n",
       "3  [{'Job Title': 'Product Designer', 'Start Date...  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame(people)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As with NumPy, we can create masks in order to filter out specific rows of the dataframe. For example, to filter out the female persons by using the syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Birth Date</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Job</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>01-01-1985</td>\n",
       "      <td>Female</td>\n",
       "      <td>[{'Job Title': 'Product Manager', 'Start Date'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sara Johnson</td>\n",
       "      <td>01-01-1995</td>\n",
       "      <td>Female</td>\n",
       "      <td>[{'Job Title': 'Product Designer', 'Start Date...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Name  Birth Date     Sex  \\\n",
       "1      Jane Doe  01-01-1985  Female   \n",
       "3  Sara Johnson  01-01-1995  Female   \n",
       "\n",
       "                                                 Job  \n",
       "1  [{'Job Title': 'Product Manager', 'Start Date'...  \n",
       "3  [{'Job Title': 'Product Designer', 'Start Date...  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"Sex\"] == \"Female\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To filter out the last job of each person by using the following code (`.get` is a safer way to obtain the value of the key of a dictionary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Birth Date</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Last job</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Doe</td>\n",
       "      <td>01-01-1980</td>\n",
       "      <td>Male</td>\n",
       "      <td>Data Scientist</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Doe</td>\n",
       "      <td>01-01-1985</td>\n",
       "      <td>Female</td>\n",
       "      <td>Project Manager</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Jim Smith</td>\n",
       "      <td>01-01-1990</td>\n",
       "      <td>Male</td>\n",
       "      <td>Business Analyst</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Sara Johnson</td>\n",
       "      <td>01-01-1995</td>\n",
       "      <td>Female</td>\n",
       "      <td>UX Designer</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Name  Birth Date     Sex          Last job\n",
       "0      John Doe  01-01-1980    Male    Data Scientist\n",
       "1      Jane Doe  01-01-1985  Female   Project Manager\n",
       "2     Jim Smith  01-01-1990    Male  Business Analyst\n",
       "3  Sara Johnson  01-01-1995  Female       UX Designer"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Last job']=df[\"Job\"].apply(lambda L: L[-1].get('Job Title'))\n",
    "df[['Name','Birth Date','Sex','Last job']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00006-934fccd7-fef5-46fb-aa5b-3763a9df25a2",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "I7Hbe9JEadOC",
    "tags": []
   },
   "source": [
    "## Basic structure: DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00007-db7b469d-ba71-4e06-8bbd-d90e212a02d5",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "IE_Vzycqajls"
   },
   "source": [
    "An flat _spreadsheet_ can be seen in terms of the types of variables of `Python` just as dictionary of lists, where each column of the spreadsheet is a pair key-list of the dictionary \n",
    "\n",
    "|   |  A   |  B   |\n",
    "|---|:----:|:----:|\n",
    "| 1 | even | odd  |\n",
    "| 2 |   0  | 1    |\n",
    "| 3 |   2  | 3    |\n",
    "| 4 |   4  | 5    |\n",
    "| 5 |   6  | 7    |\n",
    "| 6 |   8  | 9    |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "cell_id": "00008-28de7f9e-e38f-489f-a753-c739eba5c428",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "lxEc0jlHauyc"
   },
   "outputs": [],
   "source": [
    "numbers={\"even\": [0,2,4,6,8],   #  First  key-list\n",
    "         \"odd\" : [1,3,5,7,9] }  #  Second key-list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00009-3837521f-1c26-475f-91f4-43151fde0224",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "HeN8rw1JxE1N",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Data structures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00010-1260af90-44ba-48f6-b514-529cb21a5dd8",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "UiL0O2SNcCfG"
   },
   "source": [
    "`Pandas` has two new data structures:\n",
    "1. `DataFrame` which are similar to numpy arrays but with some assigned key. For example, for the previous case\n",
    "```python\n",
    "import numpy as np\n",
    "np.array([[0,1],\n",
    "          [2,3],\n",
    "          [4,5],\n",
    "          [6,7],\n",
    "          [8,9] \n",
    "         ])\n",
    "```\n",
    "1. `Series` which are enriched  to dictionaries, as the ones defined for the rows of the previous example: `{'even':0,'odd':1}`.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00011-361a470d-c350-4bba-820a-01338fffcc89",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "jaWfZyimxE1O"
   },
   "source": [
    "\n",
    "The rows in a two-dimensional `DataFrame` corresponds to `Series` with similar keys, while the columns are also Series with the indices as keys. \n",
    "\n",
    "An example of a  `DataFrame` is a spreadsheet, as the one before."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00012-fbda39d3-0cd0-48b0-ba40-4cfc129f8320",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "uRdBR4BXc8WY",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### `DataFrame`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Pandas` can convert a dictionary of lists, like the `numbers` dictionary before, into a `DataFrame`, which is just an spreadsheet but interpreted at the programming level:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "cell_id": "00013-8aff6f6f-3fd7-4e99-89eb-ca977338a551",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'even': [0, 2, 4, 6, 8], 'odd': [1, 3, 5, 7, 9]}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "cell_id": "00014-3e464053-625b-4ee3-b72c-578522966832",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 198
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "dC00VECAd_Dr",
    "outputId": "d17fed39-9a51-4e4d-8636-f4e920563fb5"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1\n",
       "1     2    3\n",
       "2     4    5\n",
       "3     6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df=pd.DataFrame(numbers)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x7f7d7a779880>]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAhRElEQVR4nO3deXxV9Z3G8c8PEpaEHcJOSDDsBAQjCLjiioiC1HFrtVoH27GtTlsFcUPBBeu0th2rgvuodSwJoiiK1A2rooCYDcK+hC1hS0IWstzv/EF0GAfkBu7NObn3eb9evExyDzePIXn48bvnfI8zM0RExL8aeR1ARER+mIpaRMTnVNQiIj6nohYR8TkVtYiIz8WE40k7dOhgSUlJ4XhqEZGItHz58t1mlnCkx8JS1ElJSSxbtiwcTy0iEpGcc5uP9pi2PkREfE5FLSLicypqERGfU1GLiPicilpExOeCKmrn3K3OuWznXI5z7rYwZxIRkcMcs6idc4OAfwWGA0OAS5xzKeEOJiIihwSzou4PLDWzMjOrBj4GLg9vLBGRhuWrTXt56uP1YXnuYIo6GzjDOdfeORcHXAz0+P5BzrnJzrllzrllhYWFoc4pIuJLBw5Wc+/8bK546nNeXbqFssrqkH+OY16ZaGarnHOzgEVAKbASqDnCcbOB2QBpaWm6G4GIRLyP8gq4a14224vKuWF0Er+7oC9xTUJ/wXdQz2hmzwLPAjjnHgLyQ55ERKSB2FdayYy3c8lYsY2Uji2Y+/NRnNKzbdg+X1BF7ZzraGYFzrlEDu1Pnxa2RCIiPmVmLMzeyb3zs9lfVsWvxqTwyzEpNI1pHNbPG+waPd051x6oAm4xs/3hiyQi4j8FxRXcMz+b93J2kdqtNS/dOIIBXVvVy+cOduvjjHAHERHxIzPj78vzmbkgl4PVAaaO7cdNpycT07j+rhcMy5hTEZFIsHVvGXdmZPHput0MT27HI5en0iuhRb3nUFGLiHxPTcB48bNN/P69PBo3csycMIhrhifSqJHzJI+KWkTkMGt3lTAlPZMVW/Zzdt8EHpqYStc2zT3NpKIWEQGqagI89dF6/vLBOuKbNubxK0/mspO74pw3q+jDqahFJOpl5Rdx+9xvWL2zhPFDunLf+AF0aNHU61jfUVGLSNSqqKrhj4vXMOeTDSS0bMqc69I4f0Anr2P9PypqEYlKX2zYw9T0TDbtKePq4T2YOrY/rZvHeh3riFTUIhJVSiqqeGThal5ZuoXEdnG8etMIRqV08DrWD1JRi0jU+HB1AdPmZbGruIKbTk/mNxf0CcsQpVDzf0IRkRO0t7SSB97K4Y2V2+nTqQV/vXYUQxPDN0Qp1FTUIhKxzIwFmTuY/mYOxRVV3Hpub245J4UmMQ3rdrEqahGJSDuLKrj7jWwWr9rFkO6tmfWjEfTrXD9DlEJNRS0iEcXMeO2rrTz09iqqAgHuurg/N56eTGOPLv8OBRW1iESMzXtKmZqexecb9nBar3Y8cvlgkjrEex3rhKmoRaTBqwkYz/9zI48tyiO2USMevjyVK9N6eDZEKdRU1CLSoOXtLOGO9Ey+2bqf8/p3ZOaEVDq3buZ1rJAK9lZc/w7cBBiQBdxgZhXhDCYi8kMqqwP89aN1PPHhOlo2i+XPVw9l/OAuvhiiFGrHLGrnXDfg18AAMyt3zr0OXAW8EOZsIiJHtHLrfqbMzSRvVwmXndyV+8YPpF18E69jhU2wWx8xQHPnXBUQB2wPXyQRkSMrr6zhD+/n8eynG+nYshnPXp/Guf39N0Qp1I5Z1Ga2zTn3GLAFKAcWmdmi7x/nnJsMTAZITEwMdU4RiXKfrd/N1PQstuwt49oRiUwZ249Wzfw5RCnUjnl5jnOuLXAZkAx0BeKdcz/+/nFmNtvM0swsLSEhIfRJRSQqFVdUcWdGFtfMWUojB69NPo0HJ6ZGTUlDcFsf5wEbzawQwDmXAYwCXg5nMBGRxbm7uOuNLApLDnLzmb247bw+NG/S2OtY9S6Yot4CnOaci+PQ1se5wLKwphKRqLbnwEGmv5XLW99sp1/nlsy5Lo3B3dt4HcszwexRL3XOzQVWANXA18DscAcTkehjZrz5zXamv5nDgYPV/Ob8Pvz8rJMa3BClUAvqrA8zuw+4L8xZRCSKbd9fzt1vZPPB6gKGJrZh1qTB9OnU0utYvqArE0XEU4GA8eqXW3hk4WpqAsa9lwzg+lFJDXqIUqipqEXEMxt3lzI1PZOlG/cyOqU9D08cTGL7OK9j+Y6KWkTqXXVNgGc/3cgf3l9Dk5hGPDppMFekdY/Iy79DQUUtIvVq1Y5ipqRnkplfxPkDOjFzwiA6tYqsIUqhpqIWkXpxsLqGJz5Yx18/Wk+buFieuGYYF6d21io6CCpqEQm75Zv3MSU9k3UFB7h8WDfuGTeAthE8RCnUVNQiEjZlldX8/r08XvhsE11aNeP5G07lnL4dvY7V4KioRSQsPl27m6kZmeTvK+e6kT2546J+tGiqyjke+qqJSEgVlVfx4Nu5vL4sn+QO8bx+80iGJ7fzOlaDpqIWkZB5L2cn97yRzZ7SSn5x9kncem5vmsVG3xClUFNRi8gJKyw5yPQ3c3g7awcDurTiuZ+eyqBurb2OFTFU1CJy3MyMjBXbeGBBLuWVNdx+YV8mn9mL2MbRPUQp1FTUInJctu0vZ1pGFh+vKeSUnm2ZNWkwKR1beB0rIqmoRaROAgHj5aWbmbVwNQbcf+lAfnJaTxppiFLYqKhFJGjrCw8wNT2Trzbt44zeHXhoYio92mmIUripqEXkmKpqAsxZsoHHF6+leWxjHrtiCJOGddPl3/XkmEXtnOsL/PdhH+oF3Gtmj4crlIj4R/a2IqakZ5KzvZixgzpz/2UD6dhSQ5TqUzC34soDTgZwzjUGtgHzwhtLRLxWUVXDXz5Yy1Mfb6BtXBOevHYYY1O7eB0rKtV16+NcYL2ZbQ5HGBHxh2Wb9jIlPZP1haVccUp37hrXnzZxGqLklboW9VXA3470gHNuMjAZIDEx8QRjiYgXSg8eGqL04ueb6Nq6OS/dOJwz+yR4HSvqOTML7kDnmgDbgYFmtuuHjk1LS7Nly5aFIJ6I1JeP1xQyLSOL7UXlXD8yidsv7Eu8hijVG+fccjNLO9JjdflTGAusOFZJi0jDsr+skhkLVpG+Ip+TEuL5+80jSUvSECU/qUtRX81Rtj1EpGFamLWDe+bnsK+skl+ek8Ivx6RoiJIPBVXUzrl44Hzg5vDGEZH6UFBcwb3zc3g3ZycDu7bixRtPZWBXDVHyq6CK2sxKgfZhziIiYWZmzF2ez4wFuVRUB5hyUT/+9YxkYjREydf0SoFIlNi6t4xp87JYsnY3w5Pa8fCkVE5K0BClhkBFLRLhagLGS59v4vfv5eGAGZcN5NoRGqLUkKioRSLYuoISpqRnsXzzPs7qk8BDl6fSrU1zr2NJHamoRSJQVU2Apz9ez5//sY64po35w78MYeJQDVFqqFTUIhEme1sRt8/NZNWOYsYN7sL08QNJaNnU61hyAlTUIhGioqqGxxevZc6SDbSLb8LTPzmFCwd29jqWhICKWiQCfLlxL1PTM9mwu5Qr03ow7eL+tI6L9TqWhIiKWqQBK6mo4tF38/ivLzbTvW1zXv7ZCE7v3cHrWBJiKmqRBurDvALuyshiR3EFN45O5ncX9iGuiX6kI5H+VEUamH2llcxYkEvG19vo3bEF6b8YxbDEtl7HkjBSUYs0EGbG21k7uG9+DkXlVfx6TAq3jEmhaYyGKEU6FbVIA7CruIJ73shmUe4uUru15uWbRtC/SyuvY0k9UVGL+JiZ8fqyrcx8exWV1QHuHNuPn52uIUrRRkUt4lNb9pQxNSOTz9bvYXhyO2ZNGkxyh3ivY4kHVNQiPlMTMF74bBOPvZdH40aOmRMGcc3wRA1RimLB3jigDfAMMAgw4EYz+zyMuUSi0ppdJdwxN5OVW/czpl9HZk4YRFcNUYp6wa6o/wS8a2Y/qr3JbVwYM4lEncrqAE99vJ6/fLCWFk1j+NNVJ3PpkK4aoiRAEEXtnGsNnAn8FMDMKoHK8MYSiR7fbN3PlPRMVu8sYfyQrkwfP4D2LTRESf5XMCvqZKAQeN45NwRYDtxae3uu7zjnJgOTARITE0OdUyTilFfW8MfFa3hmyQYSWjZlznVpnD+gk9exxIeCOccnBhgGPGlmQ4FSYOr3DzKz2WaWZmZpCQkJIY4pElk+X7+HsX/6hNmfbODKU3vw/m/OUknLUQWzos4H8s1sae37czlCUYvIsRVXVPHIwtW8unQLie3iePWmEYxK0RAl+WHHLGoz2+mc2+qc62tmecC5QG74o4lElg9W72JaRjYFJRXcdHoyv72gL82b6PJvObZgz/r4FfBK7RkfG4AbwhdJJLLsOXCQBxbkMn/ldvp0asGTPx7FUA1RkjoIqqjNbCWQFt4oIpHFzHgrcwfT38yhpKKK287rzb+dnUKTGF3+LXWjKxNFwmBHUTn3vJHN4lUFDOnRhkcnDaZv55Zex5IGSkUtEkKBgPHaV1t5+J1VVAUC3D2uPzeMTqaxLv+WE6CiFgmRTbtLmZqRyRcb9jKyV3semZRKz/YaoiQnTkUtcoJqAsZzn27kP97PI7ZRIx6+PJWrTu2hy78lZFTUIicgb2cJd8z9hm/yizivf0dmTkilc+tmXseSCKOiFjkOldUBnvhwHX/9aB0tm8Xy56uHMn5wF62iJSxU1CJ19PWWfUxJz2TNrgNMOLkr944fSLv4Jl7HkgimohYJUlllNf+xaA3P/XMjnVs147mfpjGmn+ZzSPipqEWC8Nm63UzNyGLL3jKuHZHI1LH9aNks1utYEiVU1CI/oKi8ioffWcVrX20lqX0cr00+jdN6tfc6lkQZFbXIUbyfu4u738iisOQgN5/Zi9vO66MhSuIJFbXI9+w+cJDpb+awIHMH/Tq3ZM51aQzu3sbrWBLFVNQitcyM+Su3c/9bORw4WM1vzu/Dz886SUOUxHMqahFg+/5y7pqXxYd5hQxNbMOsSYPp00lDlMQfVNQS1QIB45UvtzBr4WpqAsa9lwzg+lFJGqIkvqKilqi1cXcpU9Iz+XLjXkantOfhiYNJbB/ndSyR/yeoonbObQJKgBqg2sx0EwFpsKprAjzz6Ub++P4amsQ04tFJg7kirbsu/xbfqsuK+hwz2x22JCL1IHd7MVPSM8naVsT5Azoxc8IgOrXSECXxN219SFQ4WF3Df36wjic/Wk+buFieuGYYF6d21ipaGoRgi9qARc45A542s9nfP8A5NxmYDJCYmBi6hCInaPnmQ0OU1hUcYOLQbtx7yQDaaoiSNCDBFvXpZrbNOdcReN85t9rMPjn8gNryng2QlpZmIc4pUmelB6t5bFEeL3y2iS6tmvH8DadyTt+OXscSqbNg70K+rfa/Bc65ecBw4JMf/l0i3lmytpA7M7LI31fOdSN7csdF/WjRVDt90jAd8zvXORcPNDKzktq3LwAeCHsykeNQVFbFg+/k8vqyfJI7xPP6zSMZntzO61giJySYJUYnYF7tiy4xwKtm9m5YU4kch3ezd3LP/Gz2llbyi7NP4tZze9MsVkOUpOE7ZlGb2QZgSD1kETkuhSWHhii9nbWD/l1a8dz1p5LavbXXsURCRpt20mCZGRkrtvHAglzKK2u4/cK+TD6zF7GNNURJIouKWhqk/H1lTJuXzSdrCjmlZ1tmTUolpaOGKElkUlFLgxIIGC8v3cyshasxYPr4AVw3MolGGqIkEUxFLQ3G+sIDTE3P5KtN+zijdwcemphKj3YaoiSRT0UtvldVE2DOkg08vngtzWIa8fsfDeZHp2iIkkQPFbX4Wva2IqakZ5KzvZiLBnbmgQkD6dhSQ5QkuqioxZcqqmr4ywdreerjDbSNa8KT1w5jbGoXr2OJeEJFLb6zbNNe7kjPZENhKT86pTt3j+tPmzgNUZLopaIW3zhwsJrfv7ual77YTNfWzXnpxuGc2SfB61ginlNRiy98vKaQaRlZbC8q5/qRSdx+YV/iNURJBFBRi8f2l1UyY8Eq0lfk0yshnr/fPJK0JA1REjmcilo8szBrB/fMz2FfWSW3nHMSvxqjIUoiR6KilnpXUFzBvfNzeDdnJwO7tuLFG09lYFcNURI5GhW11BszY+7yfGYsyKWiOsCUi/px0xnJGqIkcgwqaqkXW/eWMW1eFkvW7ubUpLY8MmkwJyW08DqWSIOgopawqgkY//X5Jh59Lw8HzLhsINeO6KkhSiJ1EHRRO+caA8uAbWZ2SfgiSaRYV1DClPQslm/ex1l9Enhw4iC6t9UQJZG6qsuK+lZgFdAqTFkkQlTVBHj64/X8+R/riGvamD/8yxAmDu2mIUoixymoonbOdQfGAQ8CvwlrImnQsrcVcfvcTFbtKGZcahemXzqQhJZNvY4l0qAFu6J+HLgDOOotNJxzk4HJAImJiSccTBqWiqoaHl+8ljlLNtAuvglP/fgULhrU2etYIhHhmEXtnLsEKDCz5c65s492nJnNBmYDpKWlWagCiv99uXEvU9Mz2bC7lCvTejDt4v60jov1OpZIxAhmRT0auNQ5dzHQDGjlnHvZzH4c3mjidwcOVjNr4Wr+64vNdG/bnJd/NoLTe3fwOpZIxDlmUZvZncCdALUr6t+ppOXDvALuyshiR3EFN45O5ncX9iGuic72FAkH/WRJnewrrWTGglwyvt5GSscWzP35KE7p2dbrWCIRrU5FbWYfAR+FJYn4mpnxdtYO7pufQ1F5Fb8ek8ItY1JoGqMhSiLhphW1HNOu4grueSObRbm7SO3WmpdvGkH/LjqdXqS+qKjlqMyM15dtZebbq6isDnDn2H787PRkYjRESaReqajliLbsKePOeZn8c90ehie3Y9akwSR3iPc6lkhUUlHL/1ETMF74bBOPvZdH40aOmRMGcc3wRA1REvGQilq+s3ZXCXekZ/L1lv2c0zeBByem0rVNc69jiUQ9FbVQWR3gqY/X85cP1tKiaQx/uupkLh3SVUOURHxCRR3lvtm6nynpmazeWcL4IV2ZPn4A7VtoiJKIn6ioo1R5ZQ2PL17DnCUbSGjZlDnXpXH+gE5exxKRI1BRR6EvNuxhanomm/aUcfXwHtx5cX9aNdMQJRG/UlFHkZKKKh5ZuJpXlm4hsV0cr940glEpGqIk4ncq6ijxwepd3DUvm13FFdx0ejK/vaAvzZvo8m+RhkBFHeH2llbywFs5vLFyO306teCv145iaKKGKIk0JCrqCGVmvJW5g+lv5lBSUcVt5/Xm385OoUmMLv8WaWhU1BFoZ1EFd7+RxeJVBQzp0YZHJw2mb+ej3kVNRHxORR1BzIzXvtrKQ2+voioQ4O5x/blhdDKNdfm3SIMWzD0TmwGfAE1rj59rZveFO5jUzeY9pUxNz+LzDXsY2as9j0xKpWd7DVESiQTBrKgPAmPM7IBzLhb41Dm30My+CHM2CUJNwHj+nxt5bFEesY0a8fDlqVx1ag9d/i0SQYK5Z6IBB2rfja39pbuM+0DezkNDlL7Zup/z+ndk5oRUOrdu5nUsEQmxoPaonXONgeVACvCEmS09wjGTgckAiYmJocwo31NZHeCJD9fx14/W0apZLH+5eiiXDO6iVbRIhAqqqM2sBjjZOdcGmOecG2Rm2d87ZjYwGyAtLU0r7jBZuXU/d8z9hjW7DjDh5K7cO34g7eKbeB1LRMKorje33e+c+xC4CMg+1vESOuWVNfzHojye++dGOrVqxnM/TWNMPw1REokGwZz1kQBU1ZZ0c+B8YFbYk8l3Plu/m6npWWzZW8a1IxKZOrYfLTVESSRqBLOi7gK8WLtP3Qh43cwWhDeWABRXVPHwO6v425dbSWofx2uTT+O0Xu29jiUi9SyYsz4ygaH1kEUO837uLu5+I4vCkoPcfGYvbjuvj4YoiUQpXZnoM7sPHGT6mzksyNxBv84tmXNdGoO7t/E6loh4SEXtE2bG/JXbuf+tHEoP1vDb8/tw81knaYiSiKio/WD7/nLufiObD1YXMDTx0BCl3p00RElEDlFReygQMF79cguPLFxNTcC495IBXD8qSUOUROT/UFF7ZOPuUqakZ/Llxr2cntKBhy9PpUe7OK9jiYgPqajrWXVNgGc+3cgf319Dk5hGPDppMFekddfl3yJyVCrqepS7vZgp6ZlkbSviggGdmDFhEJ1aaYiSiPwwFXU9OFhdw39+sI4nP1pPm7hYnrhmGBendtYqWkSCoqIOs+Wb9zElPZN1BQe4fFg37hk3gLYaoiQidaCiDpPSg9U8tiiPFz7bRJdWzXj+hlM5p29Hr2OJSAOkog6DJWsLuTMji/x95Vw3sid3XNSPFk31pRaR46P2CKGisioefCeX15fl06tDPK/fPJLhye28jiUiDZyKOkTezd7JPfOz2VtayS/OPolbz+1Ns1gNURKRE6eiPkGFJYeGKL2dtYMBXVrx/E9PZVC31l7HEpEIoqI+TmZGxoptPLAgl/LKGm6/sC+Tz+xFbGMNURKR0FJRH4f8fWVMm5fNJ2sKOaVnW2ZNGkxKxxZexxKRCBXMrbh6AC8BnQADZpvZn8IdzI8CAePlpZuZtXA1Btx/6UB+clpPGmmIkoiEUTAr6mrgt2a2wjnXEljunHvfzHLDnM1X1hceYGp6Jl9t2scZvTvw0EQNURKR+hHMrbh2ADtq3y5xzq0CugFRUdRVNQHmLNnA44vX0jy2MY9dMYRJw7rp8m8RqTd12qN2ziVx6P6JS4/w2GRgMkBiYmIosnkue1sRU9IzydlezNhBnbn/soF0bKkhSiJSv4IuaudcCyAduM3Mir//uJnNBmYDpKWlWcgSeqCiqoY//2MtT3+ygbZxTXjy2mGMTe3idSwRiVJBFbVzLpZDJf2KmWWEN5K3lm3ayx3pmWwoLOWKU7pz97gBtI6L9TqWiESxYM76cMCzwCoz+0P4I3njwMFqfv/ual76YjNdWzfnpRuHc2afBK9jiYgEtaIeDfwEyHLOraz92DQzeydsqerZx2sKmZaRxfaicq4fmcTtF/YlXkOURMQngjnr41MgIk9x2F9WyYwFq0hfkc9JCfH8/eaRpCVpiJKI+EvULhvfydrBvfOz2VdWxS/PSeGXY1I0RElEfCnqirqguIJ75+fwbs5OBnVrxYs3DmdgVw1REhH/ipqiNjP+vjyfmQtyqagOMOWifvzrGcnEaIiSiPhcVBT11r1lTJuXxZK1uxme1I5HJqXSK0FDlESkYYjooq4JGC99volH382jkYMZlw3k2hEaoiQiDUvEFvW6ghLumJvJii37OatPAg9dnkq3Ns29jiUiUmcRV9RVNQGe/ng9f/7HOuKaNuaPVw5hwskaoiQiDVdEFXVWfhG3z/2G1TtLGDe4C/dfOpAOLZp6HUtE5IRERFFXVNXwx8VreGbJRtrHN+Hpn5zChQM7ex1LRCQkGnxRL92wh6kZWWzcXcqVaT2YNq4/rZtriJKIRI4GW9QlFVXMenc1L3+xhR7tmvPKTSMYndLB61giIiHXIIv6w9UF3DUvix3FFdw4OpnfXdiHuCYN8n9FROSYGlS77S2tZMaCXOZ9vY3eHVuQ/otRDEts63UsEZGwahBFbWYsyNzB9DdzKCqv4tfn9uaWc06iaYyGKIlI5PN9Ue8qruCuedksXrWLwd1b8/JNI+jfpZXXsURE6o1vi9rM+O+vtvLgO6uorA4w7eJ+3DhaQ5REJPoEcyuu54BLgAIzGxT+SLBlTxlTMzL5bP0eRiS3Y9akwSR1iK+PTy0i4jvBrKhfAP4TeCm8UQ4NUXr+nxt5bFEeMY0a8eDEQVx9aqKGKIlIVAvmVlyfOOeSwh2kqKyK65//kpVb9zOmX0cenDiILq01RElEJGR71M65ycBkgMTExDr//lbNY+jZPo4bRidx6ZCuGqIkIlLLmdmxDzq0ol4Q7B51WlqaLVu27ASjiYhED+fccjNLO9JjOoVCRMTnVNQiIj53zKJ2zv0N+Bzo65zLd879LPyxRETkW8Gc9XF1fQQREZEj09aHiIjPqahFRHxORS0i4nMqahERnwvqgpc6P6lzhcDm4/ztHYDdIYwTKspVN8pVN8pVN5GYq6eZJRzpgbAU9Ylwzi072tU5XlKuulGuulGuuom2XNr6EBHxORW1iIjP+bGoZ3sd4CiUq26Uq26Uq26iKpfv9qhFROT/8uOKWkREDqOiFhHxOd8UtXPuIudcnnNunXNuqtd5vuWce845V+Ccy/Y6y7eccz2ccx8653KdcznOuVu9zvQt51wz59yXzrlvarPd73WmbznnGjvnvnbOLfA6y+Gcc5ucc1nOuZXOOd/cccM518Y5N9c5t9o5t8o5N9IHmfrWfp2+/VXsnLvN61wAzrl/r/2ez3bO/c051yxkz+2HPWrnXGNgDXA+kA98BVxtZrmeBgOcc2cCB4CX6usu7MfinOsCdDGzFc65lsByYIJPvl4OiDezA865WOBT4FYz+8LjaDjnfgOkAa3M7BKv83zLObcJSDMzX13A4Zx7EVhiZs8455oAcWa23+NY36ntjW3ACDM73gvsQpWlG4e+1weYWblz7nXgHTN7IRTP75cV9XBgnZltMLNK4DXgMo8zAYdu7gvs9TrH4cxsh5mtqH27BFgFdPM21SF2yIHad2Nrf3m+GnDOdQfGAc94naUhcM61Bs4EngUws0o/lXStc4H1Xpf0YWKA5s65GCAO2B6qJ/ZLUXcDth72fj4+KR6/q72f5VBgqcdRvlO7xbASKADeNzM/ZHscuAMIeJzjSAxY5JxbXnuTaD9IBgqB52u3i55xzsV7Hep7rgL+5nUIADPbBjwGbAF2AEVmtihUz++Xopbj4JxrAaQDt5lZsdd5vmVmNWZ2MtAdGO6c83TLyDl3CVBgZsu9zPEDTjezYcBY4Jba7TavxQDDgCfNbChQCvjptaMmwKXA373OAuCca8uhXYBkoCsQ75z7caie3y9FvQ3ocdj73Ws/JkdRu/+bDrxiZhle5zmS2n8qfwhc5HGU0cCltXvBrwFjnHMvexvpf9WuxjCzAmAeh7YCvZYP5B/2r6G5HCpuvxgLrDCzXV4HqXUesNHMCs2sCsgARoXqyf1S1F8BvZ1zybV/U14FvOlxJt+qfcHuWWCVmf3B6zyHc84lOOfa1L7dnEMvEK/2MpOZ3Wlm3c0siUPfWx+YWchWOyfCORdf+4IwtVsLFwCen2FkZjuBrc65vrUfOhfw/MXqw1yNT7Y9am0BTnPOxdX+fJ7LodeOQuKY90ysD2ZW7Zz7JfAe0Bh4zsxyPI4FfHdz37OBDs65fOA+M3vW21SMBn4CZNXuBQNMM7N3vIv0nS7Ai7WvyDcCXjczX50O5zOdgHmHfraJAV41s3e9jfSdXwGv1C6eNgA3eJwH+O4vtPOBm73O8i0zW+qcmwusAKqBrwnh5eS+OD1PRESOzi9bHyIichQqahERn1NRi4j4nIpaRMTnVNQiIj6nohYR8TkVtYiIz/0PIJPTWbol3scAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(df['even'],df['odd'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00019-e82851df-dc6e-44e5-baf1-f53490dc4a4f",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "1ftuPJd5eiu6"
   },
   "source": [
    "See below for other possibilities of [creating Pandas DataFrames from lists and dictionaries](https://fisica.udea.edu.co:4443/user/restrepo/notebooks/prog/cursos/data-analysis/Pandas.ipynb#Intialization-from-lists-and-dictionaries)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00020-7464d85a-cbe8-4ec3-939b-846f627fc238",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "myuwII1Ed9PE",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The main advantage of the `DataFrame`,`df`, upon a spreadsheet, is that it can be managed just at the programming level without any graphical interface."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00021-8888b732-743b-4a62-9560-33fac5c21a4b",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "5A0YScKRehta"
   },
   "source": [
    "We can check the shape of the `DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "cell_id": "00022-25b27bf8-fc9d-4749-a5b1-c6cb7c67fef0",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "RV-ivs5Weyzq",
    "outputId": "865e12f5-aa0c-4d40-e02d-504431226ccf"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5, 2)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00023-90fd3ae8-8621-4469-8f9c-f89eb0c2bf00",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "ZMc0puXbe3nD",
    "jp-MarkdownHeadingCollapsed": true,
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "####  Export DataFrame to other formats"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* To export to excel:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "cell_id": "00024-4a17d6f4-9124-4228-ae9e-983b67419459",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "ioWKY1FAe-rK"
   },
   "outputs": [],
   "source": [
    "df.to_excel('example.xlsx',index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "cell_id": "00025-0d0d4e5f-471c-4e95-8cc6-cd5b604408c2",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1\n",
       "1     2    3\n",
       "2     4    5\n",
       "3     6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf=pd.read_excel('example.xlsx')\n",
    "newdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "      <th>fractions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>2.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>8.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd  fractions\n",
       "0     0    1        0.5\n",
       "1     2    3        2.5\n",
       "2     4    5        4.5\n",
       "3     6    7        6.5\n",
       "4     8    9        8.5"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf['fractions']=[0.5,2.5,4.5,6.5,8.5]\n",
    "newdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "      <th>fractions</th>\n",
       "      <th>next fractions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>2.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>6.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>8.5</td>\n",
       "      <td>1.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd  fractions  next fractions\n",
       "0     0    1        0.5             1.5\n",
       "1     2    3        2.5             1.5\n",
       "2     4    5        4.5             1.5\n",
       "3     6    7        6.5             1.5\n",
       "4     8    9        8.5             1.5"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf['next fractions']=1.5\n",
    "newdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "      <th>fractions</th>\n",
       "      <th>next fractions</th>\n",
       "      <th>(3, next to next fractions)</th>\n",
       "      <th>next to next fractions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>1.7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>2.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>1.7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>4.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>1.7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>6.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>1.7</td>\n",
       "      <td>1.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>8.5</td>\n",
       "      <td>1.5</td>\n",
       "      <td>1.7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd  fractions  next fractions  (3, next to next fractions)  \\\n",
       "0     0    1        0.5             1.5                          1.7   \n",
       "1     2    3        2.5             1.5                          1.7   \n",
       "2     4    5        4.5             1.5                          1.7   \n",
       "3     6    7        6.5             1.5                          1.7   \n",
       "4     8    9        8.5             1.5                          1.7   \n",
       "\n",
       "   next to next fractions  \n",
       "0                     NaN  \n",
       "1                     NaN  \n",
       "2                     NaN  \n",
       "3                     1.7  \n",
       "4                     NaN  "
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newdf.loc[3,'next to next fractions']=1.7\n",
    "newdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00026-53b72290-ece9-47a0-bf74-45d0b5599a48",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "yIdeiNi_fBPJ"
   },
   "source": [
    "__Activity__: Open the resulting spreadsheet in Google Drive, publish it and open from the resulting link with Pandas in the next cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "00027-afa925cb-92c5-4323-b67d-3fc9b7b06f3c",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "ZqIbSbLvf0wm"
   },
   "outputs": [],
   "source": [
    "df=pd.read_excel('PASTE THE PUBLISHED LINK HERE')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6.0</td>\n",
       "      <td>7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0   0.0  1.0\n",
       "1   2.0  3.0\n",
       "2   4.0  5.0\n",
       "3   6.0  7.0\n",
       "4   8.0  9.0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.read_excel('https://docs.google.com/spreadsheets/d/e/2PACX-1vQ1HFwErJcHkkOCT4Je-yuLSRe2L_GKWcCGVooc6rbOvTLxJhqglTZh31I_eB_dcw/pub?output=xlsx')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00028-ba1fbbbd-e5e4-45b9-bc04-2f23383ef2c5",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "c_bj3IfLxE1Q",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### `Series`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00029-9fbc4d9c-20bd-486c-9b5b-352ee4e526ac",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "H2LIeZbCgoL5"
   },
   "source": [
    "Each column of the DataFrame is now an augmented dictionary called `Series`, with the indices as the keys of the `Series`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00030-6e557bb0-a47f-41c8-8790-5ba070bdfb86",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "klrbIsoGxE1R"
   },
   "source": [
    "A `Pandas` `Series` object can be just initialized from a `Python` dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    2\n",
       "2    4\n",
       "3    6\n",
       "4    8\n",
       "Name: even, dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['even']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "cell_id": "00031-15c221e5-71b6-497b-b552-d78f9fa5c5b4",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "FoTT2jkqgrpR",
    "outputId": "51a00561-282e-4b0a-dffe-f3bc63a5ab9a"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type( df['even'] )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    2\n",
       "2    4\n",
       "3    6\n",
       "4    8\n",
       "Name: even, dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.even"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00032-5bcb5be5-166a-4dcc-a31e-92875d2f81a0",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "Iy8h3XICgz9w",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The keys are the index of the `DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "cell_id": "00033-000b6c1f-036e-43af-84c0-6dd4b76e7b58",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "BTIxbKvhg6m2",
    "outputId": "c0fb58ab-2969-4e91-c053-a1215f933dcc"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#df['even']\n",
    "df.even[4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00034-86d1564c-0818-420b-b920-2f004cfd5251",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "7MIKjfslg7Xx"
   },
   "source": [
    "Each row is also a series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "cell_id": "00035-efbe4343-8a4f-4efa-ba2d-7f88b7c8e82d",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 72
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "rhiEFGhOg-kx",
    "outputId": "5392aae6-cac1-4225-9326-db0b22359fd4"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "even    0\n",
       "odd     1\n",
       "Name: 0, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00036-c55a98ad-4590-4466-ab61-9f1a6ea49b83",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "ki2GW4yohKl5",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "with keys: `'even'` and `'odd'`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or as a filter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "4     8    9"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[[4]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "cell_id": "00037-34730af6-4952-4ee0-8375-410cb4392c20",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "JPLVdBrAhFCJ",
    "outputId": "0445bd37-44f5-45ba-8618-fd3d75033e6c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[0]['even']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00038-3f514cb9-b08b-416d-9711-6654cb2e6caa",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "8NuiZ2NIhGjx"
   },
   "source": [
    "or attributes `even` and `odd`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "cell_id": "00039-cb23d640-f06f-446e-b942-8be58193f511",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "KmY9Jrk_hZyx",
    "outputId": "bb8814ee-39e5-4d5b-8877-0466d66f7a93"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[0].odd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00040-ac07d690-d4ce-4d3d-af39-c2bb5353c03d",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "vOxxMKJshhKJ"
   },
   "source": [
    "One specific cell value can be reached with the index and the key:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.iloc[2,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "cell_id": "00041-5ca70a4d-1891-4d24-9d37-f7533cc8e50b",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "MexLUmZLhph5",
    "outputId": "6d01e4eb-9451-4e43-81e6-fa46147fb67f"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[2,'odd']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.at[2,'even']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00052-58cd1675-c71a-4c71-b957-e25c8a6ab447",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "pg20NksOjdQW"
   },
   "source": [
    "A `Pandas` `Series` object can be just initialized from a `Python` dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "cell_id": "00053-5881d053-799d-4191-9112-4d49bf041fa0",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 90
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "qjuk_nn0jgFH",
    "outputId": "ed0eade9-31dc-4cab-bfb4-79cf10c7f5b6"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name           Juan Valdez\n",
       "Nacionality       Colombia\n",
       "Age                     23\n",
       "dtype: object"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s=pd.Series({'Name':'Juan Valdez','Nacionality':'Colombia','Age':23})\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "cell_id": "00054-997dbe29-5415-4b33-a2c2-320979fb1dcb",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "pAtixllExE1Z",
    "outputId": "df062321-1e5a-4c58-cc01-b80503680a3c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Juan Valdez'"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s['Name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00055-c9c53a1e-7e5c-4360-9714-10f042c7ab91",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "pYb112eqxE1e"
   },
   "source": [
    "but also as containers of name spaces!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "cell_id": "00056-c35a9e02-65ee-4089-8af1-da308c7c33c7",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "6_GfsFVwxE1f",
    "outputId": "a2cfc2c7-bc11-4939-8c0a-536aeb453f39"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Juan Valdez'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.Name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00045-4cee4939-0326-4d1a-b481-409d3a73e6e8",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "z5JBXWlkh8Z3",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "> The __power__ of Pandas rely in that their main data structures: `DataFrames` and `Series`, are enriched with many useful methods and attributes.\n",
    "\n",
    "__[Official definition of Pandas](http://pandas.pydata.org/pandas-docs/stable/)__\n",
    "\n",
    "> Pandas is a Python package providing __fast__, __flexible__, and __expressive__ _data structures_ designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it _has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool_ available in any language. It is already well on its way toward this goal.\n",
    "\n",
    "* \"relational\": the list of data is identified with some unique index (like a `SQL` table)\n",
    "* \"labeled\": the list is identified with a key, like the previous `odd` or `even` keys.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00046-3daea1f4-914e-406e-a0e6-a12582c78615",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "sDC5HnPOicTg",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "For example. A double bracket `[[...]]`, can be used to filter data.\n",
    "\n",
    "A row in a two-dimensional `DataFrame` corresponds to `Series` with the same keys of the `DataFrame`, but with single values instead of a list "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "cell_id": "00047-924b43dc-7ce3-4ff3-8be4-a21ebbbed12e",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 78
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "UfyqbhjXigZc",
    "outputId": "6c7f9bf1-a3ee-483c-e7cf-94b9a7d42af1"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[[0]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00049-04942039-b889-4baa-bb10-d3ac7651fbf7",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "p7MgQWMijI0W",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "To filter a column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "00050-98830692-b564-4fec-b41d-c800e7ebed14",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 198
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "nT_SkKMIjMWY",
    "outputId": "f9f45db6-cf96-4ba2-e454-904d6385bd99"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   odd\n",
       "0    1\n",
       "1    3\n",
       "2    5\n",
       "3    7\n",
       "4    9"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['odd']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00057-25c85b9e-a98a-446e-a2ad-d21be1a344e4",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "Kab6pfyMxE1k",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## `DataFrame` initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00058-c269e732-fb8b-4ffb-8ae1-3df99a8c229d",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "4zhJVNq1pTTe",
    "tags": []
   },
   "source": [
    "### Initialization from an existing spreadsheet. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be locally in your computer o from some downloadable  link"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "cell_id": "00059-a5e5f2ee-7ef0-4a15-8cf0-1bd63e03a10b",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "yM4oF9r-pWsw",
    "outputId": "fb448709-fb2a-4ce1-f154-57e96439f98a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Compañia</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>23.0</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>65.0</td>\n",
       "      <td>Senado de la República</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                Compañia\n",
       "0         Juan Valdez  23.0        Café de Colombia\n",
       "1  Álvaro Uribe Vélez  65.0  Senado de la República"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.read_excel('http://bit.ly/spreadsheet_xlsx')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00060-a55db9c7-27b6-41fe-ba0e-e41a1e68892e",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "QM4z4MoXpjLj",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "To make a downloadable link for any spread sheet in Google Drive, follow the sequence:\n",
    "```\n",
    "File → Share → Publish to the web...→ Entire Document → Web page → Microsoft excel (xlsx)\n",
    "```\n",
    "<!-- as illustrated in the figure:\n",
    "![GS](https://github.com/restrepo/data-analysis/blob/master/img/img1.png?raw=1)-->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "cell_id": "00061-11ece2bc-94b9-4f70-b2cf-b55cd69a4f87",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "29uXQfWjpx66",
    "outputId": "7c58a4ca-69b9-4ee2-9187-96aee19ee940",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Compañia</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>32.0</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>65.0</td>\n",
       "      <td>Senado de la República</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                Compañia\n",
       "0         Juan Valdez  32.0        Café de Colombia\n",
       "1  Álvaro Uribe Vélez  65.0  Senado de la República"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[0,'Edad']=32\n",
    "#df.at[0,'Edad']=32\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00062-5be08fd6-94b8-4c55-b9f1-03c2ea1cef7b",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "eeZzLpDUpqbS"
   },
   "source": [
    "*After* some modification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00063-80bd419a-5bb8-42f6-9cf7-7a9508e7ef58",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "TRjW0Y4Rpv-i"
   },
   "source": [
    "it can be saved again as an `excel file` with the option to not create a column of indices: `index=False`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00064-5998e0ac-609c-4fba-b96e-53d70aef8ec5",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "QZUHeScjp6cC",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Initialization from lists and dictionaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00065-e142101b-9b27-4ff5-a1c9-c0cdeab5cdb5",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "5ulKgKBRxE1m"
   },
   "source": [
    "#### Inizialization from Series\n",
    "We start with an empty `DataFrame`:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00066-77afb247-50a8-4b43-aabb-eb64da7d31e8",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "J-F92xHmp8v7"
   },
   "source": [
    "Creating Pandas DataFrame from list and dictionaries [offers many alternatives](http://pbpython.com/pandas-list-dict.html)\n",
    "\n",
    "![creating dataframes](http://pbpython.com/images/pandas-dataframe-shadow.png)\n",
    "\n",
    "#### Column oriented way\n",
    "* In addition to the dictionary of lists [already illustrated at the beginning]() that in this case corresponds to:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "cell_id": "00067-617526f9-d3a8-459c-8227-c831c7d13ef6",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "7IOuJCeMqBRj",
    "outputId": "70888102-facb-48a2-8c6a-44a1e799ddc9",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Compañia</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>32</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>69</td>\n",
       "      <td>Senado de la República</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                Compañia\n",
       "0         Juan Valdez    32        Café de Colombia\n",
       "1  Álvaro Uribe Vélez    69  Senado de la República"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'Nombre'   : ['Juan Valdez','Álvaro Uribe Vélez'],\n",
    "              'Edad'     : [32,            69                 ],\n",
    "              'Compañia' : ['Café de Colombia','Senado de la República']})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00068-f8c10978-b48d-44bd-8207-4330ae7f2280",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "nJLhj4G4qJZb",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* We can obtain the DataFrame from list of items"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "00069-48e62ebe-c8f3-48e5-9487-20975fa1cff7",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "p755U93VqKLC"
   },
   "outputs": [],
   "source": [
    "pd.DataFrame.from_items([ [ 'Nombre'  , ['Juan Valdez','Álvaro Uribe Vélez']],\n",
    "                          [ 'Edad'    , [  32,            65               ]],\n",
    "                          [ 'Compañia', ['Café de Colombia','Senado de la República']] ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00070-3da42af6-028d-4bd9-a93b-3b8266c62501",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "J6tp94TlqO_S",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* We can obtain the `DataFrame` from dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "cell_id": "00071-6f90ed3f-241c-497b-ac6d-da067e409361",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 138
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "wx-JJgqIqQAD",
    "outputId": "66e0e42b-1261-43dd-ad1e-1eddb7eb3bfb"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Compañia</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>32</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>65</td>\n",
       "      <td>Senado de la República</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                Compañia\n",
       "0         Juan Valdez    32        Café de Colombia\n",
       "1  Álvaro Uribe Vélez    65  Senado de la República"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame( [{'Nombre':'Juan Valdez',        'Edad': 32   ,'Compañia':'Café de Colombia'      },\n",
    "              {'Nombre':'Álvaro Uribe Vélez', 'Edad': 65   ,'Compañia':'Senado de la República'}]\n",
    "            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Special DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Empty DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "cell_id": "00072-5dca2e57-28d1-4dee-8a8a-d451188f3084",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "jPsRspsXxE1n",
    "outputId": "cb0aeb13-fa1a-480d-a214-fdb27b8f0228"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.DataFrame()\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.empty"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Single row DataFrame from dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first key</th>\n",
       "      <th>second key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>first value</td>\n",
       "      <td>second value</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     first key    second key\n",
       "0  first value  second value"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d={\n",
    "    'first key' :'first value',\n",
    "    'second key':'second value'\n",
    "\n",
    "  }\n",
    "pd.DataFrame([d])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00073-0702057c-6782-455f-8481-441f2132c1b1",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "_ylhD2zjqdby",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Initialization from sequential rows as  Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We start with an empty `DataFrame`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "cell_id": "00074-7ee7b66f-0813-4412-bf3d-2a26f5804cf6",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "r6V0qilXqxBB",
    "outputId": "e1f60191-9d17-44e0-ce10-4619c767ee9d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df=pd.DataFrame()\n",
    "df.empty"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00075-51105264-c39a-4fbe-8543-f3b15fca495d",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "f0OJapjvxE1q"
   },
   "source": [
    "We can append a dictionary (or Series) as a row of the `DataFrame`, provided that we always use the option: `ignore_index=True`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "cell_id": "00076-422876c7-62e9-4f28-8ee0-a0b182cd2b02",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 78
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "AgR5uPbxrA6q",
    "outputId": "daa37a89-0b8d-40ce-c292-f67268cb5eaa"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Nacionality</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Name Nacionality  Age\n",
       "0  Juan Valdez    Colombia   23"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d={'Name':'Juan Valdez','Nacionality':'Colombia','Age':23}\n",
    "df=pd.concat([df,pd.DataFrame([d])])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00080-d1b64085-e8ea-45d1-ac27-443875bd044b",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "npH4wLrNxE17"
   },
   "source": [
    "To add a second file we build another `dict`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "cell_id": "00081-3c0893fe-aed8-4d42-8f05-5ce9bee9418f",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 163
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "nTwRmub3xE18",
    "outputId": "7b4bcbbc-d5d6-400b-d650-256791060360"
   },
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Name:\n",
      " Diego Restrepo\n",
      "Nacionality:\n",
      " Colombia\n",
      "Age:\n",
      " 51\n",
      "Company:\n",
      " UdeA\n"
     ]
    }
   ],
   "source": [
    "d={}\n",
    "for k in ['Name','Nacionality','Age','Company']:\n",
    "    var=input('{}:\\n'.format(k))\n",
    "    d[k]=var"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Name': 'Diego Restrepo',\n",
       " 'Nacionality': 'Colombia',\n",
       " 'Age': '51',\n",
       " 'Company': 'UdeA'}"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df=pd.concat([df,pd.DataFrame([d])])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To concatenate a list of dataframes side by side use the option `axis='columns'`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00082-21c2691e-ed50-43c3-a0a4-3cf199aa0775",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "t8Xzsb0qxE1-",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Exercises\n",
    "* Display the resulting `Series` in the screen:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "cell_id": "00083-c166b04e-79e9-40ea-ae48-49965f6a1ca1",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       Juan Valdez\n",
       "1    Diego Restrepo\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Name']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00084-b681d4fc-73d5-46ee-bd43-bc82de198660",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "sSPUdkvtxE2C"
   },
   "source": [
    "__Activity__: Append a new row to the previous `DataFrame` and visualize it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "cell_id": "00085-2a4e0b9c-c719-41d5-95bc-f277953e179c",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "vRRypFa_xE2D"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Nacionality</th>\n",
       "      <th>Name</th>\n",
       "      <th>Company</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>23</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>33</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Diego Restrepo</td>\n",
       "      <td>Google</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Age Nacionality            Name Company\n",
       "0  23    Colombia     Juan Valdez     NaN\n",
       "1  33    Portugal  Diego Restrepo  Google"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00086-e7522978-7107-452d-b697-fcdf7b8f948b",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "wLUED3NWxE2G"
   },
   "source": [
    "* Fill NaN with empty strings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "cell_id": "00087-97603f43-2420-46bf-8d24-182840f9b478",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "28rp4v3BxE2I"
   },
   "outputs": [],
   "source": [
    "df=df.fillna('')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Nacionality</th>\n",
       "      <th>Age</th>\n",
       "      <th>Company</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>23</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Diego Restrepo</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>51</td>\n",
       "      <td>UdeA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name Nacionality Age Company\n",
       "0     Juan Valdez    Colombia  23        \n",
       "1  Diego Restrepo    Colombia  51    UdeA"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00088-e2c6349b-dfe0-4da6-b6fd-37d2abcfd9a8",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "K3feFAfYxE2K"
   },
   "source": [
    "* Save `Pandas` `DataFrame` as an Excel file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "cell_id": "00089-b9551acb-cdbd-4f31-a95b-24a0070287ea",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "fdxuA_FAxE2L"
   },
   "outputs": [],
   "source": [
    "df.to_excel('prof.xlsx',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00090-b34d21ca-a22d-42e0-9f5f-0a770830fedd",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "evWZKwEPxE2O"
   },
   "source": [
    "* Load pandas DataFrame from the saved file in Excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "cell_id": "00091-23d2a6aa-c3bf-4b2c-9b78-7ebed3ae8ec6",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "xl7KUNVCxE2P"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Nacionality</th>\n",
       "      <th>Age</th>\n",
       "      <th>Company</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>23</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Diego Restrepo</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>51</td>\n",
       "      <td>UdeA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name Nacionality  Age Company\n",
       "0     Juan Valdez    Colombia   23     NaN\n",
       "1  Diego Restrepo    Colombia   51    UdeA"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_excel('prof.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00096-34dbb8c6-f80d-4fdd-8b56-bd0fd1fdcd9d",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "rzAKwxQhxE2a",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Other formats to saving and read files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are interested in format which keeps the tags of the columns, like `'Nombre', 'Edad', 'Compañia'`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "cell_id": "00097-01b06192-00aa-4602-97bb-b72642398c6b",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "CbgkbfTKvIkO",
    "outputId": "431510dc-5cde-4f89-841e-d8f607d6f167"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Compañia</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>23.0</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>65.0</td>\n",
       "      <td>Senado de la República</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                Compañia\n",
       "0         Juan Valdez  23.0        Café de Colombia\n",
       "1  Álvaro Uribe Vélez  65.0  Senado de la República"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.read_excel('http://bit.ly/spreadsheet_xlsx')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "cell_id": "00098-ac5ccf23-1ed0-4827-8156-2f01ab1e3810",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "numpy.float64"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(df.loc[0,'Edad'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00099-9ad55ba4-a969-4bc8-b406-22cd968a0805",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "w8hdIGOvu3Ee",
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "### CSV"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comma separated values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "cell_id": "00100-fe200388-2002-436a-b819-745bad6fcfe0",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "90PSoWzUxE2b"
   },
   "outputs": [],
   "source": [
    "df.to_csv('hoja.csv',index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nombre,Edad,Compañia\n",
      "Juan Valdez,23.0,Café de Colombia\n",
      "Álvaro Uribe Vélez,65.0,Senado de la República\n"
     ]
    }
   ],
   "source": [
    "cat hoja.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00101-cc4612dc-38b1-4330-b264-1d48c027fbb0",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "ViJ62k6BxfOm"
   },
   "source": [
    "We can check the explicit file format with"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "cell_id": "00102-5e45759b-2a49-45f9-bf6f-8989d7b9f3a9",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nombre,Edad,Compañia\n",
      "Juan Valdez,23.0,Café de Colombia\n",
      "Álvaro Uribe Vélez,65.0,Senado de la República\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(df.to_csv(None,index=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "cell_id": "00104-fbced017-69ab-46a2-9804-6fff4a406112",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Compañia</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>23.0</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>65.0</td>\n",
       "      <td>Senado de la República</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                Compañia\n",
       "0         Juan Valdez  23.0        Café de Colombia\n",
       "1  Álvaro Uribe Vélez  65.0  Senado de la República"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv('hoja.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00105-5cd6423e-2654-439c-a4e9-943248c7b894",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "s1rbHbLmuv2g",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### JSON"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00106-ee6abf2e-ef42-41b7-af5a-c98849c33e78",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "JwWj-SSVv6Aj"
   },
   "source": [
    "This format keeps the Python lists and dictionaries at the storage level"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "cell_id": "00107-d18f5f63-2e23-4154-aaeb-d832bceaa1b4",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "OaMV4GdOwmBo",
    "outputId": "51294873-b242-4574-c6e5-63ab39699e74"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Name  Age\n",
       "0  Donald Trump   74\n",
       "1   Barak Obama   59"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.DataFrame([{\"Name\":\"Donald Trump\",\"Age\":74},\n",
    "                 {\"Name\":\"Barak Obama\", \"Age\":59}])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00108-51aade1c-59de-4ab8-ab1b-cae65529ef6b",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "Ajc79wC9z2b0"
   },
   "source": [
    "This format allow us to keep exactly the very same list of dictionaries structure!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "cell_id": "00109-a06a153d-a096-4f43-b4fb-e493ef63cdc6",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "00a9jBWAzg29",
    "outputId": "9e6c226d-6528-442a-99a5-45d1363a0ffe"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{\"Name\":\"Donald Trump\",\"Age\":74},{\"Name\":\"Barak Obama\",\"Age\":59}]\n"
     ]
    }
   ],
   "source": [
    "print(df.to_json(None,orient='records'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00110-40dfa301-7ac0-4c9a-b94e-3a67bb4a5abb",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "xwPu1tn30hQ8"
   },
   "source": [
    "__Activity__: \n",
    "* Save to a file instead of `None` and open the file with some editor. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "cell_id": "00111-4cb060b6-5095-4bdc-9e15-2025f5da8f4d",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "KtKmXfQg1Z4z"
   },
   "outputs": [],
   "source": [
    "df.to_json('presidents.json',orient='records')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00112-156a13f9-d5a6-46ce-b2a5-49b78dc5ae44",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "5S7OR7X11acD"
   },
   "source": [
    "* Add a break-line at the end of the first dictionary and try to\n",
    "load the resulting file with `pd.read_json`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "cell_id": "00113-925db647-ab9d-46c1-ad47-35d873eeff39",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "xOaIuF_y1b5C"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Name  Age\n",
       "0  Donald Trump   74\n",
       "1   Barak Obama   59"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_json('presidents.json')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'Name': 'Donald Trump', 'Age': 74}, {'Name': 'Barak Obama', 'Age': 59}]"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# %load presidents.json\n",
    "[{\"Name\":\"Donald Trump\",\"Age\":74},{\"Name\":\"Barak Obama\",\"Age\":59}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00114-cabdfd59-d1aa-44c7-9859-f7590ff0bb9e",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "5vSq9daI0F7R"
   },
   "source": [
    "JSON allows for some flexibility in the break-lines structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "cell_id": "00115-935cc968-6583-4fad-9839-eabdae7a7016",
    "deepnote_cell_type": "code"
   },
   "outputs": [],
   "source": [
    "hm='''\n",
    "hola\n",
    "mundo\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "cell_id": "00116-f120fb15-040a-44e3-b1b1-0919e3c1d7d2",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nhola\\nmundo\\n'"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'Name': 'Donald Trump', 'Age': 74}, {'Name': 'Barak Obama', 'Age': 59}]"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[\n",
    "    {\"Name\":\n",
    "     \"Donald Trump\",\"Age\":74},{\"Name\": #en cualquier parte\n",
    "                               \"Barak Obama\",\"Age\":59}\n",
    "\n",
    "#Un comentario\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "cell_id": "00117-b3d11b0b-ffb3-4fa2-95f1-f1f3a009d9fc",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "qT8Lnlrgy0M9",
    "outputId": "fc8c4970-d164-4e81-b705-67a51f29fe5d"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Name  Age\n",
       "0  Donald Trump   73\n",
       "1   Barak Obama   58"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_json('''\n",
    "             [{\"Name\":\"Donald Trump\",\"Age\":73},\n",
    "              {\"Name\":\"Barak Obama\", \"Age\":58}]\n",
    "            ''')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00118-33431814-7e28-4599-988c-c692699cd502",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "w6PyR85f1mfD"
   },
   "source": [
    "For large databases it is convinient just to accumulate dictionaries in a sequential form:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "cell_id": "00119-84d25081-6867-41dd-8ba8-55ed6f8fc690",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 54
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "y2f8y66fyC5S",
    "outputId": "7eea8846-375a-4622-82a1-423f781a2ec9"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"Name\":\"Donald Trump\",\"Age\":74}\n",
      "{\"Name\":\"Barak Obama\",\"Age\":59}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(df.to_json(None,orient='records',lines=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "cell_id": "00120-3df09295-8c42-4b2f-8190-bebc99a29fcd",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "Bvhzs_GRzNWM",
    "outputId": "9b33e582-9589-4005-8d24-6a20bc463c31"
   },
   "outputs": [],
   "source": [
    "df=pd.read_json('''\n",
    "             {\"Name\":\"Donald Trump, Junior\",\"Age\":73}\n",
    "             {\"Name\":\"Barak Obama, Senior\", \"Age\":58}\n",
    "            ''',orient='records',lines=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump, Junior</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama, Senior</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Name  Age\n",
       "0  Donald Trump, Junior   73\n",
       "1   Barak Obama, Senior   58"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"name\"]=df['Name'].str.split(', ').str[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump, Junior</td>\n",
       "      <td>73</td>\n",
       "      <td>Donald Trump</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama, Senior</td>\n",
       "      <td>58</td>\n",
       "      <td>Barak Obama</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Name  Age          name\n",
       "0  Donald Trump, Junior   73  Donald Trump\n",
       "1   Barak Obama, Senior   58   Barak Obama"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [],
   "source": [
    "df=pd.read_json('''\n",
    "             {\"Name\":\"Donald Trump Junior\",\"Age\":73}\n",
    "             {\"Name\":\"Barak Obama Senior\", \"Age\":58}\n",
    "            ''',orient='records',lines=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump Junior</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama Senior</td>\n",
       "      <td>58</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Name  Age\n",
       "0  Donald Trump Junior   73\n",
       "1   Barak Obama Senior   58"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [],
   "source": [
    "l=['A','B']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'A B'"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "' '.join(l)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Age</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Donald Trump Junior</td>\n",
       "      <td>73</td>\n",
       "      <td>Donald Trump</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Barak Obama Senior</td>\n",
       "      <td>58</td>\n",
       "      <td>Barak Obama</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  Name  Age          name\n",
       "0  Donald Trump Junior   73  Donald Trump\n",
       "1   Barak Obama Senior   58   Barak Obama"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['name']=df.Name.str.split().str[:-1].apply(lambda s: ' '.join(s))\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00121-1e35d05f-c794-4f1a-8008-f939e469ba6e",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "sYfWnw1e2wfi"
   },
   "source": [
    "__Activity__: \n",
    "* Save to a file instead of `None`, with options: `orient='records',lines=True`, and open the file with some editor. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "cell_id": "00122-4d55c964-e9b3-49e0-8f04-d008caafc148",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "S9wv93dM3KER"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00123-5793b240-c88d-4ed8-8a7a-7039b03bb45f",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "bGYW8Xl-3Kdi"
   },
   "source": [
    "* Add a similar dictionary in the next new line, and try to\n",
    "load the resulting file with `pd.read_json` with options: `orient='records',lines=True`. \n",
    "   * WARNING: Use doble-quotes `\"` to write the keys od the new\n",
    "dictionary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00124-8fdef546-5acb-4662-a477-79e7e56b5ab0",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "icGCveT34HDA"
   },
   "source": [
    "Any Python string need to be converted first to double-quotes before to be used as JSON string.\n",
    "\n",
    "__Example__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "cell_id": "00125-8eb6882d-f964-487a-854a-5f26c07b1035",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "mPRCSHbG4dRx"
   },
   "outputs": [],
   "source": [
    "numbers={\"even\": [0,2,4,-6,8],   #  First  key-list\n",
    "         \"odd\" : [1,3,-5,7,9] }  #  Second key-list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "cell_id": "00126-8b43f478-80c5-4dde-90c1-56c760b3bbbf",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'even': [0, 2, 4, -6, 8], 'odd': [1, 3, -5, 7, 9]}"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "cell_id": "00127-6eed42d7-e148-4fe2-a336-1020b086422d",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "An-yzTut4gtw",
    "outputId": "a5ab32df-2d30-4194-a3a2-91bbed216c45"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"{'even': [0, 2, 4, -6, 8], 'odd': [1, 3, -5, 7, 9]}\""
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "str(numbers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00128-88abc63b-351f-4fd8-98c0-10cb52230870",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "BFCE3Hq34ktw"
   },
   "source": [
    "This string can be writing in the `JSON` format by replacing the single quotes, ' , by  duoble quotes, \":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'hola mundo radiante'"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"hola mundo cruel\".replace('cruel','radiante')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "cell_id": "00129-077cac5a-d328-436a-b590-181f5bc5a8d8",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "PJVD2Btc4nJx",
    "outputId": "4a2768cf-c531-4ce3-afaf-1772a426cbd3"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'{\"even\": [0, 2, 4, -6, 8], \"odd\": [1, 3, -5, 7, 9]}'"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "str(numbers).replace(\"'\",'\"')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00130-a2a20551-1d74-470a-9923-974394855938",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "zDssJstU45Qi"
   },
   "source": [
    "and now can be used as an JSON input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "cell_id": "00131-90d378ec-4eb8-4654-a447-0af782b2d942",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "HxbaP5T542_h"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1\n",
       "1     2    3\n",
       "2     4   -5\n",
       "3    -6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df=pd.read_json(  str(numbers).replace(\"'\",'\"') )\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00132-b0810565-a22d-4f66-b020-f06993aaa79c",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "EtqUn4t85Hlp"
   },
   "source": [
    "__Activity__: Try to read the string as JSON without make the double-quote replacement"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00092-bbceab54-8e2f-492b-aee9-374786bc9b58",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "pAZISlqKxE2T",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Common operations upon `DataFrames`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See [https://github.com/restrepo/PythonTipsAndTricks](https://github.com/restrepo/PythonTipsAndTricks)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00093-9da49b6e-ecfd-44e1-a27d-e131f56af8b1",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "IBgQ8cojxE2U",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "* __To fill a specific cell__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "cell_id": "00094-77f3f284-c87f-4cb4-b305-bfa683fe9c52",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "a00HZIQhxE2V"
   },
   "outputs": [],
   "source": [
    "df.at[0,'Company']='Federación de Caferos'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "cell_id": "00095-b9c2aeff-851c-4719-bb97-908e96da3f07",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "g6f2psm3xE2X",
    "outputId": "888e8cee-6bfd-4313-86ed-2d7eeead7579"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Nacionality</th>\n",
       "      <th>Age</th>\n",
       "      <th>Company</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>23</td>\n",
       "      <td>Federación de Caferos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Diego Restrepo</td>\n",
       "      <td>Colombia</td>\n",
       "      <td>51</td>\n",
       "      <td>UdeA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Name Nacionality Age                Company\n",
       "0     Juan Valdez    Colombia  23  Federación de Caferos\n",
       "1  Diego Restrepo    Colombia  51                   UdeA"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00133-38528865-2883-419d-a318-6739da40857a",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "s2uApSIi8PjI",
    "tags": []
   },
   "source": [
    "## Filters (masking)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The main application of labeled data for data analysis is the possibility to make filers, or cuts, to obtain specific reduced datasets to further analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [],
   "source": [
    "numbers={\"even\": [0,2,4,-6,8],   #  First  key-list\n",
    "         \"odd\" : [1,3,-5,7,9] }  #  Second key-list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {},
   "outputs": [],
   "source": [
    "df=pd.DataFrame(numbers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "cell_id": "00134-ef5b47b0-51cd-47bf-9b8a-19ab931a22e5",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1\n",
       "1     2    3\n",
       "2     4   -5\n",
       "3    -6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A _mask_ is a list of `True/False` values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3     True\n",
       "4     True\n",
       "Name: even, dtype: bool"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.even.abs()>4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "cell_id": "00135-26b7e2ad-8ed0-4c5c-a903-7dcc2a3d065d",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "zt2S0aln8_4s",
    "outputId": "5472b7b3-deb6-453c-eed7-ea7518d0c0d8"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "3    -6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.even.abs()>4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and → `&`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "metadata": {
    "cell_id": "00136-02a253ad-bdf0-4501-8cdd-552e09512321",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 78
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "eVSePCQB9GFe",
    "outputId": "596533f7-6b9e-4b47-b110-5e0aff429561"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "2     4   -5"
      ]
     },
     "execution_count": 115,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df.even>0) & (df.odd<0)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "negation → `~`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "cell_id": "00137-922f9b25-8146-4da0-899c-3d49f450c068",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1\n",
       "1     2    3\n",
       "3    -6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[~((df.even>0) & (df.odd<0)) ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or → `|`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "cell_id": "00138-ca349084-fa0c-4be5-839c-1eb00d5120e8",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 108
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "dVktA6ze9wI2",
    "outputId": "62806bab-da66-4945-a717-bc6078ca9731"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "2     4   -5\n",
       "3    -6    7"
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df.even<0) | (df.odd<0)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00139-ca62a80b-b815-4fd5-bbe8-2fccf72f6a3a",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "xHsCPml65i54",
    "tags": []
   },
   "source": [
    "## The `apply` method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The advantage of the spreadsheet paradigm is that the columns can be transformed with functions. All the typical functions avalaible for a spreadsheet are already implemented like the method `.abs()` used before, or the method: `.sum()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {
    "cell_id": "00140-5bd68707-f972-4a21-ad89-ffc17371d504",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 35
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "pPLWFFLA6NX4",
    "outputId": "17d948bf-c63f-4a71-baac-09df5c54af00"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.even.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00141-57b10a10-4de8-42f2-a94a-3013e9ef5af6",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "XCUwgF0f6WFP"
   },
   "source": [
    "__Activity__: Explore the avalaible methods by using the completion system of the notebook after the last semicolon of `df.even.`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "cell_id": "00142-50a685d2-4b25-40b1-9c45-7c9fbcc3875e",
    "deepnote_cell_type": "code"
   },
   "outputs": [],
   "source": [
    "kk=df['even']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kk."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00143-62cf460c-f390-49a9-afb3-79a89a598209",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "JDqDXVoF6mOv"
   },
   "source": [
    "df['even'].ipynb_checkpoints/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00144-aced47b4-60eb-4b3c-931e-d008d4e89f6d",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "378RNNb97OQs"
   },
   "source": [
    "### Column-level `apply`\n",
    "We just select the column and apply the direct or implicit function:\n",
    "* Pre-defined function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {
    "cell_id": "00145-a74342d5-abd2-4958-8864-fc0db16f5c37",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 126
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "L210XcRK7qYn",
    "outputId": "db5a2c2b-94b2-4fe4-e022-7fa9f169b7a7"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    2\n",
       "2    4\n",
       "3    6\n",
       "4    8\n",
       "Name: even, dtype: int64"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.even.apply(abs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00146-27fa426d-132e-4721-a8bd-ce3d2cd52bcb",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "JX0nA6swDphj"
   },
   "source": [
    "* Implicit function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {
    "cell_id": "00147-b381a3f1-17b5-4e0c-960b-4514c07a753d",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 126
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "oGEcVFrA72LX",
    "outputId": "245ccc64-a7b0-4601-ff9a-a87c0beea422"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    True\n",
       "1    True\n",
       "2    True\n",
       "3    True\n",
       "4    True\n",
       "Name: even, dtype: bool"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.even.apply(lambda n:isinstance(n,int))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {
    "cell_id": "00148-2e4c0fd3-48e3-4c86-bc26-b51aab4a4efe",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     0\n",
       "1     4\n",
       "2    16\n",
       "3    36\n",
       "4    64\n",
       "Name: even, dtype: int64"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.even.apply(lambda n: n**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00149-9f1c7206-da55-4ad4-b589-dc7022a7daba",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "8wFVd7Sk7irH"
   },
   "source": [
    "### Row-level apply\n",
    "The foll row is passed as dictionary to the explicit or implicit function when `apply` is used for the full `DataFrame` and the option `axis=1` is used at the end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>even</th>\n",
       "      <th>odd</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4</td>\n",
       "      <td>-5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   even  odd\n",
       "0     0    1\n",
       "1     2    3\n",
       "2     4   -5\n",
       "3    -6    7\n",
       "4     8    9"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {
    "cell_id": "00150-082c7a02-aa1e-49cb-a4f3-1349b1550535",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     1\n",
       "1    11\n",
       "2    29\n",
       "3    43\n",
       "4    89\n",
       "dtype: int64"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['even']+df['odd']**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {
    "cell_id": "00151-abb73dae-6242-4c9c-84e4-303bdf29589d",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 126
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "z6GLMMuEEUwB",
    "outputId": "47763d9f-1fbf-414c-e55d-74fc7fe0bf5d"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     1\n",
       "1    11\n",
       "2    29\n",
       "3    43\n",
       "4    89\n",
       "dtype: int64"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(lambda row: row['even']+row['odd']**2,axis='columns')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     1\n",
       "1    11\n",
       "2    29\n",
       "3    43\n",
       "4    89\n",
       "dtype: int64"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(lambda row: row.get('even')+row.get('odd')**2,axis='columns')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Chain tools for data analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00152-09addc01-50ec-4b33-bb8b-0b7e3a0b8af5",
    "deepnote_cell_type": "markdown",
    "jp-MarkdownHeadingCollapsed": true,
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "There are several chain tools for data analyis like the\n",
    "* Spreadsheet based one, like Excel \n",
    "* Relational databases with the use of more advanced SQL tabular data with some data base software like MySQL\n",
    "* Non-relational databases (RAM) with Pandas, R,  Paw,... ( max ~ RAM/8) \n",
    "* Non-relational databases (Disk): Dask, ROOT, MongoDB,...\n",
    "\n",
    "Here we illustrate an example of use fo a non-relational database with Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00153-31b6cb4c-2040-41f4-9b7d-e226df82dc37",
    "deepnote_cell_type": "markdown",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Relational databases"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "metadata": {
    "cell_id": "00154-21e06109-13d0-40f9-a9a6-1dec1358ab30",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Fecha de Nacimiento</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>1966-07-04</td>\n",
       "      <td>888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>1952-07-04</td>\n",
       "      <td>666</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre Fecha de Nacimiento   id\n",
       "0         Juan Valdez          1966-07-04  888\n",
       "1  Álvaro Uribe Vélez          1952-07-04  666"
      ]
     },
     "execution_count": 130,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personas=pd.read_csv('https://raw.githubusercontent.com/restrepo/ComputationalMethods/master/data/personas.csv')\n",
    "personas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "cell_id": "00156-95466953-49eb-4a2b-bcce-483b72149659",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>Inicio</th>\n",
       "      <th>Fin</th>\n",
       "      <th>Cargo</th>\n",
       "      <th>Compañía</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>888</td>\n",
       "      <td>2010</td>\n",
       "      <td></td>\n",
       "      <td>Arriero</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>666</td>\n",
       "      <td>2013</td>\n",
       "      <td>2020</td>\n",
       "      <td>Senador</td>\n",
       "      <td>Senado de la República de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>666</td>\n",
       "      <td>2020</td>\n",
       "      <td></td>\n",
       "      <td>Influencer</td>\n",
       "      <td>Twitter</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    id  Inicio   Fin       Cargo                            Compañía\n",
       "0  888    2010           Arriero                    Café de Colombia\n",
       "1  666    2013  2020     Senador  Senado de la República de Colombia\n",
       "2  666    2020        Influencer                             Twitter"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trabajos=pd.read_csv('https://raw.githubusercontent.com/restrepo/ComputationalMethods/master/data/trabajos.csv',\n",
    "                     na_filter=False)\n",
    "trabajos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00157-6c56e4ac-c3b6-4cf0-9bd6-9812ee89832e",
    "deepnote_cell_type": "markdown",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![img](https://raw.githubusercontent.com/restrepo/ComputationalMethods/master/material/figures/relation.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00158-f3332d6e-f5e9-4383-9054-5cc67a0d42bc",
    "deepnote_cell_type": "markdown",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Example\n",
    "Obtain the current work of Álvaro Uribe Vélez "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>Inicio</th>\n",
       "      <th>Fin</th>\n",
       "      <th>Cargo</th>\n",
       "      <th>Compañía</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>888</td>\n",
       "      <td>2010</td>\n",
       "      <td></td>\n",
       "      <td>Arriero</td>\n",
       "      <td>Café de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>666</td>\n",
       "      <td>2013</td>\n",
       "      <td>2020</td>\n",
       "      <td>Senador</td>\n",
       "      <td>Senado de la República de Colombia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>666</td>\n",
       "      <td>2020</td>\n",
       "      <td></td>\n",
       "      <td>Influencer</td>\n",
       "      <td>Twitter</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    id  Inicio   Fin       Cargo                            Compañía\n",
       "0  888    2010           Arriero                    Café de Colombia\n",
       "1  666    2013  2020     Senador  Senado de la República de Colombia\n",
       "2  666    2020        Influencer                             Twitter"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trabajos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is convenient to normalize the columns with strings before to tray to search inside them with a DataFrame method like\n",
    "`.`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [],
   "source": [
    "import unidecode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'alvaro de uribe'"
      ]
     },
     "execution_count": 153,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unidecode.unidecode('Álvaro de Uribe').lower()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [],
   "source": [
    "cc=personas[personas['Nombre'].str.lower().apply(\n",
    "    unidecode.unidecode).str.contains('alvaro uribe velez')].iloc[0].get('id')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Senador', 'Influencer']"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trabajos[trabajos.get('id')==cc]['Cargo'].to_list()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00160-db50187a-85fc-45f5-9b78-221f038e0794",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "z2NsLagWs-Jg",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Non-relational databases"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nested lists of dictionaries with a defined data scheme\n",
    "\n",
    "![img](https://raw.githubusercontent.com/restrepo/ComputationalMethods/master/material/figures/personajes.svg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [],
   "source": [
    "personas['Fecha de Nacimiento']=pd.to_datetime( personas['Fecha de Nacimiento'] )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Fecha de Nacimiento</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>1966-07-04</td>\n",
       "      <td>888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>1952-07-04</td>\n",
       "      <td>666</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre Fecha de Nacimiento   id\n",
       "0         Juan Valdez          1966-07-04  888\n",
       "1  Álvaro Uribe Vélez          1952-07-04  666"
      ]
     },
     "execution_count": 136,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extract-Transform-Load: ETL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dateutil.relativedelta import relativedelta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/usuario/anaconda3/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:2199: FutureWarning: The parsing of 'now' in pd.to_datetime without `utc=True` is deprecated. In a future version, this will match Timestamp('now') and Timestamp.now()\n",
      "  result, tz_parsed = tslib.array_to_datetime(\n"
     ]
    }
   ],
   "source": [
    "personas['Edad']=personas['Fecha de Nacimiento'].apply(lambda t: \n",
    "                        relativedelta( pd.to_datetime('now'), t).years )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'id': 666,\n",
       "  'Inicio': 2013,\n",
       "  'Fin': '2020',\n",
       "  'Cargo': 'Senador',\n",
       "  'Compañía': 'Senado de la República de Colombia'},\n",
       " {'id': 666,\n",
       "  'Inicio': 2020,\n",
       "  'Fin': '',\n",
       "  'Cargo': 'Influencer',\n",
       "  'Compañía': 'Twitter'}]"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trabajos[trabajos['id']==666].to_dict(orient='records')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Fecha de Nacimiento</th>\n",
       "      <th>id</th>\n",
       "      <th>Edad</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>1966-07-04</td>\n",
       "      <td>888</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>1952-07-04</td>\n",
       "      <td>666</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre Fecha de Nacimiento   id  Edad\n",
       "0         Juan Valdez          1966-07-04  888    55\n",
       "1  Álvaro Uribe Vélez          1952-07-04  666    69"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    888\n",
       "1    666\n",
       "Name: id, dtype: int64"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personas['id']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [],
   "source": [
    "personas['Trabajos']=personas['id'].apply(lambda i:  trabajos[trabajos['id']==i\n",
    "                                                             ][['Inicio','Fin','Cargo','Compañía']\n",
    "                                                              ].to_dict(orient='records') )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Fecha de Nacimiento</th>\n",
       "      <th>id</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Trabajos</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>1966-07-04</td>\n",
       "      <td>888</td>\n",
       "      <td>55</td>\n",
       "      <td>[{'Inicio': 2010, 'Fin': '', 'Cargo': 'Arriero...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>1952-07-04</td>\n",
       "      <td>666</td>\n",
       "      <td>69</td>\n",
       "      <td>[{'Inicio': 2013, 'Fin': '2020', 'Cargo': 'Sen...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre Fecha de Nacimiento   id  Edad  \\\n",
       "0         Juan Valdez          1966-07-04  888    55   \n",
       "1  Álvaro Uribe Vélez          1952-07-04  666    69   \n",
       "\n",
       "                                            Trabajos  \n",
       "0  [{'Inicio': 2010, 'Fin': '', 'Cargo': 'Arriero...  \n",
       "1  [{'Inicio': 2013, 'Fin': '2020', 'Cargo': 'Sen...  "
      ]
     },
     "execution_count": 143,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "metadata": {},
   "outputs": [],
   "source": [
    "personajes=personas[['Nombre','Edad','Trabajos']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Nombre</th>\n",
       "      <th>Edad</th>\n",
       "      <th>Trabajos</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Juan Valdez</td>\n",
       "      <td>55</td>\n",
       "      <td>[{'Inicio': 2010, 'Fin': '', 'Cargo': 'Arriero...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Álvaro Uribe Vélez</td>\n",
       "      <td>69</td>\n",
       "      <td>[{'Inicio': 2013, 'Fin': '2020', 'Cargo': 'Sen...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Nombre  Edad                                           Trabajos\n",
       "0         Juan Valdez    55  [{'Inicio': 2010, 'Fin': '', 'Cargo': 'Arriero...\n",
       "1  Álvaro Uribe Vélez    69  [{'Inicio': 2013, 'Fin': '2020', 'Cargo': 'Sen..."
      ]
     },
     "execution_count": 147,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personajes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'Nombre': 'Juan Valdez',\n",
       "  'Edad': 55,\n",
       "  'Trabajos': [{'Inicio': 2010,\n",
       "    'Fin': '',\n",
       "    'Cargo': 'Arriero',\n",
       "    'Compañía': 'Café de Colombia'}]},\n",
       " {'Nombre': 'Álvaro Uribe Vélez',\n",
       "  'Edad': 69,\n",
       "  'Trabajos': [{'Inicio': 2013,\n",
       "    'Fin': '2020',\n",
       "    'Cargo': 'Senador',\n",
       "    'Compañía': 'Senado de la República de Colombia'},\n",
       "   {'Inicio': 2020, 'Fin': '', 'Cargo': 'Influencer', 'Compañía': 'Twitter'}]}]"
      ]
     },
     "execution_count": 148,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personajes.to_dict(orient='records')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import JSON"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "metadata": {
    "cell_id": "00161-c9ac407c-7e37-4992-88ef-1de719aef0e3",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "application/json": [
       {
        "Edad": 55,
        "Nombre": "Juan Valdez",
        "Trabajos": [
         {
          "Cargo": "Arriero",
          "Compañía": "Café de Colombia",
          "Fin": "",
          "Inicio": 2010
         }
        ]
       },
       {
        "Edad": 69,
        "Nombre": "Álvaro Uribe Vélez",
        "Trabajos": [
         {
          "Cargo": "Senador",
          "Compañía": "Senado de la República de Colombia",
          "Fin": "2020",
          "Inicio": 2013
         },
         {
          "Cargo": "Influencer",
          "Compañía": "Twitter",
          "Fin": "",
          "Inicio": 2020
         }
        ]
       }
      ],
      "text/plain": [
       "<IPython.core.display.JSON object>"
      ]
     },
     "execution_count": 150,
     "metadata": {
      "application/json": {
       "expanded": false,
       "root": "root"
      }
     },
     "output_type": "execute_result"
    }
   ],
   "source": [
    "JSON( personajes.to_dict(orient='records') )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00165-b49cdbca-5db4-4a8c-9619-c6728a61eda4",
    "deepnote_cell_type": "markdown",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Actividad\n",
    "Obtenga el último trabajo de Álvaro Uribe Vélez"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Influencer'"
      ]
     },
     "execution_count": 151,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "personajes[personajes['Nombre']=='Álvaro Uribe Vélez'\n",
    "          ].get('Trabajos'\n",
    "          ).apply(lambda l: [d.get('Cargo') for d in l if not d.get('Fin')]\n",
    "          ).str[0].to_list()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00167-5961be8e-ed89-40b6-b41d-51c96c61be97",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "2NdcwWa5tuzv",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We have shown that the simple two dimensional spreadsheets where each cell values is a simple type like string, integer, or float, can be represented as a dictionary of lists values or a list of dictionary column-value assignment. \n",
    "\n",
    "We can go further and allow to store in the value itself a more general data structure, like nested lists and dictionaries. This allows advanced data-analysis when the `apply` methos is used to operate inside the nested lists or dictionaries.\n",
    "\n",
    "See for example:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00168-eb357193-9719-4e73-b2f7-4f39b6bb9f06",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "wB4hBVkhoPwR",
    "tags": []
   },
   "source": [
    "##  World wide web\n",
    "There are really three kinds of web\n",
    "* The normal web, \n",
    "* The deep web,\n",
    "* _The machine web_. The web for machine readable responses. It is served in `JSON` or `XML`  formats, which preserve programming objects."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00169-83452998-c3f5-4934-af52-088d397ec467",
    "deepnote_cell_type": "markdown",
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "### Normal web"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "00170-e0b4c06a-4c1b-4744-95ad-6d44760e3be3",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>COVID-19 pandemic</th>\n",
       "      <th>COVID-19 pandemic.1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Disease</td>\n",
       "      <td>COVID-19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Virus strain</td>\n",
       "      <td>SARS-CoV-2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Source</td>\n",
       "      <td>Probably bats, possibly via pangolins[1][2]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Location</td>\n",
       "      <td>Worldwide</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>First outbreak</td>\n",
       "      <td>Mainland China[3]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Index case</td>\n",
       "      <td>Wuhan, Hubei, China.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Date</td>\n",
       "      <td>1 December 2019[3] – present(1 year, 3 months, 3 weeks and 3 days)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Confirmed cases</td>\n",
       "      <td>124,971,776[4]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Active cases</td>\n",
       "      <td>51,331,455[4]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Recovered</td>\n",
       "      <td>70,893,740[4]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Deaths</td>\n",
       "      <td>2,746,581[4]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Territories</td>\n",
       "      <td>192[4]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   COVID-19 pandemic  \\\n",
       "1            Disease   \n",
       "2       Virus strain   \n",
       "3             Source   \n",
       "4           Location   \n",
       "5     First outbreak   \n",
       "6         Index case   \n",
       "7               Date   \n",
       "8    Confirmed cases   \n",
       "9       Active cases   \n",
       "10         Recovered   \n",
       "11            Deaths   \n",
       "12       Territories   \n",
       "\n",
       "                                                                                                                                                                                        COVID-19 pandemic.1  \n",
       "1                                                                                                                                                                                                  COVID-19  \n",
       "2                                                                                                                                                                                                SARS-CoV-2  \n",
       "3                                                                                                                                                               Probably bats, possibly via pangolins[1][2]  \n",
       "4                                                                                                                                                                                                 Worldwide  \n",
       "5                                                                                                                                                                                         Mainland China[3]  \n",
       "6   Wuhan, Hubei, China.mw-parser-output .geo-default,.mw-parser-output .geo-dms,.mw-parser-output .geo-dec{display:inline}.mw-parser-output .geo-nondefault,.mw-parser-output .geo-multi-punct{display:...  \n",
       "7                                                                                                                                        1 December 2019[3] – present(1 year, 3 months, 3 weeks and 3 days)  \n",
       "8                                                                                                                                                                                            124,971,776[4]  \n",
       "9                                                                                                                                                                                             51,331,455[4]  \n",
       "10                                                                                                                                                                                            70,893,740[4]  \n",
       "11                                                                                                                                                                                             2,746,581[4]  \n",
       "12                                                                                                                                                                                                   192[4]  "
      ]
     },
     "execution_count": 201,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_html('https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory')[0][1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00171-4930bd6f-f022-4c0d-825a-c978307d6a96",
    "deepnote_cell_type": "markdown"
   },
   "source": [
    "Real world example: microsoft academics\n",
    "![img](https://docs.microsoft.com/en-us/academic-services/graph/media/erd/entity-relationship-diagram.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00172-d77428b0-8158-40df-a696-28018c73f4b3",
    "deepnote_cell_type": "markdown",
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "### Machine web"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, consider the following normal web page:\n",
    "\n",
    "https://inspirehep.net/literature?q=doi:10.1103/PhysRevLett.122.132001\n",
    "\n",
    "about a Scientific paper with people from the University of Antioquia. A _machine web_ version can be easily obtained in JSON just by attaching the extra parameter `&of=recjson`, and direcly loaded from Pandas, which works like a _browser for the third web_:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "00173-73ec9ef2-d196-4a39-bd0d-64e4de85db70",
    "deepnote_cell_type": "code"
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "cell_id": "00174-c9b1fdd4-0927-465a-9e06-a54712ff98b5",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "-cg2w7wsoo7H"
   },
   "outputs": [],
   "source": [
    "df=pd.read_json('https://inspirehep.net/api/literature?q=doi:10.1103/PhysRevLett.122.132001')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>hits</th>\n",
       "      <th>links</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>hits</th>\n",
       "      <td>[{'metadata': {'report_numbers': [{'value': 'C...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total</th>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>self</th>\n",
       "      <td>NaN</td>\n",
       "      <td>https://inspirehep.net/api/literature/?q=doi%3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bibtex</th>\n",
       "      <td>NaN</td>\n",
       "      <td>https://inspirehep.net/api/literature/?q=doi%3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>latex-eu</th>\n",
       "      <td>NaN</td>\n",
       "      <td>https://inspirehep.net/api/literature/?q=doi%3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>latex-us</th>\n",
       "      <td>NaN</td>\n",
       "      <td>https://inspirehep.net/api/literature/?q=doi%3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>json</th>\n",
       "      <td>NaN</td>\n",
       "      <td>https://inspirehep.net/api/literature/?q=doi%3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                       hits  \\\n",
       "hits      [{'metadata': {'report_numbers': [{'value': 'C...   \n",
       "total                                                     1   \n",
       "self                                                    NaN   \n",
       "bibtex                                                  NaN   \n",
       "latex-eu                                                NaN   \n",
       "latex-us                                                NaN   \n",
       "json                                                    NaN   \n",
       "\n",
       "                                                      links  \n",
       "hits                                                    NaN  \n",
       "total                                                   NaN  \n",
       "self      https://inspirehep.net/api/literature/?q=doi%3...  \n",
       "bibtex    https://inspirehep.net/api/literature/?q=doi%3...  \n",
       "latex-eu  https://inspirehep.net/api/literature/?q=doi%3...  \n",
       "latex-us  https://inspirehep.net/api/literature/?q=doi%3...  \n",
       "json      https://inspirehep.net/api/literature/?q=doi%3...  "
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00183-b08b0eea-107b-49a8-8340-c12fcbc66e1d",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "i22ywhohsAVo"
   },
   "source": [
    " We can use all the previous methods to extract the authors from `'Antioquia U.'`:\n",
    " \n",
    " Note: For a dictionary, `d` is safer to use `d.get('key')` instead of just `d['key']` to obtain some `key`, because not error is generated if the requested `key` does not exists at all "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "metadata": {
    "cell_id": "00175-483f18ee-8181-46cf-995b-4d1b5abd53b7",
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 323
    },
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "MPiuuciYo90S",
    "outputId": "5c049b40-16ec-45f4-cce3-158546834ea3"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "hits    [Jhovanny Mejia Guisao, José David Ruiz Alvarez]\n",
       "Name: hits, dtype: object"
      ]
     },
     "execution_count": 134,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['hits'].apply(lambda l: isinstance(l,list))]['hits' # extract cell with list\n",
    "            ].apply(lambda l: [d.get('metadata') for d in l] # metadata of article\n",
    "            ).str[0 #get the matched article dictionary\n",
    "            ].str['authors' # get list of authors → l\n",
    "            ].apply(lambda l: [ f'{d.get(\"first_name\")} {d.get(\"last_name\")}' for d in l  #author is a dictionary → d\n",
    "                               #d.get('affiliations') is a list  of dictionaries → dd                               \n",
    "                               if 'Antioquia U.' in [dd.get('value') for dd in d.get('affiliations')] \n",
    "                              ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [],
   "source": [
    "Authors=df[df['hits'].apply(lambda l: isinstance(l,list))]['hits' # extract cell with list\n",
    "            ].apply(lambda articles: [article.get('metadata') for article in articles] # metadata of article\n",
    "            ).str[0 #get the matched article dictionary\n",
    "            ].str['authors' # get list of authors → l\n",
    "            ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Mejia Guisao, Jhovanny', 'Ruiz Alvarez, José David']"
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names=Authors.apply(lambda authors: [ author.get('full_name') for author in authors  #author is a dictionary\n",
    "                               #author.get('affiliations') is a list  of dictionaries → affiliation                              \n",
    "                               if 'Antioquia U.' in [affiliation.get('value') for affiliation in author.get('affiliations')] \n",
    "                              ])\n",
    "names[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00176-771f6078-2bd8-4b8c-bd1e-52d95359760b",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "SyVgDj1ApPtP"
   },
   "source": [
    "We can see that the column `authors` is quite nested: Is a list of dictionaries with the full information for each one of the authors of the article."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00180-df4514dc-1020-47a8-ae5b-9fa3fe83a913",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "8zr5Nsxwsikt"
   },
   "source": [
    "__Activity__: Check that the lenght of the auhors list coincides with the `number_of_authors` \n",
    "<!-- df.authors.apply(len),df.number_of_authors.values -->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_id": "00181-e6df8982-fca7-4ff2-a8e6-e57c2d3471ee",
    "deepnote_cell_type": "code"
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00192-fa68bd80-a348-41cb-ad3c-fb0a4cfdb0ae",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "1Z96Ju2gwQO0"
   },
   "source": [
    "For further details see: https://github.com/restrepo/inspire/blob/master/gfif.ipynb\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00193-6e138922-01e4-4ecc-934a-06eee6670c8b",
    "deepnote_cell_type": "markdown"
   },
   "source": [
    "__Activity__: Repeat the same activity but using directly the JSON file, obtained with `requests`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {
    "cell_id": "00195-da97f864-7b1d-4a98-a6d1-885056868278",
    "deepnote_cell_type": "code"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Mejia Guisao, Jhovanny', 'Ruiz Alvarez, José David']"
      ]
     },
     "execution_count": 139,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#See: https://github.com/inspirehep/rest-api-doc/issues/4#issuecomment-645218074\n",
    "import requests                                                                                                                                                      \n",
    "response = requests.get('https://inspirehep.net/api/doi/10.1103/PhysRevLett.122.132001')                                                                              \n",
    "authors = response.json()['metadata']['authors']                                                                                                                     \n",
    "names = [author.get('full_name')\n",
    "              for author in authors \n",
    "               if any(aff.get('value') == 'Antioquia U.' for aff in author.get('affiliations'))]\n",
    "names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## Summary\n",
    "[Pandas_Cheat_Sheet PDF](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00196-e12c3e66-259a-44d2-988d-55ad5e2c68de",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "D7SNm2ABxE22",
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "##  ACTIVITIES\n",
    "See:\n",
    "* https://github.com/ajcr/100-pandas-puzzles\n",
    "* https://github.com/guipsamora/pandas_exercises\n",
    "* https://rramosp.github.io/ai4eng.v1/content/NOTES%2002.04%20-%20PANDAS.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00197-d4ddb4b1-754e-48ea-8b71-75d4272f816c",
    "colab_type": "text",
    "deepnote_cell_type": "markdown",
    "id": "fWjYdB7LxE23",
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "## Final remarks\n",
    "With basic scripting and Pandas we already have a solid environment to analyse data. We introduce the other libraries motivated with the extending the capabilities of Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_id": "00198-34e40bd9-4143-4ae7-aac7-7ff4a1d98cdf",
    "colab": {},
    "colab_type": "code",
    "deepnote_cell_type": "code",
    "id": "erRi7J01xE24"
   },
   "source": [
    "## Appendix\n",
    "\n",
    "[Summary with ChatGPT](https://htmlpreview.github.io/?https://raw.githubusercontent.com/restrepo/ComputationalMethods/master/material/ChatGPT/Pandas.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "created_in_deepnote_cell": true,
    "deepnote_cell_type": "markdown",
    "tags": []
   },
   "source": [
    "<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=f91b818d-f536-4b8f-81de-61752e0979b7' target=\"_blank\">\n",
    "<img alt='Created in deepnote.com' style='display:inline;max-height:16px;margin:0px;margin-right:7.5px;' src='data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB3aWR0aD0iODBweCIgaGVpZ2h0PSI4MHB4IiB2aWV3Qm94PSIwIDAgODAgODAiIHZlcnNpb249IjEuMSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+CiAgICA8IS0tIEdlbmVyYXRvcjogU2tldGNoIDU0LjEgKDc2NDkwKSAtIGh0dHBzOi8vc2tldGNoYXBwLmNvbSAtLT4KICAgIDx0aXRsZT5Hcm91cCAzPC90aXRsZT4KICAgIDxkZXNjPkNyZWF0ZWQgd2l0aCBTa2V0Y2guPC9kZXNjPgogICAgPGcgaWQ9IkxhbmRpbmciIHN0cm9rZT0ibm9uZSIgc3Ryb2tlLXdpZHRoPSIxIiBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPgogICAgICAgIDxnIGlkPSJBcnRib2FyZCIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoLTEyMzUuMDAwMDAwLCAtNzkuMDAwMDAwKSI+CiAgICAgICAgICAgIDxnIGlkPSJHcm91cC0zIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxMjM1LjAwMDAwMCwgNzkuMDAwMDAwKSI+CiAgICAgICAgICAgICAgICA8cG9seWdvbiBpZD0iUGF0aC0yMCIgZmlsbD0iIzAyNjVCNCIgcG9pbnRzPSIyLjM3NjIzNzYyIDgwIDM4LjA0NzY2NjcgODAgNTcuODIxNzgyMiA3My44MDU3NTkyIDU3LjgyMTc4MjIgMzIuNzU5MjczOSAzOS4xNDAyMjc4IDMxLjY4MzE2ODMiPjwvcG9seWdvbj4KICAgICAgICAgICAgICAgIDxwYXRoIGQ9Ik0zNS4wMDc3MTgsODAgQzQyLjkwNjIwMDcsNzYuNDU0OTM1OCA0Ny41NjQ5MTY3LDcxLjU0MjI2NzEgNDguOTgzODY2LDY1LjI2MTk5MzkgQzUxLjExMjI4OTksNTUuODQxNTg0MiA0MS42NzcxNzk1LDQ5LjIxMjIyODQgMjUuNjIzOTg0Niw0OS4yMTIyMjg0IEMyNS40ODQ5Mjg5LDQ5LjEyNjg0NDggMjkuODI2MTI5Niw0My4yODM4MjQ4IDM4LjY0NzU4NjksMzEuNjgzMTY4MyBMNzIuODcxMjg3MSwzMi41NTQ0MjUgTDY1LjI4MDk3Myw2Ny42NzYzNDIxIEw1MS4xMTIyODk5LDc3LjM3NjE0NCBMMzUuMDA3NzE4LDgwIFoiIGlkPSJQYXRoLTIyIiBmaWxsPSIjMDAyODY4Ij48L3BhdGg+CiAgICAgICAgICAgICAgICA8cGF0aCBkPSJNMCwzNy43MzA0NDA1IEwyNy4xMTQ1MzcsMC4yNTcxMTE0MzYgQzYyLjM3MTUxMjMsLTEuOTkwNzE3MDEgODAsMTAuNTAwMzkyNyA4MCwzNy43MzA0NDA1IEM4MCw2NC45NjA0ODgyIDY0Ljc3NjUwMzgsNzkuMDUwMzQxNCAzNC4zMjk1MTEzLDgwIEM0Ny4wNTUzNDg5LDc3LjU2NzA4MDggNTMuNDE4MjY3Nyw3MC4zMTM2MTAzIDUzLjQxODI2NzcsNTguMjM5NTg4NSBDNTMuNDE4MjY3Nyw0MC4xMjg1NTU3IDM2LjMwMzk1NDQsMzcuNzMwNDQwNSAyNS4yMjc0MTcsMzcuNzMwNDQwNSBDMTcuODQzMDU4NiwzNy43MzA0NDA1IDkuNDMzOTE5NjYsMzcuNzMwNDQwNSAwLDM3LjczMDQ0MDUgWiIgaWQ9IlBhdGgtMTkiIGZpbGw9IiMzNzkzRUYiPjwvcGF0aD4KICAgICAgICAgICAgPC9nPgogICAgICAgIDwvZz4KICAgIDwvZz4KPC9zdmc+' > </img>\n",
    "Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "name": "Pandas.ipynb",
   "provenance": []
  },
  "deepnote": {},
  "deepnote_execution_queue": [],
  "deepnote_notebook_id": "0a9be7a0-3db5-477d-bb7c-1755e22290d0",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {},
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}