{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0259174d-803e-4a49-b1c3-3afb66b8342b",
   "metadata": {},
   "source": [
    "# Creating a DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef08f74f-d6f0-4359-817c-ba1f032c3d5c",
   "metadata": {},
   "source": [
    "## The library\n",
    "\n",
    "Similar to how arrays are not built into Python and are provided through the `numpy` library, `DataFrame`s are offered by the NumPy-based `pandas` library.\n",
    "\n",
    "And so, first, we need to make sure that the <a href=\"https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html\" target=\"_blank\" rel=\"noopener\">pandas library is installed</a>.\n",
    "\n",
    "Only then can we tell Python to make the `pandas` library available to our code, using the `import` statement. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5ad0c30b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dda67f7d",
   "metadata": {},
   "source": [
    "Once this is done, the `DataFrame` type becomes available as `pandas.DataFrame`. In other words, we can access it \"under\" the name `pandas`, with a dot between the two names. \n",
    "It is a convention to import `pandas` as `pd` to save the keystrokes in writing, i.e."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5efcf03f-2bf4-461f-9d8e-01b553fabdc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7afefdeb",
   "metadata": {},
   "source": [
    "This way, we will be able to refer to the elements of the pandas library, such as `DataFrame`, as `pd.DataFrame` (just like `np.array`). "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f72388a",
   "metadata": {},
   "source": [
    "<a href=\"https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html\" target=\"_blank\" rel=\"noopener\">There are _many_ ways to construct a DataFrame</a>. In the following, we discuss a few of them. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea2a7b62",
   "metadata": {},
   "source": [
    "## 1. Creating a `DataFrame` with Manual Input"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22aaef0e",
   "metadata": {},
   "source": [
    "### 1.1 From `list` of `list`s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a80ce1f8",
   "metadata": {},
   "source": [
    "One of the simplest ways to create a `DataFrame` is by using Python lists. Consider the following data in the form of a `list` with each element is a `list` containing name of a planet and its distance from the sun in million km."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0377b938",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['Mercury', 57.9],\n",
       " ['Venus', 108.2],\n",
       " ['Earth', 149.6],\n",
       " ['Mars', 227.9],\n",
       " ['Jupiter', 778.6],\n",
       " ['Saturn', 1433.5],\n",
       " ['Uranus', 2872.5],\n",
       " ['Neptune', 4495.1]]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets_data = [\n",
    "    ['Mercury', 57.9],\n",
    "    ['Venus', 108.2],\n",
    "    ['Earth', 149.6],\n",
    "    ['Mars', 227.9],\n",
    "    ['Jupiter', 778.6],\n",
    "    ['Saturn', 1433.5],\n",
    "    ['Uranus', 2872.5],\n",
    "    ['Neptune', 4495.1]\n",
    "]\n",
    "\n",
    "planets_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cd6042d",
   "metadata": {},
   "source": [
    "We can construct a DataFrame for the above data and give names to each column by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a44a2466",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Planet</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Planet  solar_distance_km_6\n",
       "0  Mercury                 57.9\n",
       "1    Venus                108.2\n",
       "2    Earth                149.6\n",
       "3     Mars                227.9\n",
       "4  Jupiter                778.6\n",
       "5   Saturn               1433.5\n",
       "6   Uranus               2872.5\n",
       "7  Neptune               4495.1"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets_df = pd.DataFrame(planets_data, columns=['Planet', 'solar_distance_km_6'])\n",
    "\n",
    "planets_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c02fb20",
   "metadata": {},
   "source": [
    "This presentation of our data already looks more like a spreadsheet!\n",
    "\n",
    "Now let us consider that we have more information about each planet i.e. we know about its absolute mass (in $10^{24}$ kg), density (in kg/$\\text{m}^3$), and gravity (in m/$\\text{s}^2$) i.e."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8fa668b7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[['Mercury', 57.9, 0.33, 5427.0, 3.7],\n",
       " ['Venus', 108.2, 4.87, 5243.0, 8.9],\n",
       " ['Earth', 149.6, 5.97, 5514.0, 9.8],\n",
       " ['Mars', 227.9, 0.642, 3933.0, 3.7],\n",
       " ['Jupiter', 778.6, 1898.0, 1326.0, 23.1],\n",
       " ['Saturn', 1433.5, 568.0, 687.0, 9.0],\n",
       " ['Uranus', 2872.5, 86.8, 1271.0, 8.7],\n",
       " ['Neptune', 4495.1, 102.0, 1638.0, 11.0]]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets_features = [\n",
    "    'name',                # familiar name\n",
    "    'solar_distance_km_6', # distance from sun: 10**6 km\n",
    "    'mass_kg_24',          # absolute mass: 10**24 kg\n",
    "    'density_kg_m3',       # density: kg/m**3\n",
    "    'gravity_m_s2',        # gravity: m/s**2\n",
    "]\n",
    "\n",
    "planets_data = [\n",
    "    ['Mercury', 57.9, 0.33, 5427.0, 3.7],\n",
    "    ['Venus', 108.2, 4.87, 5243.0, 8.9],\n",
    "    ['Earth', 149.6, 5.97, 5514.0, 9.8],\n",
    "    ['Mars', 227.9, 0.642, 3933.0, 3.7],\n",
    "    ['Jupiter', 778.6, 1898.0, 1326.0, 23.1],\n",
    "    ['Saturn', 1433.5, 568.0, 687.0, 9.0],\n",
    "    ['Uranus', 2872.5, 86.8, 1271.0, 8.7],\n",
    "    ['Neptune', 4495.1, 102.0, 1638.0, 11.0]\n",
    "]\n",
    "\n",
    "planets_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "344d92d4",
   "metadata": {},
   "source": [
    "We can similarly convert it into `pandas` DataFrame by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2fbd8858",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "0  Mercury                 57.9       0.330         5427.0           3.7\n",
       "1    Venus                108.2       4.870         5243.0           8.9\n",
       "2    Earth                149.6       5.970         5514.0           9.8\n",
       "3     Mars                227.9       0.642         3933.0           3.7\n",
       "4  Jupiter                778.6    1898.000         1326.0          23.1\n",
       "5   Saturn               1433.5     568.000          687.0           9.0\n",
       "6   Uranus               2872.5      86.800         1271.0           8.7\n",
       "7  Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets_df = pd.DataFrame(planets_data, columns = planets_features)\n",
    "\n",
    "planets_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2475a24",
   "metadata": {},
   "source": [
    "### 1.2 From Dictionary of `list`s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf5c3fc7",
   "metadata": {},
   "source": [
    "Consider specifying our data as a dictionary, where each `key` corresponds to a column name and its associated value is a list containing the data for that column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "b7144e00",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "0  Mercury                 57.9       0.330         5427.0           3.7\n",
       "1    Venus                108.2       4.870         5243.0           8.9\n",
       "2    Earth                149.6       5.970         5514.0           9.8\n",
       "3     Mars                227.9       0.642         3933.0           3.7\n",
       "4  Jupiter                778.6    1898.000         1326.0          23.1\n",
       "5   Saturn               1433.5     568.000          687.0           9.0\n",
       "6   Uranus               2872.5      86.800         1271.0           8.7\n",
       "7  Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets_dict = {\n",
    "    'name': ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'],\n",
    "    'solar_distance_km_6': [57.9, 108.2, 149.6, 227.9, 778.6, 1433.5, 2872.5, 4495.1],\n",
    "    'mass_kg_24': [0.33, 4.87, 5.97, 0.642, 1898.0, 568.0, 86.8, 102.0],\n",
    "    'density_kg_m3': [5427.0, 5243.0, 5514.0, 3933.0, 1326.0, 687.0, 1271.0, 1638.0],\n",
    "    'gravity_m_s2': [3.7, 8.9, 9.8, 3.7, 23.1, 9.0, 8.7, 11.0],\n",
    "}\n",
    "\n",
    "planets_df = pd.DataFrame(planets_dict)\n",
    "\n",
    "planets_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce300a42",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "Choosing between a list of lists and a dictionary of lists depends on your data and workflow. \n",
    "\n",
    "A list of lists is useful when your data is naturally row-oriented, with each inner list representing a complete row. It is a quick way to create a DataFrame from raw data. However, adding a new column using this approach can be cumbersome: you need to enter each value for the new column into every row list individually, which is tedious and prone to errors. \n",
    "\n",
    "On the other hand, a dictionary of lists is more convenient when you want labeled columns from the start. Adding a new column is straightforward, as you can simply assign a new key-value pair in the dictionary. \n",
    "\n",
    "In practice, if you anticipate modifying your DataFrame or adding new columns, using a dictionary of lists is generally easier and safer.\n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35f809bf",
   "metadata": {},
   "source": [
    "## 2. Creating a `DataFrame` with `Series`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a7b4cbf",
   "metadata": {},
   "source": [
    "In `pandas`, a **Series** is a one-dimensional labeled array that can hold any data type. Think of it as a single column of a DataFrame with an index. Since a DataFrame is essentially a collection of Series that share the same index, you can construct a DataFrame by combining multiple Series.  \n",
    "\n",
    "This approach is especially useful when you already have individual Series objects representing different columns, or when you want to preserve meaningful row labels (indices) for your data instead of using the default integer indices.\n",
    "\n",
    "In the following examples, we will see how to create a DataFrame by combining multiple Series into a single structured table.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "87d414e8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    Mercury\n",
       "1      Venus\n",
       "2      Earth\n",
       "3       Mars\n",
       "4    Jupiter\n",
       "5     Saturn\n",
       "6     Uranus\n",
       "7    Neptune\n",
       "dtype: object"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Creating a `pandas` Series object\n",
    "\n",
    "planet_names = pd.Series(['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'])\n",
    "\n",
    "planet_names"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "82f457c0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(planet_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1fe8372",
   "metadata": {},
   "source": [
    "Let us say we have the following data as `pandas` Series objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "47867663",
   "metadata": {},
   "outputs": [],
   "source": [
    "planets_solar_distance_km_6 = pd.Series([57.9, 108.2, 149.6, 227.9, 778.6, 1433.5, 2872.5, 4495.1])\n",
    "planets_mass_kg_24 = pd.Series([0.33, 4.87, 5.97, 0.642, 1898.0, 568.0, 86.8, 102.0])\n",
    "planets_density_kg_m3 = pd.Series([5427.0, 5243.0, 5514.0, 3933.0, 1326.0, 687.0, 1271.0, 1638.0])\n",
    "planets_gravity_m_s2 = pd.Series([3.7, 8.9, 9.8, 3.7, 23.1, 9.0, 8.7, 11.0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "813b115d",
   "metadata": {},
   "source": [
    "We can combine these `Series` objects to create a `DataFrame` in the way similar to how we create a `DataFrame` using dictionaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "12a88762",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "0  Mercury                 57.9       0.330         5427.0           3.7\n",
       "1    Venus                108.2       4.870         5243.0           8.9\n",
       "2    Earth                149.6       5.970         5514.0           9.8\n",
       "3     Mars                227.9       0.642         3933.0           3.7\n",
       "4  Jupiter                778.6    1898.000         1326.0          23.1\n",
       "5   Saturn               1433.5     568.000          687.0           9.0\n",
       "6   Uranus               2872.5      86.800         1271.0           8.7\n",
       "7  Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets_df = pd.DataFrame({\n",
    "    'name': planet_names,\n",
    "    'solar_distance_km_6': planets_solar_distance_km_6,\n",
    "    'mass_kg_24': planets_mass_kg_24,\n",
    "    'density_kg_m3': planets_density_kg_m3,\n",
    "    'gravity_m_s2': planets_gravity_m_s2\n",
    "    })\n",
    "\n",
    "planets_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0bbeac91",
   "metadata": {},
   "source": [
    "## 3. Creating a `DataFrame` from External Files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "647b40c1",
   "metadata": {},
   "source": [
    "So far, we have manually entered small amounts of data to construct `DataFrames`. While this is useful for learning, real-world datasets are typically much larger and stored in external files. A common scenario in data science is reading data that has already been collected and saved in formats such as `.csv`, `.xlsx`, or `.txt`. Instead of typing the data by hand, we can load it directly into a `DataFrame`, making it easy to explore, analyze, and manipulate large datasets efficiently. <a href=\"https://pandas.pydata.org/docs/user_guide/io.html#io\" target=\"_blank\" rel=\"noopener\">`pandas` supports many common data encoding formats</a>, and makes it easy to construct `DataFrames` from them.\n",
    "\n",
    "Let us understand how to construct a DataFrame using a CSV (comma-separated values) file. Other file formats can be worked with in a similar way.\n",
    "\n",
    "We use the function `pd.read_csv(file_path)` to read data from a CSV file into a `pandas` DataFrame. Here, `file_path` is a string that specifies the location of the file. It can be an absolute path (the full location on your computer), a relative path (relative to your current working directory), or even a URL pointing to an online dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7733d5d4",
   "metadata": {},
   "source": [
    "a) **Absolute file path**: Starts from the root of your computer and gives the full location of the file.  \n",
    "  For example, if there is a `CSV` file called `data.csv` in the Documents folder of a computer, its absolute path would be:  \n",
    "  - Windows: `C:\\\\Users\\\\<username>\\\\Documents\\\\data.csv`  \n",
    "  - Mac/Linux: `/Users/<username>/Documents/data.csv`  \n",
    "Here, `<username>` is the placeholder for the actual account name on your computer. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d394b2b2",
   "metadata": {},
   "source": [
    ":::{note} Windows traditionally uses backslashes `\\` in file paths, while Mac/Linux use forward slashes `/`. A single slash (`/` or `\\`) in a file path means “go into this folder”. \n",
    "\n",
    "In Python strings, a single backslash `\\` is treated as an escape character (e.g., `\\n` for newline), so you need to use double backslashes `\\\\` or just use forward slashes `/` (Python is able to handle it correctly for Windows path as well).\n",
    ":::\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c98fad5b",
   "metadata": {},
   "source": [
    "b) **Relative file path**: Starts from the folder where your notebook (or code) is located.  \n",
    "\n",
    "For example:\n",
    "\n",
    "1. Suppose you are working in a notebook called `hw.ipynb` which is in a folder called `HW`. You also have a file called `data.csv` located in the same `HW` folder. Its relative path would be:  \n",
    "   - Windows: `data.csv`  \n",
    "   - Mac/Linux: `data.csv`  \n",
    "   Notice that we did not need to tell the computer about the common folder `HW` that contains both the notebook and the data file, since Python automatically starts from the current working directory (i.e. folder in which your notebook is located).  \n",
    "\n",
    "2. Now suppose you are working in `hw.ipynb` inside the `HW` folder, but your file `data.csv` is in a subfolder called `Data` inside `HW`. In this case, you need to tell Python to look in the `Data` folder within your current working directory. The relative path would be:  \n",
    "   - Windows: `Data\\\\data.csv`  \n",
    "   - Mac/Linux: `Data/data.csv`  \n",
    "   Again, notice that we did not need to tell the computer about the `HW` folder as Python starts from the current folder where the notebook lives. \n",
    "\n",
    "3. Finally, suppose your notebook `hw.ipynb` is inside the `HW` folder, but your file `data.csv` is located one level up, in a parent folder called `Data118` that contains the `HW` folder. In this case, you use `..` to go up one level into `Data118` and then access the file. The relative path would be:  \n",
    "   - Windows: `..\\\\data.csv`  \n",
    "   - Mac/Linux: `../data.csv`  \n",
    "   Here, `..` means “go up one directory from the current working directory.” So starting in `HW`, Python moves up into `Data118` and finds `data.csv` there. You can also use `../` consecutively (e.g., `../../`) to go up multiple folder levels if your file is located further away in the directory hierarchy.\n",
    "\n",
    "\n",
    "4. Suppose your notebook `hw.ipynb` is inside the `HW` folder, which is in a parent folder called `Data118`. The file `data.csv` is in another subfolder called `Data` inside `Data118`. In this case, starting from the notebook in `HW`, you need to go up one level to `Data118` and then into the `Data` folder to access `data.csv`. The relative path would be:  \n",
    "   - Windows: `..\\\\Data\\\\data.csv`  \n",
    "   - Mac/Linux: `../Data/data.csv`  \n",
    "Python will first move from `HW` up to `Data118`, then into `Data` to find `data.csv`.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6949d09c",
   "metadata": {},
   "source": [
    "c). **URL**: You can also read directly from an online dataset by using `pd.read_csv(\"url\")`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26c736b2-debb-430c-8269-c96b70d2dc34",
   "metadata": {},
   "source": [
    "### Data formats\n",
    "\n",
    "Of course, it is _very_ common to store data in a file format, such as CSV. <a href=\"https://pandas.pydata.org/docs/user_guide/io.html#io\" target=\"_blank\" rel=\"noopener\">pandas supports a great many common data encoding formats</a>, and makes it easy to construct `DataFrames` from them.\n",
    "\n",
    "For example, if we had a CSV file in our \"Documents\" folder, we might construct a `DataFrame` from it using the pandas `read_csv` function, like so:\n",
    "\n",
    "```py\n",
    "data = pd.read_csv('/Users/MySelf/Documents/my-data.csv')\n",
    "```\n",
    "\n",
    "Above, we simply gave pandas the file system path to our CSV data. The `read_csv` function also supports <a href=\"https://docs.python.org/3/glossary.html#term-file-object\" target=\"_blank\" rel=\"noopener\">file objects</a>, such as those returned by Python's `open` function.\n",
    "\n",
    "Our planetary data, encoded as CSV, takes the following form:\n",
    "\n",
    "```\n",
    "name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2\n",
    "Mercury,57.9,0.33,5427.0,3.7\n",
    "Venus,108.2,4.87,5243.0,8.9\n",
    "Earth,149.6,5.97,5514.0,9.8\n",
    "Mars,227.9,0.642,3933.0,3.7\n",
    "Jupiter,778.6,1898.0,1326.0,23.1\n",
    "Saturn,1433.5,568.0,687.0,9.0\n",
    "Uranus,2872.5,86.8,1271.0,8.7\n",
    "Neptune,4495.1,102.0,1638.0,11.0\n",
    "```\n",
    "\n",
    "Note that we've included our feature names as the first row of our data. (This is optional – but useful!)\n",
    "\n",
    "And below we'll reload our planets `DataFrame`, similarly to the above – (but from a file buffer of that data, `planets_csv`, the details of which are hidden below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ca877460-cec6-42bb-b9c4-7410902ee76d",
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [],
   "source": [
    "#\n",
    "# Hello!\n",
    "#\n",
    "# This code allows you to download and execute this notebook as-is –\n",
    "# without a separate CSV file.\n",
    "#\n",
    "# We can just *pretend* that `planets_csv` is a path to a file –\n",
    "# or, more apt, a file object opened with the Python `open` function.\n",
    "#\n",
    "# (Really it's an in-memory file object … but that's not important here!)\n",
    "#\n",
    "import io\n",
    "\n",
    "\n",
    "planets_encoded = '''\\\n",
    "name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2\n",
    "Mercury,57.9,0.33,5427.0,3.7\n",
    "Venus,108.2,4.87,5243.0,8.9\n",
    "Earth,149.6,5.97,5514.0,9.8\n",
    "Mars,227.9,0.642,3933.0,3.7\n",
    "Jupiter,778.6,1898.0,1326.0,23.1\n",
    "Saturn,1433.5,568.0,687.0,9.0\n",
    "Uranus,2872.5,86.8,1271.0,8.7\n",
    "Neptune,4495.1,102.0,1638.0,11.0\n",
    "'''\n",
    "\n",
    "planets_csv = io.StringIO(planets_encoded)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "be1c8683-1dcc-4e38-b957-3290ab10e8b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      name  solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "0  Mercury                 57.9       0.330         5427.0           3.7\n",
       "1    Venus                108.2       4.870         5243.0           8.9\n",
       "2    Earth                149.6       5.970         5514.0           9.8\n",
       "3     Mars                227.9       0.642         3933.0           3.7\n",
       "4  Jupiter                778.6    1898.000         1326.0          23.1\n",
       "5   Saturn               1433.5     568.000          687.0           9.0\n",
       "6   Uranus               2872.5      86.800         1271.0           8.7\n",
       "7  Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets = pd.read_csv(planets_csv)\n",
    "\n",
    "planets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "056e60fe-ca84-446b-a50d-67fff7427b72",
   "metadata": {},
   "source": [
    "Note that pandas automatically inferred that the first row of our CSV data specified the feature names."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbdc7693-b407-423d-924a-290efd8abee2",
   "metadata": {},
   "source": [
    "## The index\n",
    "\n",
    "pandas's default index – the familiar range of integers starting with `0` – is most often sensible for computational data.\n",
    "\n",
    "This is represented by the `RangeIndex` type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "abe10a4b-a1a9-43ca-a073-319170c26020",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=8, step=1)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88a6131a-b083-4e9f-b20f-e0ceaee80c41",
   "metadata": {},
   "source": [
    "Of course, that's _not_ how we think about the planets!\n",
    "\n",
    "We can tell pandas to use a more familiar index instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a502cfa7-4b43-47eb-bcdf-bce3e4f7bf67",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=1, stop=9, step=1, name='number')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.RangeIndex(1, 9, name='number')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b3de389f-da2e-481c-b2a0-1db73b95aab2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>number</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           name  solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "number                                                                       \n",
       "1       Mercury                 57.9       0.330         5427.0           3.7\n",
       "2         Venus                108.2       4.870         5243.0           8.9\n",
       "3         Earth                149.6       5.970         5514.0           9.8\n",
       "4          Mars                227.9       0.642         3933.0           3.7\n",
       "5       Jupiter                778.6    1898.000         1326.0          23.1\n",
       "6        Saturn               1433.5     568.000          687.0           9.0\n",
       "7        Uranus               2872.5      86.800         1271.0           8.7\n",
       "8       Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(planets_data,\n",
    "             columns=planets_features,\n",
    "             index=pd.RangeIndex(1, 9, name='number'))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dfa5acf-660c-4bcc-ba63-2746eaa38193",
   "metadata": {},
   "source": [
    "We don't even have to use ranges … or numbers!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "6ada95ab-61fe-4344-9a45-f5373a203e47",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ordinal</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>first</th>\n",
       "      <td>Mercury</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>second</th>\n",
       "      <td>Venus</td>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>third</th>\n",
       "      <td>Earth</td>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fourth</th>\n",
       "      <td>Mars</td>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fifth</th>\n",
       "      <td>Jupiter</td>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sixth</th>\n",
       "      <td>Saturn</td>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>seventh</th>\n",
       "      <td>Uranus</td>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>eighth</th>\n",
       "      <td>Neptune</td>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            name  solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "ordinal                                                                       \n",
       "first    Mercury                 57.9       0.330         5427.0           3.7\n",
       "second     Venus                108.2       4.870         5243.0           8.9\n",
       "third      Earth                149.6       5.970         5514.0           9.8\n",
       "fourth      Mars                227.9       0.642         3933.0           3.7\n",
       "fifth    Jupiter                778.6    1898.000         1326.0          23.1\n",
       "sixth     Saturn               1433.5     568.000          687.0           9.0\n",
       "seventh   Uranus               2872.5      86.800         1271.0           8.7\n",
       "eighth   Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ordinals = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth']\n",
    "\n",
    "planet_ordinals = pd.DataFrame(planets_data,\n",
    "                               columns=planets_features,\n",
    "                               index=pd.Index(ordinals, name='ordinal'))\n",
    "\n",
    "planet_ordinals"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9576ebce-205b-45f3-9888-743cceb202fc",
   "metadata": {},
   "source": [
    "But, in the end, perhaps we'd prefer not to count the planets at all.\n",
    "\n",
    "Whenever a data feature makes sense to use as the data index – that is, it's sufficient to _always_ **uniquely identify** individuals, we can just tell pandas to use that column as the index, instead."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e5b25fc-6435-4cf3-b0a1-dc5fbb928d04",
   "metadata": {},
   "source": [
    "We'll learn more about manipulating `DataFrames` in subsequent sections. But, for now, here's how we would set the `name` feature as our index, (at least when constructing a `DataFrame` from `lists` or `dicts`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d9dff830-cbc3-4b40-b04d-9fe4e4b42dc9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mercury</th>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Venus</th>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Earth</th>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mars</th>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jupiter</th>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saturn</th>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Uranus</th>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Neptune</th>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "name                                                                 \n",
       "Mercury                 57.9       0.330         5427.0           3.7\n",
       "Venus                  108.2       4.870         5243.0           8.9\n",
       "Earth                  149.6       5.970         5514.0           9.8\n",
       "Mars                   227.9       0.642         3933.0           3.7\n",
       "Jupiter                778.6    1898.000         1326.0          23.1\n",
       "Saturn                1433.5     568.000          687.0           9.0\n",
       "Uranus                2872.5      86.800         1271.0           8.7\n",
       "Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets.set_index('name')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "33709da9-56b8-4163-8666-68b03614908b",
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#\n",
    "# Hello!\n",
    "#\n",
    "# This cell has been hidden – it's just an implementation concern.\n",
    "#\n",
    "# Generally, when working with files, you won't need to worry about this.\n",
    "#\n",
    "planets_csv.seek(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddcfe281-6fbd-4d7c-a71b-79520be3d967",
   "metadata": {},
   "source": [
    "The `read_csv` function, on the other hand, supports this case specifically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "073652e2-bb44-4121-a03a-758d7c9aaf0c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mercury</th>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Venus</th>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Earth</th>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mars</th>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jupiter</th>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saturn</th>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Uranus</th>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Neptune</th>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "name                                                                 \n",
       "Mercury                 57.9       0.330         5427.0           3.7\n",
       "Venus                  108.2       4.870         5243.0           8.9\n",
       "Earth                  149.6       5.970         5514.0           9.8\n",
       "Mars                   227.9       0.642         3933.0           3.7\n",
       "Jupiter                778.6    1898.000         1326.0          23.1\n",
       "Saturn                1433.5     568.000          687.0           9.0\n",
       "Uranus                2872.5      86.800         1271.0           8.7\n",
       "Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv(planets_csv, index_col='name')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc759e02-cf38-4a97-8eab-f71103146ae8",
   "metadata": {},
   "source": [
    "Now these `DataFrames` are looking great! Let's see what we can do with them."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b68a1d00-2b47-4c95-839d-b63acc602fc4",
   "metadata": {},
   "source": [
    "## Operations\n",
    "\n",
    "As we've seen with the `list`, (and the string), the `DataFrame` can be manipulated by functions and built-in operators. Moreover, these offer special-purpose functions which have been *bound* to their types – that is, *methods* – which are invoked with expressions of the form below:\n",
    "\n",
    "    name_of_dataframe.name_of_method(argument0, argument1, ..., keyword0=value0, ...)\n",
    "    \n",
    "For example, above we used the `set_index` method to construct a new `DataFrame` with the `name` column set as the data index. Here it is again:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "0a0f096a-7d48-4a52-9895-0dbfc89c2342",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>solar_distance_km_6</th>\n",
       "      <th>mass_kg_24</th>\n",
       "      <th>density_kg_m3</th>\n",
       "      <th>gravity_m_s2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mercury</th>\n",
       "      <td>57.9</td>\n",
       "      <td>0.330</td>\n",
       "      <td>5427.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Venus</th>\n",
       "      <td>108.2</td>\n",
       "      <td>4.870</td>\n",
       "      <td>5243.0</td>\n",
       "      <td>8.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Earth</th>\n",
       "      <td>149.6</td>\n",
       "      <td>5.970</td>\n",
       "      <td>5514.0</td>\n",
       "      <td>9.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mars</th>\n",
       "      <td>227.9</td>\n",
       "      <td>0.642</td>\n",
       "      <td>3933.0</td>\n",
       "      <td>3.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Jupiter</th>\n",
       "      <td>778.6</td>\n",
       "      <td>1898.000</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>23.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Saturn</th>\n",
       "      <td>1433.5</td>\n",
       "      <td>568.000</td>\n",
       "      <td>687.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Uranus</th>\n",
       "      <td>2872.5</td>\n",
       "      <td>86.800</td>\n",
       "      <td>1271.0</td>\n",
       "      <td>8.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Neptune</th>\n",
       "      <td>4495.1</td>\n",
       "      <td>102.000</td>\n",
       "      <td>1638.0</td>\n",
       "      <td>11.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         solar_distance_km_6  mass_kg_24  density_kg_m3  gravity_m_s2\n",
       "name                                                                 \n",
       "Mercury                 57.9       0.330         5427.0           3.7\n",
       "Venus                  108.2       4.870         5243.0           8.9\n",
       "Earth                  149.6       5.970         5514.0           9.8\n",
       "Mars                   227.9       0.642         3933.0           3.7\n",
       "Jupiter                778.6    1898.000         1326.0          23.1\n",
       "Saturn                1433.5     568.000          687.0           9.0\n",
       "Uranus                2872.5      86.800         1271.0           8.7\n",
       "Neptune               4495.1     102.000         1638.0          11.0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets.set_index('name')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "276c89e8-7dd4-4838-8f50-b62237ab1794",
   "metadata": {},
   "source": [
    "And, similar to methods, there are *attributes* and *properties*. These are values which are similarly bound to the `DataFrame`, but which need not be called:\n",
    "\n",
    "    name_of_dataframe.name_of_property\n",
    "    \n",
    "We made use of the `index` property above as well, to inspect our `DataFrame`'s currently-assigned index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "abb6b2d0-48eb-4dac-aead-5d4761a2b2e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=8, step=1)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "planets.index"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0db98b6-ca01-4073-8614-debc59f46d1d",
   "metadata": {},
   "source": [
    "pandas offers us many functions, methods and properties to explore!\n",
    "\n",
    "And now we're ready to explore the dimensions our data."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}