pride-data-analysis/analysis/analysis1.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Nat Scott"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Research question/interests\n",
    "\n",
    "**Is there a correlation between political alignment & living in neighbourhoods with large quantities of LGBT people?** The obvious answer to this question is \"yes, they are going to mostly be democrats\" but anyone who's ever been around queer people will know that this question is quite a bit more nuanced than that, and this nuance is what we hope to capture in investigating this question.\n",
    "\n",
    "- The gaybourhoods data set does not include data on residents political alignments, however, there is a wealth of electoral data available freely online that we intend on incorporating into this project. The primary difficulty then will be developing a geographic \"compatibility layer\" between the data sets so that the data can be understood in the same context. To build this, we intend on working with the OpenStreetMap API to create an additional column representing observations position space in a more neutral way, such as their coordinates.\n",
    "- Alternatively, we've also considered working with an additional data set that links US zip codes to their longitude and lattitude positions. As such, incorporating this data would be as easy as merging the two tables.\n",
    "\n",
    "\n",
    "**Is there a correlation between geographical stratums & being LGBT?** This question is more abstract, and will serve as a preliminary exploration of the data in hopes of establishing two key details along the way that will shape the rest of the project: how do we quantify queerness, and how do we best represent it visually?\n",
    "\n",
    "- Once again, representing this data visually will require determining the coordinates associated with each observation.\n",
    "- The gaybourhoods data set defines a \"gaybourhood index\" which effectively measures how friendly a given neighbourhood is to queer people. Since this index is entirely subjective, we will need to closely evaluate it's usefulness for our project and investigate different ways to quantify \"queer-friendliness\"\n",
    "- In addition to the last point, since, of course, no matter what choice of observations we make, the measurement will still be subjective, answering this research question will come more so in the form of comparing and contrasting different measurements to see what they tell us.\n",
    "- Obviously, visualizing this among many aspects of the other research questions would involve projecting the data onto a map of the United States, so visualizing this research question would motivate many of the visualizations for other components of this project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Wrangling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Hancock OH</td>\n",
       "      <td>41.000471</td>\n",
       "      <td>-83.666033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Stafford VA</td>\n",
       "      <td>38.413261</td>\n",
       "      <td>-77.451334</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Webster NE</td>\n",
       "      <td>40.180646</td>\n",
       "      <td>-98.498590</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Dimmit TX</td>\n",
       "      <td>28.423587</td>\n",
       "      <td>-99.765871</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cedar IA</td>\n",
       "      <td>41.772360</td>\n",
       "      <td>-91.132610</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          name        lat       long\n",
       "0   Hancock OH  41.000471 -83.666033\n",
       "1  Stafford VA  38.413261 -77.451334\n",
       "2   Webster NE  40.180646 -98.498590\n",
       "3    Dimmit TX  28.423587 -99.765871\n",
       "4     Cedar IA  41.772360 -91.132610"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## counties - Relating US counties to their long/lat position on the Earth\n",
    "counties = pd.read_csv(\"../data/raw/us-county-boundaries.csv\", sep=\";\")\n",
    "\n",
    "counties = counties.rename({\n",
    "    \"NAME\": \"name\",\n",
    "    \"INTPTLAT\": \"lat\",\n",
    "    \"INTPTLON\": \"long\",\n",
    "}, axis=\"columns\")\n",
    "\n",
    "# Combine the county name with the state code\n",
    "def combine_name_state(row):\n",
    "    row[\"name\"] = f\"{row['name']} {row['STUSAB']}\"\n",
    "    return row\n",
    "\n",
    "counties = counties.apply(combine_name_state, axis=\"columns\")\n",
    "\n",
    "# We don't need this column anymore\n",
    "counties = counties.drop([\"STUSAB\"], axis=\"columns\")\n",
    "\n",
    "counties.to_csv(\"../data/processed/us-county-boundaries.csv\")\n",
    "counties.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>county</th>\n",
       "      <th>party</th>\n",
       "      <th>votes</th>\n",
       "      <th>total</th>\n",
       "      <th>percent</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Autauga AL</td>\n",
       "      <td>Democrat</td>\n",
       "      <td>6363</td>\n",
       "      <td>23932</td>\n",
       "      <td>0.265878</td>\n",
       "      <td>32.532237</td>\n",
       "      <td>-86.646439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Autauga AL</td>\n",
       "      <td>Republican</td>\n",
       "      <td>17379</td>\n",
       "      <td>23932</td>\n",
       "      <td>0.726183</td>\n",
       "      <td>32.532237</td>\n",
       "      <td>-86.646439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Autauga AL</td>\n",
       "      <td>Other</td>\n",
       "      <td>190</td>\n",
       "      <td>23932</td>\n",
       "      <td>0.007939</td>\n",
       "      <td>32.532237</td>\n",
       "      <td>-86.646439</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Baldwin AL</td>\n",
       "      <td>Democrat</td>\n",
       "      <td>18424</td>\n",
       "      <td>85338</td>\n",
       "      <td>0.215894</td>\n",
       "      <td>30.659218</td>\n",
       "      <td>-87.746067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Baldwin AL</td>\n",
       "      <td>Republican</td>\n",
       "      <td>66016</td>\n",
       "      <td>85338</td>\n",
       "      <td>0.773583</td>\n",
       "      <td>30.659218</td>\n",
       "      <td>-87.746067</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       county       party  votes  total   percent        lat       long\n",
       "0  Autauga AL    Democrat   6363  23932  0.265878  32.532237 -86.646439\n",
       "1  Autauga AL  Republican  17379  23932  0.726183  32.532237 -86.646439\n",
       "2  Autauga AL       Other    190  23932  0.007939  32.532237 -86.646439\n",
       "3  Baldwin AL    Democrat  18424  85338  0.215894  30.659218 -87.746067\n",
       "4  Baldwin AL  Republican  66016  85338  0.773583  30.659218 -87.746067"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## pol - Election results from the 2012 American presidential election\n",
    "pol = pd.read_csv(\"../data/raw/countypres_2000-2020.csv\")\n",
    "\n",
    "# We only want 2012--the latest election before the gb data was collected\n",
    "\n",
    "pol = pol[pol[\"year\"] == 2012].reset_index()\n",
    "\n",
    "# Get rid of undesireable columns\n",
    "pol = pol.drop([\n",
    "    \"year\", \"state\", \"county_fips\", \"office\",\n",
    "    \"candidate\", \"version\", \"mode\", \"index\",\n",
    "], axis=\"columns\")\n",
    "\n",
    "# Change the column names to make them a little more friendly\n",
    "pol.rename({\n",
    "    \"county_name\": \"county\",\n",
    "    \"state_po\": \"state\",\n",
    "    \"candidatevotes\": \"votes\",\n",
    "    \"totalvotes\": \"total\"\n",
    "}, axis=\"columns\", inplace=True)\n",
    "\n",
    "# Make cells lowercase\n",
    "pol[\"county\"] = pol[\"county\"].apply(lambda x: x.capitalize())\n",
    "pol[\"party\"] = pol[\"party\"].apply(lambda x: x.capitalize())\n",
    "\n",
    "# Combine the county name with the state code\n",
    "def combine_name_state(row):\n",
    "    row[\"county\"] = f\"{row['county']} {row['state']}\"\n",
    "    return row\n",
    "\n",
    "pol = pol.apply(combine_name_state, axis=\"columns\")\n",
    "\n",
    "# Add a percent column which will be useful when graphing\n",
    "pol[\"percent\"] = pol[\"votes\"] / pol[\"total\"]\n",
    "\n",
    "# Attach long/lat data to each row\n",
    "pol = pol.merge(counties, left_on=\"county\", right_on=\"name\")\n",
    "\n",
    "# Now we can get rid of the state columns\n",
    "pol = pol.drop([\"state\", \"name\"], axis=\"columns\")\n",
    "\n",
    "pol.to_csv(\"../data/processed/election-2012.csv\", index=False)\n",
    "pol.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Tax_Mjoint</th>\n",
       "      <th>TaxRate_SS</th>\n",
       "      <th>TaxRate_FF</th>\n",
       "      <th>TaxRate_MM</th>\n",
       "      <th>Cns_RateSS</th>\n",
       "      <th>Cns_RateFF</th>\n",
       "      <th>Cns_RateMM</th>\n",
       "      <th>CountBars</th>\n",
       "      <th>FF_Index</th>\n",
       "      <th>MM_Index</th>\n",
       "      <th>SS_Index</th>\n",
       "      <th>TOTINDEX</th>\n",
       "      <th>lat</th>\n",
       "      <th>long</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2120</td>\n",
       "      <td>203.301887</td>\n",
       "      <td>28.773585</td>\n",
       "      <td>174.528302</td>\n",
       "      <td>77.125329</td>\n",
       "      <td>6.931719</td>\n",
       "      <td>70.193610</td>\n",
       "      <td>15</td>\n",
       "      <td>6.724415</td>\n",
       "      <td>48.288254</td>\n",
       "      <td>55.012669</td>\n",
       "      <td>67.077054</td>\n",
       "      <td>34.093828</td>\n",
       "      <td>-118.381697</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5080</td>\n",
       "      <td>205.511811</td>\n",
       "      <td>33.464567</td>\n",
       "      <td>172.047244</td>\n",
       "      <td>88.478367</td>\n",
       "      <td>15.617404</td>\n",
       "      <td>72.860963</td>\n",
       "      <td>17</td>\n",
       "      <td>9.834048</td>\n",
       "      <td>48.578469</td>\n",
       "      <td>58.412517</td>\n",
       "      <td>61.866815</td>\n",
       "      <td>37.758057</td>\n",
       "      <td>-122.435410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5790</td>\n",
       "      <td>107.772021</td>\n",
       "      <td>16.753022</td>\n",
       "      <td>91.018998</td>\n",
       "      <td>46.771050</td>\n",
       "      <td>5.745582</td>\n",
       "      <td>41.025469</td>\n",
       "      <td>5</td>\n",
       "      <td>4.370779</td>\n",
       "      <td>26.360413</td>\n",
       "      <td>30.731192</td>\n",
       "      <td>37.908747</td>\n",
       "      <td>40.742039</td>\n",
       "      <td>-74.000620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3510</td>\n",
       "      <td>80.056980</td>\n",
       "      <td>21.082621</td>\n",
       "      <td>58.974359</td>\n",
       "      <td>31.619291</td>\n",
       "      <td>9.315448</td>\n",
       "      <td>22.303843</td>\n",
       "      <td>10</td>\n",
       "      <td>6.055939</td>\n",
       "      <td>15.939869</td>\n",
       "      <td>21.995808</td>\n",
       "      <td>37.530067</td>\n",
       "      <td>40.734012</td>\n",
       "      <td>-74.006746</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2660</td>\n",
       "      <td>91.353383</td>\n",
       "      <td>12.781955</td>\n",
       "      <td>78.571429</td>\n",
       "      <td>21.763042</td>\n",
       "      <td>3.142678</td>\n",
       "      <td>18.620365</td>\n",
       "      <td>9</td>\n",
       "      <td>3.004058</td>\n",
       "      <td>18.280165</td>\n",
       "      <td>21.284224</td>\n",
       "      <td>35.843573</td>\n",
       "      <td>37.773134</td>\n",
       "      <td>-122.411167</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Tax_Mjoint  TaxRate_SS  TaxRate_FF  TaxRate_MM  Cns_RateSS  Cns_RateFF  \\\n",
       "0        2120  203.301887   28.773585  174.528302   77.125329    6.931719   \n",
       "1        5080  205.511811   33.464567  172.047244   88.478367   15.617404   \n",
       "2        5790  107.772021   16.753022   91.018998   46.771050    5.745582   \n",
       "3        3510   80.056980   21.082621   58.974359   31.619291    9.315448   \n",
       "4        2660   91.353383   12.781955   78.571429   21.763042    3.142678   \n",
       "\n",
       "   Cns_RateMM  CountBars  FF_Index   MM_Index   SS_Index   TOTINDEX  \\\n",
       "0   70.193610         15  6.724415  48.288254  55.012669  67.077054   \n",
       "1   72.860963         17  9.834048  48.578469  58.412517  61.866815   \n",
       "2   41.025469          5  4.370779  26.360413  30.731192  37.908747   \n",
       "3   22.303843         10  6.055939  15.939869  21.995808  37.530067   \n",
       "4   18.620365          9  3.004058  18.280165  21.284224  35.843573   \n",
       "\n",
       "         lat        long  \n",
       "0  34.093828 -118.381697  \n",
       "1  37.758057 -122.435410  \n",
       "2  40.742039  -74.000620  \n",
       "3  40.734012  -74.006746  \n",
       "4  37.773134 -122.411167  "
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## gb - the gaybourhoods dataset\n",
    "gb = pd.read_csv(\"../data/raw/gaybourhoods.csv\")\n",
    "cords = pd.read_csv(\"../data/raw/zip_lat_long.csv\")\n",
    "\n",
    "# Let's add long/lat columns to gb\n",
    "gb = gb.merge(cords, left_on=\"GEOID10\", right_on=\"ZIP\")\n",
    "\n",
    "# Get rid of unneeded columns\n",
    "gb = gb.drop([\n",
    "    \"Mjoint_MF\", \"Mjoint_SS\", \"Mjoint_FF\", \"Mjoint_MM\",\n",
    "    \"Cns_TotHH\", \"Cns_UPSS\", \"Cns_UPFF\", \"Cns_UPMM\",\n",
    "    \"ParadeFlag\", \"FF_Tax\", \"FF_Cns\", \"MM_Tax\", \"MM_Cns\",\n",
    "    \"SS_Index_Weight\", \"Parade_Weight\", \"Bars_Weight\",\n",
    "    \"GEOID10\", \"ZIP\",\n",
    "], axis=\"columns\")\n",
    "\n",
    "# There's a lot of info baked into some of these columns. Especially the composite indexes.\n",
    "# We'll leave their names as is for easy reference even if they're a little ugly.\n",
    "gb = gb.rename({\n",
    "    \"LAT\": \"lat\",\n",
    "    \"LNG\": \"long\",\n",
    "}, axis=\"columns\")\n",
    "\n",
    "gb.to_csv(\"../data/processed/gaybourhoods-nat.csv\")\n",
    "gb.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
Initial commit 2023-02-01 01:23:45 +00:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"metadata": {},`
Initial commit 2023-02-01 01:23:45 +00:00			`"source": [`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"# Nat Scott"`
			`]`
Initial commit 2023-02-01 01:23:45 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"metadata": {},`
Initial commit 2023-02-01 01:23:45 +00:00			`"source": [`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"## Research question/interests\n",`
			`"\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"Is there a correlation between political alignment & living in neighbourhoods with large quantities of LGBT people? The obvious answer to this question is \"yes, they are going to mostly be democrats\" but anyone who's ever been around queer people will know that this question is quite a bit more nuanced than that, and this nuance is what we hope to capture in investigating this question.\n",`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"\n",`
			"- The gaybourhoods data set does not include data on residents political alignments, however, there is a wealth of electoral data available freely online that we intend on incorporating into this project. The primary difficulty then will be developing a geographic \"compatibility layer\" between the data sets so that the data can be understood in the same context. To build this, we intend on working with the OpenStreetMap API to create an additional column representing observations position space in a more neutral way, such as their coordinates.\n",
			`"- Alternatively, we've also considered working with an additional data set that links US zip codes to their longitude and lattitude positions. As such, incorporating this data would be as easy as merging the two tables.\n",`
			`"\n",`
			`"\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"Is there a correlation between geographical stratums & being LGBT? This question is more abstract, and will serve as a preliminary exploration of the data in hopes of establishing two key details along the way that will shape the rest of the project: how do we quantify queerness, and how do we best represent it visually?\n",`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"\n",`
			`"- Once again, representing this data visually will require determining the coordinates associated with each observation.\n",`
			`"- The gaybourhoods data set defines a \"gaybourhood index\" which effectively measures how friendly a given neighbourhood is to queer people. Since this index is entirely subjective, we will need to closely evaluate it's usefulness for our project and investigate different ways to quantify \"queer-friendliness\"\n",`
			`"- In addition to the last point, since, of course, no matter what choice of observations we make, the measurement will still be subjective, answering this research question will come more so in the form of comparing and contrasting different measurements to see what they tell us.\n",`
Commit leftover changes 2023-02-16 00:31:05 +00:00			`"- Obviously, visualizing this among many aspects of the other research questions would involve projecting the data onto a map of the United States, so visualizing this research question would motivate many of the visualizations for other components of this project"`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`]`
			`},`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`{`
			`"cell_type": "code",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`"execution_count": 50,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import pandas as pd\n",`
			`"import seaborn as sns"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Data Wrangling"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 76,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/html": [`
			`"<div>\n",`
			`"<style scoped>\n",`
			`" .dataframe tbody tr th:only-of-type {\n",`
			`" vertical-align: middle;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe tbody tr th {\n",`
			`" vertical-align: top;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe thead th {\n",`
			`" text-align: right;\n",`
			`" }\n",`
			`"</style>\n",`
			`"<table border=\"1\" class=\"dataframe\">\n",`
			`" <thead>\n",`
			`" <tr style=\"text-align: right;\">\n",`
			`" <th></th>\n",`
			`" <th>name</th>\n",`
			`" <th>lat</th>\n",`
			`" <th>long</th>\n",`
			`" </tr>\n",`
			`" </thead>\n",`
			`" <tbody>\n",`
			`" <tr>\n",`
			`" <th>0</th>\n",`
			`" <td>Hancock OH</td>\n",`
			`" <td>41.000471</td>\n",`
			`" <td>-83.666033</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1</th>\n",`
			`" <td>Stafford VA</td>\n",`
			`" <td>38.413261</td>\n",`
			`" <td>-77.451334</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>2</th>\n",`
			`" <td>Webster NE</td>\n",`
			`" <td>40.180646</td>\n",`
			`" <td>-98.498590</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>3</th>\n",`
			`" <td>Dimmit TX</td>\n",`
			`" <td>28.423587</td>\n",`
			`" <td>-99.765871</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>4</th>\n",`
			`" <td>Cedar IA</td>\n",`
			`" <td>41.772360</td>\n",`
			`" <td>-91.132610</td>\n",`
			`" </tr>\n",`
			`" </tbody>\n",`
			`"</table>\n",`
			`"</div>"`
			`],`
			`"text/plain": [`
			`" name lat long\n",`
			`"0 Hancock OH 41.000471 -83.666033\n",`
			`"1 Stafford VA 38.413261 -77.451334\n",`
			`"2 Webster NE 40.180646 -98.498590\n",`
			`"3 Dimmit TX 28.423587 -99.765871\n",`
			`"4 Cedar IA 41.772360 -91.132610"`
			`]`
			`},`
			`"execution_count": 76,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": [`
			`"## counties - Relating US counties to their long/lat position on the Earth\n",`
			`"counties = pd.read_csv(\"../data/raw/us-county-boundaries.csv\", sep=\";\")\n",`
			`"\n",`
			`"counties = counties.rename({\n",`
			`" \"NAME\": \"name\",\n",`
			`" \"INTPTLAT\": \"lat\",\n",`
			`" \"INTPTLON\": \"long\",\n",`
			`"}, axis=\"columns\")\n",`
			`"\n",`
			`"# Combine the county name with the state code\n",`
			`"def combine_name_state(row):\n",`
			`" row[\"name\"] = f\"{row['name']} {row['STUSAB']}\"\n",`
			`" return row\n",`
			`"\n",`
			`"counties = counties.apply(combine_name_state, axis=\"columns\")\n",`
			`"\n",`
			`"# We don't need this column anymore\n",`
			`"counties = counties.drop([\"STUSAB\"], axis=\"columns\")\n",`
			`"\n",`
			`"counties.to_csv(\"../data/processed/us-county-boundaries.csv\")\n",`
			`"counties.head()"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 81,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/html": [`
			`"<div>\n",`
			`"<style scoped>\n",`
			`" .dataframe tbody tr th:only-of-type {\n",`
			`" vertical-align: middle;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe tbody tr th {\n",`
			`" vertical-align: top;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe thead th {\n",`
			`" text-align: right;\n",`
			`" }\n",`
			`"</style>\n",`
			`"<table border=\"1\" class=\"dataframe\">\n",`
			`" <thead>\n",`
			`" <tr style=\"text-align: right;\">\n",`
			`" <th></th>\n",`
			`" <th>county</th>\n",`
			`" <th>party</th>\n",`
			`" <th>votes</th>\n",`
			`" <th>total</th>\n",`
			`" <th>percent</th>\n",`
			`" <th>lat</th>\n",`
			`" <th>long</th>\n",`
			`" </tr>\n",`
			`" </thead>\n",`
			`" <tbody>\n",`
			`" <tr>\n",`
			`" <th>0</th>\n",`
			`" <td>Autauga AL</td>\n",`
			`" <td>Democrat</td>\n",`
			`" <td>6363</td>\n",`
			`" <td>23932</td>\n",`
			`" <td>0.265878</td>\n",`
			`" <td>32.532237</td>\n",`
			`" <td>-86.646439</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1</th>\n",`
			`" <td>Autauga AL</td>\n",`
			`" <td>Republican</td>\n",`
			`" <td>17379</td>\n",`
			`" <td>23932</td>\n",`
			`" <td>0.726183</td>\n",`
			`" <td>32.532237</td>\n",`
			`" <td>-86.646439</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>2</th>\n",`
			`" <td>Autauga AL</td>\n",`
			`" <td>Other</td>\n",`
			`" <td>190</td>\n",`
			`" <td>23932</td>\n",`
			`" <td>0.007939</td>\n",`
			`" <td>32.532237</td>\n",`
			`" <td>-86.646439</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>3</th>\n",`
			`" <td>Baldwin AL</td>\n",`
			`" <td>Democrat</td>\n",`
			`" <td>18424</td>\n",`
			`" <td>85338</td>\n",`
			`" <td>0.215894</td>\n",`
			`" <td>30.659218</td>\n",`
			`" <td>-87.746067</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>4</th>\n",`
			`" <td>Baldwin AL</td>\n",`
			`" <td>Republican</td>\n",`
			`" <td>66016</td>\n",`
			`" <td>85338</td>\n",`
			`" <td>0.773583</td>\n",`
			`" <td>30.659218</td>\n",`
			`" <td>-87.746067</td>\n",`
			`" </tr>\n",`
			`" </tbody>\n",`
			`"</table>\n",`
			`"</div>"`
			`],`
			`"text/plain": [`
			`" county party votes total percent lat long\n",`
			`"0 Autauga AL Democrat 6363 23932 0.265878 32.532237 -86.646439\n",`
			`"1 Autauga AL Republican 17379 23932 0.726183 32.532237 -86.646439\n",`
			`"2 Autauga AL Other 190 23932 0.007939 32.532237 -86.646439\n",`
			`"3 Baldwin AL Democrat 18424 85338 0.215894 30.659218 -87.746067\n",`
			`"4 Baldwin AL Republican 66016 85338 0.773583 30.659218 -87.746067"`
			`]`
			`},`
			`"execution_count": 81,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": [`
			`"## pol - Election results from the 2012 American presidential election\n",`
			`"pol = pd.read_csv(\"../data/raw/countypres_2000-2020.csv\")\n",`
			`"\n",`
			`"# We only want 2012--the latest election before the gb data was collected\n",`
			`"\n",`
			`"pol = pol[pol[\"year\"] == 2012].reset_index()\n",`
			`"\n",`
			`"# Get rid of undesireable columns\n",`
			`"pol = pol.drop([\n",`
			`" \"year\", \"state\", \"county_fips\", \"office\",\n",`
			`" \"candidate\", \"version\", \"mode\", \"index\",\n",`
			`"], axis=\"columns\")\n",`
			`"\n",`
			`"# Change the column names to make them a little more friendly\n",`
			`"pol.rename({\n",`
			`" \"county_name\": \"county\",\n",`
			`" \"state_po\": \"state\",\n",`
			`" \"candidatevotes\": \"votes\",\n",`
			`" \"totalvotes\": \"total\"\n",`
			`"}, axis=\"columns\", inplace=True)\n",`
			`"\n",`
			`"# Make cells lowercase\n",`
			`"pol[\"county\"] = pol[\"county\"].apply(lambda x: x.capitalize())\n",`
			`"pol[\"party\"] = pol[\"party\"].apply(lambda x: x.capitalize())\n",`
			`"\n",`
			`"# Combine the county name with the state code\n",`
			`"def combine_name_state(row):\n",`
			`" row[\"county\"] = f\"{row['county']} {row['state']}\"\n",`
			`" return row\n",`
			`"\n",`
			`"pol = pol.apply(combine_name_state, axis=\"columns\")\n",`
			`"\n",`
			`"# Add a percent column which will be useful when graphing\n",`
			`"pol[\"percent\"] = pol[\"votes\"] / pol[\"total\"]\n",`
			`"\n",`
			`"# Attach long/lat data to each row\n",`
			`"pol = pol.merge(counties, left_on=\"county\", right_on=\"name\")\n",`
			`"\n",`
			`"# Now we can get rid of the state columns\n",`
			`"pol = pol.drop([\"state\", \"name\"], axis=\"columns\")\n",`
			`"\n",`
			`"pol.to_csv(\"../data/processed/election-2012.csv\", index=False)\n",`
			`"pol.head()"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 87,`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/html": [`
			`"<div>\n",`
			`"<style scoped>\n",`
			`" .dataframe tbody tr th:only-of-type {\n",`
			`" vertical-align: middle;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe tbody tr th {\n",`
			`" vertical-align: top;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe thead th {\n",`
			`" text-align: right;\n",`
			`" }\n",`
			`"</style>\n",`
			`"<table border=\"1\" class=\"dataframe\">\n",`
			`" <thead>\n",`
			`" <tr style=\"text-align: right;\">\n",`
			`" <th></th>\n",`
			`" <th>Tax_Mjoint</th>\n",`
			`" <th>TaxRate_SS</th>\n",`
			`" <th>TaxRate_FF</th>\n",`
			`" <th>TaxRate_MM</th>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <th>Cns_RateSS</th>\n",`
			`" <th>Cns_RateFF</th>\n",`
			`" <th>Cns_RateMM</th>\n",`
			`" <th>CountBars</th>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" <th>FF_Index</th>\n",`
			`" <th>MM_Index</th>\n",`
			`" <th>SS_Index</th>\n",`
			`" <th>TOTINDEX</th>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <th>lat</th>\n",`
			`" <th>long</th>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" </tr>\n",`
			`" </thead>\n",`
			`" <tbody>\n",`
			`" <tr>\n",`
			`" <th>0</th>\n",`
			`" <td>2120</td>\n",`
			`" <td>203.301887</td>\n",`
			`" <td>28.773585</td>\n",`
			`" <td>174.528302</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>77.125329</td>\n",`
			`" <td>6.931719</td>\n",`
			`" <td>70.193610</td>\n",`
			`" <td>15</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" <td>6.724415</td>\n",`
			`" <td>48.288254</td>\n",`
			`" <td>55.012669</td>\n",`
			`" <td>67.077054</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>34.093828</td>\n",`
			`" <td>-118.381697</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1</th>\n",`
			`" <td>5080</td>\n",`
			`" <td>205.511811</td>\n",`
			`" <td>33.464567</td>\n",`
			`" <td>172.047244</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>88.478367</td>\n",`
			`" <td>15.617404</td>\n",`
			`" <td>72.860963</td>\n",`
			`" <td>17</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" <td>9.834048</td>\n",`
			`" <td>48.578469</td>\n",`
			`" <td>58.412517</td>\n",`
			`" <td>61.866815</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>37.758057</td>\n",`
			`" <td>-122.435410</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>2</th>\n",`
			`" <td>5790</td>\n",`
			`" <td>107.772021</td>\n",`
			`" <td>16.753022</td>\n",`
			`" <td>91.018998</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>46.771050</td>\n",`
			`" <td>5.745582</td>\n",`
			`" <td>41.025469</td>\n",`
			`" <td>5</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" <td>4.370779</td>\n",`
			`" <td>26.360413</td>\n",`
			`" <td>30.731192</td>\n",`
			`" <td>37.908747</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>40.742039</td>\n",`
			`" <td>-74.000620</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>3</th>\n",`
			`" <td>3510</td>\n",`
			`" <td>80.056980</td>\n",`
			`" <td>21.082621</td>\n",`
			`" <td>58.974359</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>31.619291</td>\n",`
			`" <td>9.315448</td>\n",`
			`" <td>22.303843</td>\n",`
			`" <td>10</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" <td>6.055939</td>\n",`
			`" <td>15.939869</td>\n",`
			`" <td>21.995808</td>\n",`
			`" <td>37.530067</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>40.734012</td>\n",`
			`" <td>-74.006746</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>4</th>\n",`
			`" <td>2660</td>\n",`
			`" <td>91.353383</td>\n",`
			`" <td>12.781955</td>\n",`
			`" <td>78.571429</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>21.763042</td>\n",`
			`" <td>3.142678</td>\n",`
			`" <td>18.620365</td>\n",`
			`" <td>9</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" <td>3.004058</td>\n",`
			`" <td>18.280165</td>\n",`
			`" <td>21.284224</td>\n",`
			`" <td>35.843573</td>\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" <td>37.773134</td>\n",`
			`" <td>-122.411167</td>\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`" </tr>\n",`
			`" </tbody>\n",`
			`"</table>\n",`
			`"</div>"`
			`],`
			`"text/plain": [`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" Tax_Mjoint TaxRate_SS TaxRate_FF TaxRate_MM Cns_RateSS Cns_RateFF \\\n",`
			`"0 2120 203.301887 28.773585 174.528302 77.125329 6.931719 \n",`
			`"1 5080 205.511811 33.464567 172.047244 88.478367 15.617404 \n",`
			`"2 5790 107.772021 16.753022 91.018998 46.771050 5.745582 \n",`
			`"3 3510 80.056980 21.082621 58.974359 31.619291 9.315448 \n",`
			`"4 2660 91.353383 12.781955 78.571429 21.763042 3.142678 \n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" Cns_RateMM CountBars FF_Index MM_Index SS_Index TOTINDEX \\\n",`
			`"0 70.193610 15 6.724415 48.288254 55.012669 67.077054 \n",`
			`"1 72.860963 17 9.834048 48.578469 58.412517 61.866815 \n",`
			`"2 41.025469 5 4.370779 26.360413 30.731192 37.908747 \n",`
			`"3 22.303843 10 6.055939 15.939869 21.995808 37.530067 \n",`
			`"4 18.620365 9 3.004058 18.280165 21.284224 35.843573 \n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`" lat long \n",`
			`"0 34.093828 -118.381697 \n",`
			`"1 37.758057 -122.435410 \n",`
			`"2 40.742039 -74.000620 \n",`
			`"3 40.734012 -74.006746 \n",`
			`"4 37.773134 -122.411167 "`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`]`
			`},`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`"execution_count": 87,`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": [`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`"## gb - the gaybourhoods dataset\n",`
			`"gb = pd.read_csv(\"../data/raw/gaybourhoods.csv\")\n",`
			`"cords = pd.read_csv(\"../data/raw/zip_lat_long.csv\")\n",`
			`"\n",`
			`"# Let's add long/lat columns to gb\n",`
			`"gb = gb.merge(cords, left_on=\"GEOID10\", right_on=\"ZIP\")\n",`
			`"\n",`
			`"# Get rid of unneeded columns\n",`
			`"gb = gb.drop([\n",`
			`" \"Mjoint_MF\", \"Mjoint_SS\", \"Mjoint_FF\", \"Mjoint_MM\",\n",`
			`" \"Cns_TotHH\", \"Cns_UPSS\", \"Cns_UPFF\", \"Cns_UPMM\",\n",`
			`" \"ParadeFlag\", \"FF_Tax\", \"FF_Cns\", \"MM_Tax\", \"MM_Cns\",\n",`
			`" \"SS_Index_Weight\", \"Parade_Weight\", \"Bars_Weight\",\n",`
			`" \"GEOID10\", \"ZIP\",\n",`
			`"], axis=\"columns\")\n",`
			`"\n",`
			`"# There's a lot of info baked into some of these columns. Especially the composite indexes.\n",`
			`"# We'll leave their names as is for easy reference even if they're a little ugly.\n",`
			`"gb = gb.rename({\n",`
			`" \"LAT\": \"lat\",\n",`
			`" \"LNG\": \"long\",\n",`
			`"}, axis=\"columns\")\n",`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`"\n",`
Perform preliminary data wrangling for #9 2023-03-02 04:25:34 +00:00			`"gb.to_csv(\"../data/processed/gaybourhoods-nat.csv\")\n",`
			`"gb.head()"`
Finalize Milestone 2 modifications, in line with checklist 2023-02-16 00:53:33 +00:00			`]`
Initial commit 2023-02-01 01:23:45 +00:00			`}`
			`],`
			`"metadata": {`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
Initial commit 2023-02-01 01:23:45 +00:00			`"language_info": {`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.10.9"`
Initial commit 2023-02-01 01:23:45 +00:00			`}`
			`},`
			`"nbformat": 4,`
Complete work for Milestone 2 2023-02-16 00:29:26 +00:00			`"nbformat_minor": 4`
			`}`