517 lines
18 KiB
Plaintext
517 lines
18 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Sami Almuallim"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Research question/interests\n",
|
||
"\n",
|
||
"**How are the different metrics of pride represented in this data set correlated?** Answering this question will provide a foundation upon which we can work to answer the more complicated questions that follow.\n",
|
||
"\n",
|
||
"- This will probably be the simplest research question, requiring only the data contained in our original data set. To explore this topic, we will use different visualization methods discussed in class to develop a better understanding of the data.\n",
|
||
"\n",
|
||
"**Is there a positive or a negative correlation between taxes paid and the pride of a given queer neighbourhood?** Taxes are influenced by a variety of socio-economic factors and we hope that in analyzing both tax data and our quantification of queerness on a geographic level, we'll be able to gleam insight into the question of how queerness and class are interrelated.\n",
|
||
"\n",
|
||
"- Similar again to the first research question posed, we'll need to find another data set containing geographically located tax data, which should be easy to acquire from the US government (for example, [in our cursory research, we found this data set from the IRS](https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-2018-zip-code-data-soi)).\n",
|
||
"- This would bring the number of data sets used in this project up to three, which might present some challenges in terms of the amount of data wrangling necessary to bring it all together.\n",
|
||
"- To measure this, we would rank the neighbourhoods presented in the gaybourhoods data set by pride (an open question which we will explore in a separate research question)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>GEOID10</th>\n",
|
||
" <th>Tax_Mjoint</th>\n",
|
||
" <th>Mjoint_MF</th>\n",
|
||
" <th>Mjoint_SS</th>\n",
|
||
" <th>Mjoint_FF</th>\n",
|
||
" <th>Mjoint_MM</th>\n",
|
||
" <th>TaxRate_SS</th>\n",
|
||
" <th>TaxRate_FF</th>\n",
|
||
" <th>TaxRate_MM</th>\n",
|
||
" <th>Cns_TotHH</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>FF_Cns</th>\n",
|
||
" <th>FF_Index</th>\n",
|
||
" <th>MM_Tax</th>\n",
|
||
" <th>MM_Cns</th>\n",
|
||
" <th>MM_Index</th>\n",
|
||
" <th>SS_Index</th>\n",
|
||
" <th>SS_Index_Weight</th>\n",
|
||
" <th>Parade_Weight</th>\n",
|
||
" <th>Bars_Weight</th>\n",
|
||
" <th>TOTINDEX</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>90069</td>\n",
|
||
" <td>2120</td>\n",
|
||
" <td>1689</td>\n",
|
||
" <td>431</td>\n",
|
||
" <td>61</td>\n",
|
||
" <td>370</td>\n",
|
||
" <td>203.301887</td>\n",
|
||
" <td>28.773585</td>\n",
|
||
" <td>174.528302</td>\n",
|
||
" <td>12551</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>1.847099</td>\n",
|
||
" <td>6.724415</td>\n",
|
||
" <td>29.583721</td>\n",
|
||
" <td>18.704533</td>\n",
|
||
" <td>48.288254</td>\n",
|
||
" <td>55.012669</td>\n",
|
||
" <td>39.429995</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>17.647059</td>\n",
|
||
" <td>67.077054</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>94114</td>\n",
|
||
" <td>5080</td>\n",
|
||
" <td>4036</td>\n",
|
||
" <td>1044</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>874</td>\n",
|
||
" <td>205.511811</td>\n",
|
||
" <td>33.464567</td>\n",
|
||
" <td>172.047244</td>\n",
|
||
" <td>16456</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>4.161579</td>\n",
|
||
" <td>9.834048</td>\n",
|
||
" <td>29.163165</td>\n",
|
||
" <td>19.415304</td>\n",
|
||
" <td>48.578469</td>\n",
|
||
" <td>58.412517</td>\n",
|
||
" <td>41.866815</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>20.000000</td>\n",
|
||
" <td>61.866815</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>10011</td>\n",
|
||
" <td>5790</td>\n",
|
||
" <td>5166</td>\n",
|
||
" <td>624</td>\n",
|
||
" <td>97</td>\n",
|
||
" <td>527</td>\n",
|
||
" <td>107.772021</td>\n",
|
||
" <td>16.753022</td>\n",
|
||
" <td>91.018998</td>\n",
|
||
" <td>29762</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>1.531029</td>\n",
|
||
" <td>4.370779</td>\n",
|
||
" <td>15.428332</td>\n",
|
||
" <td>10.932081</td>\n",
|
||
" <td>26.360413</td>\n",
|
||
" <td>30.731192</td>\n",
|
||
" <td>22.026394</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>5.882353</td>\n",
|
||
" <td>37.908747</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>10014</td>\n",
|
||
" <td>3510</td>\n",
|
||
" <td>3229</td>\n",
|
||
" <td>281</td>\n",
|
||
" <td>74</td>\n",
|
||
" <td>207</td>\n",
|
||
" <td>80.056980</td>\n",
|
||
" <td>21.082621</td>\n",
|
||
" <td>58.974359</td>\n",
|
||
" <td>18786</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>2.482293</td>\n",
|
||
" <td>6.055939</td>\n",
|
||
" <td>9.996551</td>\n",
|
||
" <td>5.943318</td>\n",
|
||
" <td>15.939869</td>\n",
|
||
" <td>21.995808</td>\n",
|
||
" <td>15.765361</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>11.764706</td>\n",
|
||
" <td>37.530067</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>94103</td>\n",
|
||
" <td>2660</td>\n",
|
||
" <td>2417</td>\n",
|
||
" <td>243</td>\n",
|
||
" <td>34</td>\n",
|
||
" <td>209</td>\n",
|
||
" <td>91.353383</td>\n",
|
||
" <td>12.781955</td>\n",
|
||
" <td>78.571429</td>\n",
|
||
" <td>12728</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0.837431</td>\n",
|
||
" <td>3.004058</td>\n",
|
||
" <td>13.318386</td>\n",
|
||
" <td>4.961779</td>\n",
|
||
" <td>18.280165</td>\n",
|
||
" <td>21.284224</td>\n",
|
||
" <td>15.255337</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>10.588235</td>\n",
|
||
" <td>35.843573</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>5 rows × 29 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" GEOID10 Tax_Mjoint Mjoint_MF Mjoint_SS Mjoint_FF Mjoint_MM \\\n",
|
||
"0 90069 2120 1689 431 61 370 \n",
|
||
"1 94114 5080 4036 1044 170 874 \n",
|
||
"2 10011 5790 5166 624 97 527 \n",
|
||
"3 10014 3510 3229 281 74 207 \n",
|
||
"4 94103 2660 2417 243 34 209 \n",
|
||
"\n",
|
||
" TaxRate_SS TaxRate_FF TaxRate_MM Cns_TotHH ... FF_Cns FF_Index \\\n",
|
||
"0 203.301887 28.773585 174.528302 12551 ... 1.847099 6.724415 \n",
|
||
"1 205.511811 33.464567 172.047244 16456 ... 4.161579 9.834048 \n",
|
||
"2 107.772021 16.753022 91.018998 29762 ... 1.531029 4.370779 \n",
|
||
"3 80.056980 21.082621 58.974359 18786 ... 2.482293 6.055939 \n",
|
||
"4 91.353383 12.781955 78.571429 12728 ... 0.837431 3.004058 \n",
|
||
"\n",
|
||
" MM_Tax MM_Cns MM_Index SS_Index SS_Index_Weight Parade_Weight \\\n",
|
||
"0 29.583721 18.704533 48.288254 55.012669 39.429995 10 \n",
|
||
"1 29.163165 19.415304 48.578469 58.412517 41.866815 0 \n",
|
||
"2 15.428332 10.932081 26.360413 30.731192 22.026394 10 \n",
|
||
"3 9.996551 5.943318 15.939869 21.995808 15.765361 10 \n",
|
||
"4 13.318386 4.961779 18.280165 21.284224 15.255337 10 \n",
|
||
"\n",
|
||
" Bars_Weight TOTINDEX \n",
|
||
"0 17.647059 67.077054 \n",
|
||
"1 20.000000 61.866815 \n",
|
||
"2 5.882353 37.908747 \n",
|
||
"3 11.764706 37.530067 \n",
|
||
"4 10.588235 35.843573 \n",
|
||
"\n",
|
||
"[5 rows x 29 columns]"
|
||
]
|
||
},
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"gaybourhoods = pd.read_csv(\"../data/raw/gaybourhoods.csv\")\n",
|
||
"gaybourhoods.head(5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Data wrangling"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# NOTE: This cell will not work unless this file is in the repository. The source\n",
|
||
"# can be found linked in the references section of the readme, however, it is too\n",
|
||
"# big for GitHub to handle.\n",
|
||
"irs = pd.read_csv(\"../data/raw/irs_2015.csv\")\n",
|
||
"\n",
|
||
"# Naively splitting the IRS data set in two. More formal data wrangling will\n",
|
||
"# come later\n",
|
||
"irs1 = irs.head(int(irs.shape[0] / 2))\n",
|
||
"irs2 = irs.tail(int(irs.shape[0] / 2))\n",
|
||
"\n",
|
||
"irs1.to_csv(\"../data/processed/irs_2015_1\", index=False)\n",
|
||
"irs2.to_csv(\"../data/processed/irs_2015_2\", index=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>STATEFIPS</th>\n",
|
||
" <th>STATE</th>\n",
|
||
" <th>zipcode</th>\n",
|
||
" <th>agi_stub</th>\n",
|
||
" <th>N1</th>\n",
|
||
" <th>mars1</th>\n",
|
||
" <th>MARS2</th>\n",
|
||
" <th>MARS4</th>\n",
|
||
" <th>PREP</th>\n",
|
||
" <th>N2</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>N10300</th>\n",
|
||
" <th>A10300</th>\n",
|
||
" <th>N85530</th>\n",
|
||
" <th>A85530</th>\n",
|
||
" <th>N85300</th>\n",
|
||
" <th>A85300</th>\n",
|
||
" <th>N11901</th>\n",
|
||
" <th>A11901</th>\n",
|
||
" <th>N11902</th>\n",
|
||
" <th>A11902</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>AL</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>836320.0</td>\n",
|
||
" <td>481570.0</td>\n",
|
||
" <td>109790.0</td>\n",
|
||
" <td>233260.0</td>\n",
|
||
" <td>455560.0</td>\n",
|
||
" <td>1356760.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>373410.0</td>\n",
|
||
" <td>328469.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>61920.0</td>\n",
|
||
" <td>48150.0</td>\n",
|
||
" <td>732670.0</td>\n",
|
||
" <td>1933120.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>AL</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>494830.0</td>\n",
|
||
" <td>206630.0</td>\n",
|
||
" <td>146250.0</td>\n",
|
||
" <td>129390.0</td>\n",
|
||
" <td>275920.0</td>\n",
|
||
" <td>1010990.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>395880.0</td>\n",
|
||
" <td>965011.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>73720.0</td>\n",
|
||
" <td>107304.0</td>\n",
|
||
" <td>415410.0</td>\n",
|
||
" <td>1187403.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>AL</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>261250.0</td>\n",
|
||
" <td>80720.0</td>\n",
|
||
" <td>139280.0</td>\n",
|
||
" <td>36130.0</td>\n",
|
||
" <td>155100.0</td>\n",
|
||
" <td>583910.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>251490.0</td>\n",
|
||
" <td>1333418.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>64200.0</td>\n",
|
||
" <td>139598.0</td>\n",
|
||
" <td>193030.0</td>\n",
|
||
" <td>536699.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>AL</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>166690.0</td>\n",
|
||
" <td>28510.0</td>\n",
|
||
" <td>124650.0</td>\n",
|
||
" <td>10630.0</td>\n",
|
||
" <td>99950.0</td>\n",
|
||
" <td>423990.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>165320.0</td>\n",
|
||
" <td>1414283.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>45460.0</td>\n",
|
||
" <td>128823.0</td>\n",
|
||
" <td>116440.0</td>\n",
|
||
" <td>377177.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>AL</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>212660.0</td>\n",
|
||
" <td>19520.0</td>\n",
|
||
" <td>184320.0</td>\n",
|
||
" <td>4830.0</td>\n",
|
||
" <td>126860.0</td>\n",
|
||
" <td>589490.0</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>212000.0</td>\n",
|
||
" <td>3820152.0</td>\n",
|
||
" <td>420.0</td>\n",
|
||
" <td>168.0</td>\n",
|
||
" <td>60.0</td>\n",
|
||
" <td>31.0</td>\n",
|
||
" <td>83330.0</td>\n",
|
||
" <td>421004.0</td>\n",
|
||
" <td>121570.0</td>\n",
|
||
" <td>483682.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>5 rows × 131 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" STATEFIPS STATE zipcode agi_stub N1 mars1 MARS2 MARS4 \\\n",
|
||
"0 1 AL 0 1 836320.0 481570.0 109790.0 233260.0 \n",
|
||
"1 1 AL 0 2 494830.0 206630.0 146250.0 129390.0 \n",
|
||
"2 1 AL 0 3 261250.0 80720.0 139280.0 36130.0 \n",
|
||
"3 1 AL 0 4 166690.0 28510.0 124650.0 10630.0 \n",
|
||
"4 1 AL 0 5 212660.0 19520.0 184320.0 4830.0 \n",
|
||
"\n",
|
||
" PREP N2 ... N10300 A10300 N85530 A85530 N85300 \\\n",
|
||
"0 455560.0 1356760.0 ... 373410.0 328469.0 0.0 0.0 0.0 \n",
|
||
"1 275920.0 1010990.0 ... 395880.0 965011.0 0.0 0.0 0.0 \n",
|
||
"2 155100.0 583910.0 ... 251490.0 1333418.0 0.0 0.0 0.0 \n",
|
||
"3 99950.0 423990.0 ... 165320.0 1414283.0 0.0 0.0 0.0 \n",
|
||
"4 126860.0 589490.0 ... 212000.0 3820152.0 420.0 168.0 60.0 \n",
|
||
"\n",
|
||
" A85300 N11901 A11901 N11902 A11902 \n",
|
||
"0 0.0 61920.0 48150.0 732670.0 1933120.0 \n",
|
||
"1 0.0 73720.0 107304.0 415410.0 1187403.0 \n",
|
||
"2 0.0 64200.0 139598.0 193030.0 536699.0 \n",
|
||
"3 0.0 45460.0 128823.0 116440.0 377177.0 \n",
|
||
"4 31.0 83330.0 421004.0 121570.0 483682.0 \n",
|
||
"\n",
|
||
"[5 rows x 131 columns]"
|
||
]
|
||
},
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Now these two datasets can be joined and worked with\n",
|
||
"irs = pd.concat([\n",
|
||
" pd.read_csv(\"../data/processed/irs_2015_1\"),\n",
|
||
" pd.read_csv(\"../data/processed/irs_2015_2\")\n",
|
||
"])\n",
|
||
"irs.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.9"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|