pride-data-analysis/analysis/analysis1.ipynb

279 lines
11 KiB
Plaintext
Raw Normal View History

2023-02-01 01:23:45 +00:00
{
"cells": [
{
"cell_type": "markdown",
2023-02-16 00:29:26 +00:00
"metadata": {},
2023-02-01 01:23:45 +00:00
"source": [
2023-02-16 00:29:26 +00:00
"# Nat Scott"
]
2023-02-01 01:23:45 +00:00
},
{
"cell_type": "markdown",
2023-02-16 00:29:26 +00:00
"metadata": {},
2023-02-01 01:23:45 +00:00
"source": [
2023-02-16 00:29:26 +00:00
"## Research question/interests\n",
"\n",
"**Is there a correlation between political alignment & living in neighbourhoods with large quantities of LGBT people?** The obvious answer to this question is \"yes, they are going to mostly be democrats\" but anyone who's ever been around queer people will know that this question is quite a bit more nuanced than that, and this nuance is what we hope to capture in investigating this question.\n",
2023-02-16 00:29:26 +00:00
"\n",
"- The gaybourhoods data set does not include data on residents political alignments, however, there is a wealth of electoral data available freely online that we intend on incorporating into this project. The primary difficulty then will be developing a geographic \"compatibility layer\" between the data sets so that the data can be understood in the same context. To build this, we intend on working with the OpenStreetMap API to create an additional column representing observations position space in a more neutral way, such as their coordinates.\n",
"- Alternatively, we've also considered working with an additional data set that links US zip codes to their longitude and lattitude positions. As such, incorporating this data would be as easy as merging the two tables.\n",
"\n",
"\n",
"**Is there a correlation between geographical stratums & being LGBT?** This question is more abstract, and will serve as a preliminary exploration of the data in hopes of establishing two key details along the way that will shape the rest of the project: how do we quantify queerness, and how do we best represent it visually?\n",
2023-02-16 00:29:26 +00:00
"\n",
"- Once again, representing this data visually will require determining the coordinates associated with each observation.\n",
"- The gaybourhoods data set defines a \"gaybourhood index\" which effectively measures how friendly a given neighbourhood is to queer people. Since this index is entirely subjective, we will need to closely evaluate it's usefulness for our project and investigate different ways to quantify \"queer-friendliness\"\n",
"- In addition to the last point, since, of course, no matter what choice of observations we make, the measurement will still be subjective, answering this research question will come more so in the form of comparing and contrasting different measurements to see what they tell us.\n",
2023-02-16 00:31:05 +00:00
"- Obviously, visualizing this among many aspects of the other research questions would involve projecting the data onto a map of the United States, so visualizing this research question would motivate many of the visualizations for other components of this project"
2023-02-16 00:29:26 +00:00
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>GEOID10</th>\n",
" <th>Tax_Mjoint</th>\n",
" <th>Mjoint_MF</th>\n",
" <th>Mjoint_SS</th>\n",
" <th>Mjoint_FF</th>\n",
" <th>Mjoint_MM</th>\n",
" <th>TaxRate_SS</th>\n",
" <th>TaxRate_FF</th>\n",
" <th>TaxRate_MM</th>\n",
" <th>Cns_TotHH</th>\n",
" <th>...</th>\n",
" <th>FF_Cns</th>\n",
" <th>FF_Index</th>\n",
" <th>MM_Tax</th>\n",
" <th>MM_Cns</th>\n",
" <th>MM_Index</th>\n",
" <th>SS_Index</th>\n",
" <th>SS_Index_Weight</th>\n",
" <th>Parade_Weight</th>\n",
" <th>Bars_Weight</th>\n",
" <th>TOTINDEX</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>90069</td>\n",
" <td>2120</td>\n",
" <td>1689</td>\n",
" <td>431</td>\n",
" <td>61</td>\n",
" <td>370</td>\n",
" <td>203.301887</td>\n",
" <td>28.773585</td>\n",
" <td>174.528302</td>\n",
" <td>12551</td>\n",
" <td>...</td>\n",
" <td>1.847099</td>\n",
" <td>6.724415</td>\n",
" <td>29.583721</td>\n",
" <td>18.704533</td>\n",
" <td>48.288254</td>\n",
" <td>55.012669</td>\n",
" <td>39.429995</td>\n",
" <td>10</td>\n",
" <td>17.647059</td>\n",
" <td>67.077054</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>94114</td>\n",
" <td>5080</td>\n",
" <td>4036</td>\n",
" <td>1044</td>\n",
" <td>170</td>\n",
" <td>874</td>\n",
" <td>205.511811</td>\n",
" <td>33.464567</td>\n",
" <td>172.047244</td>\n",
" <td>16456</td>\n",
" <td>...</td>\n",
" <td>4.161579</td>\n",
" <td>9.834048</td>\n",
" <td>29.163165</td>\n",
" <td>19.415304</td>\n",
" <td>48.578469</td>\n",
" <td>58.412517</td>\n",
" <td>41.866815</td>\n",
" <td>0</td>\n",
" <td>20.000000</td>\n",
" <td>61.866815</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>10011</td>\n",
" <td>5790</td>\n",
" <td>5166</td>\n",
" <td>624</td>\n",
" <td>97</td>\n",
" <td>527</td>\n",
" <td>107.772021</td>\n",
" <td>16.753022</td>\n",
" <td>91.018998</td>\n",
" <td>29762</td>\n",
" <td>...</td>\n",
" <td>1.531029</td>\n",
" <td>4.370779</td>\n",
" <td>15.428332</td>\n",
" <td>10.932081</td>\n",
" <td>26.360413</td>\n",
" <td>30.731192</td>\n",
" <td>22.026394</td>\n",
" <td>10</td>\n",
" <td>5.882353</td>\n",
" <td>37.908747</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>10014</td>\n",
" <td>3510</td>\n",
" <td>3229</td>\n",
" <td>281</td>\n",
" <td>74</td>\n",
" <td>207</td>\n",
" <td>80.056980</td>\n",
" <td>21.082621</td>\n",
" <td>58.974359</td>\n",
" <td>18786</td>\n",
" <td>...</td>\n",
" <td>2.482293</td>\n",
" <td>6.055939</td>\n",
" <td>9.996551</td>\n",
" <td>5.943318</td>\n",
" <td>15.939869</td>\n",
" <td>21.995808</td>\n",
" <td>15.765361</td>\n",
" <td>10</td>\n",
" <td>11.764706</td>\n",
" <td>37.530067</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>94103</td>\n",
" <td>2660</td>\n",
" <td>2417</td>\n",
" <td>243</td>\n",
" <td>34</td>\n",
" <td>209</td>\n",
" <td>91.353383</td>\n",
" <td>12.781955</td>\n",
" <td>78.571429</td>\n",
" <td>12728</td>\n",
" <td>...</td>\n",
" <td>0.837431</td>\n",
" <td>3.004058</td>\n",
" <td>13.318386</td>\n",
" <td>4.961779</td>\n",
" <td>18.280165</td>\n",
" <td>21.284224</td>\n",
" <td>15.255337</td>\n",
" <td>10</td>\n",
" <td>10.588235</td>\n",
" <td>35.843573</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 29 columns</p>\n",
"</div>"
],
"text/plain": [
" GEOID10 Tax_Mjoint Mjoint_MF Mjoint_SS Mjoint_FF Mjoint_MM \\\n",
"0 90069 2120 1689 431 61 370 \n",
"1 94114 5080 4036 1044 170 874 \n",
"2 10011 5790 5166 624 97 527 \n",
"3 10014 3510 3229 281 74 207 \n",
"4 94103 2660 2417 243 34 209 \n",
"\n",
" TaxRate_SS TaxRate_FF TaxRate_MM Cns_TotHH ... FF_Cns FF_Index \\\n",
"0 203.301887 28.773585 174.528302 12551 ... 1.847099 6.724415 \n",
"1 205.511811 33.464567 172.047244 16456 ... 4.161579 9.834048 \n",
"2 107.772021 16.753022 91.018998 29762 ... 1.531029 4.370779 \n",
"3 80.056980 21.082621 58.974359 18786 ... 2.482293 6.055939 \n",
"4 91.353383 12.781955 78.571429 12728 ... 0.837431 3.004058 \n",
"\n",
" MM_Tax MM_Cns MM_Index SS_Index SS_Index_Weight Parade_Weight \\\n",
"0 29.583721 18.704533 48.288254 55.012669 39.429995 10 \n",
"1 29.163165 19.415304 48.578469 58.412517 41.866815 0 \n",
"2 15.428332 10.932081 26.360413 30.731192 22.026394 10 \n",
"3 9.996551 5.943318 15.939869 21.995808 15.765361 10 \n",
"4 13.318386 4.961779 18.280165 21.284224 15.255337 10 \n",
"\n",
" Bars_Weight TOTINDEX \n",
"0 17.647059 67.077054 \n",
"1 20.000000 61.866815 \n",
"2 5.882353 37.908747 \n",
"3 11.764706 37.530067 \n",
"4 10.588235 35.843573 \n",
"\n",
"[5 rows x 29 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"gaybourhoods = pd.read_csv(\"../data/raw/gaybourhoods.csv\")\n",
"gaybourhoods.head(5)"
]
},
2023-02-16 00:29:26 +00:00
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
2023-02-01 01:23:45 +00:00
}
],
"metadata": {
2023-02-16 00:29:26 +00:00
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
2023-02-01 01:23:45 +00:00
"language_info": {
2023-02-16 00:29:26 +00:00
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
2023-02-01 01:23:45 +00:00
}
},
"nbformat": 4,
2023-02-16 00:29:26 +00:00
"nbformat_minor": 4
}