diff --git a/README.md b/README.md index 1a0613b..d37d15c 100644 --- a/README.md +++ b/README.md @@ -18,18 +18,19 @@ To quantify gaybourhoods, writers and data analysts from The Pudding collected i ## Team Members -- Nat Scott -- Sami Almuallim +- Nat Scott: Student of computer and environmental science +- Sami Almuallim: Student of computer and data science ## Images -{You should use this area to add a screenshot of an interesting plot, or of your dashboard} - - +Images coming soon. ## References -{Add your stuff here} - - +- [Men are from Chelsea, Women are from Park Slope](https://pudding.cool/2018/06/gayborhoods/) + - The article for which the data was originally collected. +- [The Gaybourhoods data set on Github](https://github.com/the-pudding/data/blob/master/gayborhoods/README.md) +Sources of (potential) secondary data sets: +- [Data set relating US ZIP codes to their coordinates](https://www.kaggle.com/datasets/joeleichter/us-zip-codes-with-lat-and-long) +- [Geographic situation of taxes payed in the US](https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-statistics-2018-zip-code-data-soi) diff --git a/analysis/analysis1.ipynb b/analysis/analysis1.ipynb index f2397fc..3291d65 100644 --- a/analysis/analysis1.ipynb +++ b/analysis/analysis1.ipynb @@ -13,13 +13,13 @@ "source": [ "## Research question/interests\n", "\n", - "Is there a correlation between political alignment & living in neighbourhoods with large quantities of LGBT people?\n", + "**Is there a correlation between political alignment & living in neighbourhoods with large quantities of LGBT people?** The obvious answer to this question is \"yes, they are going to mostly be democrats\" but anyone who's ever been around queer people will know that this question is quite a bit more nuanced than that, and this nuance is what we hope to capture in investigating this question.\n", "\n", "- The gaybourhoods data set does not include data on residents political alignments, however, there is a wealth of electoral data available freely online that we intend on incorporating into this project. The primary difficulty then will be developing a geographic \"compatibility layer\" between the data sets so that the data can be understood in the same context. To build this, we intend on working with the OpenStreetMap API to create an additional column representing observations position space in a more neutral way, such as their coordinates.\n", "- Alternatively, we've also considered working with an additional data set that links US zip codes to their longitude and lattitude positions. As such, incorporating this data would be as easy as merging the two tables.\n", "\n", "\n", - "Is there a correlation between geographical stratums & being LGBT?\n", + "**Is there a correlation between geographical stratums & being LGBT?** This question is more abstract, and will serve as a preliminary exploration of the data in hopes of establishing two key details along the way that will shape the rest of the project: how do we quantify queerness, and how do we best represent it visually?\n", "\n", "- Once again, representing this data visually will require determining the coordinates associated with each observation.\n", "- The gaybourhoods data set defines a \"gaybourhood index\" which effectively measures how friendly a given neighbourhood is to queer people. Since this index is entirely subjective, we will need to closely evaluate it's usefulness for our project and investigate different ways to quantify \"queer-friendliness\"\n", @@ -27,6 +27,225 @@ "- Obviously, visualizing this among many aspects of the other research questions would involve projecting the data onto a map of the United States, so visualizing this research question would motivate many of the visualizations for other components of this project" ] }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | GEOID10 | \n", + "Tax_Mjoint | \n", + "Mjoint_MF | \n", + "Mjoint_SS | \n", + "Mjoint_FF | \n", + "Mjoint_MM | \n", + "TaxRate_SS | \n", + "TaxRate_FF | \n", + "TaxRate_MM | \n", + "Cns_TotHH | \n", + "... | \n", + "FF_Cns | \n", + "FF_Index | \n", + "MM_Tax | \n", + "MM_Cns | \n", + "MM_Index | \n", + "SS_Index | \n", + "SS_Index_Weight | \n", + "Parade_Weight | \n", + "Bars_Weight | \n", + "TOTINDEX | \n", + "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", + "90069 | \n", + "2120 | \n", + "1689 | \n", + "431 | \n", + "61 | \n", + "370 | \n", + "203.301887 | \n", + "28.773585 | \n", + "174.528302 | \n", + "12551 | \n", + "... | \n", + "1.847099 | \n", + "6.724415 | \n", + "29.583721 | \n", + "18.704533 | \n", + "48.288254 | \n", + "55.012669 | \n", + "39.429995 | \n", + "10 | \n", + "17.647059 | \n", + "67.077054 | \n", + "
1 | \n", + "94114 | \n", + "5080 | \n", + "4036 | \n", + "1044 | \n", + "170 | \n", + "874 | \n", + "205.511811 | \n", + "33.464567 | \n", + "172.047244 | \n", + "16456 | \n", + "... | \n", + "4.161579 | \n", + "9.834048 | \n", + "29.163165 | \n", + "19.415304 | \n", + "48.578469 | \n", + "58.412517 | \n", + "41.866815 | \n", + "0 | \n", + "20.000000 | \n", + "61.866815 | \n", + "
2 | \n", + "10011 | \n", + "5790 | \n", + "5166 | \n", + "624 | \n", + "97 | \n", + "527 | \n", + "107.772021 | \n", + "16.753022 | \n", + "91.018998 | \n", + "29762 | \n", + "... | \n", + "1.531029 | \n", + "4.370779 | \n", + "15.428332 | \n", + "10.932081 | \n", + "26.360413 | \n", + "30.731192 | \n", + "22.026394 | \n", + "10 | \n", + "5.882353 | \n", + "37.908747 | \n", + "
3 | \n", + "10014 | \n", + "3510 | \n", + "3229 | \n", + "281 | \n", + "74 | \n", + "207 | \n", + "80.056980 | \n", + "21.082621 | \n", + "58.974359 | \n", + "18786 | \n", + "... | \n", + "2.482293 | \n", + "6.055939 | \n", + "9.996551 | \n", + "5.943318 | \n", + "15.939869 | \n", + "21.995808 | \n", + "15.765361 | \n", + "10 | \n", + "11.764706 | \n", + "37.530067 | \n", + "
4 | \n", + "94103 | \n", + "2660 | \n", + "2417 | \n", + "243 | \n", + "34 | \n", + "209 | \n", + "91.353383 | \n", + "12.781955 | \n", + "78.571429 | \n", + "12728 | \n", + "... | \n", + "0.837431 | \n", + "3.004058 | \n", + "13.318386 | \n", + "4.961779 | \n", + "18.280165 | \n", + "21.284224 | \n", + "15.255337 | \n", + "10 | \n", + "10.588235 | \n", + "35.843573 | \n", + "
5 rows × 29 columns
\n", + "\n", + " | GEOID10 | \n", + "Tax_Mjoint | \n", + "Mjoint_MF | \n", + "Mjoint_SS | \n", + "Mjoint_FF | \n", + "Mjoint_MM | \n", + "TaxRate_SS | \n", + "TaxRate_FF | \n", + "TaxRate_MM | \n", + "Cns_TotHH | \n", + "... | \n", + "FF_Cns | \n", + "FF_Index | \n", + "MM_Tax | \n", + "MM_Cns | \n", + "MM_Index | \n", + "SS_Index | \n", + "SS_Index_Weight | \n", + "Parade_Weight | \n", + "Bars_Weight | \n", + "TOTINDEX | \n", + "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", + "90069 | \n", + "2120 | \n", + "1689 | \n", + "431 | \n", + "61 | \n", + "370 | \n", + "203.301887 | \n", + "28.773585 | \n", + "174.528302 | \n", + "12551 | \n", + "... | \n", + "1.847099 | \n", + "6.724415 | \n", + "29.583721 | \n", + "18.704533 | \n", + "48.288254 | \n", + "55.012669 | \n", + "39.429995 | \n", + "10 | \n", + "17.647059 | \n", + "67.077054 | \n", + "
1 | \n", + "94114 | \n", + "5080 | \n", + "4036 | \n", + "1044 | \n", + "170 | \n", + "874 | \n", + "205.511811 | \n", + "33.464567 | \n", + "172.047244 | \n", + "16456 | \n", + "... | \n", + "4.161579 | \n", + "9.834048 | \n", + "29.163165 | \n", + "19.415304 | \n", + "48.578469 | \n", + "58.412517 | \n", + "41.866815 | \n", + "0 | \n", + "20.000000 | \n", + "61.866815 | \n", + "
2 | \n", + "10011 | \n", + "5790 | \n", + "5166 | \n", + "624 | \n", + "97 | \n", + "527 | \n", + "107.772021 | \n", + "16.753022 | \n", + "91.018998 | \n", + "29762 | \n", + "... | \n", + "1.531029 | \n", + "4.370779 | \n", + "15.428332 | \n", + "10.932081 | \n", + "26.360413 | \n", + "30.731192 | \n", + "22.026394 | \n", + "10 | \n", + "5.882353 | \n", + "37.908747 | \n", + "
3 | \n", + "10014 | \n", + "3510 | \n", + "3229 | \n", + "281 | \n", + "74 | \n", + "207 | \n", + "80.056980 | \n", + "21.082621 | \n", + "58.974359 | \n", + "18786 | \n", + "... | \n", + "2.482293 | \n", + "6.055939 | \n", + "9.996551 | \n", + "5.943318 | \n", + "15.939869 | \n", + "21.995808 | \n", + "15.765361 | \n", + "10 | \n", + "11.764706 | \n", + "37.530067 | \n", + "
4 | \n", + "94103 | \n", + "2660 | \n", + "2417 | \n", + "243 | \n", + "34 | \n", + "209 | \n", + "91.353383 | \n", + "12.781955 | \n", + "78.571429 | \n", + "12728 | \n", + "... | \n", + "0.837431 | \n", + "3.004058 | \n", + "13.318386 | \n", + "4.961779 | \n", + "18.280165 | \n", + "21.284224 | \n", + "15.255337 | \n", + "10 | \n", + "10.588235 | \n", + "35.843573 | \n", + "
5 rows × 29 columns
\n", + "