As discussed in #6, the IRS data set is too big for GitHub to handle. So, some processing has been added to analysis/analisys2.ipynb to split the data set into two parts and stored in the processed folder