# -*- coding: utf-8 -*-
from setuptools import setup

packages = \
['divintseg']

package_data = \
{'': ['*']}

install_requires = \
['genbadge[all]>=1.1.0,<2.0.0', 'numpy>=1.23.2,<2.0.0', 'pandas>=1.5.3,<2.0.0']

extras_require = \
{'docs': ['sphinx-rtd-theme==1.2.0',
          'sphinx-copybutton>=0.5.1,<0.6.0',
          'sphinxcontrib-napoleon==0.7',
          'toml>=0.10.0,<0.11.0']}

setup_kwargs = {
    'name': 'divintseg',
    'version': '0.4.0',
    'description': 'Tools for computing diversity, integration and segregation metrics',
    'long_description': '# divintseg\n\n`divintseg` is a simple package for computing diversity,\nintegration, and segregation statistics on data sets.\nTypically, it is used with demographic data such as \ncensus data.\n\n[![Hippocratic License HL3-CL-ECO-EXTR-FFD-LAW-MIL-SV](https://img.shields.io/static/v1?label=Hippocratic%20License&message=HL3-CL-ECO-EXTR-FFD-LAW-MIL-SV&labelColor=5e2751&color=bc8c3d)](https://firstdonoharm.dev/version/3/0/cl-eco-extr-ffd-law-mil-sv.html)\n\n[![Documentation Status](https://readthedocs.org/projects/divintseg/badge/?version=latest)](https://divintseg.readthedocs.io/en/latest/?badge=latest)\n\n![Tests Badge](reports/junit/tests-badge.svg)\n![Coverage Badge](reports/coverage/coverage-badge.svg)\n\n![PyPI - Downloads](https://img.shields.io/pypi/dm/divintseg)\n\n## Methodology\n\n`divintseg` uses a straightforward methodology to\ncompute its metrics. It is designed to make\nmathematical sense and have nice mathematical \nproperties, while at the same time remaining \nsimple enough that it makes sense to non-technical\npeople as well.\n\n### Visualizing Diversity, Integration, and Segregation\n\nIn order to build up some intuition on what our\nmetrics are trying to model, it\'s useful to start\nwith some visual illustrations of the concepts\nthe metrics try to capture.\n\nThe most basic notion in our methodology is that of a\ncommunity that consists of members of different\nnon-overlapping groups. In order to build a basic\nintuition for communities, groups, and the metrics\nwe will compute on them, we will begin with some visual\nrepresentations.\n\nWe\'ll start with a community that, intuitively, looks\nboth diverse and integrated.\n\n![a community](docs/_static/d-and-i.png?raw=true)\n\nEach small circle represents an individual. The color\nof the circle represents which one of three groups \nthey belong to. There are equal numbers of blue,\ngreen, and orange circles, so we would tend to\nconsider this group to be diverse. Furthermore, the\nmembers of the different groups are spread out\nthroughout the community so that every individual\nhas nearby neighbors that are in different groups\nthan they are. This community looks integrated.\n\nIn contrast, here is a community that looks diverse\nbut not integrated.\n\n![a community](docs/_static/d-and-s.png?raw=true)\n\nJust like the previous community, this community\nis diverse. It has an equal number of members of\neach group. But it is not integrated. Instead, it\nis segregated. Each of the three groups is \nconcentrated and most individuals do not have\nnearby neighbors of a different group.\n\nNow let\'s look at some communities that are less diverse.\nHere is a non-diverse community. Almost all the \nindividuals are in the blue group. \n\n![a community](docs/_static/nd-and-ni.png?raw=true)\n\nThis is also a segregated community. The few members of\nthe orange and green groups are all together in one\ncorner of the community.\n\nLet\'s look at another community that is also not diverse,\nbut looks like it might be at least a little more integrated \nthan the last one.\n\n![a community](docs/_static/nd-and-mi.png?raw=true)\n\nHow integrated really is this community? The few individuals\nin the orange and green groups are scattered around,\nbut there aren\'t really enough of them to say that the\ncommunity is integrated. As we will see when we develop\nthe math behind our methodology, a community that is\nnot that diverse cannot really be that integrated either,\nno matter how the individuals are distributed.\n\n### From Visuals to Mathematics\n\nWe will introduce our metrics one\nby one, starting with diversity, then integration,\nand finally segregation. Informed by the visuals\nabove, we\'ll try to come up with definitions that\nmake sense and can be translated into mathematical\nequations and then into code.\n\n\n#### Diversity\n\nLet\'s begin with a working definition of diversity.\nWe say a community is diverse if an average \nmember of the community is likely to encounter \npeople who are not members of their group as they \nnavigate the community. This idea has been \nproposed multiple times across different fields.\nIt is known as the \n[Gini-Simpson index](https://en.wikipedia.org//wiki/Diversity_index#Gini%E2%80%93Simpson_index),\nthe Gibbs-Martin index,\nthe Blau index, and\nexpected heterozygosity \nin different fields.\n\nNow let\'s turn that into math. In order to\ncompute the average chance a member of the \ncommunity encounters someone of a different\ngroup, we will first compute, for each group,\nwhat the chance that a random person from\nthe entire population comes from a different\ngroup. We will then compute the overall average\nacross all groups.\n\nLet\'s start with the population shown here:\n\n![a community](docs/_static/d-and-s2.png?raw=true)\n\nAll three groups are the same size. Let\'s start\nwith the blue group. The chance that a randomly\nchosen member of the population is a member of\nthe blue group is thus $1/3$, or \napproximately $0.333$. We\'ll call this number \n$p$. \n\nThe probability that a member of\nthe blue group encounters someone of one of the\nother two groups when they encounter a random\nperson from the entire population is \n$1 - p = 2/3$, or approximately $0.667$.\n\nSince all three groups are of the same size,\nthey all have the same value of $p$. We can\nsummarize this in a table as follows:\n\n| Group                                   | Representation $p$ | Chance of Encountering a Member of Another Group $= 1 - p$ |\n|-----------------------------------------|:--------------------:|:------------------------------------------------------------:|\n| <span style="color:blue;">Blue</span>   |       $0.333$        |                           $0.667$                            |\n| <span style="color:blue;">Orange</span> |       $0.333$        |                           $0.667$                            |\n|  <span style="color:blue;">Green</span> |       $0.333$        |                           $0.667$                            |\n\nIf we define the diversity of the population $D$ to be the average chance\nof any member of the population encountering a member of another group, then \nin this example it is\n\n$$D = 0.333(0.667) + 0.333(0.667) + 0.333(0.667) = 0.667.$$\n\nEach of the three terms is for one of the three groups, and for \neach of them the fraction of the population in the group is\n$0.333$ and the chance of encountering a member of another group\nis $0.667$.\n\n> ### Some additional strictly optional mathematical details\n>\n> (Feel free to skip this section if you like.)\n>  \n> More formally, what we computed is \n>\n> $$D = \\sum p(1 - p)$$\n>\n> The [Gini-Simpson index](https://en.wikipedia.org//wiki/Diversity_index#Gini%E2%80%93Simpson_index)\n> formulation of diversity is normally written as the equivalent expression\n>\n> $$D = 1 - \\sum p^2.$$\n>\n> The two are equivalent because \n> \n> $$D = \\sum p(1 - p) = \\sum p - \\sum p^2 = 1 - \\sum p^2$$\n> \n> The last step works because the $p$ values are probabilities for each group, so they add up to $1$, i.e. $\\sum p = 1$.\n>\n> But in our discussion we stick to the earlier formulation because we think\n> it more clearly expresses what we are computing and why, especially for\n> small examples like the ones we are considering here.\n\nNow let\'s look at another example. It is one of the communities we\nlooked at above.\n\n![a community](docs/_static/d-and-i.png?raw=true)\n\nIn this example, each of the groups is also exactly one third of\nthe population of the community, so we have the exact same numbers as before:\n\n| Group                                   | Representation $p$ | Chance of Encountering a Member of Another Group $= 1 - p$ |\n|-----------------------------------------|:--------------------:|:------------------------------------------------------------:|\n| <span style="color:blue;">Blue</span>   |       $0.333$        |                           $0.667$                            |\n| <span style="color:blue;">Orange</span> |       $0.333$        |                           $0.667$                            |\n| <span style="color:blue;">Green</span>  |       $0.333$        |                           $0.667$                            |\n\nAnd again, \n\n$$D = 0.333(0.667) + 0.333(0.667) + 0.333(0.667) = 0.667.$$\n\nSo this community has the same exact diversity as the last one,\nthough it is clearly more integrated. We\'ll return to that later.\n\nFinally, let\'s look at a less diverse community. Again, this is\none we looked at before.\n\n![a community](docs/_static/nd-and-ni.png?raw=true)\n\nLet\'s compute the $p$ for each of the three groups, and then\ncompute $D$. We\'ll add a column to our table\nwhere we will compute $p(1-p)$ for each group, and then we \nwill sum these up at the bottom of the table to get $D$.\n\n| Group                                   |   $p$   | $1 - p$ | Weighted Representation $p(1-p)$ |\n|-----------------------------------------|:---------:|:---------:|:----------------------------------:|\n| <span style="color:blue;">Blue</span>   | $0.963$ | $0.037$ |             $0.036$              |\n| <span style="color:blue;">Orange</span> | $0.022$ | $0.978$ |             $0.022$              |\n| <span style="color:blue;">Green</span>  | $0.015$ | $0.985$ |             $0.015$              |\n| Weighted Sum                            ||| $0.073$ |\n\nSo this community\'s diversity is $0.073$. As we would expect\nfrom visual inspection, it is much lower than the diversity of the\nprevious two communities ($0.667$).\n\n#### Integration\n\nAs we saw above, two communities can have the exact same diversity,\nbut to the eye appear to be very different when it comes to integration.\nIntegration is all about whether the members of a diverse community\nactually do interact, as the definition of diversity assumes they\ndo, by randomly encountering one another, or if, on the contrary,\nthey live in segregated neighborhoods within the community in which\nthey rarely encounter members of other groups.\n\nIn order to make this notion of integration a little more formal,\nin a way that we can then write math and code to compute it, we will\nsay that a community is integrated if the average member of the community \nis likely to encounter people who are not members of their \ngroup as they navigate their local neighborhood within the community.\nAnother way of putting this is that if the neighborhoods within a \ncommunity are diverse, then the community is integrated. If the community\nas a whole is diverse, but none of the neighborhoods in the community\nare themselves diverse, then the community is not integrated.\nMathematically speaking, integration is the population-weighted average of \nneighborhood diversity in the community.\n\nLet\'s look at an example of a community consisting of three neighborhoods\nof equal population.\n\n![a community of neighborhoods](docs/_static/n-d-and-i.png?raw=true)\n\nWe can compute the diversity withing each of the three neighborhoods.\nSince each neighborhood has exactly equal numbers of members of each\ngroup, each neighborhood has diversity $D = 0.667$. We get this \nnumber by doing the exact same kind of calculation we did for the \ndiversity of the community with equal members of each group above. \nWe just repeat it three times, once for each neighborhood.\n\nNow, let\'s define $r$ for each neighborhood to be the fraction of\nthe total population of the community that lives in the neighborhood.\nFor our current example, $r = 1/3$ for each of the three \nneighborhoods since they are of equal size. \n\nKnowing $r$ and $D$ for each neighborhood, we can compute the\nintegration of the community by multiplying the $r$ and $D$ values\ntogether for each neighborhood and summing them up. We do this in the\nfollowing table.\n\n| Neighborhood |   $r$   |   $D$   | Weighted Diversity $rD$ |\n|--------------|:---------:|:---------:|:-------------------------:|\n| A            | $0.333$ | $0.667$ |         $0.222$         |\n| B            | $0.333$ | $0.667$ |         $0.222$         |\n| C            | $0.333$ | $0.667$ |         $0.222$         |\n| Weighted Sum ||| $0.667$ |\n\nSo the integration of our community is $I = 0.667$. This is \nexactly the same as the overall diversity of the community.\n\nWe won\'t go into the details here, but one of the consequences\nof the way we set up our mathematical definitions of diversity\nand integration is that the value of $I$ for a community can\nnever be more that the value of $D$. That is, $I \\le D$ in\nall cases. More generally, $I$ and $D$ are also both between\n$0$ and $1$, so $0 \\le I \\le D \\le 1$ no matter how our\ncommunity and the neighborhoods within it are constructed. No \nmatter how big the community is, how big the neighborhoods are,\nwhether the neighborhoods are all the same size or not, or how\nmany groups there are, the fundamental relationship \n\n$$0 \\le I \\le D \\le 1$$\n\nwill always hold true.\n\nNow let\'s look at a community where $I < D$, meaning that \nintegration is less than diversity in the community.\n\n![a community of neighborhoods](docs/_static/n-d-and-i2.png?raw=true)\n\nIf we repeat our calculation of $D$ for each neighborhood and\nuse that to calculate $I$ again, we get\n\n| Neighborhood |   $r$   |   $D$   | Weighted Diversity $rD$ |\n|--------------|:---------:|:---------:|:-------------------------:|\n| A            | $0.333$ | $0.667$ |         $0.222$         |\n| B            | $0.333$ | $0.444$ |         $0.148$         |\n| C            | $0.333$ | $0.444$ |         $0.148$         |\n| Weighted Sum ||| $0.519$ |\n\nSo $I = 0.519$ for this community. Looking at this community vs. the\nprevious one, it does appear to be less integrated. Neighborhood A\nis diverse, with equal numbers of each of the three groups, but the other\ntwo neighborhoods are less diverse. Each of them is completely lacking\none of the three groups and has unequal numbers of the other two.\n\nSo, as far as out math working out to produce $I = 0.519 < D = 0.667$\nfor this community, things make sense. The way the people in the \ncommunity are divided up into neighborhoods results in integration\nbeing less than the diversity of the community as a whole. This is in\ncontrast to the previous example where each neighborhood was as diverse\nas the whole community, and as a result, $I$ was equal to $D$.\n\nNow let\'s look at a third example, one in which the diversity of the \ncommunity as a whole was already low, and even the limited diversity\nthat exists is not shared among the neighborhoods. This should result\nin a value of $I$ even lower than the already low value of $D$.\n\n![a community of neighborhoods](docs/_static/n-nd-and-s.png?raw=true)\n\nIf we do our calculation of $D$ as we did above when we looked at \nthis community without the neighborhood boundaries, $D = 0.073$.\nNow let\'s calculate $I$.\n\n| Neighborhood |   $r$   |   $D$   | Weighted Diversity $rD$ |\n|--------------|:---------:|:---------:|:-------------------------:|\n| A            | $0.333$ | $0.000$ |         $0.000$         |\n| B            | $0.333$ | $0.000$ |         $0.000$         |\n| C            | $0.333$ | $0.204$ |         $0.068$         |\n| Weighted Sum ||| $0.068$ |\n\nTwo of the neighborhoods (A and B) have no diversity at all. Neighborhood\nC has a little bit. The overall integration of the community is $I = 0.068$,\nwhich is less than the diversity of $D = 0.073$ as we expected.\n\nFinally, just to drive home the point that diversity and integration are\ndifferent concepts, let\'s look at a community with high diversity but\nno integration at all.\n\n![a community of neighborhoods](docs/_static/n-d-and-s.png?raw=true)\n\nOverall diversity of the community is $D = 0.667$, but if we calculate\n$I$ we get\n\n| Neighborhood |   $r$   |   $D$   | Weighted Diversity $rD$ |\n|--------------|:---------:|:---------:|:-------------------------:|\n| A            | $0.333$ | $0.000$ |         $0.000$         |\n| B            | $0.333$ | $0.000$ |         $0.000$         |\n| C            | $0.333$ | $0.000$ |         $0.000$         |\n| Weighted Sum ||| $0.000$ |\n\n$I = 0$. Despite the community being diverse, it is not integrated at all.\n\n#### Segregation\n\nSegregation is the opposite of integration. Since we know that for\nall communities, $0 \\le I \\le 1$ we simply define segregation as $S = 1 - I$.\nWe don\'t generally use $S$ as often as we use $D$ and $I$, since it is\nso related to $I$, but for completeness, the `divintseg` library can \ncompute it.\n\n## Code Examples\n\nNow that we have gone through the methodology behind `divintseg` at \nlength, let\'s look at some examples of how to use the code itself.\n\nIn most cases, data we will want to analyze with `divintseg` will\nexist in Pandas DataFrames, or in some other format that is easy\nto convert to a DataFrame. We\'ll use them in our examples.\n\n### Diversity\n\nWe begin with some diversity computations.\n\nFirst, let\'s start with a very simple example consisting of a single-row\nDataFrame with a column for each group. The numbers in the columns represent\nthe number of people in the community that belong to each group.\nThe first community we looked at had 108 members of each group. So we could\nconstruct it in code and compute its diversity as follows:\n\n```python\nimport pandas as pd\n\nfrom divintseg import diversity\n\ndf = pd.DataFrame(\n    [[108, 108, 108]],\n    columns=[\'blue\', \'green\', \'orange\']\n)\n\nprint(diversity(df))\n```\n\nThis will print \n\n```text\n0    0.666667\nName: diversity, dtype: float64\n```\n\nThe return value of the call to `diversity(df)` is a pandas Series with a \nsingle element, the diversity of the single row of `df`. And as we \nwould expect, it got the same number we calculated manually above.\n\nNow let\'s try something a little more advanced, with three neighborhoods\nin a community like in our examples above.\n\n```python\nimport pandas as pd\n\nfrom divintseg import diversity\n\ndf = pd.DataFrame(\n    [\n        [\'A\', 36, 36, 36],\n        [\'B\', 36, 36, 36],\n        [\'C\', 36, 36, 36],\n    ],\n    columns=[\'neighborhood\', \'blue\', \'green\', \'orange\']\n)\n\ndf.set_index(\'neighborhood\', inplace=True)\n\nprint(diversity(df))\n```\n\nThis time the output is \n\n```text\nneighborhood\nA    0.666667\nB    0.666667\nC    0.666667\nName: diversity, dtype: float64\n```\n\n`diversity(df)` calculated the diversity of each row independently.\nAgain we reproduced some of the same results as we got manually \nabove.\n\nNow let\'s try another example with some different diversity in\ndifferent neighborhoods.\n\n```python\nimport pandas as pd\n\nfrom divintseg import diversity\n\ndf = pd.DataFrame(\n    [\n        [\'A\', 36, 36, 36],\n        [\'B\', 72, 0, 36],\n        [\'C\', 0, 72, 36],\n    ],\n    columns=[\'neighborhood\', \'blue\', \'green\', \'orange\'],\n)\n\ndf.set_index(\'neighborhood\', inplace=True)\n\nprint(diversity(df))\n```\n\nNow the output is \n\n```text\nneighborhood\nA    0.666667\nB    0.444444\nC    0.444444\nName: diversity, dtype: float64\n```\n\njust as we would expect.\n\n### Integration\n\nNow let\'s move on to integration. The API is almost as simple\nas for diversity, but we have to specify what column or index\nrepresents the neighborhood. \n\nMore generally, since we might\nnot actually be working with neighborhoods, but with various\nother kinds of nested geographic areas. For example, if we \nare working with US Census data, we might be interested in\nintegration at the block group level computed over the diversity\nof the different blocks in the block group. But we might\nalso want to skip a level in the census hierarchy as compute\nthe integration of census tracts (groups of multiple block groups)\nover diversity down at the block level. The `integration` API\ngives us the flexibility to choose how we do this.\n\nHere is an example where we put two communities in the same\nDataFrame. The first, community `"X"` has equally diverse\nneighborhoods. The second, community `"Y"` has unequally\ndiverse neighborhoods.\n\n```python\nimport pandas as pd\n\nfrom divintseg import integration\n\ndf = pd.DataFrame(\n    [\n        [\'X\', \'A\', 36, 36, 36],\n        [\'X\', \'B\', 36, 36, 36],\n        [\'X\', \'C\', 36, 36, 36],\n        \n        [\'Y\', \'A\', 36, 36, 36],\n        [\'Y\', \'B\', 72, 0, 36],\n        [\'Y\', \'C\', 0, 72, 36],\n    ],\n    columns=[\'community\', \'neighborhood\', \'blue\', \'green\', \'orange\'],\n)\n\nprint(integration(df, by=\'community\', over=\'neighborhood\'))\n```\n\nThe two keyword arguments are important. The first `by="community`, tells\nthe API that we want our results by community. There are two unique communities\nin the data, `"X"`, and `"Y"`, so we should get two results. The second keyword,\n`over=\'neigborhood\'` tells us what column to use to represent the inner level\nof geography at which to compute the diversity numbers that we then aggregate\nup to the level specified by the `by=` argument.\n\nThe result is \n\n```text\n           integration\ncommunity             \nX             0.666667\nY             0.518519\n```\n\nThis again matches the results we computed manually for these example communities\nand neighborhoods.\n\n### Diversity, Integration (and Segregation) All at Once\n\nMore often than not, we want to compute diversity and integration\nfor the same communities at the same time. We can do that with a single\nAPI `divintseg.di`. It can also optionally tell us segregation too.\nHere is how to use it.\n\n```python\nimport pandas as pd\n\nfrom divintseg import di\n\ndf = pd.DataFrame(\n    [\n        [\'W\', \'A\', 108, 0, 0],\n        [\'W\', \'B\', 0, 108, 0],\n        [\'W\', \'C\', 0, 0, 108],\n        \n        [\'X\', \'A\', 36, 36, 36],\n        [\'X\', \'B\', 36, 36, 36],\n        [\'X\', \'C\', 36, 36, 36],\n        \n        [\'Y\', \'A\', 36, 36, 36],\n        [\'Y\', \'B\', 72, 0, 36],\n        [\'Y\', \'C\', 0, 72, 36],\n        \n        [\'Z\', \'A\', 108, 0, 0],\n        [\'Z\', \'B\', 108, 0, 0],\n        [\'Z\', \'C\', 96, 5, 7],\n    ],\n    columns=[\'community\', \'neighborhood\', \'blue\', \'green\', \'orange\'],\n)\n\nprint(di(df, by=\'community\', over=\'neighborhood\', add_segregation=True))\n```\n\nThis gives us everything we would want to know about diversity, \nintegration, and segregation in these communities in one output\nDataFrame.\n\n```text\n           diversity  integration  segregation\ncommunity      \nW           0.666667     0.000000     1.000000                               \nX           0.666667     0.666667     0.333333\nY           0.666667     0.518519     0.481481\nZ           0.071997     0.067844     0.932156\n```\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n',
    'author': 'Darren Vengroff',
    'author_email': 'None',
    'maintainer': 'None',
    'maintainer_email': 'None',
    'url': 'https://github.com/vengroff/divintseg',
    'packages': packages,
    'package_data': package_data,
    'install_requires': install_requires,
    'extras_require': extras_require,
    'python_requires': '>=3.9,<4.0',
}


setup(**setup_kwargs)
