github-org-scripts/notebooks/UserSearchPy3.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0c9feb99",
   "metadata": {
    "incorrectly_encoded_metadata": "jp-MarkdownHeadingCollapsed=true",
    "lines_to_next_cell": 0,
    "tags": []
   },
   "source": [
    "# User Search\n",
    "For use to:\n",
    "1. Try to find an account based on random knowledge\n",
    "2. List all orgs they belong to (from a subset)\n",
    "  - You will need org owner permissions to perform these searches"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "631c763f",
   "metadata": {
    "heading_collapsed": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "# Boiler plate\n",
    "Skip/hide this. Common usage is below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a7d2a42",
   "metadata": {
    "hidden": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "If you see this text, you may want to enable the nbextension \"Collapsible Headings\", so you can hide this in common usage."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f517002c",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "## Tune as needed\n",
    "\n",
    "There are several lru_cache using functions. Many of them are called len(orgs_to_check) times. If they are under sized, run times will get quite long. (Only the first query should be delayed - after that, all data should be in the cache.)\n",
    "\n",
    "See the \"cache reporting\" cell below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9033e078",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "#### Configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f73b7313",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# Set some initial values\n",
    "\n",
    "# for now, ACL lookup is quite time consuming, so assume it will be done in a different run\n",
    "defer_acl_lookup = True\n",
    "\n",
    "# for now, assume person doing GitHub offboard is not doing Heroku offboard\n",
    "defer_heroku_lookup = True\n",
    "\n",
    "# There are 3 ways to set the orgs to examine:\n",
    "#   1. use a canned list\n",
    "#   2. use just 2 for testing\n",
    "#   3. get a list of accessible orgs from GitHub\n",
    "#\n",
    "# #3 is preferred - the others are fallbacks\n",
    "\n",
    "use_github_org_list = True\n",
    "use_test_org_list = False\n",
    "use_canned_org_list = False\n",
    "\n",
    "# hacks for typing\n",
    "from typing import Any, List\n",
    "\n",
    "gh: Any = None\n",
    "\n",
    "# bogus line to catch bad type hint\n",
    "fred= 3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5a4c1c6",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "## Code"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd0faa9e",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "### main code (CIS/IAM)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65702c18",
   "metadata": {
    "hidden": true
   },
   "source": [
    "Not every operator will have a valid token for the CIS system, so fail gently if not"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91493eae",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def check_CIS(email):\n",
    "    login = first_name = None\n",
    "    if _has_cis_access():\n",
    "        login, first_name = _get_cis_info(email)\n",
    "        display(f\"CIS info for {email} reports '{login}', first name of '{first_name}'\")\n",
    "    else:\n",
    "        display(\"Skipping CIS check, no token available.\")\n",
    "    return login, first_name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75161391",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "print(sys.executable)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c96c7434",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def _has_cis_access():\n",
    "    import os\n",
    "\n",
    "    return os.environ.get(\"CIS_CLIENT_ID\", \"\") and os.environ.get(\n",
    "        \"CIS_CLIENT_SECRET\", \"\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4716f716",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "_cis_bearer_token = None\n",
    "import requests\n",
    "\n",
    "\n",
    "def _get_cis_bearer_token():\n",
    "    global _cis_bearer_token\n",
    "    if _cis_bearer_token:\n",
    "        return _cis_bearer_token\n",
    "    else:\n",
    "        import requests\n",
    "\n",
    "        url = \"https://auth.mozilla.auth0.com/oauth/token\"\n",
    "        headers = {\"Content-Type\": \"application/json\"}\n",
    "        payload = {\n",
    "            \"client_id\": os.environ[\"CIS_CLIENT_ID\"],\n",
    "            \"client_secret\": os.environ[\"CIS_CLIENT_SECRET\"],\n",
    "            \"audience\": \"api.sso.mozilla.com\",\n",
    "            \"grant_type\": \"client_credentials\",\n",
    "        }\n",
    "        resp = requests.post(url, json=payload, headers=headers)\n",
    "        data = resp.json()\n",
    "        _cis_bearer_token = data[\"access_token\"]\n",
    "        return _cis_bearer_token\n",
    "\n",
    "\n",
    "def _get_cis_info(email):\n",
    "    import urllib.request, urllib.parse, urllib.error\n",
    "\n",
    "    bearer_token = _get_cis_bearer_token()\n",
    "    # first get the v4 id\n",
    "    url = (\n",
    "        \"https://person.api.sso.mozilla.com/v2/user/primary_email/{}?active=any\".format(\n",
    "            urllib.parse.quote(email)\n",
    "        )\n",
    "    )\n",
    "    headers = {\"Authorization\": f\"Bearer {bearer_token}\"}\n",
    "    resp = requests.get(url, headers=headers)\n",
    "    data = resp.json()\n",
    "    login = v4id = None\n",
    "    try:\n",
    "        first_name = data[\"first_name\"][\"value\"].lower()\n",
    "    except KeyError:\n",
    "        try:\n",
    "            print(f\"DEBUG: {data=}\")\n",
    "            print(f\"DEBUG: {data['first_name']=}\")\n",
    "            print(f\"DEBUG: {data['first_name']['value']=}\")\n",
    "        except KeyError:\n",
    "            pass\n",
    "        first_name = None\n",
    "    try:\n",
    "        v4id = data[\"identities\"][\"github_id_v4\"][\"value\"]\n",
    "    except KeyError:\n",
    "        pass\n",
    "    if v4id:\n",
    "        # if there was a v4 id, map it to a login, via graphQL\n",
    "        query = \"\"\"\n",
    "            query id_lookup($id_to_check: ID!) {\n",
    "              node(id: $id_to_check) {\n",
    "                ... on User {\n",
    "                  login\n",
    "                  id\n",
    "                  databaseId\n",
    "                }\n",
    "              }\n",
    "            }\n",
    "            \"\"\"\n",
    "        variables = '{ \"id_to_check\": \"' + str(v4id) + '\" }'\n",
    "        url = \"https://api.github.com/graphql\"\n",
    "        headers = {\"Authorization\": f\"Token {api_key}\"}\n",
    "        payload = {\n",
    "            \"query\": query,\n",
    "            \"variables\": variables,\n",
    "        }\n",
    "        resp = requests.post(url, headers=headers, json=payload)\n",
    "        try:\n",
    "            data = resp.json()\n",
    "            login = data[\"data\"][\"node\"][\"login\"].lower()\n",
    "        except  (TypeError, KeyError):\n",
    "            login = None\n",
    "    return login, first_name"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49c6946a",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "### Debug CIS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d823ee7",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "print()\n",
    "_get_cis_info(\n",
    "    \"\"\" \n",
    "hwine@mozilla.com\n",
    "\"\"\"\n",
    ")\n",
    "print(\"done\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ca2baca",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "### main code (GitHub)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e5e6e14",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### helpers GitHub3.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50939c45",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "# print some debug information\n",
    "import github3\n",
    "\n",
    "print(github3.__version__)\n",
    "print(github3.__file__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3902e792",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.executable"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89c6dad8",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "print(sys.executable)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d84078ac",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "# set values here - you can also override below\n",
    "import os\n",
    "\n",
    "api_key = os.environ.get(\"GITHUB_PAT\", \"\")\n",
    "if not api_key:\n",
    "    raise OSError(\"no GitHub PAT found\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a97b9cb6",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "import time\n",
    "import ipywidgets, IPython\n",
    "print(ipywidgets.__file__)\n",
    "print(IPython.__file__)\n",
    "print(IPython.display.__file__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e26441e6",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def display_javascript_alert(text):\n",
    "    from IPython.display import Javascript\n",
    "    simpjs = Javascript(f'alert(\"{text}\")')\n",
    "    display(simpjs)\n",
    "\n",
    "\n",
    "def failed_initialization(msg):\n",
    "    display_javascript_alert(f\"Initialization failure: '{msg}'\")\n",
    "    raise SystemExit(3, msg)\n",
    "\n",
    "# failed_initialization(\"does it work?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f640ee45",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def try_login():\n",
    "    global gh\n",
    "    gh = None\n",
    "    try:\n",
    "        gh = github3.login(token=api_key)\n",
    "        print(f\"You are authenticated as {gh.me().login}\")\n",
    "    except (github3.exceptions.ForbiddenError) as e:\n",
    "        failed_initialization(\"Invalid credentials\")\n",
    "        raise\n",
    "    except (github3.exceptions.ConnectionError) as e:\n",
    "        failed_initialization(f\"Exception while opening connection (type {type(e)}):\\n{str(e)}\")\n",
    "        raise\n",
    "    except Exception as e:\n",
    "        failed_initialization(f\"Unexpected exception while opening connection (type {type(e)}):\\n{str(e)}\")\n",
    "        raise\n",
    "\n",
    "try_login()\n",
    "from pprint import pprint\n",
    "pprint(gh)\n",
    "if gh is None:\n",
    "    # give up if we can't login at script start\n",
    "    raise github3.exceptions.ConnectionError(\"Couldn't log in to GitHub\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0bcb67b",
   "metadata": {
    "hidden": true
   },
   "source": [
    "From here on, use ``gh`` to access all GitHub data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b01c7d37",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "# set the local timezone (the container is UTC)\n",
    "print(f'local timezone is {os.environ[\"TZ\"]}.')\n",
    "time.tzset()\n",
    "\n",
    "\n",
    "def print_limits(e=None, verbose=False):\n",
    "    if e:\n",
    "        #         display(\"API limit reached, try again in 5 minutes.\\n\")\n",
    "        display(str(e))\n",
    "\n",
    "    reset_max = reset_min = 0\n",
    "    limits = gh.rate_limit()\n",
    "    resources = limits[\"resources\"]\n",
    "    #     print(\"{:3d} keys: \".format(len(resources.keys())), resources.keys())\n",
    "    #     print(resources)\n",
    "    from pprint import pprint as pp\n",
    "    pp(f\"{limits=}\")\n",
    "    for reset in list(resources.keys()):\n",
    "        reset_at = resources[reset][\"reset\"]\n",
    "        reset_max = max(reset_at, reset_max)\n",
    "        if not resources[reset][\"remaining\"]:\n",
    "            reset_min = min(reset_at, reset_min if reset_min else reset_at)\n",
    "            if verbose:\n",
    "                print(\"EXPIRED for {} {}\".format(reset, resources[reset][\"remaining\"]))\n",
    "        else:\n",
    "            if verbose or reset == \"search\":\n",
    "                print(\n",
    "                    \"remaining for {} {}\".format(reset, resources[reset][\"remaining\"])\n",
    "                )\n",
    "\n",
    "    if not reset_min:\n",
    "        print(\"No limits reached currently.\")\n",
    "    else:\n",
    "        print(\n",
    "            \"Minimum reset at {} UTC ({})\".format(\n",
    "                time.asctime(time.gmtime(reset_min)),\n",
    "                time.asctime(time.localtime(reset_min)),\n",
    "            )\n",
    "        )\n",
    "    print(\n",
    "        \"All reset at {} UTC ({})\".format(\n",
    "            time.asctime(time.gmtime(reset_max)),\n",
    "            time.asctime(time.localtime(reset_max)),\n",
    "        )\n",
    "    )\n",
    "\n",
    "def wait_for_quota(which:str) -> None:\n",
    "    \"\"\" wait for specified search limit to reset\n",
    "    \"\"\"\n",
    "    limits = gh.rate_limit()\n",
    "    the_limit = limits[\"resources\"][which]\n",
    "    if the_limit[\"remaining\"] < 2:\n",
    "        wait_seconds = the_limit[\"reset\"] - int(time.time()) + 1\n",
    "        print(f\"waiting for {which} limit to reset at {time.asctime(time.localtime(the_limit['reset']))}\"\n",
    "             f\", {wait_seconds} seconds from now\")\n",
    "        time.sleep(wait_seconds)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d228384d",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "from functools import lru_cache\n",
    "\n",
    "print_limits()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "117face6",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    failure_reason = None\n",
    "    orgs_to_check = set()\n",
    "    if use_github_org_list:\n",
    "        my_login = gh.me().login\n",
    "        # while a comprehension looks nice here, it doesn't let us recover from permission errors in the loops,\n",
    "        # which can happen when some orgs change parameters from \"expected\"\n",
    "        # orgs_to_check = set([o.login for o in gh.organizations() if my_login in [x.login for x in o.members(role=\"admin\")]])\n",
    "        for o in gh.organizations():\n",
    "            try:\n",
    "                if my_login in [x.login for x in o.members(role=\"admin\")]:\n",
    "                    orgs_to_check.add(o.login)\n",
    "            except github3.exceptions.ForbiddenError as e:\n",
    "                display_javascript_alert(f\"Skipping org {o.login}, as your token doesn't work with it. ({str(e)})\")\n",
    "        ...\n",
    "        if not len(orgs_to_check):\n",
    "            failure_reason = \"User isn't an admin for any orgs\"\n",
    "\n",
    "    elif use_test_org_list:\n",
    "        orgs_to_check = set(\n",
    "            \"\"\"\n",
    "            mozilla-services\n",
    "            mozilla\n",
    "            pocket\n",
    "        \"\"\".split()\n",
    "        )\n",
    "    elif use_canned_org_list:  # old school\n",
    "        orgs_to_check = set(\n",
    "            \"\"\"\n",
    "        Mozilla-Commons\n",
    "        Mozilla-Games\n",
    "        Mozilla-TWQA\n",
    "        MozillaDPX\n",
    "        MozillaDataScience\n",
    "        MozillaFoundation\n",
    "        MozillaReality\n",
    "        MozillaSecurity\n",
    "        MozillaWiki\n",
    "        Pocket\n",
    "        Thunderbird-client\n",
    "        devtools-html\n",
    "        firefox-devtools\n",
    "        fxos\n",
    "        fxos-eng\n",
    "        iodide-project\n",
    "        mdn\n",
    "        moz-pkg-testing\n",
    "        mozilla\n",
    "        mozilla-applied-ml\n",
    "        mozilla-archive\n",
    "        mozilla-b2g\n",
    "        mozilla-bteam\n",
    "        mozilla-conduit\n",
    "        mozilla-extensions\n",
    "        mozilla-frontend-infra\n",
    "        mozilla-iam\n",
    "        mozilla-it\n",
    "        mozilla-jetpack\n",
    "        mozilla-l10n\n",
    "        mozilla-lockbox\n",
    "        mozilla-lockwise\n",
    "        mozilla-metrics\n",
    "        mozilla-mobile\n",
    "        mozilla-partners\n",
    "        mozilla-platform-ops\n",
    "        mozilla-private\n",
    "        mozilla-rally\n",
    "        mozilla-releng\n",
    "        mozilla-services\n",
    "        mozilla-spidermonkey\n",
    "        mozilla-standards\n",
    "        mozilla-svcops\n",
    "        mozilla-tw\n",
    "        mozmeao\n",
    "        nss-dev\n",
    "        nubisproject\n",
    "        projectfluent\n",
    "        taskcluster\n",
    "        \"\"\".split()\n",
    "        )\n",
    "    else:\n",
    "        failure_reason = \"No org list method enabled!\"\n",
    "\n",
    "except Exception as e:\n",
    "    failure_reason = f\"exception: {str(e)}\"\n",
    "    pass\n",
    "\n",
    "if failure_reason:\n",
    "    failed_initialization(failure_reason)\n",
    "\n",
    "print(f\"{len(orgs_to_check):3d} orgs to check.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fd03249",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "from IPython import get_ipython\n",
    "get_ipython()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72aeed1f",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### helpers (GitHub searches)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6e6e75d",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# rate limiting hack\n",
    "_last_query_interval: float = 0.0\n",
    "_add_seconds_heuristic: float = 1\n",
    "_min_seconds_between_calls: float = (30.0/60) + _add_seconds_heuristic\n",
    "\n",
    "_max_usable_users: int = 10\n",
    "\n",
    "# ToDo if rate limiting still an issue, could terminate after returning\n",
    "# _max_usable_users\n",
    "\n",
    "def _search_for_user(user: str):\n",
    "    # we can only rate limit between calls\n",
    "    global _last_query_interval, _min_seconds_between_calls\n",
    "    seconds_to_wait = (_last_query_interval + _min_seconds_between_calls) - time.monotonic()\n",
    "    if seconds_to_wait > 0:\n",
    "        # print(f\"_search_for_user: waiting {seconds_to_wait} seconds.\")\n",
    "        time.sleep(seconds_to_wait)\n",
    "        wait_for_quota(\"search\")\n",
    "    try:\n",
    "        for i, user in enumerate(gh.search_users(query=\"type:user \" + user)):\n",
    "            yield user\n",
    "            if i > _max_usable_users:\n",
    "                break\n",
    "        _last_query_interval = time.monotonic()\n",
    "    except Exception as e:\n",
    "        print_limits(e, verbose=True)\n",
    "        raise\n",
    "\n",
    "\n",
    "@lru_cache(maxsize=512)\n",
    "def _search_for_org(user):\n",
    "    l = list(gh.search_users(query=\"type:org \" + user))\n",
    "    display(f\"found {len(l)} potentials for {user}\")\n",
    "    return l\n",
    "\n",
    "@lru_cache(maxsize=512)\n",
    "def get_users(user):\n",
    "    # display(u\"SEARCH '{}'\".format(user))\n",
    "    l = list(_search_for_user(user))\n",
    "    display(f\"found {len(l)} potentials for {user}\")\n",
    "    yield from l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76f2f280",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "displayed_users = set()  # cache to avoid duplicate output\n",
    "\n",
    "\n",
    "def show_users(user_list, search_term):\n",
    "    global displayed_users, _max_usable_users\n",
    "    unique_users = set(user_list)\n",
    "    count = len(unique_users)\n",
    "    \n",
    "    if count > _max_usable_users:\n",
    "        # Even if there are too many, we still want to check the 'root' term, if it matched\n",
    "        try:\n",
    "            seed_user = gh.user(search_term)\n",
    "            if ' ' not in seed_user.login:\n",
    "                displayed_users.add(seed_user)\n",
    "                display(\n",
    "                    \"... too many to be useful, still trying '{}' ...\".format(\n",
    "                        seed_user.login\n",
    "                    )\n",
    "                )\n",
    "            else:\n",
    "                display(\n",
    "                    \"... too many to be useful, skipping phrase '{}' ...\".format(\n",
    "                        seed_user.login\n",
    "                    )\n",
    "                )\n",
    "        except github3.exceptions.NotFoundError as e:\n",
    "            display(f\"... too many to be useful, '{search_term}' is not a user\")\n",
    "    else:\n",
    "        for u in [x for x in unique_users if not x in displayed_users]:\n",
    "            displayed_users.add(u)\n",
    "            user = u.user.refresh()\n",
    "    if 0 < count <= _max_usable_users:\n",
    "        return [u.login for u in unique_users]\n",
    "    else:\n",
    "        return []\n",
    "\n",
    "\n",
    "from itertools import permutations\n",
    "\n",
    "\n",
    "def _permute_seeds(seeds):\n",
    "    if len(seeds) == 1:\n",
    "        yield seeds[0]\n",
    "    else:\n",
    "        for x, y in permutations(seeds, 2):\n",
    "            permutation = \" \".join([x, y])\n",
    "            display(f\"   trying phrase permutation {permutation}\")\n",
    "            yield permutation\n",
    "            permutation = \"\".join([x, y])\n",
    "            display(f\"   trying permutation {permutation}\")\n",
    "            yield permutation\n",
    "\n",
    "\n",
    "def gather_possibles(seeds):\n",
    "    found = set()\n",
    "    # sometimes get a phrase coming in - e.g. \"First Last\"\n",
    "    for seed in _permute_seeds(seeds.split()):\n",
    "        maybes = show_users(get_users(seed), seed)\n",
    "        found.update(maybes)\n",
    "        # if it was an email addr, try again with the mailbox name\n",
    "        if \"@\" in seed:\n",
    "            seed2 = seed.split(\"@\")[0]\n",
    "            # fix https://github.com/mozilla/github-org-scripts/issues/84\n",
    "            if len(seed2) >= 4:\n",
    "                display(f\"Searching for mailbox name '{seed2}' (gather_possibles)\")\n",
    "                maybes = show_users(get_users(seed2), seed2)\n",
    "                found.update(maybes)\n",
    "            else:\n",
    "                display(f\"Skipping search for '{seed2}' -- too short\")\n",
    "    return found"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e76c1648",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "class OutsideCollaboratorIterator(github3.structs.GitHubIterator):\n",
    "    def __init__(self, org):\n",
    "        super().__init__(\n",
    "            count=-1,  # get all\n",
    "            url=org.url + \"/outside_collaborators\",\n",
    "            cls=github3.users.ShortUser,\n",
    "            session=org.session,\n",
    "        )\n",
    "\n",
    "\n",
    "@lru_cache(maxsize=512)\n",
    "def get_collaborators(org):\n",
    "    collabs = [x.login.lower() for x in OutsideCollaboratorIterator(org)]\n",
    "    return collabs\n",
    "\n",
    "\n",
    "def is_collaborator(org, login):\n",
    "    return bool(login.lower() in get_collaborators(org))\n",
    "\n",
    "\n",
    "# provide same interface for members -- but the iterator is free :D\n",
    "@lru_cache(maxsize=512)\n",
    "def get_members(org):\n",
    "    collabs = [x.login.lower() for x in org.members()]\n",
    "    return collabs\n",
    "\n",
    "\n",
    "def is_member(org, login):\n",
    "    return bool(login.lower() in get_members(org))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7d190161",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "@lru_cache(maxsize=64)\n",
    "def get_org_owners(org):\n",
    "    owners = org.members(role=\"admin\")\n",
    "    logins = [x.login for x in owners]\n",
    "    return logins\n",
    "\n",
    "\n",
    "@lru_cache(maxsize=128)\n",
    "def get_inspectable_org_object(org_name):\n",
    "    try:\n",
    "        o = gh.organization(org_name)\n",
    "        # make sure we have enough chops to inspect it\n",
    "        get_org_owners(o)\n",
    "        is_member(o, \"qzu\" * 3)\n",
    "        is_collaborator(o, \"qzu\" * 3)\n",
    "    except github3.exceptions.NotFoundError:\n",
    "        o = None\n",
    "        display(f\"No such organization: '{org_name}'\")\n",
    "    except github3.exceptions.ForbiddenError as e:\n",
    "        o = None\n",
    "        display(f\"\\n\\nWARNING: Not enough permissions for org '{org_name}'\\n\\n\")\n",
    "    except Exception as e:\n",
    "        o = None\n",
    "        display(f\"didn't expect to get here: get_inspectable_org_object({org_name})\")\n",
    "        display(f\"  exception: {type(e)=}\")\n",
    "        from pprint import pformat\n",
    "        display(f\"{pformat(e)}\")\n",
    "    return o\n",
    "\n",
    "\n",
    "def check_login_perms(logins, headers=None, ldap=None):\n",
    "    any_perms = []\n",
    "    logins_with_hits = set()\n",
    "    any_perms.append(\"=\" * 30)\n",
    "    if headers:\n",
    "        any_perms.extend(headers)\n",
    "    if not len(logins):\n",
    "        any_perms.append(\"\\nFound no valid usernames\")\n",
    "    else:\n",
    "        any_perms.append(\n",
    "            \"\\nChecking {} usernames for membership in {} orgs\".format(\n",
    "                len(logins), len(orgs_to_check)\n",
    "            )\n",
    "        )\n",
    "        for login in logins:\n",
    "            start_msg_count = len(any_perms)\n",
    "            for org in orgs_to_check:\n",
    "                o = get_inspectable_org_object(org)\n",
    "                if o is None:\n",
    "                    continue\n",
    "                if is_member(o, login):\n",
    "                    url = \"https://github.com/orgs/{}/people?utf8=%E2%9C%93&query={}\".format(\n",
    "                        o.login, login\n",
    "                    )\n",
    "                    phonebook_url = f\"https://people.mozilla.org/a/ghe_{o.login}_users/\"\n",
    "                    msg = f\"FOUND! {o.login} has {login} as a member: {url}\"\n",
    "                    msg += f\"\\n\\tRemove from phonebook group if needed: {phonebook_url}\"\n",
    "                    owner_logins = get_org_owners(o)\n",
    "                    is_owner = login in owner_logins\n",
    "                    if is_owner:\n",
    "                        msg += f\"\\n  NOTE: {login} is an OWNER of {org}\"\n",
    "                    any_perms.append(msg)\n",
    "                if is_collaborator(o, login):\n",
    "                    url = \"https://github.com/orgs/{}/outside-collaborators?utf8=%E2%9C%93&query={}\".format(\n",
    "                        o.login, login\n",
    "                    )\n",
    "                    any_perms.append(\n",
    "                        \"FOUND! {} has {} as a collaborator: {}\".format(\n",
    "                            o.login, login, url\n",
    "                        )\n",
    "                    )\n",
    "            else:\n",
    "                end_msg_count = len(any_perms)\n",
    "                if end_msg_count > start_msg_count:\n",
    "                    # some found, put a header on it, the add blank line\n",
    "                    any_perms.insert(\n",
    "                        start_msg_count,\n",
    "                        \"\\nFound {:d} orgs for {}:\".format(\n",
    "                            end_msg_count - start_msg_count, login\n",
    "                        ),\n",
    "                    )\n",
    "                    any_perms.append(\"\")\n",
    "                    logins_with_hits.add(login)\n",
    "                else:\n",
    "                    any_perms.append(f\"No permissions found for {login}\")\n",
    "    return any_perms, logins_with_hits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6d0edbd",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def extract_addresses(text):\n",
    "    \"\"\"Get email addresses from text.\"\"\"\n",
    "    # ASSUME that text is a list of email addresses (possibly empty)\n",
    "    if not text:\n",
    "        return []\n",
    "    #     print(\"before: %s\" % text)\n",
    "    text = text.replace(\"[\", \"\").replace(\"]\", \"\").replace(\"b'\", \"\").replace(\"'\", \"\")\n",
    "    #     print(\"after: %s\" % text)\n",
    "    #     print(\" split: %s\" % text.split())\n",
    "    return text.split()\n",
    "    # raise ValueError(\"couldn't parse '{}'\".format(text))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c55db615",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### ACL Routines"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62f0d438",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "##### Filtering Routines"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0adf40d",
   "metadata": {
    "hidden": true
   },
   "source": [
    "Filtering is split into 3 cells:\n",
    "1. a test to ensure it's not broken when changes are made\n",
    "2. the exception lists, which are the most common update\n",
    "3. the filter code\n",
    "4. the code to invoke the tests\n",
    "\n",
    "Any change to #2 or #3 will re-run the tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a353aff4",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# tests for every load of the filtering code\n",
    "# NB these urls are NOT the search URL, they are the URL from the hit\n",
    "test_url = [\n",
    "    \"\"\"https://github.com/mozilla-services/foxsec-results/blob/47f31f014cf21dc6e7e774ddc28e51a6f9eeba54/bucketlister/README.md\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-services/product-delivery-tools/blob/47f31f014cf21dc6e7e774ddc28e51a6f9eeba54/bucketlister/README.markdown\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-services/cloudops-docs/blob/0ff6ea92e394784aef55abd4b9f8b5d26306fe4b/TeamDiagrams/service_registry.csv\"\"\",\n",
    "    \"\"\"https://github.com/mozilla/participation-metrics-identities/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"\"\",\n",
    "    # should skip - org\n",
    "    \"\"\"https://github.com/fxos/participation-metrics-identities/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"\"\",\n",
    "    # should skip - regexp repo\n",
    "    \"\"\"https://github.com/mozilla-releng/take-home-assignment-no-such-repo/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"\"\",\n",
    "    # should skip - below 'tests' or 'test' directory\n",
    "    \"\"\"https://github.com/mozilla/participation-metrics-identities/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/tests/should-be-skipped.csv\"\"\",\n",
    "    \"\"\"https://github.com/Pocket/Android/blob/54abeefaa252a4b1c279f7c5bf571082f2283dc1/sync-pocket-android/src/test/resources/mock/getNotifications.json\"\"\", \n",
    "    \"\"\"https://github.com/mozilla-services/addons-code-corpus/blob/5b5646ad2390ba1b22f64ca62953ec3b260c0da7/train/bad/872325/chat/index.html\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-services/addons-code-corpus/blob/5b5646ad2390ba1b22f64ca62953ec3b260c0da7/training/bad/872325/chat/index.html\"\"\",\n",
    "    \"\"\"https://github.com/Pocket/iOS/blob/d69247be8b8987038b45406e168853d59e72e1ef/Listen/Listen/Sample/Definitions/PKTListenAppTheme.m\"\"\",\n",
    "    \"\"\"https://github.com/Pocket/particle/blob/8e5e3b45766fb3f93fbf28d8170f522aa8ba0e4f/convert/samples/input/html/audiemega2/1685443536.html\"\"\",\n",
    "    # should skip based on filename\n",
    "    \"\"\"https://github.com/Pocket/particle/blob/8e5e3b45766fb3f93fbf28d8170f522aa8ba0e4f/convert/samples/input/html/audiemega2/Makefile\"\"\",\n",
    "    \"\"\"https://github.com/Pocket/particle/blob/8e5e3b45766fb3f93fbf28d8170f522aa8ba0e4f/convert/samples/input/html/audiemega2/pyproject.toml\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-it/cloudalerts/blob/c3721d1d17f5e987cdc60f3d3d0c161a0b04b5ac/Dockerfile\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-it/it-sre-bot/blob/27c7a5e11ed48aa38baf676c6406fb3dcb116eaf/package-lock.json\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-services/splunk-ops/blob/ece2052d252d763fa14382f7d114f7c38e5cb31b/splunk_apps/corelight-app-for-splunk/CorelightForSplunk/bin/corelightforsplunk/aob_py3/httplib2/cacerts.txt\"\"\",\n",
    "    \"\"\"https://github.com/mozilla-mobile/firefox-android/blob/f3b4a9e8c806fcb51906a0db15d67c95a9a383ba/android-components/components/browser/errorpages/src/main/res/values-is/strings.xml\"\"\",\n",
    "    # should pass (makefile != Makefile)\n",
    "    \"\"\"https://github.com/mozilla-services/cloudops-docs/blob/0ff6ea92e394784aef55abd4b9f8b5d26306fe4b/TeamDiagrams/makefile\"\"\",\n",
    "]\n",
    "test_good = [\n",
    "    None,  # should be skipped file extension\n",
    "    None,  # should be skipped file extension\n",
    "    (\"\"\"https://github.com/search?type=Code&ref=advsearch&q=repo%3Amozilla-services/cloudops-docs+path%3A\"TeamDiagrams/service_registry.csv\"+oremj\"\"\",\n",
    "    \"mozilla-services/cloudops-docs\", \"TeamDiagrams\", \"service_registry.csv\"),\n",
    "    (\"\"\"https://github.com/search?type=Code&ref=advsearch&q=repo%3Amozilla/participation-metrics-identities+path%3A\"Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"+oremj\"\"\",\n",
    "    \"mozilla/participation-metrics-identities\", \"\", \"Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"),\n",
    "    None,  # should be skipped org\n",
    "    None,  # should be skipped matching repo regex\n",
    "    None,  # skipped 'tests' in path\n",
    "    None,  # skipped 'test' in path\n",
    "    None,  # skipped 'train' in path\n",
    "    None,  # skipped 'training' in path (matches substring)\n",
    "    None,  # skipped 'sample' in path\n",
    "    None,  # skipped 'samples' in path (matches substring)\n",
    "    None,  # skipped: Makefile\n",
    "    None,  # skipped: pyproject.toml\n",
    "    None,  # skipped: Dockerfile\n",
    "    None,  # skipped: package-lock.json\n",
    "    None,  # skipped: cacerts.txt\n",
    "    None,  # skipped: strings.xml\n",
    "    (\"\"\"https://github.com/search?type=Code&ref=advsearch&q=repo%3Amozilla-services/cloudops-docs+path%3A\"TeamDiagrams/makefile\"+oremj\"\"\",\n",
    "    \"mozilla-services/cloudops-docs\", \"TeamDiagrams\", \"makefile\"),\n",
    "]\n",
    "test_login = \"oremj\"\n",
    "_test_ignore_filters_invocation_count = 0\n",
    "\n",
    "_debug_test_case_num = 0  # leave at 0 for no debug, -1 to find which test case is failing\n",
    "\n",
    "def _test_ignore_filters():\n",
    "    global _test_ignore_filters_invocation_count\n",
    "    _test_ignore_filters_invocation_count += 1\n",
    "    if _test_ignore_filters_invocation_count == 1:\n",
    "        # first invocation means both cells needed for test haven't been loaded yet\n",
    "        return \"skipped awaiting code load\"\n",
    "    test_case_num = 0\n",
    "    for test, success in zip(test_url, test_good):\n",
    "        test_case_num += 1\n",
    "        do_debug = (_debug_test_case_num == test_case_num)\n",
    "        if _debug_test_case_num:\n",
    "            print(f\"case {test_case_num}; debug {do_debug}\")\n",
    "        actual = search_hit_to_url(test, test_login, debug=do_debug)\n",
    "        if actual != success:\n",
    "            print(f\"test case {test_case_num}: {test}\")\n",
    "            print(f\" received: {actual}\")\n",
    "            print(f\" expected: {success}\")\n",
    "            for a, e in zip(actual, success):\n",
    "                print(f\"{a==e  !r:5}: '{a}'\\n       '{e}'\")\n",
    "            raise SystemExit(\"unit test failed\")\n",
    "    return \"ignore filter tests pass\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97dd614b",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# items to ignore -- all heuristically derived\n",
    "\n",
    "# only add extensions or repos that could NEVER contain an ACL definition\n",
    "filenames_to_skip = {\"setup.py\", \"pyproject.toml\", \"requirements.txt\", \"Makefile\", \n",
    "                     \"Dockerfile\", \"package-lock.json\", \"cacerts.txt\", \"strings.xml\"}\n",
    "extensions_to_skip = ( \".ics\", \".md\", \".markdown\", \".rst\", \".der\", \".pem\", \".crt\", \n",
    "                      \".html\", \".htm\", \".svg\", \".bib\", \".po\", )\n",
    "\n",
    "# some orgs have patterns for repo names, take advantage of that\n",
    "repos_to_skip_regexp = set((\n",
    "    \"mozilla-services/foxsec-results\",\n",
    "    \"mozilla-services/cloudops-jenkins\",\n",
    "    \"mozilla-services/cloudqa-jenkins\",\n",
    "    \"mozilla/gecko-dev\",\n",
    "    \"mozilla/eu2019-ad-transparency-report\",\n",
    "    \"mdn/archived-content\",\n",
    "    \"mozilla-releng/take-home-assignment.*\",  # interview tests\n",
    "    \"mozilla-it/www-archive.mozilla.org\",\n",
    "    \"mdn/retired.*content\",  # history\n",
    "    \"mozilla-it/sumo-l10n.*\",\n",
    "    \"mozmeao/sumo-l10n.*\",\n",
    "    \"mozmeao/www-l10n\",\n",
    "    \"mozilla-services/ms-language-packs\",\n",
    "    \"mozilladatascience/search-terms-sanitization\",\n",
    "    \"Pocket/AndroidHiring\",\n",
    "    \"Pocket/Localization\",\n",
    "    \"Pocket/data-explorations\",\n",
    "    \"mdn/translated-content\",\n",
    "    \"Pocket/parser-benchmark\",\n",
    "    \"mozilla/releases_insights\",\n",
    "))\n",
    "# convert to single regexp & compile\n",
    "# based on https://stackoverflow.com/a/3040797/5128493\n",
    "import re\n",
    "RE_REPO_TO_SKIP = re.compile(\"(?:\" + \")|(?:\".join(repos_to_skip_regexp) + \")\", re.IGNORECASE)\n",
    "\n",
    "# These orgs are guaranteed not to have any current ACLs in them\n",
    "# - could be expanded for any parked or archived org\n",
    "orgs_to_skip = (\n",
    "    \"fxos\",\n",
    "    \"fxos-eng\",\n",
    "    \"mozilla-b2g\",\n",
    "    \"moco-ghe-admin\",\n",
    "    \"mozilla-l10n\",  # only translations, no apps or services\n",
    "    \"common-voice\",  # not supported by IT\n",
    ")\n",
    "# skip anything in a `tests` directory\n",
    "# ToDo: consider making regexp, would allow path /data/ & /changelog.*/ to be added\n",
    "paths_to_skip = (\n",
    "    \"test\",\n",
    "    \"train\",  # all the AI these days\n",
    "    \"sample\",\n",
    "    \"locales\",  # l10n stuff\n",
    "    \"translations\",\n",
    "    \"template\",\n",
    "    \"resources\",\n",
    ")\n",
    "\n",
    "# test on any change\n",
    "print(_test_ignore_filters())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb5e4999",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# we get some insane counts sometimes, along with 404s as all of these\n",
    "# results are based on an index GitHub created some time ago, and include sha1\n",
    "# references. E.g. 404 link:\n",
    "#    https://github.com/mozilla-services/foxsec-results/blob/70a5b7841edcdb967beddbce75309efa0bc2b687/aws-pytest/cloudservices-aws-stage/one-offs/cloudservices-aws-stage-2018-01-31-secgroup-service-report.md/search?q=oremj&type=code\n",
    "# valid version\n",
    "#    https://github.com/mozilla-services/foxsec-results/blob/master/aws-pytest/cloudservices-aws-stage/one-offs/cloudservices-aws-stage-2018-01-31-secgroup-service-report.md\n",
    "# However, that URL won't support a search endpoint, so what we _really_ want is\n",
    "#    https://github.com/search?q=oremj+repo%3Amozilla-services%2Ffoxsec-results++path%3Aaws-pytest%2Fcloudservices-aws-stage%2Fone-offs%2F+filename%3Acloudservices-aws-stage-2018-01-31-secgroup-service-report.md&type=Code&ref=advsearch\n",
    "#    https://github.com/search?q=repo%3Amozilla-services/foxsec-results%20path%3Aaws-pytest/cloudservices-aws-stage/one-offs%20filename%3Acloudservices-aws-stage-2018-01-31-secgroup-service-report.md%20fred&type=code&ref=advsearch\n",
    "#\n",
    "# rebuild the file hit url into what we want\n",
    "\n",
    "from urllib.parse import urlparse, urlunparse, quote_plus\n",
    "#        from IPython.core.debugger import set_trace; set_trace()\n",
    "\n",
    "\n",
    "def ignore_path(url_list, debug=False):\n",
    "    for ignorable in paths_to_skip:\n",
    "        # we want to do partial matching, so can't use list comprehension\n",
    "        for element in [x.lower() for x in url_list[1:-1]]:\n",
    "            if ignorable in element:\n",
    "                if debug:\n",
    "                    print(f\"Ignoring {'/'.join(url_list)}\")\n",
    "                return True\n",
    "    return False\n",
    "\n",
    "def search_hit_to_url(url, login=None, debug=False):\n",
    "    # split into components\n",
    "    parts = urlparse(url)\n",
    "    # break down the path\n",
    "    path_parts = parts.path.split('/')\n",
    "    if path_parts[1] in orgs_to_skip:\n",
    "        # ignore orgs with no relevant ACL\n",
    "        if debug:\n",
    "            print(f\"ignoring based on org '{path_parts[1]}'\")\n",
    "        return\n",
    "    repo = '/'.join(path_parts[1:3])\n",
    "#     if repo in repos_to_skip:\n",
    "    if RE_REPO_TO_SKIP.match(repo):\n",
    "        if debug:\n",
    "            print(f\"ignoring based on repo '{repo}'\")\n",
    "            print(f\"pattern: r'{RE_REPO_TO_SKIP.pattern}'\")\n",
    "        return\n",
    "    if ignore_path(path_parts, debug=debug):\n",
    "        if debug:\n",
    "            print(f\"ignoring based on path: {'/'.join(path_parts)}\")\n",
    "        return\n",
    "    filename = path_parts[-1]\n",
    "#     from pprint import pprint\n",
    "#     pprint(path_parts)\n",
    "#     print(f\"{repo}; {filename}\")\n",
    "    try:\n",
    "#         from IPython.core.debugger import Pdb; Pdb().set_trace()\n",
    "        if filename[filename.rindex('.'):] in extensions_to_skip:\n",
    "            if debug:\n",
    "                print(f\"ignoring due to extension '{filename[filename.rindex('.'):]}'\")\n",
    "            return\n",
    "    except ValueError:\n",
    "        # file didn't have extension, so process it\n",
    "        pass\n",
    "    if filename in filenames_to_skip:\n",
    "        if debug:\n",
    "            print(f\"ignoring due to filename '{filename}'\")\n",
    "        return\n",
    "    basepath = path_parts[3:-1]\n",
    "    if basepath[0] == \"blob\":\n",
    "        # get rid of 'blob' and sha1\n",
    "        basepath = basepath[2:]\n",
    "        \n",
    "    # build the new query string\n",
    "    # With the newer syntax, everything can be in one \"path\" filter\n",
    "    basepath = '/'.join(basepath)\n",
    "    filename_filter = 'path:\"'\n",
    "    if basepath:\n",
    "        filename_filter += f\"{basepath}/\"\n",
    "    filename_filter += f'{filename}\"'\n",
    "    # the actual query needs to be form encoded, so use quote_plus. \n",
    "    # We make unusual characters safe to get the correct result when working with file with embedded spaces.\n",
    "    # Note - won't currently work with paths with spaces\n",
    "    query = quote_plus(f\"\"\"repo:{repo} {filename_filter} {login}\"\"\", safe='/%\"')\n",
    "    query_string = f\"type=Code&ref=advsearch&q={query}\"\n",
    "        \n",
    "    # now rebuild the url\n",
    "    new_url = urlunparse((\n",
    "        parts.scheme,\n",
    "        parts.netloc,\n",
    "        \"search\",\n",
    "        None, # params\n",
    "        query_string,\n",
    "        None, # fragment\n",
    "    ))\n",
    "    return new_url, repo, basepath, filename"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa87e0b3",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# test on any change - put in own cell, so failure obvious\n",
    "print(_test_ignore_filters())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25536c3e",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "##### ACL reporting code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ccc2eb4",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def prune_hits_to_ignore(full_list, id_to_find):\n",
    "    # remove vulnerability repos (*-ghsa-*) and archived repos (archive status \n",
    "    # requires refresh of repository object\n",
    "    hit_list_1 = [r for r in full_list if (not \"-ghsa-\" in r.repository.name)\n",
    "                                     and (not r.repository.refresh().archived)\n",
    "                 ]\n",
    "    # now eliminate any hits where the search term was not found \"as a word\"\n",
    "    id_re = re.compile(fr\"\\b{id_to_find}\\b\", re.IGNORECASE)\n",
    "    hit_list_2 = []\n",
    "#     print(f\"Checking {len(hit_list_1)} hits\")\n",
    "    for index, hit in enumerate(hit_list_1):\n",
    "#         print(f\"  Hit {index} has {len(hit.text_matches)} contexts\")\n",
    "        for ctxt, context in enumerate(hit.text_matches):\n",
    "            if id_re.search(context[\"fragment\"]):\n",
    "                hit_list_2.append(hit)\n",
    "#                 print(f\"Adding hit {index}; context {ctxt} ({len(hit_list_2)=}): {context['fragment']}\")\n",
    "                break\n",
    "            else:\n",
    "#                 print(f\"ignoring context {context['fragment']}\")\n",
    "                ...\n",
    "#     print(f\"returning {len(hit_list_2)} hits\")\n",
    "    return hit_list_2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74a23434",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "import csv, io\n",
    "def check_for_acls(logins):\n",
    "\n",
    "    \"\"\" Check for these items in code, could be an acl to be removed\n",
    "    \n",
    "    Note that we haven't pruned logins to just the orgs we found hits on -- we're using all GitHub logins. May want to modify in the future.\n",
    "    \"\"\"\n",
    "    possibles = set(logins)\n",
    "\n",
    "    # we're now outputing in CSV format, so put in a header line\n",
    "    csvfile = io.StringIO()\n",
    "    writer = csv.writer(csvfile)\n",
    "    writer.writerow([\"Action Taken\", \"Comment\", \"\", \"Context\", \"File\", \"Search URL\", \"Raw Context\"])\n",
    "    # add formula to use for copy down in R2C3 - still requires manual intervention\n",
    "    #  1. in cell C3 select, edit, and enter to make real formula\n",
    "    #  2. fill down for all rows in sheet\n",
    "    writer.writerow([\"\", \"\", '=if(ISBLANK(F2),\"\", HYPERLINK(F2,\"?\"))', '=if(isblank(G2),,SUBSTITUTE(G2,\"\\\\n\",char(10)))', \"\", \"\"])\n",
    "    writer.writerow([\"\"] * 4)\n",
    "    writer.writerow([f\"Checking for possible ACLs for: {', '.join(possibles)}\", \"\", \"\",])\n",
    "    writer.writerow([\"\"] * 4)\n",
    "#     import pdb ; pdb.set_trace()\n",
    "#     from IPython.core.debugger import set_trace; set_trace()\n",
    "\n",
    "\n",
    "    # needed for exception handling below\n",
    "    import http\n",
    "\n",
    "    for org in orgs_to_check:\n",
    "#         print(f\" {org}..\", end='')\n",
    "        for l in possibles:\n",
    "            full_list = []\n",
    "            hit_iter = gh.search_code(query=f\"org:{org} {l}\", text_match=True)\n",
    "            assume_time_out = True\n",
    "            while assume_time_out:\n",
    "                try:\n",
    "                    # 2023-05-25 can't use regex in code search, so return context for further processing\n",
    "                    # 2023-06-16 can't get as list in one shot, as it could consume search query limit every restart\n",
    "                    #            but, even this isn't \"right\" -- some state in\n",
    "                    #            the iterator appears to be changed prior to\n",
    "                    #            the network call. I.e. even if the network\n",
    "                    #            call fails, the iterator has \"advanced\", and\n",
    "                    #            we're missing some elements.\n",
    "                    #\n",
    "                    #            I think the only way around that is to do our\n",
    "                    #            own iterator, so we can specifically identify\n",
    "                    #            _when_ we're making network calls, vs\n",
    "                    #            unpacking another element from the last call.\n",
    "                    #            Or a hack to reach that deep inside the\n",
    "                    #            iterator. :/\n",
    "                    hit = hit_iter.next()\n",
    "                    full_list.append(hit)\n",
    "                    print(f\"so far {len(full_list)}\")\n",
    "                except StopIteration:\n",
    "                    # we have to handle explicily - just set flag\n",
    "                    assume_time_out = False\n",
    "                except Exception as e:\n",
    "                    if isinstance(e, http.client.RemoteDisconnected):\n",
    "                        # This is \"fun\" to run into - doesn't happen very often\n",
    "                        # so this recovery is an educated guess (the time I\n",
    "                        # did see it, it was after a 'resumed' message from\n",
    "                        # the clause below)\n",
    "                        for i in range(3):\n",
    "                            try_login()\n",
    "                            if gh:\n",
    "                                # re-established connection\n",
    "                                print(f\"re-established connection on try {i+1}\")\n",
    "                                break\n",
    "                            else:\n",
    "                                time.sleep(60)\n",
    "                        else:\n",
    "                            print(f\"failed to re-establish connection after {i+1} tries\")\n",
    "                            raise SystemExit\n",
    "                    elif not hasattr(e, 'code'):\n",
    "                        print(f\"org={org} l={l} exception={str(e)} (exception type {type(e)})\")\n",
    "                    elif e.code not in [403, 422]:\n",
    "                        print(f\"org={org} l={l} exception={str(e)}\")\n",
    "                    elif e.code in [403]:\n",
    "                        seconds_to_wait = 7\n",
    "                        print(f\"Out of Code Search API calls, waiting {seconds_to_wait} seconds ({org=}, {l=}) ..\", end='')\n",
    "                        # we can hit this a lot, so just wait a minute - only 10 req/min\n",
    "                        #  per https://docs.github.com/en/enterprise-cloud@latest/rest/search?apiVersion=2022-11-28#rate-limit\n",
    "                        time.sleep(seconds_to_wait)\n",
    "                        print(\"... resumed.\")\n",
    "                    # we've reported on everything of interest, no need for else clause\n",
    "    #                 else:\n",
    "    #                     print(f\"Got code {e.code} for org {org}, search {l}\")\n",
    "\n",
    "            hit_list = prune_hits_to_ignore(full_list, l)\n",
    "\n",
    "            search_urls = []\n",
    "            for search_hit in hit_list:\n",
    "                new_url = search_hit_to_url(search_hit.html_url, l, debug=False)\n",
    "                if new_url:\n",
    "                    # add the matching fragments as the 2nd item of a tupple\n",
    "                    context = \"\\n----\\n\".join([m['fragment'] for m in search_hit.text_matches])\n",
    "                    search_urls.append((*new_url, context.replace(\"\\n\", \"\\\\n\")))\n",
    "            num_raw_search_urls = len(search_urls)\n",
    "            search_urls = set(search_urls)\n",
    "            num_search_urls = len(search_urls)\n",
    "#             print(f\"search results: {len(hit_list)}; after translation: {num_raw_search_urls}; after dedupe: {num_search_urls}\")\n",
    "            if num_search_urls > 0:\n",
    "                writer.writerow(['', f\"{num_search_urls} files with possible ACLs in {org} for {l}:\", \"\", \"\"])\n",
    "                for url, repo, path, filename, context in sorted(search_urls):\n",
    "                    # output in csv format\n",
    "                    writer.writerow([\"\", \"\", \"\", \"\", f\"{repo}/{path}/{filename}\", f\"{url}\", context])\n",
    "            # import pdb ; pdb.set_trace()\n",
    "    csvfile.seek(0)\n",
    "    hits = [l.strip() for l in csvfile.readlines()]\n",
    "    return hits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "369f2337",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def check_github_acls(logins):\n",
    "    logins_to_check = set(logins.split())\n",
    "    # import pdb; pdb.set_trace()\n",
    "    msgs = check_for_acls(logins_to_check)\n",
    "    print(\"=\" * 35)\n",
    "    display(*msgs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb5a02ad",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### Heroku support"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b59d62ad",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def check_heroku_logins(logins):\n",
    "    logins_to_check = set(logins.split())\n",
    "    # import pdb; pdb.set_trace()\n",
    "    for login in logins_to_check:\n",
    "        print(\"\\nworking on %s:\" % login)\n",
    "        show_heroku_data(login)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fb098a1",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "# let user start manual work before we do all the GitHub calls\n",
    "def show_heroku_data(primary_email:str) -> None:\n",
    "    display(\"Check these URLs for Heroku activity:\")\n",
    "    display(\n",
    "        \"  Heroku Access: https://people.mozilla.org/a/heroku-members/edit?section=members\"\n",
    "    )\n",
    "    display(f\"     copy/paste for ^^ query:  :{primary_email}:  \")\n",
    "    display(\n",
    "        \"  People: https://people.mozilla.org/s?who=all&query={}\".format(\n",
    "            primary_email.replace(\"@\", \"%40\")\n",
    "        )\n",
    "    )\n",
    "    display(\n",
    "        \"  Heroku: https://dashboard.heroku.com/teams/mozillacorporation/access?filter={}\".format(\n",
    "            primary_email.replace(\"@\", \"%40\")\n",
    "        )\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7c90ee1",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### main driver"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db4a9236",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "import re\n",
    "import os\n",
    "\n",
    "re_flags = re.MULTILINE | re.IGNORECASE\n",
    "\n",
    "\n",
    "def process_from_email(email_body):\n",
    "    # get rid of white space\n",
    "    email_body = os.linesep.join(\n",
    "        [s.strip() for s in email_body.splitlines() if s.strip()]\n",
    "    )\n",
    "    if not email_body:\n",
    "        return\n",
    "\n",
    "    user = set()\n",
    "\n",
    "    # Extract data from internal email format\n",
    "    match = re.search(r\"^Full Name: (?P<full_name>\\S.*)$\", email_body, re_flags)\n",
    "    if match:\n",
    "        # add base and some variations\n",
    "        full_name = match.group(\"full_name\")\n",
    "        user.add(full_name)\n",
    "        # remove spaces, forward & reversed\n",
    "        user.add(full_name.replace(\" \", \"\"))\n",
    "        user.add(\"\".join(full_name.split()[::-1]))\n",
    "        # use hypens, forward & reversed\n",
    "        user.add(full_name.replace(\" \", \"-\"))\n",
    "        user.add(\"-\".join(full_name.split()[::-1]))\n",
    "\n",
    "    match = re.search(r\"^Email: (?P<primary_email>.*)$\", email_body, re_flags)\n",
    "    primary_email = match.group(\"primary_email\") if match else None\n",
    "    user.add(primary_email)\n",
    "    default_login = primary_email.split(\"@\")[0].lower() if primary_email else None\n",
    "    if default_login:\n",
    "        # add some common variations that may get discarded for \"too many\" matches\n",
    "        user.update(\n",
    "            [\n",
    "                f\"moz{default_login}\",\n",
    "                f\"moz-{default_login}\",\n",
    "                f\"mozilla{default_login}\",\n",
    "                f\"mozilla-{default_login}\",\n",
    "                f\"{default_login}moz\",\n",
    "                f\"{default_login}-moz\",\n",
    "            ]\n",
    "        )\n",
    "\n",
    "    if not defer_heroku_lookup:\n",
    "        show_heroku_data(str(primary_email))\n",
    "    display(email_body)\n",
    "\n",
    "    match = re.search(r\"^Github Profile: (?P<github_profile>.*)$\", email_body, re_flags)\n",
    "    declared_github = match.group(\"github_profile\") if match else None\n",
    "    user.add(declared_github)\n",
    "    display(f\"Declared GitHub {declared_github}\")\n",
    "\n",
    "    # check CIS for verified login (not all users will have creds)\n",
    "    verified_github_login, first_name = check_CIS(primary_email)\n",
    "    if verified_github_login:\n",
    "        user.add(verified_github_login)\n",
    "        display(f\"Verified GitHub {verified_github_login}\")\n",
    "\n",
    "    match = re.search(r\"^Zimbra Alias: (?P<other_email>.*)$\", email_body, re_flags)\n",
    "    possible_aliases = extract_addresses(match.group(\"other_email\") if match else None)\n",
    "    user.update(possible_aliases)\n",
    "\n",
    "    # new field: Email Alias -- list syntax (brackets)\n",
    "    match = re.search(r\"^Email Alias: \\s*\\[(?P<alias_email>.*)\\]\", email_body, re_flags)\n",
    "    user.add(match.group(\"alias_email\") if match else None)\n",
    "\n",
    "    # we consider each token in the IM line as a possible GitHub login\n",
    "    match = re.search(r\"^IM:\\s*(.*)$\", email_body, re_flags)\n",
    "    if match:\n",
    "        im_line = match.groups()[0]\n",
    "        matches = re.finditer(r\"\\W*((\\w+)(?:\\s+\\w+)*)\", im_line)\n",
    "        user.update([x.group(1) for x in matches] if matches else None)\n",
    "\n",
    "    match = re.search(r\"^Bugzilla Email: (?P<bz_email>.*)$\", email_body, re_flags)\n",
    "    user.add(match.group(\"bz_email\") if match else None)\n",
    "\n",
    "    # grab the department name, for a heuristic on whether we expect to find perms\n",
    "    expect_github_login = False\n",
    "    match = re.search(r\"^\\s*Dept Name: (?P<dept_name>\\S.*)$\", email_body, re_flags)\n",
    "    if match and not verified_github_login:\n",
    "        department_name = match.groups()[0].lower()\n",
    "        dept_keys_infering_github = [\"firefox\", \"engineering\", \"qa\", \"operations\"]\n",
    "        for key in dept_keys_infering_github:\n",
    "            if key in department_name:\n",
    "                expect_github_login = True\n",
    "                break\n",
    "\n",
    "    # clean up some noise, case insensitively, \"binary\" markers\n",
    "    user = {x.lower() for x in user if x and (len(x) > 2)}\n",
    "    to_update = [x[2:-1] for x in user if (x.startswith(\"b'\") and x.endswith(\"'\"))]\n",
    "    user.update(to_update)\n",
    "    user = {x for x in user if not (x.startswith(\"b'\") and x.endswith(\"'\"))}\n",
    "\n",
    "    # the tokens to ignore are added based on discovery,\n",
    "    # they tend to cause the searches to get rate limited.\n",
    "    user = user - {\n",
    "        None,\n",
    "        \"irc\",\n",
    "        \"slack\",\n",
    "        \"skype\",\n",
    "        \"b\",\n",
    "        \"hotmail\",\n",
    "        \"mozilla\",\n",
    "        \"ro\",\n",
    "        \"com\",\n",
    "        \"softvision\",\n",
    "        \"mail\",\n",
    "        \"twitter\",\n",
    "        \"blog\",\n",
    "        \"https\",\n",
    "        \"jabber\",\n",
    "        \"net\",\n",
    "        \"github\",\n",
    "        \"gmail\",\n",
    "        \"facebook\",\n",
    "        \"guy\",\n",
    "        \"pdx\",\n",
    "        \"yahoo\",\n",
    "        \"aim\",\n",
    "        \"whatsapp\",\n",
    "        \"gtalk\",\n",
    "        \"google\",\n",
    "        \"gpg\",\n",
    "        \"telegram\",\n",
    "        \"keybase\",\n",
    "        \"zoom\",\n",
    "        \"name\",\n",
    "    }\n",
    "    global displayed_users\n",
    "    displayed_users = set()\n",
    "    try:\n",
    "        headers = [\n",
    "            \"Search seeds: '{}'\".format(\"', '\".join(user)),\n",
    "        ]\n",
    "        display(*headers)\n",
    "        guesses = set()\n",
    "        for term in user:\n",
    "            possibles = gather_possibles(term)\n",
    "            guesses.update({x.lower() for x in possibles})\n",
    "        # include declared_github if it exists\n",
    "        if declared_github:\n",
    "            guesses.add(declared_github.lower())\n",
    "        guesses.update({x.login.lower() for x in displayed_users})\n",
    "        display(f\"Checking logins {guesses}\")\n",
    "        msgs = []\n",
    "        msgs, logins_with_hits = check_login_perms(guesses, headers)\n",
    "        display(f\"resetting guesses to found hits: {logins_with_hits}\")\n",
    "        guesses = logins_with_hits\n",
    "        found_perms = \"FOUND!\" in \"\".join(msgs)\n",
    "        display(f\"msgs {len(msgs)}; headers {len(headers)}\")\n",
    "        display(\n",
    "            \"found_perms {}; declared_github {} {}\".format(\n",
    "                found_perms, declared_github, bool(declared_github)\n",
    "            )\n",
    "        )\n",
    "\n",
    "        if declared_github and not found_perms:\n",
    "            msgs.append(f\"Even for declared login '{declared_github}'.\")\n",
    "        if expect_github_login and not found_perms:\n",
    "            msgs.append(\n",
    "                \"WARNING: expected GitHub permissions for dept '{}'\".format(\n",
    "                    department_name\n",
    "                )\n",
    "            )\n",
    "        \n",
    "        # check for GitHub login or ldap in a file (might be permissions)\n",
    "        guesses.add(default_login)\n",
    "        print(f\"before: guesses {guesses}; default {default_login}\")\n",
    "        if first_name and not primary_email.endswith(\"mozilla.com\"):\n",
    "            # for non-MoCo emails, the the primary email stem might be their first name\n",
    "            # which leads to an insane amount of false positives, so remove it\n",
    "            # the assumption is that they would not be in any ACL files, anyway\n",
    "            guesses -= {first_name.lower()}\n",
    "        print(f\" after: guesses {guesses}; default {default_login}\")\n",
    "        if defer_acl_lookup:\n",
    "            display(\"Defering search for possible ACLs\")\n",
    "            msgs.append(f\"Check for ACLs using the following values: {' '.join([str(x) for x in guesses])}\")\n",
    "        else:\n",
    "            display(\"Looking for possible ACLs\")\n",
    "            new_msgs = check_for_acls(guesses)\n",
    "            msgs.extend(new_msgs)\n",
    "        msgs.append(\"Finished all reporting.\")\n",
    "        display(*msgs)\n",
    "    except github3.exceptions.ForbiddenError as e:\n",
    "        print_limits(e)\n",
    "        raise e"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a0c94cb",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "from ipywidgets import interact_manual, Layout, widgets\n",
    "from IPython.display import display\n",
    "\n",
    "text = widgets.Textarea(\n",
    "    value=\"email: \\nim: \",\n",
    "    placeholder=\"Paste ticket description here!\",\n",
    "    description=\"Email body:\",\n",
    "    layout=Layout(width=\"95%\"),\n",
    "    disabled=False,\n",
    ")\n",
    "\n",
    "run_process = interact_manual.options(manual_name=\"Process\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "309442de",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def display(*args):\n",
    "    # iPyWidgets don't like unicode - ensure everything we try to put there is ascii\n",
    "    text = \"\\n\".join(\n",
    "        [str(x) for x in args]\n",
    "    )  # deal with None values by casting to unicode\n",
    "    # python 3 no longer requires us to play the convert-to-ascii game\n",
    "    cleaned = text  #.encode(\"ascii\", \"replace\")\n",
    "    if cleaned.strip():\n",
    "        print(str(cleaned))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "70531388",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "def check_github_logins(logins):\n",
    "    logins_to_check = set(logins.split())\n",
    "    # import pdb; pdb.set_trace()\n",
    "    for login in logins_to_check:\n",
    "        print(\"\\nworking on %s:\" % login)\n",
    "        msgs = check_login_perms([login])\n",
    "        display(*msgs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c18e3641",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### Cache Tuning & Clearing\n",
    "\n",
    "Various functions use lru_cache -- this outputs the values to see if they are tuned appropriately.\n",
    "\n",
    "Note that these have no meaning until after 1 or more queries have been run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efe34b13",
   "metadata": {
    "hidden": true,
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "print(\"get_users\")\n",
    "print(get_users.cache_info())\n",
    "print(\"_search_for_org\")\n",
    "print(_search_for_org.cache_info())\n",
    "\n",
    "print(\"get_collaborators\")\n",
    "print(get_collaborators.cache_info())\n",
    "print(\"get_members\")\n",
    "print(get_members.cache_info())\n",
    "\n",
    "print(\"get_org_owners\")\n",
    "print(get_org_owners.cache_info())\n",
    "print(\"get_inspectable_org_object\")\n",
    "print(get_inspectable_org_object.cache_info())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b28cf0e8",
   "metadata": {
    "hidden": true,
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "print(\"clearing caches...\")\n",
    "get_users.cache_clear()\n",
    "_search_for_org.cache_clear()\n",
    "get_collaborators.cache_clear()\n",
    "get_members.cache_clear()\n",
    "get_org_owners.cache_clear()\n",
    "get_inspectable_org_object.cache_clear()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d11ee975",
   "metadata": {
    "heading_collapsed": true,
    "hidden": true
   },
   "source": [
    "#### EML file support"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c734fd08",
   "metadata": {
    "hidden": true,
    "init_cell": true,
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "# read EML file support\n",
    "import email\n",
    "from ipywidgets import FileUpload\n",
    "from pprint import pprint as pp\n",
    "from IPython.display import display as display_widget"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "465a9648",
   "metadata": {
    "hidden": true,
    "init_cell": true
   },
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "def extract_reply(body):\n",
    "    extracted = []\n",
    "    for l in body.split(\"\\r\\n\"):\n",
    "        if l.startswith(\"> --\"):\n",
    "            break\n",
    "        elif l.startswith(\"> \"):\n",
    "            extracted.append(l[2:])\n",
    "    return extracted\n",
    "\n",
    "\n",
    "def process_from_file(uploader):\n",
    "    # message = email.message_from_string()\n",
    "    for file in list(uploader.value.keys()):\n",
    "        print(\"checking %s\" % file)\n",
    "        pp(list(uploader.value[file].keys()))\n",
    "        content = uploader.value[file][\"content\"]\n",
    "        pp(type(content))\n",
    "        pp(type(uploader.value[file]))\n",
    "        #pp(uploader.value[file])\n",
    "        message = email.message_from_bytes(content)\n",
    "        #message = email.message_from_string(uploader.value[file][\"content\"])\n",
    "        for part in message.walk():\n",
    "            if part.get_content_maintype() == \"multipart\":\n",
    "                continue\n",
    "            else:\n",
    "                mime = part.get_content_type()\n",
    "                if \"plain\" in mime:\n",
    "                    body = part.get_payload()\n",
    "                    # this could be the original, or a reply\n",
    "                    if re.search(r\"\"\"^Full Name:\"\"\", body, re_flags):\n",
    "                        print(\"original email:\")\n",
    "                        process_from_email(body)\n",
    "                    elif re.search(r\"\"\"^> Full Name:\"\"\", body, re_flags):\n",
    "                        print(\"reply:\")\n",
    "                        process_from_email(\"\\n\".join(extract_reply(body)))\n",
    "                    else:\n",
    "                        print(\"no match!\\n%s\" % body)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1867aff0",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "# Start of common usage (How To)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7d23e4d",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "Currently, there are three common use cases:\n",
    "- processing an offboarding email (via downloaded EML file),\n",
    "- processing an offboarding email (via message copy/paste), and\n",
    "- adhoc lookup of GitHub login\n",
    "\n",
    "For anything else, you're on your own!\n",
    "\n",
    "All usage requires the following setup:\n",
    "1. Supply your PAT token via the environment variable `GITHUB_PAT` when starting the notebook server. (If you can't do that, read the code for another way.)\n",
    "2. Supply your CIS credentials via the environment variables `CIS_CLIENT_ID` and `CIS_CLIENT_SECRET`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74367943",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "## EML File parsing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac01d450",
   "metadata": {},
   "source": [
    "Upload the file using the button below, then process that file by running the cell below the button. You can only process one file at a time, but the \"file uploaded\" count will continue to increase (ui glitch)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98322893",
   "metadata": {
    "init_cell": true,
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "_uploader = FileUpload(accept=\"*.eml\", multiple=False)\n",
    "display_widget(_uploader)\n",
    "# check_file(_uploader)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b6207ae",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "def check_file(f):\n",
    "    try:\n",
    "        # display_widget(_uploader)\n",
    "        process_from_file(f)\n",
    "        print(\"completed\")\n",
    "    except Exception as e:\n",
    "        print(repr(e))\n",
    "        raise\n",
    "\n",
    "\n",
    "check_file(_uploader)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dbe286a",
   "metadata": {
    "heading_collapsed": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "## Process offboarding email body text (copy/paste)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "145c0eef",
   "metadata": {
    "hidden": true
   },
   "source": [
    "Usage steps - for each user:\n",
    "    1. Run the cell below -- it should display a text entry area and a \"Process\" button.\n",
    "    2. Copy entire text of email\n",
    "    3. Paste into the text area below\n",
    "    4. Click the \"Process\" button\n",
    "    5. Use the generated links to check for Heroku authorization\n",
    "    6. After \"process finished\" printed, copy/paste final output into email"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9f75814c",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "@run_process(t=text)\n",
    "def show_matches(t):\n",
    "    try:\n",
    "        process_from_email(t)\n",
    "    except Exception as e:\n",
    "        print(repr(e))\n",
    "        pass"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62ad9a7c",
   "metadata": {
    "heading_collapsed": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "## Adhoc Lookups (GitHub & Heroku)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6209c976",
   "metadata": {
    "hidden": true
   },
   "source": [
    "Fill in list of the desired logins in the cell below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "650f2db4",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "check_github_logins(\n",
    "    \"\"\" \n",
    " \"\"\"\n",
    ")\n",
    "print(\"done\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f28ed4dc",
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "check_heroku_logins(\n",
    "    \"\"\" \n",
    " \"\"\"\n",
    ")\n",
    "print(\"done\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de2fe9a2",
   "metadata": {
    "heading_collapsed": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "## ACL search"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a5cd9b9",
   "metadata": {
    "hidden": true
   },
   "source": [
    "Fill in list of the desired logins in the cell below. Appropriate values may be in the GitHub report."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9112d45e",
   "metadata": {
    "hidden": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print()\n",
    "start = time.time()\n",
    "check_github_acls(\n",
    "    \"\"\" \n",
    "\n",
    "\"\"\"\n",
    ")\n",
    "duration = time.time() - start\n",
    "print(f\"done in {int(duration)} seconds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a60e70e9",
   "metadata": {
    "heading_collapsed": true,
    "lines_to_next_cell": 0
   },
   "source": [
    "# To Do"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4d3b5a3",
   "metadata": {
    "hidden": true
   },
   "source": [
    "- check invites as well, using manage_invitations.py\n",
    "- code doesn't handle hyphenated github logins, e.g. 'marco-c' (gets split)\n",
    "- github lookup should strip https... so can use link from people.m.o\n",
    "- dpreston, aka fzzy, doesn't have any GitHub perms\n",
    "- fix permutations of names\n",
    "- preprocess to remove all (colon separated) :b':':[:]: (maybe not the :b: & :':)\n",
    "- add link to Heroku service accounts to check\n",
    "- hide connection failure tracebacks during check_github_tracebacks\n",
    "\n",
    "<details><summary>Completed <small><em>(click to toggle)</em></small></summary>\n",
    "\n",
    "- ~~GitHub login no longer part of email, but user id is available via CIS~~\n",
    "- ~~add \"clear cache\" button to purge after long idle~~ _(in tuning section)_\n",
    "- ~~add common login with 'moz{,illa}' tacked on, sometimes with a dash~~\n",
    "- ~~update link to view access group on people.m.o~~\n",
    "- ~~add \"trying\" info to copy/paste output~~\n",
    "- ~~double check that \"even for declared login\" code still active~~\n",
    "- ~~add formatted output summary for copy/paste~~\n",
    "- ~~when a guess is multiple words, each word should be tried separately as well~~\n",
    "- ~~code should always search for stated github, even if search is \"too many\" (e.g. \"past\")~~\n",
    "- ~~does not call out owner status (reports as member)~~\n",
    "- ~~add short ldap name as an \"always check\"~~\n",
    "- ~~always check stem when search gives too many (i.e. go for the exact match)~~\n",
    "- ~~treat Zimbra Aliases as a potential multi valued list (or empty)~~\n",
    "- ~~\"-\" is a valid character in GitHub logins. Try as separator first-last and last-first~~\n",
    "</details>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Initialization Cell",
  "interpreter": {
   "hash": "6c306200e52f4c21cd0564fd208112100f014f9753eb71cb08d42d2021bd714c"
  },
  "jupytext": {
   "formats": "ipynb,py:percent"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}