2275 строки
79 KiB
Plaintext
2275 строки
79 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0c9feb99",
|
|
"metadata": {
|
|
"incorrectly_encoded_metadata": "jp-MarkdownHeadingCollapsed=true",
|
|
"lines_to_next_cell": 0,
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"# User Search\n",
|
|
"For use to:\n",
|
|
"1. Try to find an account based on random knowledge\n",
|
|
"2. List all orgs they belong to (from a subset)\n",
|
|
" - You will need org owner permissions to perform these searches"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "631c763f",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"# Boiler plate\n",
|
|
"Skip/hide this. Common usage is below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4a7d2a42",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"If you see this text, you may want to enable the nbextension \"Collapsible Headings\", so you can hide this in common usage."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f517002c",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"## Tune as needed\n",
|
|
"\n",
|
|
"There are several lru_cache using functions. Many of them are called len(orgs_to_check) times. If they are under sized, run times will get quite long. (Only the first query should be delayed - after that, all data should be in the cache.)\n",
|
|
"\n",
|
|
"See the \"cache reporting\" cell below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9033e078",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"#### Configuration"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f73b7313",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Set some initial values\n",
|
|
"\n",
|
|
"# for now, ACL lookup is quite time consuming, so assume it will be done in a different run\n",
|
|
"defer_acl_lookup = True\n",
|
|
"\n",
|
|
"# for now, assume person doing GitHub offboard is not doing Heroku offboard\n",
|
|
"defer_heroku_lookup = True\n",
|
|
"\n",
|
|
"# There are 3 ways to set the orgs to examine:\n",
|
|
"# 1. use a canned list\n",
|
|
"# 2. use just 2 for testing\n",
|
|
"# 3. get a list of accessible orgs from GitHub\n",
|
|
"#\n",
|
|
"# #3 is preferred - the others are fallbacks\n",
|
|
"\n",
|
|
"use_github_org_list = True\n",
|
|
"use_test_org_list = False\n",
|
|
"use_canned_org_list = False\n",
|
|
"\n",
|
|
"# hacks for typing\n",
|
|
"from typing import Any, List\n",
|
|
"\n",
|
|
"gh: Any = None\n",
|
|
"\n",
|
|
"# bogus line to catch bad type hint\n",
|
|
"fred= 3"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b5a4c1c6",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"## Code"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dd0faa9e",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"### main code (CIS/IAM)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "65702c18",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"Not every operator will have a valid token for the CIS system, so fail gently if not"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "91493eae",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def check_CIS(email):\n",
|
|
" login = first_name = None\n",
|
|
" if _has_cis_access():\n",
|
|
" login, first_name = _get_cis_info(email)\n",
|
|
" display(f\"CIS info for {email} reports '{login}', first name of '{first_name}'\")\n",
|
|
" else:\n",
|
|
" display(\"Skipping CIS check, no token available.\")\n",
|
|
" return login, first_name"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "75161391",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sys\n",
|
|
"print(sys.executable)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c96c7434",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def _has_cis_access():\n",
|
|
" import os\n",
|
|
"\n",
|
|
" return os.environ.get(\"CIS_CLIENT_ID\", \"\") and os.environ.get(\n",
|
|
" \"CIS_CLIENT_SECRET\", \"\"\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4716f716",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"_cis_bearer_token = None\n",
|
|
"import requests\n",
|
|
"\n",
|
|
"\n",
|
|
"def _get_cis_bearer_token():\n",
|
|
" global _cis_bearer_token\n",
|
|
" if _cis_bearer_token:\n",
|
|
" return _cis_bearer_token\n",
|
|
" else:\n",
|
|
" import requests\n",
|
|
"\n",
|
|
" url = \"https://auth.mozilla.auth0.com/oauth/token\"\n",
|
|
" headers = {\"Content-Type\": \"application/json\"}\n",
|
|
" payload = {\n",
|
|
" \"client_id\": os.environ[\"CIS_CLIENT_ID\"],\n",
|
|
" \"client_secret\": os.environ[\"CIS_CLIENT_SECRET\"],\n",
|
|
" \"audience\": \"api.sso.mozilla.com\",\n",
|
|
" \"grant_type\": \"client_credentials\",\n",
|
|
" }\n",
|
|
" resp = requests.post(url, json=payload, headers=headers)\n",
|
|
" data = resp.json()\n",
|
|
" _cis_bearer_token = data[\"access_token\"]\n",
|
|
" return _cis_bearer_token\n",
|
|
"\n",
|
|
"\n",
|
|
"def _get_cis_info(email):\n",
|
|
" import urllib.request, urllib.parse, urllib.error\n",
|
|
"\n",
|
|
" bearer_token = _get_cis_bearer_token()\n",
|
|
" # first get the v4 id\n",
|
|
" url = (\n",
|
|
" \"https://person.api.sso.mozilla.com/v2/user/primary_email/{}?active=any\".format(\n",
|
|
" urllib.parse.quote(email)\n",
|
|
" )\n",
|
|
" )\n",
|
|
" headers = {\"Authorization\": f\"Bearer {bearer_token}\"}\n",
|
|
" resp = requests.get(url, headers=headers)\n",
|
|
" data = resp.json()\n",
|
|
" login = v4id = None\n",
|
|
" try:\n",
|
|
" first_name = data[\"first_name\"][\"value\"].lower()\n",
|
|
" except KeyError:\n",
|
|
" try:\n",
|
|
" print(f\"DEBUG: {data=}\")\n",
|
|
" print(f\"DEBUG: {data['first_name']=}\")\n",
|
|
" print(f\"DEBUG: {data['first_name']['value']=}\")\n",
|
|
" except KeyError:\n",
|
|
" pass\n",
|
|
" first_name = None\n",
|
|
" try:\n",
|
|
" v4id = data[\"identities\"][\"github_id_v4\"][\"value\"]\n",
|
|
" except KeyError:\n",
|
|
" pass\n",
|
|
" if v4id:\n",
|
|
" # if there was a v4 id, map it to a login, via graphQL\n",
|
|
" query = \"\"\"\n",
|
|
" query id_lookup($id_to_check: ID!) {\n",
|
|
" node(id: $id_to_check) {\n",
|
|
" ... on User {\n",
|
|
" login\n",
|
|
" id\n",
|
|
" databaseId\n",
|
|
" }\n",
|
|
" }\n",
|
|
" }\n",
|
|
" \"\"\"\n",
|
|
" variables = '{ \"id_to_check\": \"' + str(v4id) + '\" }'\n",
|
|
" url = \"https://api.github.com/graphql\"\n",
|
|
" headers = {\"Authorization\": f\"Token {api_key}\"}\n",
|
|
" payload = {\n",
|
|
" \"query\": query,\n",
|
|
" \"variables\": variables,\n",
|
|
" }\n",
|
|
" resp = requests.post(url, headers=headers, json=payload)\n",
|
|
" try:\n",
|
|
" data = resp.json()\n",
|
|
" login = data[\"data\"][\"node\"][\"login\"].lower()\n",
|
|
" except (TypeError, KeyError):\n",
|
|
" login = None\n",
|
|
" return login, first_name"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "49c6946a",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"### Debug CIS"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2d823ee7",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"print()\n",
|
|
"_get_cis_info(\n",
|
|
" \"\"\" \n",
|
|
"hwine@mozilla.com\n",
|
|
"\"\"\"\n",
|
|
")\n",
|
|
"print(\"done\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0ca2baca",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"### main code (GitHub)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8e5e6e14",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### helpers GitHub3.py"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "50939c45",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 2
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# print some debug information\n",
|
|
"import github3\n",
|
|
"\n",
|
|
"print(github3.__version__)\n",
|
|
"print(github3.__file__)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3902e792",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sys\n",
|
|
"sys.executable"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "89c6dad8",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import sys\n",
|
|
"print(sys.executable)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d84078ac",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 2
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# set values here - you can also override below\n",
|
|
"import os\n",
|
|
"\n",
|
|
"api_key = os.environ.get(\"GITHUB_PAT\", \"\")\n",
|
|
"if not api_key:\n",
|
|
" raise OSError(\"no GitHub PAT found\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a97b9cb6",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 2
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import time\n",
|
|
"import ipywidgets, IPython\n",
|
|
"print(ipywidgets.__file__)\n",
|
|
"print(IPython.__file__)\n",
|
|
"print(IPython.display.__file__)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e26441e6",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def display_javascript_alert(text):\n",
|
|
" from IPython.display import Javascript\n",
|
|
" simpjs = Javascript(f'alert(\"{text}\")')\n",
|
|
" display(simpjs)\n",
|
|
"\n",
|
|
"\n",
|
|
"def failed_initialization(msg):\n",
|
|
" display_javascript_alert(f\"Initialization failure: '{msg}'\")\n",
|
|
" raise SystemExit(3, msg)\n",
|
|
"\n",
|
|
"# failed_initialization(\"does it work?\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f640ee45",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def try_login():\n",
|
|
" global gh\n",
|
|
" gh = None\n",
|
|
" try:\n",
|
|
" gh = github3.login(token=api_key)\n",
|
|
" print(f\"You are authenticated as {gh.me().login}\")\n",
|
|
" except (github3.exceptions.ForbiddenError) as e:\n",
|
|
" failed_initialization(\"Invalid credentials\")\n",
|
|
" raise\n",
|
|
" except (github3.exceptions.ConnectionError) as e:\n",
|
|
" failed_initialization(f\"Exception while opening connection (type {type(e)}):\\n{str(e)}\")\n",
|
|
" raise\n",
|
|
" except Exception as e:\n",
|
|
" failed_initialization(f\"Unexpected exception while opening connection (type {type(e)}):\\n{str(e)}\")\n",
|
|
" raise\n",
|
|
"\n",
|
|
"try_login()\n",
|
|
"from pprint import pprint\n",
|
|
"pprint(gh)\n",
|
|
"if gh is None:\n",
|
|
" # give up if we can't login at script start\n",
|
|
" raise github3.exceptions.ConnectionError(\"Couldn't log in to GitHub\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a0bcb67b",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"From here on, use ``gh`` to access all GitHub data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b01c7d37",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# set the local timezone (the container is UTC)\n",
|
|
"print(f'local timezone is {os.environ[\"TZ\"]}.')\n",
|
|
"time.tzset()\n",
|
|
"\n",
|
|
"\n",
|
|
"def print_limits(e=None, verbose=False):\n",
|
|
" if e:\n",
|
|
" # display(\"API limit reached, try again in 5 minutes.\\n\")\n",
|
|
" display(str(e))\n",
|
|
"\n",
|
|
" reset_max = reset_min = 0\n",
|
|
" limits = gh.rate_limit()\n",
|
|
" resources = limits[\"resources\"]\n",
|
|
" # print(\"{:3d} keys: \".format(len(resources.keys())), resources.keys())\n",
|
|
" # print(resources)\n",
|
|
" from pprint import pprint as pp\n",
|
|
" pp(f\"{limits=}\")\n",
|
|
" for reset in list(resources.keys()):\n",
|
|
" reset_at = resources[reset][\"reset\"]\n",
|
|
" reset_max = max(reset_at, reset_max)\n",
|
|
" if not resources[reset][\"remaining\"]:\n",
|
|
" reset_min = min(reset_at, reset_min if reset_min else reset_at)\n",
|
|
" if verbose:\n",
|
|
" print(\"EXPIRED for {} {}\".format(reset, resources[reset][\"remaining\"]))\n",
|
|
" else:\n",
|
|
" if verbose or reset == \"search\":\n",
|
|
" print(\n",
|
|
" \"remaining for {} {}\".format(reset, resources[reset][\"remaining\"])\n",
|
|
" )\n",
|
|
"\n",
|
|
" if not reset_min:\n",
|
|
" print(\"No limits reached currently.\")\n",
|
|
" else:\n",
|
|
" print(\n",
|
|
" \"Minimum reset at {} UTC ({})\".format(\n",
|
|
" time.asctime(time.gmtime(reset_min)),\n",
|
|
" time.asctime(time.localtime(reset_min)),\n",
|
|
" )\n",
|
|
" )\n",
|
|
" print(\n",
|
|
" \"All reset at {} UTC ({})\".format(\n",
|
|
" time.asctime(time.gmtime(reset_max)),\n",
|
|
" time.asctime(time.localtime(reset_max)),\n",
|
|
" )\n",
|
|
" )\n",
|
|
"\n",
|
|
"def wait_for_quota(which:str) -> None:\n",
|
|
" \"\"\" wait for specified search limit to reset\n",
|
|
" \"\"\"\n",
|
|
" limits = gh.rate_limit()\n",
|
|
" the_limit = limits[\"resources\"][which]\n",
|
|
" if the_limit[\"remaining\"] < 2:\n",
|
|
" wait_seconds = the_limit[\"reset\"] - int(time.time()) + 1\n",
|
|
" print(f\"waiting for {which} limit to reset at {time.asctime(time.localtime(the_limit['reset']))}\"\n",
|
|
" f\", {wait_seconds} seconds from now\")\n",
|
|
" time.sleep(wait_seconds)\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d228384d",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from functools import lru_cache\n",
|
|
"\n",
|
|
"print_limits()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "117face6",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"try:\n",
|
|
" failure_reason = None\n",
|
|
" orgs_to_check = set()\n",
|
|
" if use_github_org_list:\n",
|
|
" my_login = gh.me().login\n",
|
|
" # while a comprehension looks nice here, it doesn't let us recover from permission errors in the loops,\n",
|
|
" # which can happen when some orgs change parameters from \"expected\"\n",
|
|
" # orgs_to_check = set([o.login for o in gh.organizations() if my_login in [x.login for x in o.members(role=\"admin\")]])\n",
|
|
" for o in gh.organizations():\n",
|
|
" try:\n",
|
|
" if my_login in [x.login for x in o.members(role=\"admin\")]:\n",
|
|
" orgs_to_check.add(o.login)\n",
|
|
" except github3.exceptions.ForbiddenError as e:\n",
|
|
" display_javascript_alert(f\"Skipping org {o.login}, as your token doesn't work with it. ({str(e)})\")\n",
|
|
" ...\n",
|
|
" if not len(orgs_to_check):\n",
|
|
" failure_reason = \"User isn't an admin for any orgs\"\n",
|
|
"\n",
|
|
" elif use_test_org_list:\n",
|
|
" orgs_to_check = set(\n",
|
|
" \"\"\"\n",
|
|
" mozilla-services\n",
|
|
" mozilla\n",
|
|
" pocket\n",
|
|
" \"\"\".split()\n",
|
|
" )\n",
|
|
" elif use_canned_org_list: # old school\n",
|
|
" orgs_to_check = set(\n",
|
|
" \"\"\"\n",
|
|
" Mozilla-Commons\n",
|
|
" Mozilla-Games\n",
|
|
" Mozilla-TWQA\n",
|
|
" MozillaDPX\n",
|
|
" MozillaDataScience\n",
|
|
" MozillaFoundation\n",
|
|
" MozillaReality\n",
|
|
" MozillaSecurity\n",
|
|
" MozillaWiki\n",
|
|
" Pocket\n",
|
|
" Thunderbird-client\n",
|
|
" devtools-html\n",
|
|
" firefox-devtools\n",
|
|
" fxos\n",
|
|
" fxos-eng\n",
|
|
" iodide-project\n",
|
|
" mdn\n",
|
|
" moz-pkg-testing\n",
|
|
" mozilla\n",
|
|
" mozilla-applied-ml\n",
|
|
" mozilla-archive\n",
|
|
" mozilla-b2g\n",
|
|
" mozilla-bteam\n",
|
|
" mozilla-conduit\n",
|
|
" mozilla-extensions\n",
|
|
" mozilla-frontend-infra\n",
|
|
" mozilla-iam\n",
|
|
" mozilla-it\n",
|
|
" mozilla-jetpack\n",
|
|
" mozilla-l10n\n",
|
|
" mozilla-lockbox\n",
|
|
" mozilla-lockwise\n",
|
|
" mozilla-metrics\n",
|
|
" mozilla-mobile\n",
|
|
" mozilla-partners\n",
|
|
" mozilla-platform-ops\n",
|
|
" mozilla-private\n",
|
|
" mozilla-rally\n",
|
|
" mozilla-releng\n",
|
|
" mozilla-services\n",
|
|
" mozilla-spidermonkey\n",
|
|
" mozilla-standards\n",
|
|
" mozilla-svcops\n",
|
|
" mozilla-tw\n",
|
|
" mozmeao\n",
|
|
" nss-dev\n",
|
|
" nubisproject\n",
|
|
" projectfluent\n",
|
|
" taskcluster\n",
|
|
" \"\"\".split()\n",
|
|
" )\n",
|
|
" else:\n",
|
|
" failure_reason = \"No org list method enabled!\"\n",
|
|
"\n",
|
|
"except Exception as e:\n",
|
|
" failure_reason = f\"exception: {str(e)}\"\n",
|
|
" pass\n",
|
|
"\n",
|
|
"if failure_reason:\n",
|
|
" failed_initialization(failure_reason)\n",
|
|
"\n",
|
|
"print(f\"{len(orgs_to_check):3d} orgs to check.\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2fd03249",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from IPython import get_ipython\n",
|
|
"get_ipython()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "72aeed1f",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### helpers (GitHub searches)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a6e6e75d",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# rate limiting hack\n",
|
|
"_last_query_interval: float = 0.0\n",
|
|
"_add_seconds_heuristic: float = 1\n",
|
|
"_min_seconds_between_calls: float = (30.0/60) + _add_seconds_heuristic\n",
|
|
"\n",
|
|
"_max_usable_users: int = 10\n",
|
|
"\n",
|
|
"# ToDo if rate limiting still an issue, could terminate after returning\n",
|
|
"# _max_usable_users\n",
|
|
"\n",
|
|
"def _search_for_user(user: str):\n",
|
|
" # we can only rate limit between calls\n",
|
|
" global _last_query_interval, _min_seconds_between_calls\n",
|
|
" seconds_to_wait = (_last_query_interval + _min_seconds_between_calls) - time.monotonic()\n",
|
|
" if seconds_to_wait > 0:\n",
|
|
" # print(f\"_search_for_user: waiting {seconds_to_wait} seconds.\")\n",
|
|
" time.sleep(seconds_to_wait)\n",
|
|
" wait_for_quota(\"search\")\n",
|
|
" try:\n",
|
|
" for i, user in enumerate(gh.search_users(query=\"type:user \" + user)):\n",
|
|
" yield user\n",
|
|
" if i > _max_usable_users:\n",
|
|
" break\n",
|
|
" _last_query_interval = time.monotonic()\n",
|
|
" except Exception as e:\n",
|
|
" print_limits(e, verbose=True)\n",
|
|
" raise\n",
|
|
"\n",
|
|
"\n",
|
|
"@lru_cache(maxsize=512)\n",
|
|
"def _search_for_org(user):\n",
|
|
" l = list(gh.search_users(query=\"type:org \" + user))\n",
|
|
" display(f\"found {len(l)} potentials for {user}\")\n",
|
|
" return l\n",
|
|
"\n",
|
|
"@lru_cache(maxsize=512)\n",
|
|
"def get_users(user):\n",
|
|
" # display(u\"SEARCH '{}'\".format(user))\n",
|
|
" l = list(_search_for_user(user))\n",
|
|
" display(f\"found {len(l)} potentials for {user}\")\n",
|
|
" yield from l"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "76f2f280",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"displayed_users = set() # cache to avoid duplicate output\n",
|
|
"\n",
|
|
"\n",
|
|
"def show_users(user_list, search_term):\n",
|
|
" global displayed_users, _max_usable_users\n",
|
|
" unique_users = set(user_list)\n",
|
|
" count = len(unique_users)\n",
|
|
" \n",
|
|
" if count > _max_usable_users:\n",
|
|
" # Even if there are too many, we still want to check the 'root' term, if it matched\n",
|
|
" try:\n",
|
|
" seed_user = gh.user(search_term)\n",
|
|
" if ' ' not in seed_user.login:\n",
|
|
" displayed_users.add(seed_user)\n",
|
|
" display(\n",
|
|
" \"... too many to be useful, still trying '{}' ...\".format(\n",
|
|
" seed_user.login\n",
|
|
" )\n",
|
|
" )\n",
|
|
" else:\n",
|
|
" display(\n",
|
|
" \"... too many to be useful, skipping phrase '{}' ...\".format(\n",
|
|
" seed_user.login\n",
|
|
" )\n",
|
|
" )\n",
|
|
" except github3.exceptions.NotFoundError as e:\n",
|
|
" display(f\"... too many to be useful, '{search_term}' is not a user\")\n",
|
|
" else:\n",
|
|
" for u in [x for x in unique_users if not x in displayed_users]:\n",
|
|
" displayed_users.add(u)\n",
|
|
" user = u.user.refresh()\n",
|
|
" if 0 < count <= _max_usable_users:\n",
|
|
" return [u.login for u in unique_users]\n",
|
|
" else:\n",
|
|
" return []\n",
|
|
"\n",
|
|
"\n",
|
|
"from itertools import permutations\n",
|
|
"\n",
|
|
"\n",
|
|
"def _permute_seeds(seeds):\n",
|
|
" if len(seeds) == 1:\n",
|
|
" yield seeds[0]\n",
|
|
" else:\n",
|
|
" for x, y in permutations(seeds, 2):\n",
|
|
" permutation = \" \".join([x, y])\n",
|
|
" display(f\" trying phrase permutation {permutation}\")\n",
|
|
" yield permutation\n",
|
|
" permutation = \"\".join([x, y])\n",
|
|
" display(f\" trying permutation {permutation}\")\n",
|
|
" yield permutation\n",
|
|
"\n",
|
|
"\n",
|
|
"def gather_possibles(seeds):\n",
|
|
" found = set()\n",
|
|
" # sometimes get a phrase coming in - e.g. \"First Last\"\n",
|
|
" for seed in _permute_seeds(seeds.split()):\n",
|
|
" maybes = show_users(get_users(seed), seed)\n",
|
|
" found.update(maybes)\n",
|
|
" # if it was an email addr, try again with the mailbox name\n",
|
|
" if \"@\" in seed:\n",
|
|
" seed2 = seed.split(\"@\")[0]\n",
|
|
" # fix https://github.com/mozilla/github-org-scripts/issues/84\n",
|
|
" if len(seed2) >= 4:\n",
|
|
" display(f\"Searching for mailbox name '{seed2}' (gather_possibles)\")\n",
|
|
" maybes = show_users(get_users(seed2), seed2)\n",
|
|
" found.update(maybes)\n",
|
|
" else:\n",
|
|
" display(f\"Skipping search for '{seed2}' -- too short\")\n",
|
|
" return found"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e76c1648",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"class OutsideCollaboratorIterator(github3.structs.GitHubIterator):\n",
|
|
" def __init__(self, org):\n",
|
|
" super().__init__(\n",
|
|
" count=-1, # get all\n",
|
|
" url=org.url + \"/outside_collaborators\",\n",
|
|
" cls=github3.users.ShortUser,\n",
|
|
" session=org.session,\n",
|
|
" )\n",
|
|
"\n",
|
|
"\n",
|
|
"@lru_cache(maxsize=512)\n",
|
|
"def get_collaborators(org):\n",
|
|
" collabs = [x.login.lower() for x in OutsideCollaboratorIterator(org)]\n",
|
|
" return collabs\n",
|
|
"\n",
|
|
"\n",
|
|
"def is_collaborator(org, login):\n",
|
|
" return bool(login.lower() in get_collaborators(org))\n",
|
|
"\n",
|
|
"\n",
|
|
"# provide same interface for members -- but the iterator is free :D\n",
|
|
"@lru_cache(maxsize=512)\n",
|
|
"def get_members(org):\n",
|
|
" collabs = [x.login.lower() for x in org.members()]\n",
|
|
" return collabs\n",
|
|
"\n",
|
|
"\n",
|
|
"def is_member(org, login):\n",
|
|
" return bool(login.lower() in get_members(org))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7d190161",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"@lru_cache(maxsize=64)\n",
|
|
"def get_org_owners(org):\n",
|
|
" owners = org.members(role=\"admin\")\n",
|
|
" logins = [x.login for x in owners]\n",
|
|
" return logins\n",
|
|
"\n",
|
|
"\n",
|
|
"@lru_cache(maxsize=128)\n",
|
|
"def get_inspectable_org_object(org_name):\n",
|
|
" try:\n",
|
|
" o = gh.organization(org_name)\n",
|
|
" # make sure we have enough chops to inspect it\n",
|
|
" get_org_owners(o)\n",
|
|
" is_member(o, \"qzu\" * 3)\n",
|
|
" is_collaborator(o, \"qzu\" * 3)\n",
|
|
" except github3.exceptions.NotFoundError:\n",
|
|
" o = None\n",
|
|
" display(f\"No such organization: '{org_name}'\")\n",
|
|
" except github3.exceptions.ForbiddenError as e:\n",
|
|
" o = None\n",
|
|
" display(f\"\\n\\nWARNING: Not enough permissions for org '{org_name}'\\n\\n\")\n",
|
|
" except Exception as e:\n",
|
|
" o = None\n",
|
|
" display(f\"didn't expect to get here: get_inspectable_org_object({org_name})\")\n",
|
|
" display(f\" exception: {type(e)=}\")\n",
|
|
" from pprint import pformat\n",
|
|
" display(f\"{pformat(e)}\")\n",
|
|
" return o\n",
|
|
"\n",
|
|
"\n",
|
|
"def check_login_perms(logins, headers=None, ldap=None):\n",
|
|
" any_perms = []\n",
|
|
" logins_with_hits = set()\n",
|
|
" any_perms.append(\"=\" * 30)\n",
|
|
" if headers:\n",
|
|
" any_perms.extend(headers)\n",
|
|
" if not len(logins):\n",
|
|
" any_perms.append(\"\\nFound no valid usernames\")\n",
|
|
" else:\n",
|
|
" any_perms.append(\n",
|
|
" \"\\nChecking {} usernames for membership in {} orgs\".format(\n",
|
|
" len(logins), len(orgs_to_check)\n",
|
|
" )\n",
|
|
" )\n",
|
|
" for login in logins:\n",
|
|
" start_msg_count = len(any_perms)\n",
|
|
" for org in orgs_to_check:\n",
|
|
" o = get_inspectable_org_object(org)\n",
|
|
" if o is None:\n",
|
|
" continue\n",
|
|
" if is_member(o, login):\n",
|
|
" url = \"https://github.com/orgs/{}/people?utf8=%E2%9C%93&query={}\".format(\n",
|
|
" o.login, login\n",
|
|
" )\n",
|
|
" phonebook_url = f\"https://people.mozilla.org/a/ghe_{o.login}_users/\"\n",
|
|
" msg = f\"FOUND! {o.login} has {login} as a member: {url}\"\n",
|
|
" msg += f\"\\n\\tRemove from phonebook group if needed: {phonebook_url}\"\n",
|
|
" owner_logins = get_org_owners(o)\n",
|
|
" is_owner = login in owner_logins\n",
|
|
" if is_owner:\n",
|
|
" msg += f\"\\n NOTE: {login} is an OWNER of {org}\"\n",
|
|
" any_perms.append(msg)\n",
|
|
" if is_collaborator(o, login):\n",
|
|
" url = \"https://github.com/orgs/{}/outside-collaborators?utf8=%E2%9C%93&query={}\".format(\n",
|
|
" o.login, login\n",
|
|
" )\n",
|
|
" any_perms.append(\n",
|
|
" \"FOUND! {} has {} as a collaborator: {}\".format(\n",
|
|
" o.login, login, url\n",
|
|
" )\n",
|
|
" )\n",
|
|
" else:\n",
|
|
" end_msg_count = len(any_perms)\n",
|
|
" if end_msg_count > start_msg_count:\n",
|
|
" # some found, put a header on it, the add blank line\n",
|
|
" any_perms.insert(\n",
|
|
" start_msg_count,\n",
|
|
" \"\\nFound {:d} orgs for {}:\".format(\n",
|
|
" end_msg_count - start_msg_count, login\n",
|
|
" ),\n",
|
|
" )\n",
|
|
" any_perms.append(\"\")\n",
|
|
" logins_with_hits.add(login)\n",
|
|
" else:\n",
|
|
" any_perms.append(f\"No permissions found for {login}\")\n",
|
|
" return any_perms, logins_with_hits"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a6d0edbd",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def extract_addresses(text):\n",
|
|
" \"\"\"Get email addresses from text.\"\"\"\n",
|
|
" # ASSUME that text is a list of email addresses (possibly empty)\n",
|
|
" if not text:\n",
|
|
" return []\n",
|
|
" # print(\"before: %s\" % text)\n",
|
|
" text = text.replace(\"[\", \"\").replace(\"]\", \"\").replace(\"b'\", \"\").replace(\"'\", \"\")\n",
|
|
" # print(\"after: %s\" % text)\n",
|
|
" # print(\" split: %s\" % text.split())\n",
|
|
" return text.split()\n",
|
|
" # raise ValueError(\"couldn't parse '{}'\".format(text))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c55db615",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### ACL Routines"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "62f0d438",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"##### Filtering Routines"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b0adf40d",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"Filtering is split into 3 cells:\n",
|
|
"1. a test to ensure it's not broken when changes are made\n",
|
|
"2. the exception lists, which are the most common update\n",
|
|
"3. the filter code\n",
|
|
"4. the code to invoke the tests\n",
|
|
"\n",
|
|
"Any change to #2 or #3 will re-run the tests"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a353aff4",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# tests for every load of the filtering code\n",
|
|
"# NB these urls are NOT the search URL, they are the URL from the hit\n",
|
|
"test_url = [\n",
|
|
" \"\"\"https://github.com/mozilla-services/foxsec-results/blob/47f31f014cf21dc6e7e774ddc28e51a6f9eeba54/bucketlister/README.md\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-services/product-delivery-tools/blob/47f31f014cf21dc6e7e774ddc28e51a6f9eeba54/bucketlister/README.markdown\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-services/cloudops-docs/blob/0ff6ea92e394784aef55abd4b9f8b5d26306fe4b/TeamDiagrams/service_registry.csv\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla/participation-metrics-identities/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"\"\",\n",
|
|
" # should skip - org\n",
|
|
" \"\"\"https://github.com/fxos/participation-metrics-identities/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"\"\",\n",
|
|
" # should skip - regexp repo\n",
|
|
" \"\"\"https://github.com/mozilla-releng/take-home-assignment-no-such-repo/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"\"\",\n",
|
|
" # should skip - below 'tests' or 'test' directory\n",
|
|
" \"\"\"https://github.com/mozilla/participation-metrics-identities/blob/12b64498e12fe035cd5f7e081fa0e38888e0be8d/tests/should-be-skipped.csv\"\"\",\n",
|
|
" \"\"\"https://github.com/Pocket/Android/blob/54abeefaa252a4b1c279f7c5bf571082f2283dc1/sync-pocket-android/src/test/resources/mock/getNotifications.json\"\"\", \n",
|
|
" \"\"\"https://github.com/mozilla-services/addons-code-corpus/blob/5b5646ad2390ba1b22f64ca62953ec3b260c0da7/train/bad/872325/chat/index.html\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-services/addons-code-corpus/blob/5b5646ad2390ba1b22f64ca62953ec3b260c0da7/training/bad/872325/chat/index.html\"\"\",\n",
|
|
" \"\"\"https://github.com/Pocket/iOS/blob/d69247be8b8987038b45406e168853d59e72e1ef/Listen/Listen/Sample/Definitions/PKTListenAppTheme.m\"\"\",\n",
|
|
" \"\"\"https://github.com/Pocket/particle/blob/8e5e3b45766fb3f93fbf28d8170f522aa8ba0e4f/convert/samples/input/html/audiemega2/1685443536.html\"\"\",\n",
|
|
" # should skip based on filename\n",
|
|
" \"\"\"https://github.com/Pocket/particle/blob/8e5e3b45766fb3f93fbf28d8170f522aa8ba0e4f/convert/samples/input/html/audiemega2/Makefile\"\"\",\n",
|
|
" \"\"\"https://github.com/Pocket/particle/blob/8e5e3b45766fb3f93fbf28d8170f522aa8ba0e4f/convert/samples/input/html/audiemega2/pyproject.toml\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-it/cloudalerts/blob/c3721d1d17f5e987cdc60f3d3d0c161a0b04b5ac/Dockerfile\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-it/it-sre-bot/blob/27c7a5e11ed48aa38baf676c6406fb3dcb116eaf/package-lock.json\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-services/splunk-ops/blob/ece2052d252d763fa14382f7d114f7c38e5cb31b/splunk_apps/corelight-app-for-splunk/CorelightForSplunk/bin/corelightforsplunk/aob_py3/httplib2/cacerts.txt\"\"\",\n",
|
|
" \"\"\"https://github.com/mozilla-mobile/firefox-android/blob/f3b4a9e8c806fcb51906a0db15d67c95a9a383ba/android-components/components/browser/errorpages/src/main/res/values-is/strings.xml\"\"\",\n",
|
|
" # should pass (makefile != Makefile)\n",
|
|
" \"\"\"https://github.com/mozilla-services/cloudops-docs/blob/0ff6ea92e394784aef55abd4b9f8b5d26306fe4b/TeamDiagrams/makefile\"\"\",\n",
|
|
"]\n",
|
|
"test_good = [\n",
|
|
" None, # should be skipped file extension\n",
|
|
" None, # should be skipped file extension\n",
|
|
" (\"\"\"https://github.com/search?type=Code&ref=advsearch&q=repo%3Amozilla-services/cloudops-docs+path%3A\"TeamDiagrams/service_registry.csv\"+oremj\"\"\",\n",
|
|
" \"mozilla-services/cloudops-docs\", \"TeamDiagrams\", \"service_registry.csv\"),\n",
|
|
" (\"\"\"https://github.com/search?type=Code&ref=advsearch&q=repo%3Amozilla/participation-metrics-identities+path%3A\"Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"+oremj\"\"\",\n",
|
|
" \"mozilla/participation-metrics-identities\", \"\", \"Community%20Analytics%20-%20Reps%20Organization%20-%20Test%201.csv\"),\n",
|
|
" None, # should be skipped org\n",
|
|
" None, # should be skipped matching repo regex\n",
|
|
" None, # skipped 'tests' in path\n",
|
|
" None, # skipped 'test' in path\n",
|
|
" None, # skipped 'train' in path\n",
|
|
" None, # skipped 'training' in path (matches substring)\n",
|
|
" None, # skipped 'sample' in path\n",
|
|
" None, # skipped 'samples' in path (matches substring)\n",
|
|
" None, # skipped: Makefile\n",
|
|
" None, # skipped: pyproject.toml\n",
|
|
" None, # skipped: Dockerfile\n",
|
|
" None, # skipped: package-lock.json\n",
|
|
" None, # skipped: cacerts.txt\n",
|
|
" None, # skipped: strings.xml\n",
|
|
" (\"\"\"https://github.com/search?type=Code&ref=advsearch&q=repo%3Amozilla-services/cloudops-docs+path%3A\"TeamDiagrams/makefile\"+oremj\"\"\",\n",
|
|
" \"mozilla-services/cloudops-docs\", \"TeamDiagrams\", \"makefile\"),\n",
|
|
"]\n",
|
|
"test_login = \"oremj\"\n",
|
|
"_test_ignore_filters_invocation_count = 0\n",
|
|
"\n",
|
|
"_debug_test_case_num = 0 # leave at 0 for no debug, -1 to find which test case is failing\n",
|
|
"\n",
|
|
"def _test_ignore_filters():\n",
|
|
" global _test_ignore_filters_invocation_count\n",
|
|
" _test_ignore_filters_invocation_count += 1\n",
|
|
" if _test_ignore_filters_invocation_count == 1:\n",
|
|
" # first invocation means both cells needed for test haven't been loaded yet\n",
|
|
" return \"skipped awaiting code load\"\n",
|
|
" test_case_num = 0\n",
|
|
" for test, success in zip(test_url, test_good):\n",
|
|
" test_case_num += 1\n",
|
|
" do_debug = (_debug_test_case_num == test_case_num)\n",
|
|
" if _debug_test_case_num:\n",
|
|
" print(f\"case {test_case_num}; debug {do_debug}\")\n",
|
|
" actual = search_hit_to_url(test, test_login, debug=do_debug)\n",
|
|
" if actual != success:\n",
|
|
" print(f\"test case {test_case_num}: {test}\")\n",
|
|
" print(f\" received: {actual}\")\n",
|
|
" print(f\" expected: {success}\")\n",
|
|
" for a, e in zip(actual, success):\n",
|
|
" print(f\"{a==e !r:5}: '{a}'\\n '{e}'\")\n",
|
|
" raise SystemExit(\"unit test failed\")\n",
|
|
" return \"ignore filter tests pass\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "97dd614b",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# items to ignore -- all heuristically derived\n",
|
|
"\n",
|
|
"# only add extensions or repos that could NEVER contain an ACL definition\n",
|
|
"filenames_to_skip = {\"setup.py\", \"pyproject.toml\", \"requirements.txt\", \"Makefile\", \n",
|
|
" \"Dockerfile\", \"package-lock.json\", \"cacerts.txt\", \"strings.xml\"}\n",
|
|
"extensions_to_skip = ( \".ics\", \".md\", \".markdown\", \".rst\", \".der\", \".pem\", \".crt\", \n",
|
|
" \".html\", \".htm\", \".svg\", \".bib\", \".po\", )\n",
|
|
"\n",
|
|
"# some orgs have patterns for repo names, take advantage of that\n",
|
|
"repos_to_skip_regexp = set((\n",
|
|
" \"mozilla-services/foxsec-results\",\n",
|
|
" \"mozilla-services/cloudops-jenkins\",\n",
|
|
" \"mozilla-services/cloudqa-jenkins\",\n",
|
|
" \"mozilla/gecko-dev\",\n",
|
|
" \"mozilla/eu2019-ad-transparency-report\",\n",
|
|
" \"mdn/archived-content\",\n",
|
|
" \"mozilla-releng/take-home-assignment.*\", # interview tests\n",
|
|
" \"mozilla-it/www-archive.mozilla.org\",\n",
|
|
" \"mdn/retired.*content\", # history\n",
|
|
" \"mozilla-it/sumo-l10n.*\",\n",
|
|
" \"mozmeao/sumo-l10n.*\",\n",
|
|
" \"mozmeao/www-l10n\",\n",
|
|
" \"mozilla-services/ms-language-packs\",\n",
|
|
" \"mozilladatascience/search-terms-sanitization\",\n",
|
|
" \"Pocket/AndroidHiring\",\n",
|
|
" \"Pocket/Localization\",\n",
|
|
" \"Pocket/data-explorations\",\n",
|
|
" \"mdn/translated-content\",\n",
|
|
" \"Pocket/parser-benchmark\",\n",
|
|
" \"mozilla/releases_insights\",\n",
|
|
"))\n",
|
|
"# convert to single regexp & compile\n",
|
|
"# based on https://stackoverflow.com/a/3040797/5128493\n",
|
|
"import re\n",
|
|
"RE_REPO_TO_SKIP = re.compile(\"(?:\" + \")|(?:\".join(repos_to_skip_regexp) + \")\", re.IGNORECASE)\n",
|
|
"\n",
|
|
"# These orgs are guaranteed not to have any current ACLs in them\n",
|
|
"# - could be expanded for any parked or archived org\n",
|
|
"orgs_to_skip = (\n",
|
|
" \"fxos\",\n",
|
|
" \"fxos-eng\",\n",
|
|
" \"mozilla-b2g\",\n",
|
|
" \"moco-ghe-admin\",\n",
|
|
" \"mozilla-l10n\", # only translations, no apps or services\n",
|
|
" \"common-voice\", # not supported by IT\n",
|
|
")\n",
|
|
"# skip anything in a `tests` directory\n",
|
|
"# ToDo: consider making regexp, would allow path /data/ & /changelog.*/ to be added\n",
|
|
"paths_to_skip = (\n",
|
|
" \"test\",\n",
|
|
" \"train\", # all the AI these days\n",
|
|
" \"sample\",\n",
|
|
" \"locales\", # l10n stuff\n",
|
|
" \"translations\",\n",
|
|
" \"template\",\n",
|
|
" \"resources\",\n",
|
|
")\n",
|
|
"\n",
|
|
"# test on any change\n",
|
|
"print(_test_ignore_filters())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bb5e4999",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# we get some insane counts sometimes, along with 404s as all of these\n",
|
|
"# results are based on an index GitHub created some time ago, and include sha1\n",
|
|
"# references. E.g. 404 link:\n",
|
|
"# https://github.com/mozilla-services/foxsec-results/blob/70a5b7841edcdb967beddbce75309efa0bc2b687/aws-pytest/cloudservices-aws-stage/one-offs/cloudservices-aws-stage-2018-01-31-secgroup-service-report.md/search?q=oremj&type=code\n",
|
|
"# valid version\n",
|
|
"# https://github.com/mozilla-services/foxsec-results/blob/master/aws-pytest/cloudservices-aws-stage/one-offs/cloudservices-aws-stage-2018-01-31-secgroup-service-report.md\n",
|
|
"# However, that URL won't support a search endpoint, so what we _really_ want is\n",
|
|
"# https://github.com/search?q=oremj+repo%3Amozilla-services%2Ffoxsec-results++path%3Aaws-pytest%2Fcloudservices-aws-stage%2Fone-offs%2F+filename%3Acloudservices-aws-stage-2018-01-31-secgroup-service-report.md&type=Code&ref=advsearch\n",
|
|
"# https://github.com/search?q=repo%3Amozilla-services/foxsec-results%20path%3Aaws-pytest/cloudservices-aws-stage/one-offs%20filename%3Acloudservices-aws-stage-2018-01-31-secgroup-service-report.md%20fred&type=code&ref=advsearch\n",
|
|
"#\n",
|
|
"# rebuild the file hit url into what we want\n",
|
|
"\n",
|
|
"from urllib.parse import urlparse, urlunparse, quote_plus\n",
|
|
"# from IPython.core.debugger import set_trace; set_trace()\n",
|
|
"\n",
|
|
"\n",
|
|
"def ignore_path(url_list, debug=False):\n",
|
|
" for ignorable in paths_to_skip:\n",
|
|
" # we want to do partial matching, so can't use list comprehension\n",
|
|
" for element in [x.lower() for x in url_list[1:-1]]:\n",
|
|
" if ignorable in element:\n",
|
|
" if debug:\n",
|
|
" print(f\"Ignoring {'/'.join(url_list)}\")\n",
|
|
" return True\n",
|
|
" return False\n",
|
|
"\n",
|
|
"def search_hit_to_url(url, login=None, debug=False):\n",
|
|
" # split into components\n",
|
|
" parts = urlparse(url)\n",
|
|
" # break down the path\n",
|
|
" path_parts = parts.path.split('/')\n",
|
|
" if path_parts[1] in orgs_to_skip:\n",
|
|
" # ignore orgs with no relevant ACL\n",
|
|
" if debug:\n",
|
|
" print(f\"ignoring based on org '{path_parts[1]}'\")\n",
|
|
" return\n",
|
|
" repo = '/'.join(path_parts[1:3])\n",
|
|
"# if repo in repos_to_skip:\n",
|
|
" if RE_REPO_TO_SKIP.match(repo):\n",
|
|
" if debug:\n",
|
|
" print(f\"ignoring based on repo '{repo}'\")\n",
|
|
" print(f\"pattern: r'{RE_REPO_TO_SKIP.pattern}'\")\n",
|
|
" return\n",
|
|
" if ignore_path(path_parts, debug=debug):\n",
|
|
" if debug:\n",
|
|
" print(f\"ignoring based on path: {'/'.join(path_parts)}\")\n",
|
|
" return\n",
|
|
" filename = path_parts[-1]\n",
|
|
"# from pprint import pprint\n",
|
|
"# pprint(path_parts)\n",
|
|
"# print(f\"{repo}; {filename}\")\n",
|
|
" try:\n",
|
|
"# from IPython.core.debugger import Pdb; Pdb().set_trace()\n",
|
|
" if filename[filename.rindex('.'):] in extensions_to_skip:\n",
|
|
" if debug:\n",
|
|
" print(f\"ignoring due to extension '{filename[filename.rindex('.'):]}'\")\n",
|
|
" return\n",
|
|
" except ValueError:\n",
|
|
" # file didn't have extension, so process it\n",
|
|
" pass\n",
|
|
" if filename in filenames_to_skip:\n",
|
|
" if debug:\n",
|
|
" print(f\"ignoring due to filename '{filename}'\")\n",
|
|
" return\n",
|
|
" basepath = path_parts[3:-1]\n",
|
|
" if basepath[0] == \"blob\":\n",
|
|
" # get rid of 'blob' and sha1\n",
|
|
" basepath = basepath[2:]\n",
|
|
" \n",
|
|
" # build the new query string\n",
|
|
" # With the newer syntax, everything can be in one \"path\" filter\n",
|
|
" basepath = '/'.join(basepath)\n",
|
|
" filename_filter = 'path:\"'\n",
|
|
" if basepath:\n",
|
|
" filename_filter += f\"{basepath}/\"\n",
|
|
" filename_filter += f'{filename}\"'\n",
|
|
" # the actual query needs to be form encoded, so use quote_plus. \n",
|
|
" # We make unusual characters safe to get the correct result when working with file with embedded spaces.\n",
|
|
" # Note - won't currently work with paths with spaces\n",
|
|
" query = quote_plus(f\"\"\"repo:{repo} {filename_filter} {login}\"\"\", safe='/%\"')\n",
|
|
" query_string = f\"type=Code&ref=advsearch&q={query}\"\n",
|
|
" \n",
|
|
" # now rebuild the url\n",
|
|
" new_url = urlunparse((\n",
|
|
" parts.scheme,\n",
|
|
" parts.netloc,\n",
|
|
" \"search\",\n",
|
|
" None, # params\n",
|
|
" query_string,\n",
|
|
" None, # fragment\n",
|
|
" ))\n",
|
|
" return new_url, repo, basepath, filename"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "fa87e0b3",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# test on any change - put in own cell, so failure obvious\n",
|
|
"print(_test_ignore_filters())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "25536c3e",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"##### ACL reporting code"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2ccc2eb4",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def prune_hits_to_ignore(full_list, id_to_find):\n",
|
|
" # remove vulnerability repos (*-ghsa-*) and archived repos (archive status \n",
|
|
" # requires refresh of repository object\n",
|
|
" hit_list_1 = [r for r in full_list if (not \"-ghsa-\" in r.repository.name)\n",
|
|
" and (not r.repository.refresh().archived)\n",
|
|
" ]\n",
|
|
" # now eliminate any hits where the search term was not found \"as a word\"\n",
|
|
" id_re = re.compile(fr\"\\b{id_to_find}\\b\", re.IGNORECASE)\n",
|
|
" hit_list_2 = []\n",
|
|
"# print(f\"Checking {len(hit_list_1)} hits\")\n",
|
|
" for index, hit in enumerate(hit_list_1):\n",
|
|
"# print(f\" Hit {index} has {len(hit.text_matches)} contexts\")\n",
|
|
" for ctxt, context in enumerate(hit.text_matches):\n",
|
|
" if id_re.search(context[\"fragment\"]):\n",
|
|
" hit_list_2.append(hit)\n",
|
|
"# print(f\"Adding hit {index}; context {ctxt} ({len(hit_list_2)=}): {context['fragment']}\")\n",
|
|
" break\n",
|
|
" else:\n",
|
|
"# print(f\"ignoring context {context['fragment']}\")\n",
|
|
" ...\n",
|
|
"# print(f\"returning {len(hit_list_2)} hits\")\n",
|
|
" return hit_list_2"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "74a23434",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import csv, io\n",
|
|
"def check_for_acls(logins):\n",
|
|
"\n",
|
|
" \"\"\" Check for these items in code, could be an acl to be removed\n",
|
|
" \n",
|
|
" Note that we haven't pruned logins to just the orgs we found hits on -- we're using all GitHub logins. May want to modify in the future.\n",
|
|
" \"\"\"\n",
|
|
" possibles = set(logins)\n",
|
|
"\n",
|
|
" # we're now outputing in CSV format, so put in a header line\n",
|
|
" csvfile = io.StringIO()\n",
|
|
" writer = csv.writer(csvfile)\n",
|
|
" writer.writerow([\"Action Taken\", \"Comment\", \"\", \"Context\", \"File\", \"Search URL\", \"Raw Context\"])\n",
|
|
" # add formula to use for copy down in R2C3 - still requires manual intervention\n",
|
|
" # 1. in cell C3 select, edit, and enter to make real formula\n",
|
|
" # 2. fill down for all rows in sheet\n",
|
|
" writer.writerow([\"\", \"\", '=if(ISBLANK(F2),\"\", HYPERLINK(F2,\"?\"))', '=if(isblank(G2),,SUBSTITUTE(G2,\"\\\\n\",char(10)))', \"\", \"\"])\n",
|
|
" writer.writerow([\"\"] * 4)\n",
|
|
" writer.writerow([f\"Checking for possible ACLs for: {', '.join(possibles)}\", \"\", \"\",])\n",
|
|
" writer.writerow([\"\"] * 4)\n",
|
|
"# import pdb ; pdb.set_trace()\n",
|
|
"# from IPython.core.debugger import set_trace; set_trace()\n",
|
|
"\n",
|
|
"\n",
|
|
" # needed for exception handling below\n",
|
|
" import http\n",
|
|
"\n",
|
|
" for org in orgs_to_check:\n",
|
|
"# print(f\" {org}..\", end='')\n",
|
|
" for l in possibles:\n",
|
|
" full_list = []\n",
|
|
" hit_iter = gh.search_code(query=f\"org:{org} {l}\", text_match=True)\n",
|
|
" assume_time_out = True\n",
|
|
" while assume_time_out:\n",
|
|
" try:\n",
|
|
" # 2023-05-25 can't use regex in code search, so return context for further processing\n",
|
|
" # 2023-06-16 can't get as list in one shot, as it could consume search query limit every restart\n",
|
|
" # but, even this isn't \"right\" -- some state in\n",
|
|
" # the iterator appears to be changed prior to\n",
|
|
" # the network call. I.e. even if the network\n",
|
|
" # call fails, the iterator has \"advanced\", and\n",
|
|
" # we're missing some elements.\n",
|
|
" #\n",
|
|
" # I think the only way around that is to do our\n",
|
|
" # own iterator, so we can specifically identify\n",
|
|
" # _when_ we're making network calls, vs\n",
|
|
" # unpacking another element from the last call.\n",
|
|
" # Or a hack to reach that deep inside the\n",
|
|
" # iterator. :/\n",
|
|
" hit = hit_iter.next()\n",
|
|
" full_list.append(hit)\n",
|
|
" print(f\"so far {len(full_list)}\")\n",
|
|
" except StopIteration:\n",
|
|
" # we have to handle explicily - just set flag\n",
|
|
" assume_time_out = False\n",
|
|
" except Exception as e:\n",
|
|
" if isinstance(e, http.client.RemoteDisconnected):\n",
|
|
" # This is \"fun\" to run into - doesn't happen very often\n",
|
|
" # so this recovery is an educated guess (the time I\n",
|
|
" # did see it, it was after a 'resumed' message from\n",
|
|
" # the clause below)\n",
|
|
" for i in range(3):\n",
|
|
" try_login()\n",
|
|
" if gh:\n",
|
|
" # re-established connection\n",
|
|
" print(f\"re-established connection on try {i+1}\")\n",
|
|
" break\n",
|
|
" else:\n",
|
|
" time.sleep(60)\n",
|
|
" else:\n",
|
|
" print(f\"failed to re-establish connection after {i+1} tries\")\n",
|
|
" raise SystemExit\n",
|
|
" elif not hasattr(e, 'code'):\n",
|
|
" print(f\"org={org} l={l} exception={str(e)} (exception type {type(e)})\")\n",
|
|
" elif e.code not in [403, 422]:\n",
|
|
" print(f\"org={org} l={l} exception={str(e)}\")\n",
|
|
" elif e.code in [403]:\n",
|
|
" seconds_to_wait = 7\n",
|
|
" print(f\"Out of Code Search API calls, waiting {seconds_to_wait} seconds ({org=}, {l=}) ..\", end='')\n",
|
|
" # we can hit this a lot, so just wait a minute - only 10 req/min\n",
|
|
" # per https://docs.github.com/en/enterprise-cloud@latest/rest/search?apiVersion=2022-11-28#rate-limit\n",
|
|
" time.sleep(seconds_to_wait)\n",
|
|
" print(\"... resumed.\")\n",
|
|
" # we've reported on everything of interest, no need for else clause\n",
|
|
" # else:\n",
|
|
" # print(f\"Got code {e.code} for org {org}, search {l}\")\n",
|
|
"\n",
|
|
" hit_list = prune_hits_to_ignore(full_list, l)\n",
|
|
"\n",
|
|
" search_urls = []\n",
|
|
" for search_hit in hit_list:\n",
|
|
" new_url = search_hit_to_url(search_hit.html_url, l, debug=False)\n",
|
|
" if new_url:\n",
|
|
" # add the matching fragments as the 2nd item of a tupple\n",
|
|
" context = \"\\n----\\n\".join([m['fragment'] for m in search_hit.text_matches])\n",
|
|
" search_urls.append((*new_url, context.replace(\"\\n\", \"\\\\n\")))\n",
|
|
" num_raw_search_urls = len(search_urls)\n",
|
|
" search_urls = set(search_urls)\n",
|
|
" num_search_urls = len(search_urls)\n",
|
|
"# print(f\"search results: {len(hit_list)}; after translation: {num_raw_search_urls}; after dedupe: {num_search_urls}\")\n",
|
|
" if num_search_urls > 0:\n",
|
|
" writer.writerow(['', f\"{num_search_urls} files with possible ACLs in {org} for {l}:\", \"\", \"\"])\n",
|
|
" for url, repo, path, filename, context in sorted(search_urls):\n",
|
|
" # output in csv format\n",
|
|
" writer.writerow([\"\", \"\", \"\", \"\", f\"{repo}/{path}/{filename}\", f\"{url}\", context])\n",
|
|
" # import pdb ; pdb.set_trace()\n",
|
|
" csvfile.seek(0)\n",
|
|
" hits = [l.strip() for l in csvfile.readlines()]\n",
|
|
" return hits"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "369f2337",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def check_github_acls(logins):\n",
|
|
" logins_to_check = set(logins.split())\n",
|
|
" # import pdb; pdb.set_trace()\n",
|
|
" msgs = check_for_acls(logins_to_check)\n",
|
|
" print(\"=\" * 35)\n",
|
|
" display(*msgs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "eb5a02ad",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### Heroku support"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b59d62ad",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def check_heroku_logins(logins):\n",
|
|
" logins_to_check = set(logins.split())\n",
|
|
" # import pdb; pdb.set_trace()\n",
|
|
" for login in logins_to_check:\n",
|
|
" print(\"\\nworking on %s:\" % login)\n",
|
|
" show_heroku_data(login)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9fb098a1",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# let user start manual work before we do all the GitHub calls\n",
|
|
"def show_heroku_data(primary_email:str) -> None:\n",
|
|
" display(\"Check these URLs for Heroku activity:\")\n",
|
|
" display(\n",
|
|
" \" Heroku Access: https://people.mozilla.org/a/heroku-members/edit?section=members\"\n",
|
|
" )\n",
|
|
" display(f\" copy/paste for ^^ query: :{primary_email}: \")\n",
|
|
" display(\n",
|
|
" \" People: https://people.mozilla.org/s?who=all&query={}\".format(\n",
|
|
" primary_email.replace(\"@\", \"%40\")\n",
|
|
" )\n",
|
|
" )\n",
|
|
" display(\n",
|
|
" \" Heroku: https://dashboard.heroku.com/teams/mozillacorporation/access?filter={}\".format(\n",
|
|
" primary_email.replace(\"@\", \"%40\")\n",
|
|
" )\n",
|
|
" )"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d7c90ee1",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### main driver"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "db4a9236",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import re\n",
|
|
"import os\n",
|
|
"\n",
|
|
"re_flags = re.MULTILINE | re.IGNORECASE\n",
|
|
"\n",
|
|
"\n",
|
|
"def process_from_email(email_body):\n",
|
|
" # get rid of white space\n",
|
|
" email_body = os.linesep.join(\n",
|
|
" [s.strip() for s in email_body.splitlines() if s.strip()]\n",
|
|
" )\n",
|
|
" if not email_body:\n",
|
|
" return\n",
|
|
"\n",
|
|
" user = set()\n",
|
|
"\n",
|
|
" # Extract data from internal email format\n",
|
|
" match = re.search(r\"^Full Name: (?P<full_name>\\S.*)$\", email_body, re_flags)\n",
|
|
" if match:\n",
|
|
" # add base and some variations\n",
|
|
" full_name = match.group(\"full_name\")\n",
|
|
" user.add(full_name)\n",
|
|
" # remove spaces, forward & reversed\n",
|
|
" user.add(full_name.replace(\" \", \"\"))\n",
|
|
" user.add(\"\".join(full_name.split()[::-1]))\n",
|
|
" # use hypens, forward & reversed\n",
|
|
" user.add(full_name.replace(\" \", \"-\"))\n",
|
|
" user.add(\"-\".join(full_name.split()[::-1]))\n",
|
|
"\n",
|
|
" match = re.search(r\"^Email: (?P<primary_email>.*)$\", email_body, re_flags)\n",
|
|
" primary_email = match.group(\"primary_email\") if match else None\n",
|
|
" user.add(primary_email)\n",
|
|
" default_login = primary_email.split(\"@\")[0].lower() if primary_email else None\n",
|
|
" if default_login:\n",
|
|
" # add some common variations that may get discarded for \"too many\" matches\n",
|
|
" user.update(\n",
|
|
" [\n",
|
|
" f\"moz{default_login}\",\n",
|
|
" f\"moz-{default_login}\",\n",
|
|
" f\"mozilla{default_login}\",\n",
|
|
" f\"mozilla-{default_login}\",\n",
|
|
" f\"{default_login}moz\",\n",
|
|
" f\"{default_login}-moz\",\n",
|
|
" ]\n",
|
|
" )\n",
|
|
"\n",
|
|
" if not defer_heroku_lookup:\n",
|
|
" show_heroku_data(str(primary_email))\n",
|
|
" display(email_body)\n",
|
|
"\n",
|
|
" match = re.search(r\"^Github Profile: (?P<github_profile>.*)$\", email_body, re_flags)\n",
|
|
" declared_github = match.group(\"github_profile\") if match else None\n",
|
|
" user.add(declared_github)\n",
|
|
" display(f\"Declared GitHub {declared_github}\")\n",
|
|
"\n",
|
|
" # check CIS for verified login (not all users will have creds)\n",
|
|
" verified_github_login, first_name = check_CIS(primary_email)\n",
|
|
" if verified_github_login:\n",
|
|
" user.add(verified_github_login)\n",
|
|
" display(f\"Verified GitHub {verified_github_login}\")\n",
|
|
"\n",
|
|
" match = re.search(r\"^Zimbra Alias: (?P<other_email>.*)$\", email_body, re_flags)\n",
|
|
" possible_aliases = extract_addresses(match.group(\"other_email\") if match else None)\n",
|
|
" user.update(possible_aliases)\n",
|
|
"\n",
|
|
" # new field: Email Alias -- list syntax (brackets)\n",
|
|
" match = re.search(r\"^Email Alias: \\s*\\[(?P<alias_email>.*)\\]\", email_body, re_flags)\n",
|
|
" user.add(match.group(\"alias_email\") if match else None)\n",
|
|
"\n",
|
|
" # we consider each token in the IM line as a possible GitHub login\n",
|
|
" match = re.search(r\"^IM:\\s*(.*)$\", email_body, re_flags)\n",
|
|
" if match:\n",
|
|
" im_line = match.groups()[0]\n",
|
|
" matches = re.finditer(r\"\\W*((\\w+)(?:\\s+\\w+)*)\", im_line)\n",
|
|
" user.update([x.group(1) for x in matches] if matches else None)\n",
|
|
"\n",
|
|
" match = re.search(r\"^Bugzilla Email: (?P<bz_email>.*)$\", email_body, re_flags)\n",
|
|
" user.add(match.group(\"bz_email\") if match else None)\n",
|
|
"\n",
|
|
" # grab the department name, for a heuristic on whether we expect to find perms\n",
|
|
" expect_github_login = False\n",
|
|
" match = re.search(r\"^\\s*Dept Name: (?P<dept_name>\\S.*)$\", email_body, re_flags)\n",
|
|
" if match and not verified_github_login:\n",
|
|
" department_name = match.groups()[0].lower()\n",
|
|
" dept_keys_infering_github = [\"firefox\", \"engineering\", \"qa\", \"operations\"]\n",
|
|
" for key in dept_keys_infering_github:\n",
|
|
" if key in department_name:\n",
|
|
" expect_github_login = True\n",
|
|
" break\n",
|
|
"\n",
|
|
" # clean up some noise, case insensitively, \"binary\" markers\n",
|
|
" user = {x.lower() for x in user if x and (len(x) > 2)}\n",
|
|
" to_update = [x[2:-1] for x in user if (x.startswith(\"b'\") and x.endswith(\"'\"))]\n",
|
|
" user.update(to_update)\n",
|
|
" user = {x for x in user if not (x.startswith(\"b'\") and x.endswith(\"'\"))}\n",
|
|
"\n",
|
|
" # the tokens to ignore are added based on discovery,\n",
|
|
" # they tend to cause the searches to get rate limited.\n",
|
|
" user = user - {\n",
|
|
" None,\n",
|
|
" \"irc\",\n",
|
|
" \"slack\",\n",
|
|
" \"skype\",\n",
|
|
" \"b\",\n",
|
|
" \"hotmail\",\n",
|
|
" \"mozilla\",\n",
|
|
" \"ro\",\n",
|
|
" \"com\",\n",
|
|
" \"softvision\",\n",
|
|
" \"mail\",\n",
|
|
" \"twitter\",\n",
|
|
" \"blog\",\n",
|
|
" \"https\",\n",
|
|
" \"jabber\",\n",
|
|
" \"net\",\n",
|
|
" \"github\",\n",
|
|
" \"gmail\",\n",
|
|
" \"facebook\",\n",
|
|
" \"guy\",\n",
|
|
" \"pdx\",\n",
|
|
" \"yahoo\",\n",
|
|
" \"aim\",\n",
|
|
" \"whatsapp\",\n",
|
|
" \"gtalk\",\n",
|
|
" \"google\",\n",
|
|
" \"gpg\",\n",
|
|
" \"telegram\",\n",
|
|
" \"keybase\",\n",
|
|
" \"zoom\",\n",
|
|
" \"name\",\n",
|
|
" }\n",
|
|
" global displayed_users\n",
|
|
" displayed_users = set()\n",
|
|
" try:\n",
|
|
" headers = [\n",
|
|
" \"Search seeds: '{}'\".format(\"', '\".join(user)),\n",
|
|
" ]\n",
|
|
" display(*headers)\n",
|
|
" guesses = set()\n",
|
|
" for term in user:\n",
|
|
" possibles = gather_possibles(term)\n",
|
|
" guesses.update({x.lower() for x in possibles})\n",
|
|
" # include declared_github if it exists\n",
|
|
" if declared_github:\n",
|
|
" guesses.add(declared_github.lower())\n",
|
|
" guesses.update({x.login.lower() for x in displayed_users})\n",
|
|
" display(f\"Checking logins {guesses}\")\n",
|
|
" msgs = []\n",
|
|
" msgs, logins_with_hits = check_login_perms(guesses, headers)\n",
|
|
" display(f\"resetting guesses to found hits: {logins_with_hits}\")\n",
|
|
" guesses = logins_with_hits\n",
|
|
" found_perms = \"FOUND!\" in \"\".join(msgs)\n",
|
|
" display(f\"msgs {len(msgs)}; headers {len(headers)}\")\n",
|
|
" display(\n",
|
|
" \"found_perms {}; declared_github {} {}\".format(\n",
|
|
" found_perms, declared_github, bool(declared_github)\n",
|
|
" )\n",
|
|
" )\n",
|
|
"\n",
|
|
" if declared_github and not found_perms:\n",
|
|
" msgs.append(f\"Even for declared login '{declared_github}'.\")\n",
|
|
" if expect_github_login and not found_perms:\n",
|
|
" msgs.append(\n",
|
|
" \"WARNING: expected GitHub permissions for dept '{}'\".format(\n",
|
|
" department_name\n",
|
|
" )\n",
|
|
" )\n",
|
|
" \n",
|
|
" # check for GitHub login or ldap in a file (might be permissions)\n",
|
|
" guesses.add(default_login)\n",
|
|
" print(f\"before: guesses {guesses}; default {default_login}\")\n",
|
|
" if first_name and not primary_email.endswith(\"mozilla.com\"):\n",
|
|
" # for non-MoCo emails, the the primary email stem might be their first name\n",
|
|
" # which leads to an insane amount of false positives, so remove it\n",
|
|
" # the assumption is that they would not be in any ACL files, anyway\n",
|
|
" guesses -= {first_name.lower()}\n",
|
|
" print(f\" after: guesses {guesses}; default {default_login}\")\n",
|
|
" if defer_acl_lookup:\n",
|
|
" display(\"Defering search for possible ACLs\")\n",
|
|
" msgs.append(f\"Check for ACLs using the following values: {' '.join([str(x) for x in guesses])}\")\n",
|
|
" else:\n",
|
|
" display(\"Looking for possible ACLs\")\n",
|
|
" new_msgs = check_for_acls(guesses)\n",
|
|
" msgs.extend(new_msgs)\n",
|
|
" msgs.append(\"Finished all reporting.\")\n",
|
|
" display(*msgs)\n",
|
|
" except github3.exceptions.ForbiddenError as e:\n",
|
|
" print_limits(e)\n",
|
|
" raise e"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5a0c94cb",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from ipywidgets import interact_manual, Layout, widgets\n",
|
|
"from IPython.display import display\n",
|
|
"\n",
|
|
"text = widgets.Textarea(\n",
|
|
" value=\"email: \\nim: \",\n",
|
|
" placeholder=\"Paste ticket description here!\",\n",
|
|
" description=\"Email body:\",\n",
|
|
" layout=Layout(width=\"95%\"),\n",
|
|
" disabled=False,\n",
|
|
")\n",
|
|
"\n",
|
|
"run_process = interact_manual.options(manual_name=\"Process\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "309442de",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def display(*args):\n",
|
|
" # iPyWidgets don't like unicode - ensure everything we try to put there is ascii\n",
|
|
" text = \"\\n\".join(\n",
|
|
" [str(x) for x in args]\n",
|
|
" ) # deal with None values by casting to unicode\n",
|
|
" # python 3 no longer requires us to play the convert-to-ascii game\n",
|
|
" cleaned = text #.encode(\"ascii\", \"replace\")\n",
|
|
" if cleaned.strip():\n",
|
|
" print(str(cleaned))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "70531388",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def check_github_logins(logins):\n",
|
|
" logins_to_check = set(logins.split())\n",
|
|
" # import pdb; pdb.set_trace()\n",
|
|
" for login in logins_to_check:\n",
|
|
" print(\"\\nworking on %s:\" % login)\n",
|
|
" msgs = check_login_perms([login])\n",
|
|
" display(*msgs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c18e3641",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### Cache Tuning & Clearing\n",
|
|
"\n",
|
|
"Various functions use lru_cache -- this outputs the values to see if they are tuned appropriately.\n",
|
|
"\n",
|
|
"Note that these have no meaning until after 1 or more queries have been run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "efe34b13",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"lines_to_next_cell": 2
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(\"get_users\")\n",
|
|
"print(get_users.cache_info())\n",
|
|
"print(\"_search_for_org\")\n",
|
|
"print(_search_for_org.cache_info())\n",
|
|
"\n",
|
|
"print(\"get_collaborators\")\n",
|
|
"print(get_collaborators.cache_info())\n",
|
|
"print(\"get_members\")\n",
|
|
"print(get_members.cache_info())\n",
|
|
"\n",
|
|
"print(\"get_org_owners\")\n",
|
|
"print(get_org_owners.cache_info())\n",
|
|
"print(\"get_inspectable_org_object\")\n",
|
|
"print(get_inspectable_org_object.cache_info())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b28cf0e8",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(\"clearing caches...\")\n",
|
|
"get_users.cache_clear()\n",
|
|
"_search_for_org.cache_clear()\n",
|
|
"get_collaborators.cache_clear()\n",
|
|
"get_members.cache_clear()\n",
|
|
"get_org_owners.cache_clear()\n",
|
|
"get_inspectable_org_object.cache_clear()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d11ee975",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"#### EML file support"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c734fd08",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 2
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# read EML file support\n",
|
|
"import email\n",
|
|
"from ipywidgets import FileUpload\n",
|
|
"from pprint import pprint as pp\n",
|
|
"from IPython.display import display as display_widget"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "465a9648",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"init_cell": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"\n",
|
|
"\n",
|
|
"def extract_reply(body):\n",
|
|
" extracted = []\n",
|
|
" for l in body.split(\"\\r\\n\"):\n",
|
|
" if l.startswith(\"> --\"):\n",
|
|
" break\n",
|
|
" elif l.startswith(\"> \"):\n",
|
|
" extracted.append(l[2:])\n",
|
|
" return extracted\n",
|
|
"\n",
|
|
"\n",
|
|
"def process_from_file(uploader):\n",
|
|
" # message = email.message_from_string()\n",
|
|
" for file in list(uploader.value.keys()):\n",
|
|
" print(\"checking %s\" % file)\n",
|
|
" pp(list(uploader.value[file].keys()))\n",
|
|
" content = uploader.value[file][\"content\"]\n",
|
|
" pp(type(content))\n",
|
|
" pp(type(uploader.value[file]))\n",
|
|
" #pp(uploader.value[file])\n",
|
|
" message = email.message_from_bytes(content)\n",
|
|
" #message = email.message_from_string(uploader.value[file][\"content\"])\n",
|
|
" for part in message.walk():\n",
|
|
" if part.get_content_maintype() == \"multipart\":\n",
|
|
" continue\n",
|
|
" else:\n",
|
|
" mime = part.get_content_type()\n",
|
|
" if \"plain\" in mime:\n",
|
|
" body = part.get_payload()\n",
|
|
" # this could be the original, or a reply\n",
|
|
" if re.search(r\"\"\"^Full Name:\"\"\", body, re_flags):\n",
|
|
" print(\"original email:\")\n",
|
|
" process_from_email(body)\n",
|
|
" elif re.search(r\"\"\"^> Full Name:\"\"\", body, re_flags):\n",
|
|
" print(\"reply:\")\n",
|
|
" process_from_email(\"\\n\".join(extract_reply(body)))\n",
|
|
" else:\n",
|
|
" print(\"no match!\\n%s\" % body)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1867aff0",
|
|
"metadata": {
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"# Start of common usage (How To)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f7d23e4d",
|
|
"metadata": {
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"Currently, there are three common use cases:\n",
|
|
"- processing an offboarding email (via downloaded EML file),\n",
|
|
"- processing an offboarding email (via message copy/paste), and\n",
|
|
"- adhoc lookup of GitHub login\n",
|
|
"\n",
|
|
"For anything else, you're on your own!\n",
|
|
"\n",
|
|
"All usage requires the following setup:\n",
|
|
"1. Supply your PAT token via the environment variable `GITHUB_PAT` when starting the notebook server. (If you can't do that, read the code for another way.)\n",
|
|
"2. Supply your CIS credentials via the environment variables `CIS_CLIENT_ID` and `CIS_CLIENT_SECRET`\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "74367943",
|
|
"metadata": {
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"## EML File parsing"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ac01d450",
|
|
"metadata": {},
|
|
"source": [
|
|
"Upload the file using the button below, then process that file by running the cell below the button. You can only process one file at a time, but the \"file uploaded\" count will continue to increase (ui glitch)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "98322893",
|
|
"metadata": {
|
|
"init_cell": true,
|
|
"lines_to_next_cell": 2
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"_uploader = FileUpload(accept=\"*.eml\", multiple=False)\n",
|
|
"display_widget(_uploader)\n",
|
|
"# check_file(_uploader)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0b6207ae",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"def check_file(f):\n",
|
|
" try:\n",
|
|
" # display_widget(_uploader)\n",
|
|
" process_from_file(f)\n",
|
|
" print(\"completed\")\n",
|
|
" except Exception as e:\n",
|
|
" print(repr(e))\n",
|
|
" raise\n",
|
|
"\n",
|
|
"\n",
|
|
"check_file(_uploader)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7dbe286a",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"## Process offboarding email body text (copy/paste)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "145c0eef",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"Usage steps - for each user:\n",
|
|
" 1. Run the cell below -- it should display a text entry area and a \"Process\" button.\n",
|
|
" 2. Copy entire text of email\n",
|
|
" 3. Paste into the text area below\n",
|
|
" 4. Click the \"Process\" button\n",
|
|
" 5. Use the generated links to check for Heroku authorization\n",
|
|
" 6. After \"process finished\" printed, copy/paste final output into email"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9f75814c",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"@run_process(t=text)\n",
|
|
"def show_matches(t):\n",
|
|
" try:\n",
|
|
" process_from_email(t)\n",
|
|
" except Exception as e:\n",
|
|
" print(repr(e))\n",
|
|
" pass"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "62ad9a7c",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"## Adhoc Lookups (GitHub & Heroku)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6209c976",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"Fill in list of the desired logins in the cell below"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "650f2db4",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"check_github_logins(\n",
|
|
" \"\"\" \n",
|
|
" \"\"\"\n",
|
|
")\n",
|
|
"print(\"done\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f28ed4dc",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"check_heroku_logins(\n",
|
|
" \"\"\" \n",
|
|
" \"\"\"\n",
|
|
")\n",
|
|
"print(\"done\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "de2fe9a2",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"## ACL search"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8a5cd9b9",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"Fill in list of the desired logins in the cell below. Appropriate values may be in the GitHub report."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9112d45e",
|
|
"metadata": {
|
|
"hidden": true,
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"print()\n",
|
|
"start = time.time()\n",
|
|
"check_github_acls(\n",
|
|
" \"\"\" \n",
|
|
"\n",
|
|
"\"\"\"\n",
|
|
")\n",
|
|
"duration = time.time() - start\n",
|
|
"print(f\"done in {int(duration)} seconds\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a60e70e9",
|
|
"metadata": {
|
|
"heading_collapsed": true,
|
|
"lines_to_next_cell": 0
|
|
},
|
|
"source": [
|
|
"# To Do"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f4d3b5a3",
|
|
"metadata": {
|
|
"hidden": true
|
|
},
|
|
"source": [
|
|
"- check invites as well, using manage_invitations.py\n",
|
|
"- code doesn't handle hyphenated github logins, e.g. 'marco-c' (gets split)\n",
|
|
"- github lookup should strip https... so can use link from people.m.o\n",
|
|
"- dpreston, aka fzzy, doesn't have any GitHub perms\n",
|
|
"- fix permutations of names\n",
|
|
"- preprocess to remove all (colon separated) :b':':[:]: (maybe not the :b: & :':)\n",
|
|
"- add link to Heroku service accounts to check\n",
|
|
"- hide connection failure tracebacks during check_github_tracebacks\n",
|
|
"\n",
|
|
"<details><summary>Completed <small><em>(click to toggle)</em></small></summary>\n",
|
|
"\n",
|
|
"- ~~GitHub login no longer part of email, but user id is available via CIS~~\n",
|
|
"- ~~add \"clear cache\" button to purge after long idle~~ _(in tuning section)_\n",
|
|
"- ~~add common login with 'moz{,illa}' tacked on, sometimes with a dash~~\n",
|
|
"- ~~update link to view access group on people.m.o~~\n",
|
|
"- ~~add \"trying\" info to copy/paste output~~\n",
|
|
"- ~~double check that \"even for declared login\" code still active~~\n",
|
|
"- ~~add formatted output summary for copy/paste~~\n",
|
|
"- ~~when a guess is multiple words, each word should be tried separately as well~~\n",
|
|
"- ~~code should always search for stated github, even if search is \"too many\" (e.g. \"past\")~~\n",
|
|
"- ~~does not call out owner status (reports as member)~~\n",
|
|
"- ~~add short ldap name as an \"always check\"~~\n",
|
|
"- ~~always check stem when search gives too many (i.e. go for the exact match)~~\n",
|
|
"- ~~treat Zimbra Aliases as a potential multi valued list (or empty)~~\n",
|
|
"- ~~\"-\" is a valid character in GitHub logins. Try as separator first-last and last-first~~\n",
|
|
"</details>"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"celltoolbar": "Initialization Cell",
|
|
"interpreter": {
|
|
"hash": "6c306200e52f4c21cd0564fd208112100f014f9753eb71cb08d42d2021bd714c"
|
|
},
|
|
"jupytext": {
|
|
"formats": "ipynb,py:percent"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.10"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|