# Open Source Management Portal This application represents the home for open source engineering experiences at Microsoft. As a backend application it manages source of truth for many types of corporate open source metadata, historical intent of repos and projects, hosts a rich front-end, and also a set of APIs used by partner teams. While we prefer native GitHub experiences, when it comes to displaying certain info and being more transparent about permissions and metadata, especially on GitHub, which has no extensible user interface, we end up using and driving people to this Open Source Management Portal to get the information they need. At Microsoft, 50,000 engineers are using a version of this portal as part of their open source engineering experience. However, Microsoft does have a set of "company-specific" extensions, including a separate React frontend client, that are not currently part of this repository. And... yup, if we were to start over today, we'd probably make this a Next.js-or-similar project. Core capabilities and features of this application: - **Linking GitHub accounts ⛓️** for enterprise use - **Self-service GitHub organization joining 🙋** for engineers - **Creating and managing GitHub open source repositories 👩‍💻** - **Displaying transparent information, metrics, and company-specific data** about our GitHub open source presence around permissions, access, metadata, intent, and especially cross-organization views and search indexes - **People inventory 👨‍🦳🧑‍🚀🧒🏽** to help people connect GitHub public logins with corporate identities - **Intercepting forks and new repositories 🔐** to inject compliance and approval processes - **Disable and enable 🔑** experiences for GitHub repositories - **Just-in-time (JIT) access 🚪** for open source repositories, teams, and organizations, aligning with the principle of least privilege access - **Sudo ⚡️** capabilities for repos, teams, organizations to remove persistent broad ownership and admin permissions - **Hosting APIs 🍽️** to create repos, large-scale orgs to access link data, and reports - **Background jobs 👷‍♂️** to maintain eventual consistency, run tasks, gather metrics, and prepare OKRs - **Team join requests/approvals with context 🚪** building beyond the GitHub experience - **Automated offboarding 🛶** when people take on new opportunities At Microsoft, additional capabilities include: - **Pre-release business and legal approvals to release projects 🧑‍⚖️** - **Requesting contribution reviews ✍🏾** within policy - **Service Tree and Direct Owners inventory 🌳** for showing accountable ownership information for repos when available - **Hosting internal docs 📚** at aka.ms/opensource - **Hosting a subset of opensource.microsoft.com's APIs 🌍** to bring to life the Microsoft open source presence The management portal is designed to be fast, efficient, and get out of the way of engineers to get their important work done, with an emphasis on _relentless automation_ and _delegation_. Most of the experience is eventually consistent; however, operational actions such as joining teams, orgs, sudo operations, etc., are fully consistent at the time they are requested. ## Implementation Details and More Docs Please see the `docs/` sub-folder, including [docs/index.md](docs/index.md). ## API Please see the [docs/api.md](docs/api.md) file for information about the current API. ## Application stack for learning As a TypeScript/Node.js backend application, with a React frontend, the management portal also serves as a learning opportunity for Microsoft's engineering systems teams to understand the experience that non-.NET stack applications may have. The 1ES+OSPO teams partner to ship the application based on essentially a fork of this open source repo. As of 2022, the backend site is hosted by Azure App Service with Linux containers, while the background cronjobs and daemons run in Azure Kubernetes Service (AKS) clusters. All containers are built on top of the CBL Mariner distro. The app started as a hackathon project in an ancient JavaScript era full of "callback hell", and has evolved through to third-party promise libraries to native ECMAScript promises and to TypeScript. So it both shows its age, and, is, interesting. ### Web app authentication The **primary** authentication for the site is **Azure Active Directory** for corporate users. The **secondary** authentication is **GitHub**. This allows users not using GitHub to fully explore the site, link, and otherwise be productive. _In theory, open source friends, this project could be made a bit more extensible. In the past, we prototyped Google authentication, as an example, for the primary aspect. Contributions welcome!_ APIs can use either JWTs or an active web app session in some cases, used by the React frontend. ### Configuration ⛳️ Many feature flags exist. Please see [docs/configuration.md](docs/configuration.md) ### Jobs 💼 Please see [docs/jobs.md](docs/jobs.md) ## Service dependencies - GitHub organization(s) - Hosting environment - Background job environment for eventual consistency work and maintenance cronjobs - Daemon hosting for near-real-time process - Queue system - A cache system or multi-tiered cache implementation - Azure Active Directory and the Microsoft Graph - An email service to send mail - Optional insights or telemetry system ### Source of truth store 🧑‍⚖️ The backend maintains in a data store of your choice key metadata for repositories, links, and general compliance info. The backend supports natively Azure Storage, Azure Table, Azure CosmosDB, and Postgres. At Microsoft we currently use **Postgres** for source of truth including: - GitHub organization configuration - corporate GitHub repository metadata - corporate identity-to-GitHub login links - compliance metadata (enable/disabled repos) ### Respecting the GitHub API To be friendly to GitHub, we strive to be very efficient and fair in our use of the GitHub API. We cache as much as we can, and have a native concept of building on top of GitHub's **Conditional Request** best practice for GitHub Apps: whenever possible, we send the `e-tag` for a request, and we will use our cache for many types of operations. For long multi-page GitHub REST API v3 responses, we will maintain a cache of those responses and rebuild them slowly in the background, as the site is eventually consistent for most views. For operational work, a real-time API call is used to continue to be accurate and secure when working around granting access or managing access to superuser features. ### Cache The primary cache layer is backed by **CosmosDB** documents, in a hybrid approach where larger documents fallback to **Azure Storage** (blob). Redis is also supported for open source users of the site. ### Background event processing firehose and cronjobs There are at least 2 ongoing single-instance daemonsets and many cronjobs that also keep the site efficient, up-to-date, and gather important info. The daemons: - **Firehouse**: webhook event processing from a queue for eventual consistency and reacting to GitHub events around compliance/audit/scale/management - **Just-in-time**: JIT revocations, audit log event gathering, and analysis Example cronjobs: - Make sure caches are primed occasionally - Remind people to setup or delete repos - Automatically delete repos that are not setup in a time window - Disabling repos out of compliance - Collecting data and metrics for reports and user interface experiences - Backing up link data - Prepare stats for an OKR #### About the firehose in detail While the original portal works fine for very small GitHub presences, it was designed around the idea that the cache would fill, while respecting the GitHub REST API by using [Conditional Requests](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests), and being very eventually consistent. However, the REST API v3 (non-GraphQL client) maximum size for a page of results is 100 entries, which ... is very painful if you have tens of thousands of anything. The "firehose" is designed to be run either within the app itself, or as a secondary app processing results. At Microsoft, we use a service bus to process webhook events from GitHub, since we have a robust webhook ingestion mechanism elsewhere. The firehose runs as a daemon that pulls off the queue and works to keep the "query cache" primed with newer information than the REST API may have in some cases. What this improves: - The user views of the orgs, repos, teams they are added to and have access to - Cross-organization views and querying The firehose and query cache are _not_ used for important or auth-style scenarios: - Query cache is not used to make permission decisions - Query cache is not used to authorize access to administrative functions We did at one point design the idea of having a `/webhook` endpoint and validating the webhook signatures before processing hooks for simple app hosting, but it's slightly broken right now and disabled at Microsoft. ## Dev prep, build, deploy ### Install Node packages Make sure to include dev dependencies. The default assets package is a _super ancient_ Bootstrap and jQuery app that in theory provides basic skin for the site, favicons, graphics, etc. However, it's ... really, really, really old. Microsoft discards the default-assets-package, using a different set of assets, so you've been mildly warned. The `main` module of the defined default-assets-package should resolve to the path to serve content from. Since the default version uses \[ancient\] Grunt to build the assets, it returns the `__dirname` + `/public`, which is the output/built location for Grunt. ```bash npm install cd default-assets-package npm install ``` ### Build ```bash npm run build ``` You need to rebuild the default-assets-package if you change something. [see Static Site Assets](docs/staticSiteAssets.md) ### Codespaces instructions You will likely want to use a defined environment to save time spinning up many variables, follow one of the below paths: - GitHub Codespaces account-level secrets for your environment variables as well - use a `.env` file up a folder from the cloned repository in your Codespace environment - configure environment variables once the devcontainer boots - GitHub Codespaces repo-specific secrets Whether as a secret or in the `../env` from the root, set - `CONFIGURATION_ENVIRONMENT`: `development` (or similar) Then, you'll also need to make sure authentication will work when redirecting to the running Codespaces environment. ### GitHub authentication You'll want to bring your own GitHub App and use its client ID and client secret for authentication. [Configure your account-specific Codespace secrets](https://github.com/settings/codespaces). - `CODESPACES_GITHUB_AUTHENTICATION_ENABLED`: set to `1` to enable - `CODESPACES_GITHUB_CLIENT_ID`: the client ID - `CODESPACES_GITHUB_CLIENT_SECRET`: the client secret Configure the secrets for your fork and/or this repository as necessary. The redirect URL will be dynamically generated and included in the startup debug output. Make sure that the hostname is an appropriate callback URL for the GitHub app. #### Enterprise Managed Users impersonation/override Since the underlying repository and the Codespace are likely hosted in GHEC EMU, you will also need to use the debug-time impersonation features to override the EMU user information after a GitHub callback with your GitHub.com account. For ease of use, an initial impersonation override feature is available that only will override a GitHub EMU response: - `CODESPACES_IMPERSONATE_OVERRIDE_EMU_ENABLED`: set to `1` to allow in your environment - `CODESPACES_IMPERSONATE_OVERRIDE_EMU_LOGIN`: set to the login to use _only_ when an EMU user authenticates. _The primary impersonation feature will still be used after this._ ### AAD authentication Configure your AAD application in an appropriate tenant. - `CODESPACES_AAD_AUTHENTICATION_ENABLED`: set to `1` to enable - Set the other AAD variables for your environment as necessary: - `AAD_CLIENT_ID` - `AAD_CLIENT_SECRET` - ... ### Private artifacts The Microsoft-internal fork of this project uses a private Azure Artifact feed to bring in additional components and libraries. These are not applicable to the open source upstream and should be excluded currently. ### Building the Docker image ```bash docker build -t opensource-management-portal . ``` #### Run (OSS instructions) > This section is from the open source community The most easy way to run is by using the docker-compose setup. This will bootup the postgres and redis components as well. The docker-compose setup depends on 2 environment files and 1 json file: - .docker.env - .secrets.env - env-orgs.json Make sure to copy the .secrets.env.example and env-orgs.json.example files and provide the configuration values. ```bash cp .secrets.env.example .secrets.env cp env-orgs.json.example env-orgs.json # provide configuration values for .secrets.env and env-orgs.json docker-compose up ``` If you desire to run all on your local machine (redis, postgres) you might want to use following approach. ```bash # ensure redis and postgres is running on localhost source .secrets.env source .local.env npm run start ``` #### Troubleshooting If the docker image doesn't start you can debug the image using an interactive shell session. This allows you to browse the folders, update the files to test things and run the portal. ```bash $ docker run --rm -it --env-file .secrets.env --env-file .docker.env --entrypoint /bin/sh opensource-management-portal /usr/src/repos $ ls app.js data lib package.json tsconfig.tsbuildinfo webhooks app.js.map entities localEnvironment.js routes user bin features localEnvironment.js.map test utils.js business github middleware transitional.js utils.js.map config jobs node_modules transitional.js.map views /usr/src/repos $ npm run start-in-container ``` ### Test This project basically has _very few tests_, and aspirations to start using Jest better. Oops. Bad debt as multiple hackathons combine, along with production dependencies on GitHub... ### Bare minimum local development environment If you place a JSON file `.env` above the directory of your cloned repo (to prevent committing secrets to your repo by accident or in your editor), you can configure the following extreme minimum working set to use the app. The central operations token is a personal access token that is a **org owner** of the GitHub org(s) being managed. ```env DEBUG_ALLOW_HTTP=1 GITHUB_CENTRAL_OPERATIONS_TOKEN=a github token for the app GITHUB_ORGANIZATIONS_FILE=../../env-orgs.json GITHUB_CLIENT_ID=your client id GITHUB_CLIENT_SECRET=your client secret GITHUB_CALLBACK_URL=http://localhost:3000/auth/github/callback AAD_CLIENT_ID=your corporate app id AAD_REDIRECT_URL=http://localhost:3000/auth/azure/callback AAD_CLIENT_SECRET=a secret for the corporate app AAD_TENANT_ID=your tenant id AAD_ISSUER=https://sts.windows.net/your tenant id/ ``` In this mode memory providers are used, including a mocked Redis client. Note that this does mean that a large GitHub organization configured with memory providers could become a token use nightmare, as each new execution of the app without a Redis Cache behind the scenes is going to have 100% cache misses for GitHub metadata. Consider configuring a development or local Redis server to keep cached data around. > The built-in Redis mock will likely be removed when we move to the next > major semver of the Node Redis library. ## Collaboration This project began as a hackathon... so still has growing pains years later. Since this is technically a _backend web application_ and includes some server-generated user interface, the project was not originally designed to be shared as something that runs out-of-the-box, but... it is possible. To collaborate on extensibility and improvements, please sync in the issues first so we can come up with the best approach. Again, since Microsoft strips most of the `routes/` and uses a React frontend on this app, it's likely `routes/` and the Pug rendering is... old. Very old. Hopefully this **monolith** can at least be an interesting learning opportunity in crufty old ancient apps evolving on the JavaScript front! ### Work to be done (OSS project) - Support more interesting cloud and data providers - Support other authentication technologies - Any tests - More tests - Ship the front-end UI to the world as open source - Continuing to refactor out Microsoft-specific things when possible ## Project origin An introduction to this project is available in a [2015 post by JWilcox](https://jeffwilcox.blog/2015/11/azure-on-github/) and a [2019 follow-up post, "Scaling from 2,000 to 25,000"](https://jeffwilcox.blog/2019/06/scaling-25k/). An Open Source Hub concept was prototyped by a Microsoft subsidiary and the early Open Source Programs Office to make very clear the open source experiences, docs, and guides for Microsoft's culture change to working more in the open, releasing projects, and connecting everything together. At the same time, GitHub was very basic, and it was necessary to automate and make self-service the GitHub engineering system to work at an enterprise scale. When Azure became the first approved organization to use GitHub at Microsoft, this portal scaled access and built guardrails around the GitHub environment. ## LICENSE [MIT License](LICENSE) ## Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit . When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.