README.md
Open Source Management Portal
This application represents the home for open source engineering experiences at Microsoft. As a backend application it manages source of truth for many types of corporate open source metadata, historical intent of repos and projects, hosts a rich front-end, and also a set of APIs used by partner teams.
While we prefer native GitHub experiences, when it comes to displaying certain info and being more transparent about permissions and metadata, especially on GitHub, which has no extensible user interface, we end up using and driving people to this Open Source Management Portal to get the information they need.
At Microsoft, 50,000 engineers are using a version of this portal as part of their open source engineering experience. However, Microsoft does have a set of "company-specific" extensions, including a separate React frontend client, that are not currently part of this repository. And... yup, if we were to start over today, we'd probably make this a Next.js-or-similar project.
Core capabilities and features of this application:
- Linking GitHub accounts ⛓️ for enterprise use
- Self-service GitHub organization joining 🙋 for engineers
- Creating and managing GitHub open source repositories 👩💻
- Displaying transparent information, metrics, and company-specific data about our GitHub open source presence around permissions, access, metadata, intent, and especially cross-organization views and search indexes
- People inventory 👨🦳🧑🚀🧒🏽 to help people connect GitHub public logins with corporate identities
- Intercepting forks and new repositories 🔐 to inject compliance and approval processes
- Disable and enable 🔑 experiences for GitHub repositories
- Just-in-time (JIT) access 🚪 for open source repositories, teams, and organizations, aligning with the principle of least privilege access
- Sudo ⚡️ capabilities for repos, teams, organizations to remove persistent broad ownership and admin permissions
- Hosting APIs 🍽️ to create repos, large-scale orgs to access link data, and reports
- Background jobs 👷♂️ to maintain eventual consistency, run tasks, gather metrics, and prepare OKRs
- Team join requests/approvals with context 🚪 building beyond the GitHub experience
- Automated offboarding 🛶 when people take on new opportunities
At Microsoft, additional capabilities include:
- Pre-release business and legal approvals to release projects 🧑⚖️
- Requesting contribution reviews ✍🏾 within policy
- Service Tree and Direct Owners inventory 🌳 for showing accountable ownership information for repos when available
- Hosting internal docs 📚 at aka.ms/opensource
- Hosting a subset of opensource.microsoft.com's APIs 🌍 to bring to life the Microsoft open source presence
The management portal is designed to be fast, efficient, and get out of the way of engineers to get their important work done, with an emphasis on relentless automation and delegation.
Most of the experience is eventually consistent; however, operational actions such as joining teams, orgs, sudo operations, etc., are fully consistent at the time they are requested.
Implementation Details and More Docs
Please see the docs/
sub-folder, including docs/index.md.
API
Please see the docs/api.md file for information about the current API.
Application stack for learning
As a TypeScript/Node.js backend application, with a React frontend, the management portal also serves as a learning opportunity for Microsoft's engineering systems teams to understand the experience that non-.NET stack applications may have. The 1ES+OSPO teams partner to ship the application based on essentially a fork of this open source repo.
As of 2022, the backend site is hosted by Azure App Service with Linux containers, while the background cronjobs and daemons run in Azure Kubernetes Service (AKS) clusters. All containers are built on top of the CBL Mariner distro.
The app started as a hackathon project in an ancient JavaScript era full of "callback hell", and has evolved through to third-party promise libraries to native ECMAScript promises and to TypeScript. So it both shows its age, and, is, interesting.
Web app authentication
The primary authentication for the site is Azure Active Directory for corporate users.
The secondary authentication is GitHub. This allows users not using GitHub to fully explore the site, link, and otherwise be productive.
In theory, open source friends, this project could be made a bit more extensible. In the past, we prototyped Google authentication, as an example, for the primary aspect. Contributions welcome!
APIs can use either JWTs or an active web app session in some cases, used by the React frontend.
Configuration ⛳️
Many feature flags exist.
Please see docs/configuration.md
Jobs 💼
Please see docs/jobs.md
Service dependencies
- GitHub organization(s)
- Hosting environment
- Background job environment for eventual consistency work and maintenance cronjobs
- Daemon hosting for near-real-time process
- Queue system
- A cache system or multi-tiered cache implementation
- Azure Active Directory and the Microsoft Graph
- An email service to send mail
- Optional insights or telemetry system
Source of truth store 🧑⚖️
The backend maintains in a data store of your choice key metadata for repositories, links, and general compliance info. The backend supports natively Azure Storage, Azure Table, Azure CosmosDB, and Postgres.
At Microsoft we currently use Postgres for source of truth including:
- GitHub organization configuration
- corporate GitHub repository metadata
- corporate identity-to-GitHub login links
- compliance metadata (enable/disabled repos)
Respecting the GitHub API
To be friendly to GitHub, we strive to be very efficient and fair in
our use of the GitHub API. We cache as much as we can, and have a
native concept of building on top of GitHub's Conditional Request
best practice for GitHub Apps: whenever possible, we send the e-tag
for
a request, and we will use our cache for many types of operations.
For long multi-page GitHub REST API v3 responses, we will maintain a cache of those responses and rebuild them slowly in the background, as the site is eventually consistent for most views.
For operational work, a real-time API call is used to continue to be accurate and secure when working around granting access or managing access to superuser features.
Cache
The primary cache layer is backed by CosmosDB documents, in a hybrid approach where larger documents fallback to Azure Storage (blob). Redis is also supported for open source users of the site.
Background event processing firehose and cronjobs
There are at least 2 ongoing single-instance daemonsets and many cronjobs that also keep the site efficient, up-to-date, and gather important info.
The daemons:
- Firehouse: webhook event processing from a queue for eventual consistency and reacting to GitHub events around compliance/audit/scale/management
- Just-in-time: JIT revocations, audit log event gathering, and analysis
Example cronjobs:
- Make sure caches are primed occasionally
- Remind people to setup or delete repos
- Automatically delete repos that are not setup in a time window
- Disabling repos out of compliance
- Collecting data and metrics for reports and user interface experiences
- Backing up link data
- Prepare stats for an OKR
About the firehose in detail
While the original portal works fine for very small GitHub presences, it was designed around the idea that the cache would fill, while respecting the GitHub REST API by using Conditional Requests, and being very eventually consistent.
However, the REST API v3 (non-GraphQL client) maximum size for a page of results is 100 entries, which ... is very painful if you have tens of thousands of anything.
The "firehose" is designed to be run either within the app itself, or as a secondary app processing results. At Microsoft, we use a service bus to process webhook events from GitHub, since we have a robust webhook ingestion mechanism elsewhere. The firehose runs as a daemon that pulls off the queue and works to keep the "query cache" primed with newer information than the REST API may have in some cases.
What this improves:
- The user views of the orgs, repos, teams they are added to and have access to
- Cross-organization views and querying
The firehose and query cache are not used for important or auth-style scenarios:
- Query cache is not used to make permission decisions
- Query cache is not used to authorize access to administrative functions
We did at one point design the idea of having a /webhook
endpoint and validating
the webhook signatures before processing hooks for simple app hosting, but it's
slightly broken right now and disabled at Microsoft.
Dev prep, build, deploy
Install Node packages
Make sure to include dev dependencies.
The default assets package is a super ancient Bootstrap and jQuery app that in theory provides basic skin for the site, favicons, graphics, etc. However, it's ... really, really, really old. Microsoft discards the default-assets-package, using a different set of assets, so you've been mildly warned.
The main
module of the defined default-assets-package should resolve to the
path to serve content from. Since the default version uses [ancient] Grunt to build the
assets, it returns the __dirname
+ /public
, which is the output/built location for Grunt.
npm install
cd default-assets-package
npm install
Build
npm run build
You need to rebuild the default-assets-package if you change something. see Static Site Assets
Codespaces instructions
You will likely want to use a defined environment to save time spinning up many variables, follow one of the below paths:
- GitHub Codespaces account-level secrets for your environment variables as well
- use a
.env
file up a folder from the cloned repository in your Codespace environment - configure environment variables once the devcontainer boots
- GitHub Codespaces repo-specific secrets
Whether as a secret or in the ../env
from the root, set
CONFIGURATION_ENVIRONMENT
:development
(or similar)
Then, you'll also need to make sure authentication will work when redirecting to the running Codespaces environment.
GitHub authentication
You'll want to bring your own GitHub App and use its client ID and client secret for authentication. Configure your account-specific Codespace secrets.
CODESPACES_GITHUB_AUTHENTICATION_ENABLED
: set to1
to enableCODESPACES_GITHUB_CLIENT_ID
: the client IDCODESPACES_GITHUB_CLIENT_SECRET
: the client secret
Configure the secrets for your fork and/or this repository as necessary. The redirect URL will be dynamically generated and included in the startup debug output. Make sure that the hostname is an appropriate callback URL for the GitHub app.
Enterprise Managed Users impersonation/override
Since the underlying repository and the Codespace are likely hosted in GHEC EMU, you will also need to use the debug-time impersonation features to override the EMU user information after a GitHub callback with your GitHub.com account.
For ease of use, an initial impersonation override feature is available that only will override a GitHub EMU response:
CODESPACES_IMPERSONATE_OVERRIDE_EMU_ENABLED
: set to1
to allow in your environmentCODESPACES_IMPERSONATE_OVERRIDE_EMU_LOGIN
: set to the login to use only when an EMU user authenticates. The primary impersonation feature will still be used after this.
AAD authentication
Configure your AAD application in an appropriate tenant.
CODESPACES_AAD_AUTHENTICATION_ENABLED
: set to1
to enable- Set the other AAD variables for your environment as necessary:
AAD_CLIENT_ID
AAD_CLIENT_SECRET
- ...
Private artifacts
The Microsoft-internal fork of this project uses a private Azure Artifact feed to bring in additional components and libraries. These are not applicable to the open source upstream and should be excluded currently.
Building the Docker image
docker build -t opensource-management-portal .
Run (OSS instructions)
This section is from the open source community
The most easy way to run is by using the docker-compose setup. This will bootup the postgres and redis components as well. The docker-compose setup depends on 2 environment files and 1 json file:
- .docker.env
- .secrets.env
- env-orgs.json
Make sure to copy the .secrets.env.example and env-orgs.json.example files and provide the configuration values.
cp .secrets.env.example .secrets.env
cp env-orgs.json.example env-orgs.json
# provide configuration values for .secrets.env and env-orgs.json
docker-compose up
If you desire to run all on your local machine (redis, postgres) you might want to use following approach.
# ensure redis and postgres is running on localhost
source .secrets.env
source .local.env
npm run start
Troubleshooting
If the docker image doesn't start you can debug the image using an interactive shell session. This allows you to browse the folders, update the files to test things and run the portal.
$ docker run --rm -it --env-file .secrets.env --env-file .docker.env --entrypoint /bin/sh opensource-management-portal
/usr/src/repos $ ls
app.js data lib package.json tsconfig.tsbuildinfo webhooks
app.js.map entities localEnvironment.js routes user
bin features localEnvironment.js.map test utils.js
business github middleware transitional.js utils.js.map
config jobs node_modules transitional.js.map views
/usr/src/repos $ npm run start-in-container
Test
This project basically has very few tests, and aspirations to start using Jest better. Oops. Bad debt as multiple hackathons combine, along with production dependencies on GitHub...
Bare minimum local development environment
If you place a JSON file .env
above the directory of your cloned repo
(to prevent committing secrets to your repo by accident or in your editor),
you can configure the following extreme minimum working set to use the app.
The central operations token is a personal access token that is a org owner of the GitHub org(s) being managed.
DEBUG_ALLOW_HTTP=1
GITHUB_CENTRAL_OPERATIONS_TOKEN=a github token for the app
GITHUB_ORGANIZATIONS_FILE=../../env-orgs.json
GITHUB_CLIENT_ID=your client id
GITHUB_CLIENT_SECRET=your client secret
GITHUB_CALLBACK_URL=http://localhost:3000/auth/github/callback
AAD_CLIENT_ID=your corporate app id
AAD_REDIRECT_URL=http://localhost:3000/auth/azure/callback
AAD_CLIENT_SECRET=a secret for the corporate app
AAD_TENANT_ID=your tenant id
AAD_ISSUER=https://sts.windows.net/your tenant id/
In this mode memory providers are used, including a mocked Redis client. Note that this does mean that a large GitHub organization configured with memory providers could become a token use nightmare, as each new execution of the app without a Redis Cache behind the scenes is going to have 100% cache misses for GitHub metadata. Consider configuring a development or local Redis server to keep cached data around.
The built-in Redis mock will likely be removed when we move to the next major semver of the Node Redis library.
Collaboration
This project began as a hackathon... so still has growing pains years later. Since this is technically a backend web application and includes some server-generated user interface, the project was not originally designed to be shared as something that runs out-of-the-box, but... it is possible.
To collaborate on extensibility and improvements, please sync in the issues first so we can come up with the best approach.
Again, since Microsoft strips most of the routes/
and uses a React frontend
on this app, it's likely routes/
and the Pug rendering is... old. Very old.
Hopefully this monolith can at least be an interesting learning opportunity in crufty old ancient apps evolving on the JavaScript front!
Work to be done (OSS project)
- Support more interesting cloud and data providers
- Support other authentication technologies
- Any tests
- More tests
- Ship the front-end UI to the world as open source
- Continuing to refactor out Microsoft-specific things when possible
Project origin
An introduction to this project is available in a 2015 post by JWilcox and a 2019 follow-up post, "Scaling from 2,000 to 25,000".
An Open Source Hub concept was prototyped by a Microsoft subsidiary and the early Open Source Programs Office to make very clear the open source experiences, docs, and guides for Microsoft's culture change to working more in the open, releasing projects, and connecting everything together.
At the same time, GitHub was very basic, and it was necessary to automate and make self-service the GitHub engineering system to work at an enterprise scale. When Azure became the first approved organization to use GitHub at Microsoft, this portal scaled access and built guardrails around the GitHub environment.
LICENSE
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.