зеркало из
1
0
Форкнуть 0
opensource-management-portal/README.md

438 строки
19 KiB
Markdown

# Open Source Management Portal
This application represents the home for open source engineering experiences
at Microsoft. As a backend application it manages source of truth for many
types of corporate open source metadata, historical intent of repos
and projects, hosts a rich front-end, and also a set of APIs used by partner
teams.
While we prefer native GitHub experiences, when it comes to displaying certain info
and being more transparent about permissions and metadata, especially on
GitHub, which has no extensible user interface, we end up using and driving
people to this Open Source Management Portal to get the information they
need.
At Microsoft, 50,000 engineers are using a version of this portal as part of
their open source engineering experience. However, Microsoft does have a set
of "company-specific" extensions, including a separate React frontend client,
that are not currently part of this repository. And... yup, if we were to
start over today, we'd probably make this a Next.js-or-similar project.
Core capabilities and features of this application:
- **Linking GitHub accounts ⛓️** for enterprise use
- **Self-service GitHub organization joining 🙋** for engineers
- **Creating and managing GitHub open source repositories 👩‍💻**
- **Displaying transparent information, metrics, and company-specific data** about our GitHub open source presence around permissions, access, metadata, intent, and especially cross-organization views and search indexes
- **People inventory 👨‍🦳🧑‍🚀🧒🏽** to help people connect GitHub public logins with corporate identities
- **Intercepting forks and new repositories 🔐** to inject compliance and approval processes
- **Disable and enable 🔑** experiences for GitHub repositories
- **Just-in-time (JIT) access 🚪** for open source repositories, teams, and organizations, aligning with the principle of least privilege access
- **Sudo ⚡️** capabilities for repos, teams, organizations to remove persistent broad ownership and admin permissions
- **Hosting APIs 🍽️** to create repos, large-scale orgs to access link data, and reports
- **Background jobs 👷‍♂️** to maintain eventual consistency, run tasks, gather metrics, and prepare OKRs
- **Team join requests/approvals with context 🚪** building beyond the GitHub experience
- **Automated offboarding 🛶** when people take on new opportunities
At Microsoft, additional capabilities include:
- **Pre-release business and legal approvals to release projects 🧑‍⚖️**
- **Requesting contribution reviews ✍🏾** within policy
- **Service Tree and Direct Owners inventory 🌳** for showing accountable ownership information for repos when available
- **Hosting internal docs 📚** at aka.ms/opensource
- **Hosting a subset of opensource.microsoft.com's APIs 🌍** to bring to life the Microsoft open source presence
The management portal is designed to be fast, efficient, and get out of the way of engineers
to get their important work done, with an emphasis on _relentless automation_ and _delegation_.
Most of the experience is eventually consistent; however, operational actions
such as joining teams, orgs, sudo operations, etc., are fully consistent at the time
they are requested.
## Implementation Details and More Docs
Please see the `docs/` sub-folder, including [docs/index.md](docs/index.md).
## API
Please see the [docs/api.md](docs/api.md) file for information about the current API.
## Application stack for learning
As a TypeScript/Node.js backend application, with a React frontend, the
management portal also serves as a learning opportunity for Microsoft's
engineering systems teams to understand the experience that non-.NET stack
applications may have. The 1ES+OSPO teams partner to ship the application
based on essentially a fork of this open source repo.
As of 2022, the backend site is hosted by Azure App Service with
Linux containers, while the background cronjobs and daemons run in
Azure Kubernetes Service (AKS) clusters. All containers are built on top
of the CBL Mariner distro.
The app started as a hackathon project in an ancient JavaScript era full
of "callback hell", and has evolved through to third-party promise libraries
to native ECMAScript promises and to TypeScript. So it both shows its age,
and, is, interesting.
### Web app authentication
The **primary** authentication for the site is **Azure Active Directory** for
corporate users.
The **secondary** authentication is **GitHub**. This allows users not using
GitHub to fully explore the site, link, and otherwise be productive.
_In theory, open source friends, this project could be made a bit more
extensible. In the past, we prototyped Google authentication, as an example,
for the primary aspect. Contributions welcome!_
APIs can use either JWTs or an active web app session in some cases, used
by the React frontend.
### Configuration ⛳️
Many feature flags exist.
Please see [docs/configuration.md](docs/configuration.md)
### Jobs 💼
Please see [docs/jobs.md](docs/jobs.md)
## Service dependencies
- GitHub organization(s)
- Hosting environment
- Background job environment for eventual consistency work and maintenance cronjobs
- Daemon hosting for near-real-time process
- Queue system
- A cache system or multi-tiered cache implementation
- Azure Active Directory and the Microsoft Graph
- An email service to send mail
- Optional insights or telemetry system
### Source of truth store 🧑‍⚖️
The backend maintains in a data store of your choice key metadata for
repositories, links, and general compliance info. The backend supports
natively Azure Storage, Azure Table, Azure CosmosDB, and Postgres.
At Microsoft we currently use **Postgres** for source of truth including:
- GitHub organization configuration
- corporate GitHub repository metadata
- corporate identity-to-GitHub login links
- compliance metadata (enable/disabled repos)
### Respecting the GitHub API
To be friendly to GitHub, we strive to be very efficient and fair in
our use of the GitHub API. We cache as much as we can, and have a
native concept of building on top of GitHub's **Conditional Request**
best practice for GitHub Apps: whenever possible, we send the `e-tag` for
a request, and we will use our cache for many types of operations.
For long multi-page GitHub REST API v3 responses, we will maintain a
cache of those responses and rebuild them slowly in the background,
as the site is eventually consistent for most views.
For operational work, a real-time API call is used to continue to be
accurate and secure when working around granting access or managing
access to superuser features.
### Cache
The primary cache layer is backed by **CosmosDB** documents, in a hybrid
approach where larger documents fallback to **Azure Storage** (blob). Redis is
also supported for open source users of the site.
### Background event processing firehose and cronjobs
There are at least 2 ongoing single-instance daemonsets and many cronjobs
that also keep the site efficient, up-to-date, and gather important info.
The daemons:
- **Firehouse**: webhook event processing from a queue for eventual consistency and reacting to GitHub events around compliance/audit/scale/management
- **Just-in-time**: JIT revocations, audit log event gathering, and analysis
Example cronjobs:
- Make sure caches are primed occasionally
- Remind people to setup or delete repos
- Automatically delete repos that are not setup in a time window
- Disabling repos out of compliance
- Collecting data and metrics for reports and user interface experiences
- Backing up link data
- Prepare stats for an OKR
#### About the firehose in detail
While the original portal works fine for very small GitHub presences, it
was designed around the idea that the cache would fill, while respecting the
GitHub REST API by using [Conditional Requests](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#conditional-requests), and being very
eventually consistent.
However, the REST API v3 (non-GraphQL client) maximum size for a page of results
is 100 entries, which ... is very painful if you have tens of thousands of anything.
The "firehose" is designed to be run either within the app itself, or as a secondary
app processing results. At Microsoft, we use a service bus to process webhook events
from GitHub, since we have a robust webhook ingestion mechanism elsewhere. The
firehose runs as a daemon that pulls off the queue and works to keep the "query cache" primed with newer information than the REST API may have in some cases.
What this improves:
- The user views of the orgs, repos, teams they are added to and have access to
- Cross-organization views and querying
The firehose and query cache are _not_ used for important or auth-style scenarios:
- Query cache is not used to make permission decisions
- Query cache is not used to authorize access to administrative functions
We did at one point design the idea of having a `/webhook` endpoint and validating
the webhook signatures before processing hooks for simple app hosting, but it's
slightly broken right now and disabled at Microsoft.
## Dev prep, build, deploy
### Install Node packages
Make sure to include dev dependencies.
The default assets package is a _super ancient_ Bootstrap and jQuery app that
in theory provides basic skin for the site, favicons, graphics, etc. However,
it's ... really, really, really old. Microsoft discards the default-assets-package,
using a different set of assets, so you've been mildly warned.
The `main` module of the defined default-assets-package should resolve to the
path to serve content from. Since the default version uses \[ancient\] Grunt to build the
assets, it returns the `__dirname` + `/public`, which is the output/built location for Grunt.
```bash
npm install
cd default-assets-package
npm install
```
### Build
```bash
npm run build
```
You need to rebuild the default-assets-package if you change something. [see Static Site Assets](docs/staticSiteAssets.md)
### Codespaces instructions
You will likely want to use a defined environment to save time spinning up many variables, follow one of the below paths:
- GitHub Codespaces account-level secrets for your environment variables as well
- use a `.env` file up a folder from the cloned repository in your Codespace environment
- configure environment variables once the devcontainer boots
- GitHub Codespaces repo-specific secrets
Whether as a secret or in the `../env` from the root, set
- `CONFIGURATION_ENVIRONMENT`: `development` (or similar)
Then, you'll also need to make sure authentication will work when redirecting to the running
Codespaces environment.
### GitHub authentication
You'll want to bring your own GitHub App and use its client ID and client secret for
authentication. [Configure your account-specific Codespace secrets](https://github.com/settings/codespaces).
- `CODESPACES_GITHUB_AUTHENTICATION_ENABLED`: set to `1` to enable
- `CODESPACES_GITHUB_CLIENT_ID`: the client ID
- `CODESPACES_GITHUB_CLIENT_SECRET`: the client secret
Configure the secrets for your fork and/or this repository as necessary. The redirect URL will
be dynamically generated and included in the startup debug output. Make sure that the hostname
is an appropriate callback URL for the GitHub app.
#### Enterprise Managed Users impersonation/override
Since the underlying repository and the Codespace are likely hosted in GHEC EMU,
you will also need to use the debug-time impersonation features to override the EMU
user information after a GitHub callback with your GitHub.com account.
For ease of use, an initial impersonation override feature is available that
only will override a GitHub EMU response:
- `CODESPACES_IMPERSONATE_OVERRIDE_EMU_ENABLED`: set to `1` to allow in your environment
- `CODESPACES_IMPERSONATE_OVERRIDE_EMU_LOGIN`: set to the login to use _only_ when an EMU user authenticates. _The primary impersonation feature will still be used after this._
### AAD authentication
Configure your AAD application in an appropriate tenant.
- `CODESPACES_AAD_AUTHENTICATION_ENABLED`: set to `1` to enable
- Set the other AAD variables for your environment as necessary:
- `AAD_CLIENT_ID`
- `AAD_CLIENT_SECRET`
- ...
### Private artifacts
The Microsoft-internal fork of this project uses a private Azure Artifact feed
to bring in additional components and libraries. These are not applicable to
the open source upstream and should be excluded currently.
### Building the Docker image
```bash
docker build -t opensource-management-portal .
```
#### Run (OSS instructions)
> This section is from the open source community
The most easy way to run is by using the docker-compose setup. This will bootup the postgres and redis components as well. The docker-compose setup depends on 2 environment files and 1 json file:
- .docker.env
- .secrets.env
- env-orgs.json
Make sure to copy the .secrets.env.example and env-orgs.json.example files and provide the configuration values.
```bash
cp .secrets.env.example .secrets.env
cp env-orgs.json.example env-orgs.json
# provide configuration values for .secrets.env and env-orgs.json
docker-compose up
```
If you desire to run all on your local machine (redis, postgres) you might want to use following approach.
```bash
# ensure redis and postgres is running on localhost
source .secrets.env
source .local.env
npm run start
```
#### Troubleshooting
If the docker image doesn't start you can debug the image using an interactive shell session. This allows
you to browse the folders, update the files to test things and run the portal.
```bash
$ docker run --rm -it --env-file .secrets.env --env-file .docker.env --entrypoint /bin/sh opensource-management-portal
/usr/src/repos $ ls
app.js data lib package.json tsconfig.tsbuildinfo webhooks
app.js.map entities localEnvironment.js routes user
bin features localEnvironment.js.map test utils.js
business github middleware transitional.js utils.js.map
config jobs node_modules transitional.js.map views
/usr/src/repos $ npm run start-in-container
```
### Test
This project basically has _very few tests_, and aspirations to start using Jest better. Oops. Bad debt as multiple hackathons combine, along with
production dependencies on GitHub...
### Bare minimum local development environment
If you place a JSON file `.env` above the directory of your cloned repo
(to prevent committing secrets to your repo by accident or in your editor),
you can configure the following extreme minimum working set to use the app.
The central operations token is a personal access token that is a **org owner**
of the GitHub org(s) being managed.
```env
DEBUG_ALLOW_HTTP=1
GITHUB_CENTRAL_OPERATIONS_TOKEN=a github token for the app
GITHUB_ORGANIZATIONS_FILE=../../env-orgs.json
GITHUB_CLIENT_ID=your client id
GITHUB_CLIENT_SECRET=your client secret
GITHUB_CALLBACK_URL=http://localhost:3000/auth/github/callback
AAD_CLIENT_ID=your corporate app id
AAD_REDIRECT_URL=http://localhost:3000/auth/azure/callback
AAD_CLIENT_SECRET=a secret for the corporate app
AAD_TENANT_ID=your tenant id
AAD_ISSUER=https://sts.windows.net/your tenant id/
```
In this mode memory providers are used, including a mocked Redis client. Note
that this does mean that a large GitHub organization configured with memory
providers could become a token use nightmare, as each new execution of the app
without a Redis Cache behind the scenes is going to have 100% cache misses for
GitHub metadata. Consider configuring a development or local Redis server to
keep cached data around.
> The built-in Redis mock will likely be removed when we move to the next
> major semver of the Node Redis library.
## Collaboration
This project began as a hackathon... so still has growing pains years later.
Since this is technically a _backend web application_ and includes some
server-generated user interface, the project was not originally designed
to be shared as something that runs out-of-the-box, but... it is possible.
To collaborate on extensibility and improvements, please sync in the issues
first so we can come up with the best approach.
Again, since Microsoft strips most of the `routes/` and uses a React frontend
on this app, it's likely `routes/` and the Pug rendering is... old. Very old.
Hopefully this **monolith** can at least be an interesting learning
opportunity in crufty old ancient apps evolving on the JavaScript front!
### Work to be done (OSS project)
- Support more interesting cloud and data providers
- Support other authentication technologies
- Any tests
- More tests
- Ship the front-end UI to the world as open source
- Continuing to refactor out Microsoft-specific things when possible
## Project origin
An introduction to this project is available in a [2015 post by JWilcox](https://jeffwilcox.blog/2015/11/azure-on-github/) and a
[2019 follow-up post, "Scaling from 2,000 to 25,000"](https://jeffwilcox.blog/2019/06/scaling-25k/).
An Open Source Hub concept was prototyped by a Microsoft subsidiary and
the early Open Source Programs Office to make very clear the open source
experiences, docs, and guides for Microsoft's culture change to working
more in the open, releasing projects, and connecting everything together.
At the same time, GitHub was very basic, and it was necessary to automate and
make self-service the GitHub engineering system to work at an enterprise scale.
When Azure became the first approved organization to use GitHub at Microsoft,
this portal scaled access and built guardrails around the GitHub environment.
## LICENSE
[MIT License](LICENSE)
## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit <https://cla.opensource.microsoft.com>.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.