Граф коммитов

2596 Коммитов

Автор SHA1 Сообщение Дата
Zach Newton 7abf5cfd3b
fix(gitrest): Handle FileSystem Errors in HTTP Responses (#22986)
## Description

Currently, there are some filesystem operations in Gitrest that result
in a generic 400 HTTP error code, rather than a helpful HTTP status and
message based on the error that occurred.

This PR adds some wrapper functions that help determine if an error is a
FileSystemError (or RedisFSError, which is similar) and bubble that up
as a NetworkError that can be parsed for the HTTP response.
2024-11-05 13:04:13 -08:00
Alex Villarreal c56a5218b8
refactor: Update dependencies so path-to-regexp gets to a version without CVE (#22928)
## Description

Updates our transitive dependencies on `path-to-regexp` to versions that
fixed https://nvd.nist.gov/vuln/detail/CVE-2024-45296 . Accomplished by
updating our direct dependencies on `sinon` to a mix of version 18 and
19, since that's the main way in which we get transitive dependencies on
`path-to-regexp`.

`@types/sinon` was also opportunistically updated to the latest version
where it wasn't already up to date.
2024-10-30 11:48:31 -05:00
Alex Villarreal 1edf5091de
refactor: Update transitive dependency on `tar` to address CVE (#22932)
## Description

Updates `tar` to version `6.2.1` to address
https://nvd.nist.gov/vuln/detail/CVE-2024-28863 . Done by adding a
`pnpm.overrides` entry `"tar": "^6.2.1"` to the package.json of each of
the affected packages, running `pnpm i --no-frozen-lockfile`, then
removing the override from package.json and running the same command
again.
2024-10-30 11:46:01 -05:00
Alex Villarreal c8ea391f49
refactor: Update dependency on cookie to address CVE (#22847)
## Description

Updates the `cookie` dependency to address [a
[CVE](https://nvd.nist.gov/vuln/detail/CVE-2021-23368) in the `cookie`
package](https://nvd.nist.gov/vuln/detail/CVE-2024-47764). This required
updating `express` since it declares a hardcoded (no range) dependency
on `cookie`.
2024-10-25 10:40:56 -05:00
kekachmar ff6a3bf1b8
add isEphemeralContaner to relevant metrics (#22890)
Adds `isEphemeralContainer` as a property to relevant metrics
2024-10-24 20:04:24 +00:00
dhr-verma b621e4a25d
Enabled support for a token issuance endpoint in Alfred (#22884)
## Description

This PR adds support for a fluid token issuance endpoint. This endpoint
can be used to issue fluid access tokens based on a custom
implementation of the `IFluidAccessTokenGenerator` interface. One use of
this endpoint is to enable access using cloud identity providers such as
Entra-ID. Alfred has a new endpoint -
`api/v1/tenants/:tenantid/accesstoken`. This endpoint expects a `Bearer`
token and creates an access token for a given fluid tenant.

This PR also adds support to inject a custom implementation of the
`IFluidAccessTokenGenerator`. This can be used to implement any custom
logic that is needed for a business use case.

It also has unit tests for the following:

1) Throttling of the endpoint
2) Validation of cases where a `Bearer` token is not provided or an
invalid authorization method is used
3) Validation of a valid token creation path
4) Validation of token creation failure due to invalid token signature
and due to unauthorized access

## Breaking Changes

This PR adds a new resourceFactory arg of type
`IFluidAccessTokenGenerator`. This arg is customizable as well.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-24 17:48:47 +00:00
Alex Villarreal 9901775f65
refactor: Update build-tools dev deps in common-utils, protocol-definitions, server/historian, and server/gitrest (addressesCVE-2024-43788) (#22885)
## Description

Updates the build-tools and build-cli dev dependencies in
protocol-definitions, common-utils, server/historian, and
server/gitrest. This gets webpack updated to the latest 5.x version,
which addresses https://nvd.nist.gov/vuln/detail/CVE-2024-43788 .

Also updates eslint-config-fluid in protocol-definitions,
server/historian, and server/gitrest, just to keep with the latest
version.
2024-10-24 10:46:27 -05:00
Alex Villarreal 1b5ced0d94
refactor(server): Update ws to address CVE (#22845)
## Description

Updates dependencies to get to ws@8.17.1 (or ws@7.5.10) to address
https://nvd.nist.gov/vuln/detail/CVE-2024-37890. Updating socket.io to
4.8.0 was necessary in some cases get the necessary dependency ranges.

socket.io 4.7.5-4.8.0 is a minor semver update but contains [a breaking
change in the type of the `close()`
function](https://github.com/socketio/socket.io/pull/4971/files), so two
places had to be updated to account for that.
2024-10-23 14:01:25 -05:00
dhr-verma b6aa07a614
Replaced StartupCheck with IReadinessCheck in Gitrest and Historian (#22868)
## Description

The `StartupCheck` implementation was consumed by Gitrest and Historian.
However, since the r11 packages consumed by Gitrest and Historian are
not updated as frequently as r11s consumed by repos such as FRS, this
dependency caused a breakage in Gitrest in FRS.

As a result, the solution is to make the startupCheck parameter in
resourceFactories to be more generic - `IReadinessCheck`. This will
prevent any future class dependencies from breaking.

## Breaking Changes

There should be none as this change makes the parameter more 'generic'.
That is, old implementations should still work fine.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-21 23:01:11 -07:00
dhr-verma 71ad22bea0
Replaced StartupCheck with IReadinessCheck (#22867)
## Description

The `StartupCheck` implementation was consumed by Gitrest and Historian.
However, since the r11 packages consumed by Gitrest and Historian are
not updated as frequently as r11s consumed by repos such as FRS, this
dependency caused a breakage in Gitrest in FRS.

As a result, the solution is to make the startupCheck parameter in
resourceFactories to be more generic - `IReadinessCheck`. This will
prevent any future class dependencies from breaking.

## Breaking Changes

There should be none as this change makes the parameter more 'generic'.
That is, old implementations should still work fine.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-21 19:02:01 -07:00
Alex Villarreal 0ef059c270
refactor(server): Update dependencies to remove `ip` (CVE) (#22860)
## Description

Updates dependencies so we get rid of the transitive dependency on `ip`
which is flagged for a CVE.
2024-10-21 17:18:51 -05:00
Tyler Butler 6095d7f4f2
build: Remove references to deleted readme-command package (#22831)
The readme-command package was deleted some time ago but there were
still some references to it in configs and comments.
2024-10-17 12:36:37 -07:00
yunho-microsoft 4df3b36633
Fix token cache error: invalid expire time (#22761)
Skip token cache if the token is about to expire in 5 minutes.

---------

Co-authored-by: Yunho <yunho-macbookpro2024@DESKTOP-M86HBMH.redmond.corp.microsoft.com>
Co-authored-by: Yunho <yunho-macbookpro2024@Yunhos-MacBook-Pro.local>
Co-authored-by: Yunho <yunho-macbookpro2024@Yunhos-MBP.guest.corp.microsoft.com>
2024-10-17 18:45:06 +00:00
dhr-verma 878582128c
Removed singleton Startup check usage by Historian and Gitrest (#22826)
## Description

This PR follows the r11s PR -
https://github.com/microsoft/FluidFramework/pull/22819 - to remove the
usage of the `StartupCheck` singleton by Historian and Gitrest. This
singleton along with r11 package mismatch caused bugs. Hence, now I pass
the implementation of the startup probe as a resource to the server.

## Breaking Changes

Changes the resourceFactory args of both Gitrest and Historian.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-16 20:34:41 +00:00
dhr-verma c5d7bde895
Removed the singleton implementation of the startup probe (#22819)
## Description

The singleton implementation of `StartupCheck` causes bugs when the r11
packages do not match the historian packages consumed. Hence, I decided
to switch to a non-singleton implementation. This introduces the
`StartupCheck` as an implementation of `IReadinessCheck`. This probe is
a resource provided to all HTTP services in r11.

## Breaking Changes

Changes Resource and Runner objects for Alfred, Riddler and Nexus to
include the `StartUpCheck` object.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-15 17:16:03 -07:00
Shubhangi 256bf0899c
Circuit breaker implementation in scriptorium (#22730)
## Description

This PR is to add circuit breaker functionality for scriptorium lambda.
It is to handle the exceptions where service restart is not helpful and
instead, we want to wait and retry again. For example, when mongo db is
unavailable/down, and scriptorium is not able to write ops to the db,
restarting the service doesnt help, instead we would wait and retry
after some time. Circuit Breaker pattern helps in such cases by
maintaining open/closed/halfOpen state.

So in scriptorium, all the calls to db are wrapped by the circuit
breaker, and in case of such errors, the circuit will open and pause the
lambda (i.e. pause the incoming messages). After some time, the circuit
will go to halfOpen state and call a healthCheck function - if it
succeeds, the circuit will close and resume the incoming messages, else
it will stay open and paused.

We can configure various options, like error threshold, reset timeout,
the errors for which we want to engage the circuit breaker, etc. Also if
the circuit is not able to close or resume for some time (configurable),
we will fallback to restarting the service to avoid being in an endless
state of waiting.

This PR is for scriptorium, and once we validate and roll this out in
production, we will add the same pattern for document lambdas too.

Summary of changes made in this PR: 
- Circuit Breaker Implementation: Adds a circuit breaker pattern to
scriptorium->db calls, with various configuration options for error
thresholds, reset timeouts, and error filters.
- Pause and Resume Methods: Adds pause and resume methods for lambdas,
context, documentContext, partition, partitionManager, kafkaRunner,
rdKafkaConsumer, and lambda to manage message flow during circuit
breaker states.
- Health Check for MongoDB: Adds a health check method to the MongoDB
class and exposes a healthCheck property from the MongoManager class.

## Testing

- [X] Added unit tests for circuit breaker.
- [X] Tested the scriptorium end to end functionality locally by forcing
the db to be unavailable in the local setup.
- [x] Tested in dev cluster by changing mongo db settings to replicate a
networking error.

We will roll this out slowly by testing in each ring.

---------

Co-authored-by: Shubhangi Agarwal <shuagarwal@microsoft.com>
2024-10-15 09:35:40 -07:00
Mark Fields de6928b528
Stop parsing op contents in DeltaManager - runtime will do it (#22750)
A long time ago (5acfef448f) we added
support in ContaineRuntime to parse op contents if it's a string. The
intention was to stop parsing in DeltaManager once that saturated. This
is that long overdue follow-up.

Taking this opportunity to make a few things hopefully clearer in
ContainerRuntime too:

* Highlighting where/how the serialization/deserialization of `contents`
happens
* Highlighting the different treatment/expectations for runtime v.
non-runtime messages during `process` flow

## Deprecations:

Deprecating use of `contents` on the event arg `op` for
`batchBegin`/`batchEnd` events, they're in for a surprise. I added a
changeset for this case.
2024-10-11 23:01:59 +00:00
yunho-microsoft 03d6823692
Improve socket errors for AFR (#22745)
This PR includes:

1. Send back retryAfterMs for draining errors
2. Use Network for token revocation errors; deprecate
TokenRevocationErrors

---------

Co-authored-by: Yunho <yunho-macbookpro2024@Yunhos-MacBook-Pro.local>
Co-authored-by: Yunho <yunho-macbookpro2024@Yunhos-MBP.guest.corp.microsoft.com>
2024-10-10 11:06:02 -07:00
Brandon fffe980734
Session discovery metrics/monitoring (#22681)
## Description

- Add metrics to know where time is being spent during session discovery
- Broken down into two primary pieces: verifyStorageToken and getSession
- GetSession is further broken down into three parts:
checkDocumentExistence, updateExistingSession, and createNewSession
- checkDocumentExistence is the DB call that is made to retrieve the doc
and see if it exists
- updateExistingSession will only happen if the session is not yet
alive/discovered
- createNewSession will only happen if the session is undefined (docs
created before the concept of service sessions)

---------

Co-authored-by: Brandon Diaz <“BrandonLouisDiaz@gmail.com”>
2024-10-10 14:01:31 -04:00
Alex Villarreal 4228a21d96
fix: Update transitive dependencies on `braces` to address CVE (#22768)
## Description

Updates transitive dependencies on `braces` from 3.0.2 to 3.0.3 to
address [CVE-2024-4068](https://nvd.nist.gov/vuln/detail/CVE-2024-4068).

A couple of applications of `flub modify lockfile --dependency braces
--version 3.0.3 --releaseGroup <release group>`, and some manual updates
in packages/release groups that we can't target with `flub`, basically
doing the same thing but manually (add an override in package.json,
install dependencies, remove override, install dependencies again to
clean up override from the lockfile).

In a few cases I got unrelated updates, mostly about node types, which I
reverted manually.

Server packages also got semver update from 7.6.0 to 7.6.3 which seems
fine.
2024-10-09 14:16:38 -05:00
yunho-microsoft 9e1f6bf859
add new error code: TokenRevoked (#22723)
Add a new error code: TokenRevoked to InternalErrorCode enum for driver
to handle token revocation scenario: should refresh token and reconnect.

Co-authored-by: Yunho <yunho-macbookpro2024@Yunhos-MBP.guest.corp.microsoft.com>
2024-10-03 16:19:58 -07:00
dhr-verma ebed7e613c
Added support for health probes to Gitrest and Historian (#22710)
## Description

This PR takes in the r11 changes -
https://github.com/microsoft/FluidFramework/pull/22635 - and adds
support for the `/healthz` endpoints for `Historian` and `Gitrest`.

1. `/healthz/startup`: Startup readiness check endpoint
2. `/healthz/ready`: Service lifecycle readiness check endpoint
3. `/healthz/ping`: Liveness endpoint. This endpoint was not added for
`Historian` as it already has an existing ping endpoint `/repos/ping`
4. 
These are needed to support Kubernetes Health Checks.

The readiness endpoint would need a custom implementation of
IReadinessCheck. If this is not provided, the endpoint will not be
created.

## Breaking Changes
Adds customizations to the ResourceFactory and Runners each of the
service mentioned above. These are used to inject an implementation of
IReadinessCheck.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-02 17:09:23 +00:00
dhr-verma 9d41303ccf
Added support for health probes for all HTTP services in Routerlicious (#22635)
## Description

This PR adds support for the following endpoints for `Riddler, Nexus,
and Alfred`:

1) `/healthz/startup`: Startup readiness check endpoint
2) `/healthz/ready`: Service lifecycle readiness check endpoint
3) `/healthz/ping`: Liveness endpoint. This endpoint was not added for
`Alfred` as it already has an existing ping endpoint `/api/v1/ping`

These are needed to support Kubernetes Health Checks.

The startup endpoint relies on a new singleton class introduced in this
PR - `StartupChecker`. This class returns the `startup` status as
`isReady: true` after the service runner is created.

The readiness endpoint would need a custom implementation of
`IReadinessCheck`. If this is not provided, the endpoint will not be
created.

To support HTTP endpoints in Nexus, it also adds a request listener to
the HTTP server setup in Nexus.

## Breaking Changes

Adds customizations to the ResourceFactory and Runners each of the
service mentioned above. These are used to inject an implementation of
`IReadinessCheck`.

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-10-01 15:56:44 -07:00
Alex Villarreal a127c7cebe
refactor(server): Remove deprecated version property from docker-compose files (#22546)
## Description

The version property in docker-compose files is deprecated and only used
for backwards compatibility. When using the docker-compose files in ADO
we get warnings like these:

![image](https://github.com/user-attachments/assets/d9ec81d5-7fc9-4ef7-80a4-8d20502f1e93)

I don't think we support using older versions of docker compose so
removing the optional/deprecated property seems fine.

See
https://docs.docker.com/reference/compose-file/version-and-name/#version-top-level-element-optional
2024-10-01 15:22:25 -05:00
Pradeep Vairamani fbda4c0ad2
Upgrade express and body-parser (#22600)
Upgrades the express and body-parser packages in historian and gitrest
to address
[CVE-2024-45590](https://nvd.nist.gov/vuln/detail/CVE-2024-45590).

Release notes for express
[4.20.0](https://github.com/expressjs/express/releases/tag/4.20.0) and
[4.21.0](https://github.com/expressjs/express/releases/tag/4.21.0)
Release notes for body-parser
[1.20.3](https://github.com/expressjs/body-parser/releases/tag/1.20.3)

Follow up to #22480

Co-authored-by: Pradeep Vairamani <pradeep@Pradeeps-MacBook-Pro-2.local>
2024-09-24 10:24:10 -07:00
Alex Villarreal 0b17d50af8
fix: Have alfred redirect requests to nexus when appropriate in local docker environment (#22535)
## Description

This PR makes it so alfred redirects requests whose path starts with
`/socket.io` to nexus for handling instead of trying to handle them
itself, specifically in the case of a local routerlicious environment
running in docker.

### Context

While trying to run our e2e tests against a local routerlicious
environment running in docker I noticed that some compat tests with
older versions (1.x) were failing consistently, and looking at the
server logs I realized that requests for the delta stream were being
received by alfred, who doesn't handle them anymore since
https://github.com/microsoft/FluidFramework/pull/19227. That PR updated
the kubernetes manifests so requests to alfred's URL where the path
starts with `/socket.io` are actually routed to nexus now. I believe
that was necessary because older versions of the driver would not
understand new settings for the deltaStreamUrl. That makes things work
for an AKS deployment, but we missed doing the same thing for the local
docker environment, which this PR fixes.
2024-09-19 10:53:48 -05:00
Alex Villarreal 8987bc9e76
fix(server): Remove bad comma in server configmap (#22560)
## Description

Removes a trailing comma that results in an invalid JSON config and
kubernetes pods crashing trying to load it. Introduced recently in
https://github.com/microsoft/FluidFramework/pull/22442.
2024-09-18 12:43:35 -05:00
WillieHabi f8d3fed16c
Client support for targeted signals (#22321)
## Description
Client side changes needed to support targeting signals to a specific
client id.

Signals are now sent with v2 signals protocol (`ISentSignalMessage`)

Unnecessary override of `submitSignal` function is removed from
localDocumentDeltaConnection. This is handled in documentDeltaConnection
of base driver

These changes follow the server changes to support targeted signals
#19519

[ADO Task
7026](https://dev.azure.com/fluidframework/internal/_workitems/edit/7026)
2024-09-14 05:00:38 +02:00
Mark Fields 0795a20d22
Some overdue cleanup from prior message layer/refactoring work (#22404) 2024-09-13 17:57:14 +00:00
Tyler Butler 697bb0cc7d
build(server): Upgrade express to 4.21.0 and body-parser to 1.20.3 (#22480)
Upgrades the express and body-parser packages to address
[CVE-2024-45590](https://nvd.nist.gov/vuln/detail/CVE-2024-45590). The
package.json range for body-parser was `"^1.17.1"`, but we were already
resolved to 1.20.2 in our lockfile anyway, so this is really just a
patch bump. The express upgrade is the bigger change.

- Release notes for express
[4.20.0](https://github.com/expressjs/express/releases/tag/4.20.0) and
[4.21.0](https://github.com/expressjs/express/releases/tag/4.21.0)
- Release notes for body-parser
[1.20.3](https://github.com/expressjs/body-parser/releases/tag/1.20.3)
2024-09-12 18:00:56 -07:00
Matt Rakow d9f0c37395
Update webpack-related dependencies (#22447) 2024-09-10 23:11:36 +00:00
Zach Newton fff9bab5a5
server: Use HSCAN for Nexus getAllSessions (#22442)
## Description

During peak traffic hours, the RedisCollaborationSessionManager
introduced in #22381 could potentially return thousands of sessions.
After 1,600 sessions, this exceeds the recommended maximum Redis
response size of 200kb (each session+key is about 172 bytes) for optimal
efficiency.

To improve efficiency, we can use [Redis
HSCAN](https://redis.io/docs/latest/commands/hscan/) to fetch sessions
from Redis in batches. Here, the default number of sessions per batch is
800 (half the maximum) to allow wiggle room for future session
information.

### Tests

Added some unit tests for the RedisCollaborationSessionManager, and
bumped the `ioredis-mock` version to include stipsan/ioredis-mock#1300.
2024-09-10 19:43:52 +00:00
Zach Newton e892b97f87
server: Use key for docId tenantId instead of nexus session redis fields (#22439)
## Description

It is redundant and a waste of space to store the documentId and
tenantId in redis fields when they are already present in the key.

Improves #22381
2024-09-09 10:45:28 -07:00
Matt Rakow 63aeb13082
Update/remove some deps using old semver (#22420)
Updates:
* `pm2`
* `@changesets/cli`
* `@changesets/types`
* `sass`
* `sass-loader`

Removes
* `typescript-formatter`
2024-09-06 16:21:16 -07:00
Joel Zhu eaad5963c3
Use the net Library for IP Type (#22405)
Use the net library for IP type detection instead of your custom method.
Some IP addresses may not be recognized or printed correctly if you use
your own regular expression method.
2024-09-06 16:12:42 -07:00
Zach Newton 9a932a638b
server: add collab session tracking to Nexus lambda (#22381)
## Description

Currently, the only reliable way to track a session in R11s is via
Deli's `SessionResult` metric, which depends on Join/Leave Ops and
Deli's "close" handler. This session tracking does not account for
sessions that only have Reader clients with no Ops.

This PR introduces an optional, alternative method for tracking
collaboration sessions within the Nexus lambda itself, which is able to
account for Read-only sessions.

> **Note:** This is an alternative to #9191 which requires creating
Orderer connections to manage read clients using Deli, as well as
keep-alive pings from the frontend (Nexus in our case). We do not want
to spin up Deli and create Orderer connections for read sessions.

### Solution Design Details

> **Context** The original design attempted to only use information
already available from `IClientManager` to understand active session
information and act accordingly. However, the "currently connected
client list" available via `IClientManager` was insufficient for
handling various multi-instance scenarios such as clients leaving from
separate Nexus instances causing the session to "terminate" too
quickly/twice or a Nexus instance shutting down causing a session end
timer to be lost.

1. **"Session Creation (First Client Join)"**: When a client for a given
document connects to the socket server while no other clients are
connected/active for that document, and the previous session either
never existed or was inactive for more than 10 minutes, the session is
"created/started."
2. **"Session Expansion/Continuation (Client Join):** When a client for
a given document connects to the socket server while other clients are
connected/active for that document, or the previous session has been
inactive for less than 10 minutes, the session is updated with
information about that new client, and any existing timers are reset.
3. **Session End (Last Client Leave):** When the only remaining
connected client for a given document disconnects from the socket
server, the session is updated with "last client leave time" and a 10
minute timeout is started.
4. **Session Timeout (Inactive for 10 minutes):** When a session's
inactivity timer expires and there are still no clients in the session
according to the ClientManager, the session is logged as "ended" and
cleaned up.

All of the above "session" information is stored within a Redis HashMap
that allows the list of current sessions to be retrieved and iterated
over, or a single session to be retrieved and updated.

## Breaking Changes

### Firm Input Validation

When the client sends a malformed connect message (i.e. the message does
not contain all expected properties with expected types), Nexus will
emit a `connect_document_error` message with a 400 error code,
indicating malformed user input to the client.

#### Context

Nexus currently makes a lot of type assumptions about the client's
`IConnect` message in the `connect_document` event handler. This can
cause the service to crash due to unhandled TypeErrors at runtime. This
PR introduces strong type checks for the incoming `IConnect` message and
its internal `IClient` details so that Nexus can safely access the
expected properties in that message.

## Reviewer Guidance

- **Main Session Tracking Logic**: server/r11s/packages/services/src
`redisSessionManager.ts` and `sessionTracker.ts`
- **Main Nexus Session Tracking**:
server/r11s/packages/lambdas/src/nexus `connect.ts` and `disconnect.ts`
- There is also a small refactor in `disconnect.ts` to make the
Disconnect handler structure more similar to the Connect handler by
moving the internal loops into their own named functions.
- **Type Validation**: server/r11s/packages/lambdas/src/nexus `index.ts`
and `protocol.ts`

---------

Co-authored-by: Tyler Butler <tyler@tylerbutler.com>
2024-09-06 19:32:08 +00:00
Matt Rakow afe20defdf
Update build-tools versions to latest (0.44.0) (#22407)
This also requires updates to some typetests since the format has
changed.
2024-09-06 11:47:26 -07:00
Alex Villarreal f8505c1a8c
Update axios dependencies (#22388)
## Description

Updates axios dependencies to the latest version (in package.json direct
dependencies and in transitive dependencies in lockfiles) throughout the
repo to address a few CVEs.
2024-09-05 13:23:42 -05:00
zhangxin511 c4870068b5
Add isEphemeralContainer information to session logs (#22284)
## Description

We don't have good way of hooking up connect document metrics with
isEphemeralContainer flags. Get session would be the entry point of
connect a document so this will provide us more accurate information.

## Breaking Changes

N/A

---------

Co-authored-by: Xin Zhang <zhangxin@microsoft.com>
2024-09-04 11:33:27 -04:00
dhr-verma c8e16500ff
Vermadhr/correlation id source tracking (#22292)
## Description

Refactors and changes the prop `correlationIdSource` to `requestSource`
to avoid ambiguity in understanding whether we are tracking request
origin or correlationId origin.
2024-08-22 18:00:29 +00:00
dhr-verma 7fd8c786b3
Added correlationId source tracking (#22280)
## Description

This PR adds telemetry to track the origin of the correlation associated
with an API call by adding a new telemetry prop - `correlationIdSource`.
If the client sends a correlationId in the `x-correlation-id` header or
in the `x-telemetry-header`, then the source is set as
`"correlationIdSource": "client"`. Else the correlationId is generated
by the server and the prop is set as `"correlationIdSource": "server"`.

## Breaking Changes

Updates `ITelemetryContextProperties` to include the
`correlationIdSource` property.
2024-08-21 19:16:39 +00:00
kekachmar afe31cfe89
make consumeLoopTimeoutDelay configurable (#22253)
Change to make consumeLoopTimeoutDelay configurable
2024-08-19 13:53:50 -04:00
Zach Newton f461368cf6
server: upgrade server packages in Historian and Gitrest (#22220)
## Description

Upgrading Routerlicious server packages in Gitrest and Historian to pull
in changes from #22109.

Adds `getTelemetryContextProeprties` param to each BasicRestWrapper
instantiation
2024-08-16 23:37:11 +00:00
Zach Newton 24ad74864f
server: Switch Ephemeral Container expired to document deleted message (#22217)
## Description

Customers depend on the "Document is deleted..." message, not the error
code. Some of our E2E tests do to. When an EC is considered expired,
just say it's "deleted" to match existing client logic.

Follow-up to move to a better message: [ADO
#12867](https://dev.azure.com/fluidframework/internal/_workitems/edit/12867)
2024-08-15 20:50:28 +00:00
Zach Newton 38194426d6
server: add a couple doc comments (#22213)
## Description

Missed adding some doc comments in #22109
2024-08-14 22:58:28 +00:00
Zach Newton 18b76b29ff
server: telemetry context header (#22109)
## Description

Global TelemetryContext was implemented several major server versions
ago. At the same time, the old `getCorrelationId` and
`bindCorrelationId` method of tracking correlationId was deprecated.
This PR removes usage of those methods, and also adds a new Telemetry
Context header that can be extended to track other information for the
lifetime of an API request.

For the new `x-telemetry-context` header, the old `x-correlation-id`
header will still be respected (for now) if `x-telemetry-context` header
does not container `correlationId` property. BasicRestWrapper now takes
in an optional `getTelemetryContextProperties` method, similar to how it
takes a `getCorrelationId` method. This is used to generate
telemetryContext header on outgoing requests from within R11s.
`x-correlation-id` is still generated.

## Breaking Changes

- `enableGlobalTelemetryContext` config switched to `true` in code. Was
already true in configs.
- `bindCorrelationId` usage was removed from Gitrest, Historian, and
Routerlicious Rest APIs, meaning `getCorrelationId` without
`enableGlobalTelemetryContext: true` will not work anymore.

I'm leaving the old `getCorrelationId` and `bindCorrelationId` methods
in for 1 more release cycle out of abundance of caution, even though it
has been deprecated for almost a year.
2024-08-14 14:59:13 -07:00
Alex Villarreal e9d1a83787
refactor: Address CredScan warning in server pipelines (#22179)
## Description

This PR fixes the CredScan warnings we were getting in the server
pipelines, before they become a blocker that makes the pipeline runs
fail.

The auto-injected CredScan task in server pipelines was complaining
about things that we had already indicated should be skipped (through
the CredScanSuppressions.json file). Turns out that for docker builds,
the file is expected in the "root context" for the docker build, not at
the root of the repo like it is for some other auto-injected tasks. This
PR makes it so we copy the file to the necessary new location in the
server pipelines.

It also replaces a bunch of fake usernames/passwords in a file's
comments with "PLACEHOLDER" which the CredScan task automatically skips
(pro-tip: don't use "PLACEHOLDER" as your actual password 😄).

Finally, it adds more suppressions for files that are part of test code
in some server dependencies.
2024-08-14 17:35:30 +00:00
Zach Newton c1e343e4c6
server: explicitly reject requests for expired Ephemeral Containers (#22174)
## Description

Currently, we rely on an Ephemeral Container to either 1) be cleaned up
by the Deli lambda on session end, or 2) expire due to DB and Redis TTL
values.

There are inconsistencies in configurations and TTL behaviors regardless
of configs, so we want to explicitly reject access to Ephemeral
containers that are older than a certain time.

This PR causes all Historian requests and Alfred getSession requests to
fail with an explicit `404 - Ephemeral Container Expired: ...` error
when the container was created longer ago than the EphemeralDocumentTTL
config value. It also changes Gitrest's Ephemeral TTL configuration to
use an explicit EphemeralDocumentTTL value for consistency, rather than
an implicit general Redis TTL value.

The defaults for these values are remaining as 24 hours.
2024-08-12 21:57:20 +00:00
Zach Newton e9614754e9
Make ephemeral Document TTL configurable (#22164)
## Description

Makes the ephemeral container DB TTL added in #19981 configurable.
2024-08-08 23:51:04 +00:00
Tyler Butler 8a0e4190f2
ci: Fix docker pipelines to correctly pack packages (#22072)
This change reverts part of the changes made in #21018. The past changes
inadvertently caused the packages to be published without any built
content. I have verified from test builds that the published packages do
have built content with this change.

In this change, the pack process for docker pipelines is once again run
with a unique shell command that is run in the docker container, and the
package lists are created directly in the pipeline instead of by a
script.

This is unfortunate from a maintenance perspective because it means
there are two slightly different pack paths depending on the pipeline.
That said, this is by far the most straightforward fix.
2024-07-31 19:55:41 -05:00