replace ILogger with TextWriter and move the implementation to ActorRuntimeLogTextFormatter to unify all logging done inside LogWriter and prepare the way for xml and json versions of this thing.
Related work items: #2125
This PR simplifies* the logic behind the `ControlledTask` implementation to make it easier to add upcoming features and improve the performance.
The PR removes some unnecessary intermediate classes as well as logic to streamline the design (e.g. removed the `RuntimeProvider`, and also various classes related to the interception of controlled tasks during testing -- this is now done much simpler via the `ITaskController`). This simplification also results in some performance improvements as well as much cleaner code inside the `_Runtime` classes. Global accesses to the runtime object from the task classes are also now reduced, which is nicer.
The PR also fixes a bug where the returned controlled task ID was not the expected in some scenarios (now it matches the `Task.Run` behavior), and adds some tests to avoid regressions.
The PR also removes `ControlledLock` which was experimental and not a real .NET API. We can bring back some more robust APIs for exposing the capability to declare controlled locks in a future PR.
*There are more that can be done, but I leave that for after v1 for the benefit of time.
Related work items: #1943
Graph is consolidating events from one state to another so it shows up as one link, but if those are actually different event types it was forgetting that. This fix changes the Graph data structure so it can remember multiple event id's on a single link. The coverage report then uses that to produce the right result. The handling of Monitors was also slightly wrong where the [hot], [cold] state labels were not matching with the coverage report MachineStates info, so this has been changed to a graph label only, rather than being part of the node id.
Also changed things so that you can now do this on the coyote test command line:
```
--explore --coverage activity
```
and you get the combined coverage of all explorations, regardless of whether any bugs were found.
Related work items: #2037
Fix some documentation bugs, typos, and move "state machine" docs into their own top level page so they are more discoverable, also complete the state machine docs by also talking about push and pop.
Add Transition.None and hide the Transition.Type since there is no need to make that stuff public, in fact it just leads to confusion because one starts to hunt for ways to change that field but there is no way. Also added more documentation to Transition class and how to use it properly.
Related work items: #1946
This PR is removes the `pstrace` test output (which is not supported in P# for a long time now, and was dropped from Coyote) and the `BugTrace` data structure that was gathering data for `pstrace`. @<Chris Lovett>, to give you some context that was an early attempt for some trace visualization but we dropped it. Removing these (besides cleaning up unnecessary code) should reduce runtime/memory overhead during testing.
If we bring trace visualization back, we can use Chris' new logging infrastructure for this purpose (e.g. a logger that gathers these data and spits out some file used for visualization).
Related work items: #1149
This PR adds some missing APIs from `ControlledTask`, namely the `Wait` methods.
The PR also addresses a super sneaky bug in the controlled task implementation (and also adds a few new tests to guard against regressions).
A simple repro:
```c#
private class SharedEntry
{
public int Value = 0;
}
private static async ControlledTask WriteAsync(SharedEntry entry, int value)
{
await ControlledTask.Delay(10);
entry.Value = value;
}
private static async ControlledTask InvokeWriteAsync(SharedEntry entry, int value)
{
await WriteAsync(entry, value);
}
[Test]
public static async ControlledTask RunExample()
{
SharedEntry entry = new SharedEntry();
ControlledTask task1 = InvokeWriteAsync(entry, 3);
ControlledTask task2 = InvokeWriteAsync(entry, 5);
await ControlledTask.WhenAll(task1, task2);
Specification.Assert(entry.Value == 5, "Value is {0} instead of 5.", entry.Value);
}
```
The bug causes the assertion to never fail. Same behavior happens with some other combinations e.g. `Yield` instead of `Delay`.
The issue is that in the intermediate (`InvokeWriteAsync`) method (could be more than 1 of them) the "rewriting" that coyote does during systematic testing does not "package" the “after-await-logic” into a continuation and then return to the caller (`RunExample` here). Instead it "blocks" the current task executing until the await completes, which is wrong semantics, and causes the missing interleaving. Although here the assertion is missed, it could potentially lead to a worse scenario, e.g. a deadlock not getting resolved (imagine if `WriteAsync(3)` was waiting on a lock that only `WriteAsync(5)` releases 😊).
The pre-fix logic only did the packaging (which generates a special task under the hood, which is controlled and explored systematically) when a `ControlledTask` was created explicitly (`Run`, `Yield`, `Delay`) but not implicitly as in the above intermediate example, thus causing the issue! The actual logic (and production .NET semantics) is a bit more complicated than my description.
The fix is quite simple conceptually, imagine an async callstack like this:
```
WriteAsync
InvokeWriteAsync
RunExample
```
Coyote should do the packaging when an await happens in any method (which can be anonymous lambda), but not on the “root” method (`RunExample`, in this case). If it happens in the root, then it can hang the scheduler (as there is no caller method to return too). My current fix is to do a limited traversal of the `StackTrace` and differentiate between the cases.
The PR also does some minor related cleanup.
This PR fixes some of the APIs of the `TestingEngineFactory`. Basically after supporting `ControlledTask`, we switched the expected test function pointers to `ControlledTask` instead of `Task` and forgot to update it here. This PR solves this issue.
This PR fixes a reported bug where the seed was not reported when the testing engine run programmatically. This was fixed by simplifying setting the seed (there is no need for it to be nullable, it always has a value now, which is set upon construction of the configuration object).
Fix bug in handling of raised events so the links show up correctly. The problem was OnHandleRaisedEvent was trying to create a link immediately, but we don't yet know the impact of this raised event. For example, it might result in a OnEventGotoState transition, in which case we want the link on the diagram to reflect that. Fortunately the fix was very simple since I had a concept of dequeued events already.
Also added animating raft demo under http://127.0.0.1:4000/coyote/learn/core/demo.
Also added the beginnings of some tutorial content - more of a template really...
This PR changes the `StateMachine` API to make all transitions (`RaiseEvent`, `GotoState`, `PushState`, `PopState` and `Halt`) typed (via the new `StateMachine.Transition` type) towards improving the programmability of this programming model. The same applies to the `Monitor` API. Quite a bit of runtime complexity was also reduced based on the fact that the C# type checker does some of the work for us now.
This basically allows a `StateMachine` to be declared as follows:
```c#
private class M : StateMachine
{
[Start]
[OnEntry(nameof(InitOnEntry))]
private class Init : State
{
}
private Transition InitOnEntry() => this.GotoState<Done>();
private class Done : State
{
}
}
```
Related work items: #1439
This PR changes the API of `Actor` and `Monitor` to allow an event to be passed as in-parameter in an action. The previous `ReceivedEvent` property has been removed.
Related work items: #871
This PR removes the `RaiseEvent` API from `Actor` as it is specific to state-machine logic. It introduces a `Halt` API that is basically syntactic sugar for `RaiseEvent(HaltEvent)` (and is required by `Actor` to halt without having to go through the queue). The PR also optimizes `HaltEvent` by making the constructor private, now can use it via a singleton instance `HaltEvent.Instance` which minimizes allocations.
This PR adds lots of systematic testing tests for the actor type. Also minor update in the `Coyote.nuspec` to point to the website and repository properly.
This PR introduces the `Actor` type, which is the base class for all actor types. `StateMachine` has now become a subclass of `Actor`, and most of the actor-related functionality has moved inside `Actor`, leaving only state machine related functionality in the subclass.
The three `SharedObjects` testing implementations (`SharedCounter`, `SharedDictionary` and `SharedRegister`) converted from `StateMachine` to `Actor`.
There is only a limited amount of new `Actor` tests added for two reasons:
- `StateMachine` is built on top of `Actor`, and most of the non-state-transition related functionality is inside `Actor`, which means that all `StateMachine` tests also test common `Actor` functionality.
- For the interest of time to get this PR merged, more tests will be added in upcoming PRs.
I have not updated the docs, we can do that in another PR.
Also I plan to do some more refactoring and polishing in both `Actor` and `StateMachine` (e.g. `Event` passed as in-param, and `OnInitializeAsync` meaningfully used for `StateMachine` custom initialization), but I leave that for subsequent PRs.
Also did some refactoring in tests, to increase code reuse (`E` and `Unit`, which were the most commonly used events, and do not take a payload, are now moved to a common reusable across tests event `UnitEvent`). More maximizing code between tests in future PRs.
Related work items: #1169, #1174
This fixes a hang in parallel coyote testing on Linux. Seems a socket Listener on Linux doesn't terminate when you call Close, it just hangs. Switching to async Accept solved the problem.
Related work items: #1296
The Graph writer was getting confused about "sender state" and monitor state so the monitor states were showing up in the sender. There was also missing information during halt on what the current state was before the machine was halted, this missing information is gathered when we pop unhandled events, since Halt is always an unhandled event.
This PR renames the following APIs:
- `IActorRuntime.CreateMachine*` becomes `IActorRuntime.CreateStateMachine*`. When we introduce actors we can think if we want to create a common creation for both state machines and actors (as Akash suggested in his email). But for now I want to have it explicit that its creating a state machine.
- `StateMachine.CreateMachine` becomes `StateMachine.CreateStateMachine`. Similar to above.
- `StateMachine.Raise` becomes `StateMachine.RaiseEvent`. Becoming more descriptive.
- `StateMachine.Receive` becomes `StateMachine.ReceiveEventAsync`. Becoming more descriptive and also adds the `Async` suffix to follow best practises for C# `async`/`await` programming..
- `StateMachine.Send` becomes `StateMachine.SendEvent`. Becoming more descriptive, and also matches the corresponding runtime API.
- `Monitor.Raise` becomes `Monitor.RaiseEvent`. Becoming more descriptive.
Related work items: #1168
Cleanup time! I checked all known customer repositories, can't find anyone who is using this. Will eventually be replaced by a much better `Actor` type.
Related work items: #1163
This PR renames `IMachineRuntime` to `IActorRuntime`. It also changes comments to mention actors or state machines (depending on the context). Also renamed `IMachineRuntimeLog` to `IActorRuntimeLog` and `RuntimeLogWriter` to `ActorRuntimeLogWriter`.
Related work items: #1156
This PR renames MachineId (in the code, as well as some comments and in the docs) to ActorId **only**. It should not touch any of the stuff in Chris Machine -> StateMachine PR, so the two should be able to be merged in parallel.
More refactoring in separate PRs.
Related work items: #1157
This is phase 1 of the rename, it is the just the very mechanical change of Machine to StateMachine ONLY. We can do the continuation of this direction in subsequent PR's.
Related work items: #1151, #1155
This PR adds the remaining aggregate wait APIs in controlled tasks, that were missing, and more specifically the `WaitAll` APIs, as well as fixes exception propagation.
Towards simplifying the design, the PR also removes some duplicate APIs and code targeting `Task` (besides `ControlledTask`).
Also adding (and fixed) unit tests for these APIs.
Related work items: #697
This PR renamed the command line tool from `Coyote` to `coyote` to make it much easier to write, and also follow the standard practice for the rest of the .NET ecosystem (e.g. similar to the `dotnet`, `nuget`, `msbuild` tools). Also renamed the appropriate bits in the documentation (hopefully did not miss any).
Related work items: #1107
Remove "colons" from command line options. Merge Tester and replay tool into new "coyote" tool. Also added Positional argument support to CommandLineParser.
```
d:\git\foundry99\Coyote>coyote test ..\CoyoteSamples\MachineExamples\bin\net46\FailureDetector.exe -i 100 -ms 10
. Testing ..\CoyoteSamples\MachineExamples\bin\net46\FailureDetector.exe
Starting TestingProcessScheduler in process 11348
... Created '1' testing task.
... Task 0 is using 'Random' strategy (seed:766).
..... Iteration #1
... Task 0 found a bug.
... Emitting task 0 traces:
..... Writing ..\CoyoteSamples\MachineExamples\bin\net46\Output\FailureDetector.exe\CoyoteTesterOutput\FailureDetector_0_10.txt
..... Writing ..\CoyoteSamples\MachineExamples\bin\net46\Output\FailureDetector.exe\CoyoteTesterOutput\FailureDetector_0_10.pstrace
..... Writing ..\CoyoteSamples\MachineExamples\bin\net46\Output\FailureDetector.exe\CoyoteTesterOutput\FailureDetector_0_10.schedule
... Elapsed 0.1360933 sec.
... Testing statistics:
..... Found 1 bug.
... Scheduling statistics:
..... Explored 1 schedule: 1 fair and 0 unfair.
..... Found 100.00% buggy schedules.
..... Number of scheduling points in fair terminating schedules: 54 (min), 54 (avg), 54 (max).
..... Exceeded the max-steps bound of '10' in 100.00% of the fair schedules.
... Elapsed 0.2428933 sec.
. Done
d:\git\foundry99\Coyote>coyote replay ..\CoyoteSamples\MachineExamples\bin\net46\FailureDetector.exe \git\foundry99\CoyoteSamples\MachineExamples\bin\net46\Output\FailureDetector.exe\CoyoteTesterOutput\FailureDetector_0_10.schedule
. Reproducing trace in ..\CoyoteSamples\MachineExamples\bin\net46\FailureDetector.exe
... Reproduced 1 bug.
... Elapsed 0.1342201 sec.
```
Related work items: #649, #713
This hopefully fixes the nondet issue (causing flaky tests) with halting. There is a very subtle "race condition" (during the serialized testing execution) that caused a "window" where a machine has halted but not yet removed from the set of machines, and an enqueue happening, still enqueuing the event, but not triggering the "halted machine dropped event" check.
Immad gave feedback earlier that the random generator (boolean and integer) APIs should not be part of the `Specification` class. This feedback makes sense as these APIs are indeed orthogonal to the notion of a "specification", and they can be used both in production and mock/test code. For this reason, this PR refactors this bit of logic, and puts them in their own `RandomValueGenerator` class, which exposes these API as static methods.
This also addresses a previous work item, by removing the need to recreate a new `Random` instance each time the corresponding production runtime API is called (this API simply uses the random instance of the generator, created only once or whenever the seed is updated -- which is not thread safe as noted in the comment, if the user chooses to do that, its currently expected to only be done in the very beginning of the application).
This also moves towards simplifying the API surface of state machines, by deprecating the previous Random and RandomInteger APIs of machine, and now asking the user to use this new static methods.
Also changes all tests to this API (we need to change samples too after we finalize and merge).
@<Chris Lovett> @<Akash Lal> please have a look and let me know if you have any suggestions.
Related work items: #758
Improve unhandled exception handling. Print inner exceptions in the unhandled exception handler. SO you now get this nice error message when you use "async void":
```
. Testing D:\git\foundry99\CoyoteSamples\AsyncExamples\bin\net46\FailureDetector.exe
Starting TestingProcessScheduler in process 11736
... Created '1' testing task.
... Task 0 is using 'PCT' strategy (seed:97).
..... Iteration #1
..... Iteration #2
..... Iteration #3
Error: [CoyoteTester] unhandled exception: Microsoft.Coyote.Runtime.ExecutionCanceledException: This can mean you are using ControlledTasks but you have a code path that is not controlled by ControlledTasks (in other somewhere you are still using Task) and it threw an unhandled exception. One known issue that causes this is using 'async void' methods, which is not supported.
..... Iteration #4
at Microsoft.Coyote.TestingServices.Scheduling.OperationScheduler.CheckNoExternalConcurrencyUsed() in D:\git\foundry99\Coyote\Source\TestingServices\Exploration\OperationScheduler.cs:line 431
at Microsoft.Coyote.TestingServices.Runtime.SystematicTestingRuntime.CreateControlledTaskCompletionSource(Task task) in D:\git\foundry99\Coyote\Source\TestingServices\SystematicTestingRuntime.cs:line 679
at Microsoft.Coyote.Threading.Tasks.AsyncControlledTaskMethodBuilder.get_Task() in D:\git\foundry99\Coyote\Source\Core\Threading\Tasks\AsyncControlledTaskMethodBuilder.cs:line 58
at Coyote.Examples.FailureDetector.Timer.StartTimer(Int32 timeout)
at Coyote.Examples.FailureDetector.FailureDetector.<SendPingOnEntry>d__20.MoveNext() in D:\git\foundry99\CoyoteSamples\AsyncExamples\FailureDetector\FailureDetector.cs:line 186
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
```
Related work items: #904
Cleanup command line parsing by adding a CommandLineArgumentParser, similar to what Python provides. This centralizes "datatype" parsing for options with values, and PrintHelp. So each tool is easier to write the command line options. This also removes support for "/" as a switch delimiter because this is problematic character to use on Linux where "/" is also a valid file name. I can easily add that back though if you feel strongly about it. Linux tools also have a standardized methodology for long argument names and short argument names, which I've also implemented here. I also exposed a few new option on COyoteTester, let me know if you want those moved to the hiddenGroup.
Related work items: #713