We must build each appex bundle before the container bundle, so that we can
compute the frameworks each appex the needs before bundling the container app.
Also there's no need to store the list of frameworks appex's need in a file,
since everything is now done in the same mtouch process.
Implement support for sharing both code and resources between app extensions
and their container app:
* AOT-compiled code. Each shared assembly is only AOT-compiled once, and if
the assembly is built to a framework or dynamic library, it will also only
be included once in the final app (as a framework or dynamic library in the
container app, referenced directly by the app extension). If the assemblies
are built to static objects there won't be any size improvements in the app,
but the build will be much faster, because the assemblies will only be AOT-
compiled once.
* Any resources related to managed assemblies (debug files, config files,
satellite assemblies) will be put in the container app only.
Since these improvements are significant, code sharing will be enabled by
default.
Test results
============
For an extreme test project with 7 extensions (embedded-frameworks)[1]:
with code sharing cycle 9 difference
build time 1m 47s 3m 33s -1m 46s = ~50% faster
app size 26 MB 131 MB -105 MB = ~80% smaller
For a more normal test project (MyTabbedApplication)[2] - this is a simple application with 1 extension:
with code sharing cycle 9 difference
build time 0m 44s 0m 48s -4s = ~ 8% faster
app size 23 MB 37 MB -15 MB = ~40% smaller
Another tvOS app with one extension also show similar gains (MyTVApp)[3]:
with code sharing cycle 9 difference
build time 0m 22s 0m 48s -26s = ~54% faster
app size 22 MB 62 MB -40 MB = ~65% smaller
[1]: https://github.com/rolfbjarne/embedded-frameworks
[2]: https://github.com/xamarin/xamarin-macios/tree/cycle9/msbuild/tests/MyTabbedApplication
[3]: https://github.com/xamarin/xamarin-macios/tree/cycle9/msbuild/tests/MyTVApp
Warn if mtouch loads an assembly from a different location than requested
(which might be because there are multiple assemblies with the same name).
Also rework the MT0023 check a bit by explicitly loading the root assembly
first, and then detecting if any loaded assemblies matches the root assembly.
This results in code that's a bit more obvious, and it also works correctly
with extensions (previously the entire MT0023 check was skipped for
extensions).
Allow the assembly build target name for frameworks to end with '.framework',
so that the following:
--assembly-build-target=@sdk=framework=Xamarin.Sdk.framework
doesn't end up creating Xamarin.Sdk.framework.framework.
The previous build system kept a forward-pointing single linked list of tasks
to execute: task X had a list of subsequent tasks to execute. If task X was
up-to-date, it was not created (and the next tasks were directly added to the
list of tasks to execute).
In this world it became complicated to merge output from tasks (for instance
if the output of task X and task Y should be a consumed by a single task
producing a single output, since the corresponding task would end up in both
X's and Y's list of subsequent tasks).
Example: creating a single framework from the aot-compiled output of multiple
assemblies.
So I've reversed the logic: now we keep track of the final output, and then
each task has a list of dependencies that must be built.
This makes it trivial to create merging tasks (for the previous example, there
could for instance be a CreateFrameworkTask, where its dependencies would be
all the corresponding AotTasks).
We also always create every task, and then each task decides when its executed
whether it should do anything or not. This makes it unnecessary to 'forward-
delete' files when creating tasks (say you have three tasks, A, B, C; B
depends on A, and C depends on B; if A's output isn't up-to-date, it has to
delete its own output if it exists, otherwise B would not detect that it would
have to re-execute, because at task *creation* time, B's input hadn't
changed).
Additionally make it based on async/await, since much of the work happens in
externel processes (and we don't need to spin up additional threads just to
run external processes). This makes us have less code run on background
threads, which makes any issues with thread-safety less likely.
Rework the code that copies assemblies and their related files to the app
bundle to take into account that we might be building to frameworks now.
Also strip the assemblies when they're copied (if they must be stripped),
which removes the need for custom logic to copy files related to stripped
assemblies.
Additionally change how we handle duplicated assemblies by checking for
duplication before copying them to the app bundle. This allows us to copy
assemblies to the root directory (not the .monotouch-[32|64] subdirectory) if
the 32-bit and 64-bit versions are identical, which also means we won't need
symlinks anymore.
Use @rpath instead of @executable_path in dylibs, since it allows us to be
more flexible when placing dylibs in the app.
In particular with this change it's trivial to put libmonosgen-2.0.dylib in
the container app, and reference it from extensions.
Performance tests
-----------------
This is for a new watchOS extension project, built for release.
* The default (currently -O2) optimizations: 41s ( baseline ) 30.027.060 bytes ( baseline )
* All optimizations disabled (`--llvm-opt=all=`): 17s (-24s = -59%) 32.978.312 bytes (+2.951.252 = +10%)
* Optimized for size (`--llvm-opt=all=-Os`): 36s ( -5s = -12%) 28.617.408 bytes (-1.409.652 = -5%)
* Optimized for more size (`--llvm-opt=all=-Oz`): 35s ( -6s = -15%) 28.601.016 bytes (-1.426.044 = -5%)
* Optimized slightly (`--llvm-opt=all=-O1`): 35s ( -6s = -15%) 28.666.556 bytes (-1.360.504 = -5%)
* Optimized a lot (`--llvm-opt=all=-O3`): 41s ( 0s = 0%) 30.403.996 bytes (+ 376.936 = +1%)
Conclusions
-----------
* The fastest build by far (less than twice as fast) is if optimizations are
disabled, but this adds a 10% size penalty (~3 MB in this test case),
compared to the baseline, and 15% size penalty (4.3 MB) compared to -Oz.
* -Oz seems to have the best overall results: at least as fast as any other
optimized build, and the smallest app as well.
Caveats
-------
Some optimizations might not work the AOT compiled code. The resulting
binaries have not been tested.
Build extensions and the container app in the same mtouch process, by storing
all the mtouch arguments when called to build extensions in a text file, and
then reloading those arguments when called to build the main app.
This is required if we want to share code between extensions and the
container.
This is a series of fixes to the dynamic libraries we build / create to remove
any unnecessary bloat (unused architectures, bitcode).
A brand new watchOS app with no changes goes from 35MB to 11MB with these
fixes (with incremental builds disabled, the app size is 10MB).
--------------------------------------------------------------------------------
* [runtime] Split list of architectures into simulator and device-specific lists.
* [runtime] Build separate dylibs for device and simulator.
Build separate dylibs for device and simulator, since we already install these
into different locations.
This makes both the simulator and device builds slightly faster (since the
respective dylibs are smaller, and less data to copy around).
For watchOS apps, this saves ~430kb.
* [runtime] Strip bitcode from dylibs. Fixes#51352.
We know that dylibs will never be shipped to the App Store, so we'll never
need them to have bitcode.
So just strip the bitcode from all our dylibs, since this makes apps with
fastdev significantly smaller (and thus much faster to upload to watch
devices).
For watchOS apps this is a very significant improvement: a branch new watchOS
app without any changes goes from 35MB to 17MB.
https://bugzilla.xamarin.com/show_bug.cgi?id=51352
* [mtouch] Fix dylib compilation to not embed full bitcode.
Facts
=====
a. The output from the AOT compiler is an assembly (.s) file.
b. Clang's assembler does not support -fembed-bitcode-marker (only -fembed-
bitcode), so when we ask clang to -fembed-bitcode-marker, the assembler
receives a -fembed-bitcode argument.
c. This means that the assembled object file does not contain the
__LLVM/__bitcode and __LLVM/__cmdline sections (it does however contain an
__LLVM/__asm section).
d. The native linker will create a bitcode assembly section if none of the
object files passed to the linker contain a __LLVM/__bitcode section and
there's an __LLVM/__asm section present.
e. The end result is that when we build to a dylib, we end up (unexpectedly,
because we ask Clang to -fembed-bitcode-marker) including both armv7k and
bitcode in the dylib, thus bloating the dylib size significantly.
Solution
========
We manually add the __LLVM/__bitcode and __LLVM/__cmdline sections to the .s
file Mono's AOT compiler generated. This way the .o file will have the magic
sections, and the linker will not include bitcode (only the bitcode marker) in
the final library.
An empty watchOS extension with incremental builds is now 6MB smaller (down
to 11MB from 17MB).
So that there can be multiple caches in the same process (which we'll have
once mtouch can compile extensions and the container app in the same process).
* [mtouch] Remove most static state and put it on an instance.
Remove most static state, and put it on an instance (of Application) instead.
Soon we'll build multiple apps (app extensions) in the same process, and we
need to make sure we're not inadvertedly (and incorrectly) sharing state
between apps, so remove as much static state as possible to avoid any
problems.
* [mtouch] Rename GetIsUsingClang -> IsUsingClang.
* [mtouch] Always require a SDK version when building.
Technically it was required before too, but the error messages were non-optimal:
it could for instance complain that the user is using an iOS framework that
was introduced in iOS 2.0.
* [mtouch tests] Rewrite MT0060 and MT0061 tests to use MTouchTool.
This makes sure we pass --sdk to mtouch (which MTouchTool does by default), so
that we don't run into MT0025 before the errors we're testing for.
The BundleId property is used by the code that generates the mSYM directory,
but its value was always the default value 'com.yourcompany.sample' instead of
looked up in the app's Info.plist.
So fix the BundleId property to do the expected.
Also fix the mSYM test (SymbolicationData) to actually test mSYM stuff (it was
partially disabled when we disabled automatic mSYM generation for C8, and
never re-enabled), and port it to the new and better test syntax, and add a
few more asserts to check the manifest.xml generation.
Run the partial static registrar separately for 32-bit and 64-bit, since this
is required with the upcoming changes to embed metadata tokens inside the
generated output, because the metadata tokens are different between the 32-bit
and 64-bit versions of Xamarin.iOS.dll.
Also make sure to properly resolve the 32-bit and 64-bit assemblies correctly
(by setting the ArchDirectory on the assembly resolver), so that we don't pick
up the reference assembly (which does not have the right metadata tokens).
Additionally stop running the partial static registrar for MonoTouch.Dialog-1,
since with the upcoming changes to use metadata tokens in the generated
output, we won't support registering anything more than once. This shouldn't
make much of an impact, because MonoTouch.Dialog-1 is fairly small, and
doesn't take long to register in the dynamic registrar. For device builds (or
when the static registrar is selected) this has no effect, since in that case
we're registering everythinga anyway.
If the cache is invalid we print a warning:
> A full rebuild will be performed because the cache is either incomplete or entirely missing.
and set `Driver.Force = true;` (in Application.cs).
This later means that extracting the native code is done on each target:
```
public static bool IsUptodate (string source, string target)
{
if (Driver.Force)
return false;
```
even if this is identical between 32 and 64 bits (targets). That's
inefficient and, for large binding libraries (e.g. > 1GB), has a
noticable impact on build time (see timestamps).
Considering that the cache is cleaned (when detected as invalid) then
this Force condition is not really needed.
E.g. in `IsUptodate`
* the first time (arch) it's called will have to extract the native library;
* if a 2nd arch is built (fat) then it will be found as present and will
not be extracted again
Removing the `Driver.Force = true;` in this condition let the `-f` option
continue to extract it twice, which can be useful in debugging and testing.
As such the check is not removed from `IsUptodate`
Timestamps (before)
Setup: 25 ms
Resolve References: 1605 ms
Extracted native link info: 10465 ms
...
Timestamps (after)
Setup: 24 ms
Resolve References: 1560 ms
Extracted native link info: 5473 ms
...
Total build times (from XS) was around 90-100 seconds so 5 seconds is
about 10%. The actual savings will depend on how much native code needs
to be extracted, but it should help most release builds (almost always
fat builds).
The container app may not reference the same third-party frameworks as
extensions, which means that we must make sure the extension's frameworks are
also included in the app bundle.
So when building extensions save a list of all third-party frameworks, and
then read that list and include those frameworks when building the main app.
https://bugzilla.xamarin.com/show_bug.cgi?id=45800
This makes it possible to set linker flags per assembly:
[assembly: LinkWith (LinkerFlags = "-lsqlite3")]
Which is required when incremental builds is enabled and a particular assembly
needs special linker flags (because we don't propagate the global -gcc_flags
to each dylib we build when doing incremental builds).
Also add an option to set the dlsym mode for an assembly (using the LinkWith
attribute).