SymCrypt/scripts/symcryptasm_processor.py

1294 строки
61 KiB
Python
Executable File

#!/usr/bin/env python3
"""
This script enables processing of symcryptasm files so that they can be assembled in a variety of
environments without requiring forking or duplication of source files - symcryptasm files phrase
assembly in an assembler and environment agnostic way.
The current target assemblers are:
MASM, GAS, and armasm64 (Arm64 assembler which ships with MSVC)
The current target environments are:
amd64 Windows (using the Microsoft x64 calling convention),
amd64 Linux (using the SystemV amd64 calling convention),
arm64 Windows (using the aapcs64 calling convention),
arm64 Windows (using the arm64ec calling convention), and
arm64 Linux (using the aapcs64 calling convention)
The plan is to rephrase all remaining .asm in SymCrypt as symcryptasm, extending support as
appropriate to enable this effort.
Normally the processing of symcryptasm files takes place in 2 passes. The first pass is performed by
this symcryptasm_processor.py script, which does the more stateful processing, outputting a .cppasm
file. If the processed symcryptasm file includes other files via the INCLUDE directive, the contents
of the included files are merged at their point of inclusion to generate a single expanded symcryptasm
file which is saved with a .symcryptasmexp extension to the output folder. For symcryptasm files which
do not include other files, there's no corresponding .symcryptasmexp file as it would be identical to
the source file.
The .cppasm files are further processed by the C preprocessor to do more simple stateless text
substitutions, outputting a .asm file which can be assembled by the target assembler for the target
environment.
The exception is when using the armasm64 assembler, which uses the C preprocessor before assembling
its inputs already; so the output of this script is directly assembled by armasm64.
We have set up the intermediate generated files to be created in the output directories in both
razzle and CMake builds.
### symcryptasm syntax ###
Different calling conventions pass arguments to functions in different registers, have differing
numbers of volatile and non-volatile registers, and use the stack in different ways.
We define our own register naming scheme which abstracts away the differences between calling
conventions. The generalities of the naming scheme will be similar across target architectures, but
refer to the Architecture specifics below for details. For the following general information we use
the notation R<n> to denote registers in the symcryptasm register naming scheme.
A leaf function (a function which does not call another function) begins with an invocation of the
FUNCTION_START macro which currently takes 3 mandatory and 2 optional arguments:
1) The function name
This must be the name that matches the corresponding declaration of the function
2) The number of arguments (arg_count) that the function takes
These arguments will be accessible in some contiguous region of the symcrypt registers at the
start of the function
On amd64 this contiguous region is R1..R<arg_count>
On arm64 this contiguous region is R0..R<arg_count-1>
Note: arg_count need not correspond to the exact number of argument in the function declaration
if the assembly does not use some tail of the arguments
3) The number of registers (reg_count) that the function uses
These registers will be accessible as R0..R<reg_count-1>
4) Stack allocation size (stack_alloc_size) for local variables (amd64 only)
This parameter is optional, default = 0 if not provided.
Allows reserving space for local variables on the stack. This value will be rounded to nearest
multiple of 8 and the stack pointer will be aligned on 16B boundary. The allocated space does
not include the space needed to save non-volatile registers or shadow space and for function use only.
If the function is not nested, rsp will point to the beginning of the buffer after the prologue.
Otherwise, rsp will point to shadow space, which is right below the allocated buffer on the stack.
In this case, the function has to access the local buffer by using an offset equal to the size of the
shadow space (32B).
5) The number of Xmm registers (xmm_reg_count) that the function uses (amd64 only)
This parameter is optional and can take on values [0,16], default = 0 if not provided.
If a non-zero value is specified, used registers are assumed to be Xmm0..Xmm<xmm_reg_count-1>
When xmm_reg_count > 6, used Xmm registers starting from Xmm6 will be saved on the
stack on amd64 Windows environment.
A leaf function ends with the FUNCTION_END macro, which also takes the function name
(a FUNCTION_END macro's function name must match the preceding FUNCTION_START's name)
At the function start a prologue is generated which arranges the arguments appropriately in
registers, and saves non-volatile registers that have been requested to be used.
At the function end an epilogue is generated which restores the non-volatile registers and returns.
A nested function (a function which does call another function) is specified similarly, only using
NESTED_FUNCTION_START and NESTED_FUNCTION_END macros. A nested function currently updates and aligns
the stack pointer in the function prologue, and avoids use of the redzone in the SystemV ABI.
Nested functions are not currently supported for Arm64.
A macro begins with an invocation of the MACRO_START macro which takes the Macro name, and variable
number of macros argument names. It ends with MACRO_END.
### Architecture specifics ###
### amd64 ###
We allow up to 15 registers to be addressed, with the names:
Q0-Q14 (64-bit registers), D0-D14 (32-bit registers), W0-W14 (16-bit registers), and B0-B14 (8-bit
registers)
Xmm0-Xmm5 registers may be used directly in assembly too, as in both amd64 calling conventions we
currently support, these registers are volatile so do not need any special handling. If the number
of used Xmm registers specified in FUNCTION_START macro is greater than 6, those registers among
Xmm6..Xmm15 are saved/restored on amd64 Windows. There is no save/restore for Xmm16-Xmm31 as of now.
On function entry we insert a prologue which ensures:
Q0 is the result register (the return value of the function, and the low half of a multiplication)
Q1-Q6 are the first 6 arguments passed to the function
Additionally, there is a special case for functions using mul or mulx instructions, as these
instructions make rdx a special register. Functions using these instructions may address Q0-Q13,
and QH. As rdx is used to pass arguments, its value is moved to another register in the function
prologue. The MUL_FUNCTION_START and MUL_FUNCTION_END macros are used in this case.
We currently do not support nested mul functions, as we have none of them.
Stack layout for amd64 is as follows. Xmm registers are volatile and not saved on Linux.
Memory Exists if
|-------------------|
| |
| Shadow space |
| |
|-------------------|
| Return address |
|-------------------|
| Non-volatile |
| general purpose | reg_count > volatile_registers
| registers |
|-------------------|
| Non-volatile |
| Xmm registers | xmm_reg_count > 6 and Windows
| (Windows only) |
|-------------------|---> 16B aligned
| |
| Local variables | stack_alloc_size > 0
| |
|-------------------|---> 16B aligned
| |
| Shadow space | nested
| (nested function) |
|-------------------|---> 16B aligned
### arm64 ###
We allow up to 23 registers to be addressed, with the names:
X_0-X_22 (64-bit registers) and W_0-W_22 (32-bit registers)
v0-v7 ASIMD registers may by used directly in assembly too, as in both arm64 calling conventions we
currently support, these registers are volatile so do not need any special handling
X_0 is always the result register and the first argument passed to the function.
X_1-X_7 are the arguments 2-8 passed to the function
### arm (32) ###
We allow the registers r0-r12 to be addressed as r13-r15 are special registers that we cannot use as general purpose registers.
As r12 is volatile in a leaf function, it should be used in preference to r4, to avoid spilling/restoring a register.
"""
import re
import types
import logging
import os
class Register:
"""A class to represent registers"""
def __init__(self, name64, name32, name16=None, name8=None):
self.name64 = name64
self.name32 = name32
self.name16 = name16
self.name8 = name8
# amd64 registers
AMD64_RAX = Register("rax", "eax", "ax", "al")
AMD64_RBX = Register("rbx", "ebx", "bx", "bl")
AMD64_RCX = Register("rcx", "ecx", "cx", "cl")
AMD64_RDX = Register("rdx", "edx", "dx", "dl")
AMD64_RSI = Register("rsi", "esi", "si", "sil")
AMD64_RDI = Register("rdi", "edi", "di", "dil")
AMD64_RSP = Register("rsp", "esp", "sp", "spl")
AMD64_RBP = Register("rbp", "ebp", "bp", "bpl")
AMD64_R8 = Register( "r8", "r8d", "r8w", "r8b")
AMD64_R9 = Register( "r9", "r9d", "r9w", "r9b")
AMD64_R10 = Register("r10", "r10d", "r10w", "r10b")
AMD64_R11 = Register("r11", "r11d", "r11w", "r11b")
AMD64_R12 = Register("r12", "r12d", "r12w", "r12b")
AMD64_R13 = Register("r13", "r13d", "r13w", "r13b")
AMD64_R14 = Register("r14", "r14d", "r14w", "r14b")
AMD64_R15 = Register("r15", "r15d", "r15w", "r15b")
# arm64 registers
ARM64_R0 = Register( "x0", "w0")
ARM64_R1 = Register( "x1", "w1")
ARM64_R2 = Register( "x2", "w2")
ARM64_R3 = Register( "x3", "w3")
ARM64_R4 = Register( "x4", "w4")
ARM64_R5 = Register( "x5", "w5")
ARM64_R6 = Register( "x6", "w6")
ARM64_R7 = Register( "x7", "w7")
ARM64_R8 = Register( "x8", "w8")
ARM64_R9 = Register( "x9", "w9")
ARM64_R10 = Register("x10", "w10")
ARM64_R11 = Register("x11", "w11")
ARM64_R12 = Register("x12", "w12")
ARM64_R13 = Register("x13", "w13")
ARM64_R14 = Register("x14", "w14")
ARM64_R15 = Register("x15", "w15")
ARM64_R16 = Register("x16", "w16")
ARM64_R17 = Register("x17", "w17")
ARM64_R18 = Register("x18", "w18")
ARM64_R19 = Register("x19", "w19")
ARM64_R20 = Register("x20", "w20")
ARM64_R21 = Register("x21", "w21")
ARM64_R22 = Register("x22", "w22")
ARM64_R23 = Register("x23", "w23")
ARM64_R24 = Register("x24", "w24")
ARM64_R25 = Register("x25", "w25")
ARM64_R26 = Register("x26", "w26")
ARM64_R27 = Register("x27", "w27")
ARM64_R28 = Register("x28", "w28")
ARM64_R29 = Register("x29", "w29") # Frame Pointer
ARM64_R30 = Register("x30", "w30") # Link Register
# arm32 registers
ARM32_R0 = Register(None, "r0")
ARM32_R1 = Register(None, "r1")
ARM32_R2 = Register(None, "r2")
ARM32_R3 = Register(None, "r3")
ARM32_R4 = Register(None, "r4")
ARM32_R5 = Register(None, "r5")
ARM32_R6 = Register(None, "r6")
ARM32_R7 = Register(None, "r7")
ARM32_R8 = Register(None, "r8")
ARM32_R9 = Register(None, "r9")
ARM32_R10 = Register(None, "r10")
ARM32_R11 = Register(None, "r11")
ARM32_R12 = Register(None, "r12")
ARM32_R13 = Register(None, "r13")
ARM32_R14 = Register(None, "r14")
ARM32_R15 = Register(None, "r15")
class CallingConvention:
"""A class to represent calling conventions"""
def __init__(self, name, architecture, mapping, max_arguments, argument_registers, volatile_registers, gen_prologue_fn, gen_epilogue_fn, gen_get_memslot_offset_fn):
self.name = name
self.architecture = architecture
self.mapping = mapping
self.max_arguments = max_arguments
self.argument_registers = argument_registers
self.volatile_registers = volatile_registers
self.gen_prologue_fn = types.MethodType(gen_prologue_fn, self)
self.gen_epilogue_fn = types.MethodType(gen_epilogue_fn, self)
self.gen_get_memslot_offset_fn = types.MethodType(gen_get_memslot_offset_fn, self)
def get_mul_mapping_from_normal_mapping(mapping, argument_registers):
"""Gets the register mapping used in functions requiring special rdx handling.
In amd64, when using mul and mulx, rdx is a special register.
rdx is also used for passing arguments in both Msft and System V calling conventions.
In asm functions that use mul or mulx, we will explicitly move the argument passed in
rdx to a different volatile register in the function prologue, and in the function body
we refer to rdx using (Q|D|W|B)H.
"""
rdx_index = None
return_mapping = { 'H': AMD64_RDX }
for (index, register) in mapping.items():
if register == AMD64_RDX:
rdx_index = index
break
for (index, register) in mapping.items():
# preserve argument registers
if (index <= argument_registers) and (index != rdx_index):
return_mapping[index] = register
# replace rdx with the first non-argument register
if index == argument_registers+1:
return_mapping[rdx_index] = register
# shuffle all later registers down to fill the gap
if index > argument_registers+1:
return_mapping[index-1] = register
return return_mapping
# Microsoft x64 calling convention
MAPPING_AMD64_MSFT = {
0: AMD64_RAX, # Result register / volatile
1: AMD64_RCX, # Argument 1 / volatile
2: AMD64_RDX, # Argument 2 / volatile
3: AMD64_R8, # Argument 3 / volatile
4: AMD64_R9, # Argument 4 / volatile
5: AMD64_R10, # volatile
6: AMD64_R11, # volatile
7: AMD64_RSI, # All registers from rsi are non-volatile and need to be saved/restored in epi/prologue
8: AMD64_RDI,
9: AMD64_RBP,
10:AMD64_RBX,
11:AMD64_R12,
12:AMD64_R13,
13:AMD64_R14,
14:AMD64_R15,
# currently not mapping rsp
}
def calc_amd64_stack_allocation_sizes(self, reg_count, stack_alloc_size, xmm_reg_count, nested):
""" Calculate the sizes of different regions on the stack for adjusting the stack pointer
in the function prologue/epilogue accordingly.
Given the number of general purpose and Xmm registers used by a function and the amount of
local buffer it requires, this function calculates and returns as a tuple:
1 - reg_save_size : space required to save general purpose registers
2 - xmm_save_size : space required to save Xmm registers if any, including 16B alignment
padding if necessary
3 - stack_alloc_aligned_size : space required to store local variables based on the requested
buffer size in stack_alloc_size, rounded to multiple of 8 and 16B aligned
4 - shadow_space_allocation_size : space required for shadow store/home location if the function
is nested, including 16B alignment padding if necesssary
"""
# Keep track of stack alignment during each step
# We assume rsp % 16 is either 0 or 8 all the time
# Initially we have rsp % 16 = 8 because return address is pushed on 16B aligned stack
aligned_on_16B = False
# Calculate the space needed to save general purpose registers on the stack
saved_reg_gp = 0 if reg_count <= self.volatile_registers else (reg_count - self.volatile_registers)
reg_save_size = 8 * saved_reg_gp
if saved_reg_gp % 2:
aligned_on_16B = True
# Calculate the space needed to save Xmm registers on the stack
saved_reg_xmm = 0 if xmm_reg_count <= 6 else (xmm_reg_count - 6)
xmm_save_size = 16 * saved_reg_xmm
if xmm_save_size > 0 and not aligned_on_16B:
xmm_save_size += 8
aligned_on_16B = True
# Calculate space needed for local buffer
# Round the requested buffer size to multiple of 8 and align it on 16B boundary
stack_alloc_aligned_size = 0
if stack_alloc_size > 0:
stack_alloc_qwords = (stack_alloc_size + 7) // 8
stack_alloc_adjusted_size = 8 * stack_alloc_qwords
stack_alloc_aligned_size = stack_alloc_adjusted_size
if aligned_on_16B ^ ((stack_alloc_aligned_size % 16) == 0):
stack_alloc_aligned_size += 8
aligned_on_16B = True
# If we are a nested function, we need to align the stack to 16B, and allocate space for up to 4
# memory slots not in the redzone. We can use the same logic as on the MSFT x64 side to allocate
# our own space for 32B of local variables (whereas on the MSFT side, we use this for allocating
# space for a function we are about to call)
shadow_space_allocation_size = 32 + 8 * (not aligned_on_16B) if nested else 0
return reg_save_size, xmm_save_size, stack_alloc_aligned_size, shadow_space_allocation_size
def gen_prologue_amd64_msft(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, mul_fixup="", nested=False):
prologue = "\n"
# Calculate the sizes of the buffers needed for saving registers, local variable buffer and shadow space.
# Each of the sections other than general purpose registers are aligned on 16B boundary and some of them
# may include an 8B padding in their size.
reg_save_size, xmm_save_size, stack_alloc_aligned_size, shadow_space_allocation_size = calc_amd64_stack_allocation_sizes(
self, reg_count, stack_alloc_size, xmm_reg_count, nested)
# Save general purpose registers
if reg_count > self.volatile_registers:
prologue += "rex_push_reg Q%s\n" % self.volatile_registers
for i in range(self.volatile_registers+1, reg_count):
prologue += "push_reg Q%s\n" % i
# Allocate space on the stack
stack_total_size = xmm_save_size + stack_alloc_aligned_size + shadow_space_allocation_size
if stack_total_size > 0:
prologue += "alloc_stack %d\n" % stack_total_size
# Save Xmm registers
if xmm_save_size > 0:
for i in range(6, xmm_reg_count):
prologue += "save_xmm128 xmm%d, %d\n" % (i, shadow_space_allocation_size + stack_alloc_aligned_size + 16 * (i - 6))
if prologue != "\n":
prologue += "\nEND_PROLOGUE\n\n"
prologue += mul_fixup
# put additional arguments into Q5-Q6 (we do not support more than 6 arguments for now)
# stack_offset to get the 5th argument is:
# 32B of shadow space + 8B return address + (8*#pushed registers in prologue) + (16*#saved xmm registers in prologue) + local buffer size + shadow_space_allocation_size
stack_offset = 32 + 8 + reg_save_size + xmm_save_size + stack_alloc_aligned_size + shadow_space_allocation_size
for i in range(self.argument_registers+1, arg_count+1):
prologue += "mov Q%s, [rsp + %d]\n" % (i, stack_offset)
stack_offset += 8
return prologue
def gen_prologue_amd64_msft_mul(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_prologue_amd64_msft(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, mul_fixup = "mov Q2, QH\n", nested = False)
def gen_prologue_amd64_msft_nested(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_prologue_amd64_msft(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, mul_fixup = "", nested = True)
def gen_epilogue_amd64_msft(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested = False):
epilogue = "\n"
reg_save_size, xmm_save_size, stack_alloc_aligned_size, shadow_space_allocation_size = calc_amd64_stack_allocation_sizes(
self, reg_count, stack_alloc_size, xmm_reg_count, nested)
# Restore non-volatile Xmm registers
if xmm_save_size > 0:
for i in range(6, xmm_reg_count):
epilogue += "movdqa xmm%d, xmmword ptr [rsp + %d]\n" % (i, shadow_space_allocation_size + stack_alloc_aligned_size + 16 * (i - 6))
# Restore stack pointer
stack_total_size = xmm_save_size + stack_alloc_aligned_size + shadow_space_allocation_size
if stack_total_size > 0:
epilogue += "add rsp, %d\n" % stack_total_size
epilogue += "BEGIN_EPILOGUE\n"
if reg_count > self.volatile_registers:
for i in reversed(range(self.volatile_registers, reg_count)):
epilogue += "pop Q%s\n" % i
epilogue += "ret\n"
return epilogue
def gen_epilogue_amd64_msft_nested(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_epilogue_amd64_msft(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested = True)
def gen_get_memslot_offset_amd64_msft(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested = False):
# only support 4 memory slots for now (in shadow space)
if(slot >= 4):
logging.error("symcryptasm currently only support 4 memory slots! (requested slot%d)" % slot)
exit(1)
# 8B for return address + (8*#pushed registers in prologue) + (16*#saved XMM registers) + local buffer size + shadow space
reg_save_size, xmm_save_size, stack_alloc_aligned_size, shadow_space_allocation_size = calc_amd64_stack_allocation_sizes(
self, reg_count, stack_alloc_size, xmm_reg_count, nested)
stack_offset = 8 + reg_save_size + xmm_save_size + stack_alloc_aligned_size + shadow_space_allocation_size
return "%d /*MEMSLOT%d*/" % (stack_offset+(8*slot), slot)
def gen_get_memslot_offset_amd64_msft_nested(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_get_memslot_offset_amd64_msft(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested = True)
CALLING_CONVENTION_AMD64_MSFT = CallingConvention(
"msft_x64", "amd64", MAPPING_AMD64_MSFT, 6, 4, 7,
gen_prologue_amd64_msft, gen_epilogue_amd64_msft, gen_get_memslot_offset_amd64_msft)
CALLING_CONVENTION_AMD64_MSFT_MUL = CallingConvention(
"msft_x64", "amd64", get_mul_mapping_from_normal_mapping(MAPPING_AMD64_MSFT, 4), 6, 4, 6,
gen_prologue_amd64_msft_mul, gen_epilogue_amd64_msft, gen_get_memslot_offset_amd64_msft)
CALLING_CONVENTION_AMD64_MSFT_NESTED = CallingConvention(
"msft_x64", "amd64", MAPPING_AMD64_MSFT, 6, 4, 7,
gen_prologue_amd64_msft_nested, gen_epilogue_amd64_msft_nested, gen_get_memslot_offset_amd64_msft_nested)
# AMD64 System V calling convention
MAPPING_AMD64_SYSTEMV = {
0: AMD64_RAX, # Result register / volatile
1: AMD64_RDI, # Argument 1 / volatile
2: AMD64_RSI, # Argument 2 / volatile
3: AMD64_RDX, # Argument 3 / volatile
4: AMD64_RCX, # Argument 4 / volatile
5: AMD64_R8, # Argument 5 / volatile
6: AMD64_R9, # Argument 6 / volatile
7: AMD64_R10, # volatile
8: AMD64_R11, # volatile
9: AMD64_RBX, # All registers from rbx are non-volatile and need to be saved/restored in epi/prologue
10:AMD64_RBP,
11:AMD64_R12,
12:AMD64_R13,
13:AMD64_R14,
14:AMD64_R15
# currently not mapping rsp
}
def gen_prologue_amd64_systemv(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, mul_fixup = "", nested = False):
# Calculate the sizes required for each section
# We need to call with xmm_reg_count=0 to avoid allocation/alignment for saving Xmm registers since they're
# volatile for this calling convention.
reg_save_size, xmm_save_size, stack_alloc_aligned_size, shadow_space_allocation_size = calc_amd64_stack_allocation_sizes(
self, reg_count, stack_alloc_size, 0, nested)
# push volatile registers onto the stack
prologue = "\n"
if reg_count > self.volatile_registers:
for i in range(self.volatile_registers, reg_count):
prologue += "push Q%s\n" % i
# update stack pointer if local buffer size is nonzero or shadow space exists
if stack_alloc_aligned_size + shadow_space_allocation_size > 0:
prologue += "sub rsp, %s // allocate local buffer, memslot space and align stack\n\n" % str(stack_alloc_aligned_size + shadow_space_allocation_size)
prologue += mul_fixup
# do not support more than 6 arguments for now
# # put additional arguments into Q7-Qn
# # stack_offset to get the 7th argument is:
# # 8B for return address
# stack_offset = 8
# for i in range(self.argument_registers+1, arg_count+1):
# prologue += "mov Q%s, [rsp + %d]\n" % (i, stack_offset)
# stack_offset += 8
return prologue
def gen_prologue_amd64_systemv_mul(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_prologue_amd64_systemv(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, mul_fixup = "mov Q3, QH\n", nested = False)
def gen_prologue_amd64_systemv_nested(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_prologue_amd64_systemv(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, mul_fixup = "", nested = True)
def gen_epilogue_amd64_systemv(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested = False):
epilogue = ""
# Calculate the sizes required for each section
# We need to call with xmm_reg_count=0 to avoid allocation/alignment for saving Xmm registers since they're
# volatile for this calling convention.
reg_save_size, xmm_save_size, stack_alloc_aligned_size, shadow_space_allocation_size = calc_amd64_stack_allocation_sizes(
self, reg_count, stack_alloc_size, 0, nested)
# update stack pointer if local buffer size is nonzero or shadow space exists
if stack_alloc_aligned_size + shadow_space_allocation_size > 0:
epilogue += "add rsp, %s // deallocate local buffer, memslot space and align stack\n\n" % str(stack_alloc_aligned_size + shadow_space_allocation_size)
if reg_count > self.volatile_registers:
for i in reversed(range(self.volatile_registers, reg_count)):
epilogue += "pop Q%s\n" % i
epilogue += "ret\n"
return epilogue
def gen_epilogue_amd64_systemv_nested(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_epilogue_amd64_systemv(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested=True)
def gen_get_memslot_offset_amd64_systemv(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested = False):
# only support 4 memory slots for now
if(slot >= 4):
logging.error("symcryptasm currently only support 4 memory slots! (requested slot%d)" % slot)
exit(1)
# For leaf functions, use the top of the redzone below the stack pointer
offset = -8 * (slot+1)
if nested:
# For nested functions, use the 32B of memslot space above the stack pointer created in the prologue
offset = 8*slot
return "%d /*MEMSLOT%d*/" % (offset, slot)
def gen_get_memslot_offset_amd64_systemv_nested(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
return gen_get_memslot_offset_amd64_systemv(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested=True)
CALLING_CONVENTION_AMD64_SYSTEMV = CallingConvention(
"amd64_systemv", "amd64", MAPPING_AMD64_SYSTEMV, 6, 6, 9,
gen_prologue_amd64_systemv, gen_epilogue_amd64_systemv, gen_get_memslot_offset_amd64_systemv)
CALLING_CONVENTION_AMD64_SYSTEMV_MUL = CallingConvention(
"amd64_systemv", "amd64", get_mul_mapping_from_normal_mapping(MAPPING_AMD64_SYSTEMV, 6), 6, 6, 8,
gen_prologue_amd64_systemv_mul, gen_epilogue_amd64_systemv, gen_get_memslot_offset_amd64_systemv)
CALLING_CONVENTION_AMD64_SYSTEMV_NESTED = CallingConvention(
"amd64_systemv", "amd64", MAPPING_AMD64_SYSTEMV, 6, 6, 9,
gen_prologue_amd64_systemv_nested, gen_epilogue_amd64_systemv_nested, gen_get_memslot_offset_amd64_systemv_nested)
# ARM64 calling conventions
MAPPING_ARM64_AAPCS64 = {
0: ARM64_R0, # Argument 1 / Result register / volatile
1: ARM64_R1, # Argument 2 / volatile
2: ARM64_R2, # Argument 3 / volatile
3: ARM64_R3, # Argument 4 / volatile
4: ARM64_R4, # Argument 5 / volatile
5: ARM64_R5, # Argument 6 / volatile
6: ARM64_R6, # Argument 7 / volatile
7: ARM64_R7, # Argument 8 / volatile
8: ARM64_R8, # Indirect result location / volatile
9: ARM64_R9, # volatile
10:ARM64_R10, # volatile
11:ARM64_R11, # volatile
12:ARM64_R12, # volatile
13:ARM64_R13, # volatile
14:ARM64_R14, # volatile
15:ARM64_R15, # volatile
# R16 and R17 are intra-procedure-call temporary registers which may be used by the linker
# We cannot use these registers for local scratch if we call out to arbitrary procedures, but
# currently we only have leaf functions in Arm64 symcryptasm.
16:ARM64_R16, # IP0 / volatile
17:ARM64_R17, # IP1 / volatile
# R18 is a platform register which has a special meaning in kernel mode - we do not use it
18:ARM64_R19, # non-volatile
19:ARM64_R20, # non-volatile
20:ARM64_R21, # non-volatile
21:ARM64_R22, # non-volatile
22:ARM64_R23, # non-volatile
# We could map more registers (R24-R28) but we can only support 23 registers for ARM64EC, and we
# don't use this many registers in any symcryptasm yet
}
MAPPING_ARM64_ARM64ECMSFT = {
0: ARM64_R0, # Argument 1 / Result register / volatile
1: ARM64_R1, # Argument 2 / volatile
2: ARM64_R2, # Argument 3 / volatile
3: ARM64_R3, # Argument 4 / volatile
4: ARM64_R4, # Argument 5 / volatile
5: ARM64_R5, # Argument 6 / volatile
6: ARM64_R6, # Argument 7 / volatile
7: ARM64_R7, # Argument 8 / volatile
8: ARM64_R8, # Indirect result location / volatile
9: ARM64_R9, # volatile
10:ARM64_R10, # volatile
11:ARM64_R11, # volatile
12:ARM64_R12, # volatile
# R13 and R14 are reserved in ARM64EC
13:ARM64_R15, # volatile
14:ARM64_R16, # volatile
15:ARM64_R17, # volatile
16:ARM64_R19, # non-volatile
17:ARM64_R20, # non-volatile
18:ARM64_R21, # non-volatile
19:ARM64_R22, # non-volatile
# R23 and R24 are reserved in ARM64EC
20:ARM64_R25, # non-volatile
21:ARM64_R26, # non-volatile
22:ARM64_R27, # non-volatile
# R28 is reserved in ARM64EC
}
# ARM32 calling convention
# A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6).
MAPPING_ARM32_AAPCS32 = {
0: ARM32_R0, # Argument 1 / Result register / volatile
1: ARM32_R1, # Argument 2 / Result register / volatile
2: ARM32_R2, # Argument 3 / volatile
3: ARM32_R3, # Argument 4 / volatile
4: ARM32_R4, # non-volatile
5: ARM32_R5, # non-volatile
6: ARM32_R6, # non-volatile
7: ARM32_R7, # non-volatile
8: ARM32_R8, # non-volatile
9: ARM32_R9, # reserved for something
10:ARM32_R10, # non-volatile
11:ARM32_R11, # FP non-volatile
12:ARM32_R12, # volatile for leaf functions
13:ARM32_R13, # SP
14:ARM32_R14, # LR
15:ARM32_R15, # PC
}
def gen_prologue_aapcs32(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
assert(not stack_alloc_size and not xmm_reg_count)
prologue = ""
# Always spill at least 1 register (LR).
# LR needs to be saved for nested functions but for now we'll store it always
# since we don't differentiate between nested and leaf functions for arm yet.
registers_to_spill = []
logging.info(f"prologue {reg_count} > {self.volatile_registers}")
if reg_count > self.volatile_registers:
for i in range(self.volatile_registers, reg_count):
registers_to_spill.append('r%s' % i)
# Stack pointer is word 4B aligned
# required_stack_space = 4 * len(registers_to_spill)
registers_to_spill.append('lr')
prologue += "push {" + ",".join(registers_to_spill) + "}\n"
return prologue
def gen_epilogue_aapcs32(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
assert(not stack_alloc_size and not xmm_reg_count)
epilogue = ""
registers_to_spill = []
logging.info(f"epilogue {reg_count} > {self.volatile_registers}")
if reg_count > self.volatile_registers:
for i in range(self.volatile_registers, reg_count):
registers_to_spill.append('r%s' % i)
# Stack pointer is word 4B aligned
# required_stack_space = 4 * len(registers_to_spill)
registers_to_spill.append('pc')
epilogue += "pop {" + ",".join(registers_to_spill) + "}\n"
return epilogue
def gen_prologue_arm64_armasm64(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
prologue = ""
if reg_count > self.volatile_registers:
# Calculate required stack space
# If we allocate stack space we must spill fp and lr, so we always spill at least 2 registers
registers_to_spill = 2 + reg_count - self.volatile_registers
# Stack pointer remain 16B aligned, so round up to the nearest multiple of 16B
required_stack_space = 16 * ((registers_to_spill + 1) // 2)
prologue += " PROLOG_SAVE_REG_PAIR fp, lr, #-%d! // allocate %d bytes of stack; store FP/LR\n" % (required_stack_space, required_stack_space)
stack_offset = 16
for i in range(self.volatile_registers, reg_count-1, 2):
prologue += " PROLOG_SAVE_REG_PAIR X_%d, X_%d, #%d\n" % (i, i+1, stack_offset)
stack_offset += 16
if registers_to_spill % 2 == 1:
prologue += " PROLOG_SAVE_REG X_%d, #%d\n" % (reg_count-1, stack_offset)
return prologue
def gen_epilogue_arm64_armasm64(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
epilogue = ""
if reg_count > self.volatile_registers:
# Calculate required stack space
# If we allocate stack space we must spill fp and lr, so we always spill at least 2 registers
registers_to_spill = 2 + reg_count - self.volatile_registers
# Stack pointer remain 16B aligned, so round up to the nearest multiple of 16B
required_stack_space = 16 * ((registers_to_spill + 1) // 2)
stack_offset = required_stack_space-16
if registers_to_spill % 2 == 1:
epilogue += " EPILOG_RESTORE_REG X_%d, #%d\n" % (reg_count-1, stack_offset)
stack_offset -= 16
for i in reversed(range(self.volatile_registers, reg_count-1, 2)):
epilogue += " EPILOG_RESTORE_REG_PAIR X_%d, X_%d, #%d\n" % (i, i+1, stack_offset)
stack_offset -= 16
epilogue += " EPILOG_RESTORE_REG_PAIR fp, lr, #%d! // deallocate %d bytes of stack; restore FP/LR\n" % (required_stack_space, required_stack_space)
epilogue += " EPILOG_RETURN\n"
else:
epilogue += " ret\n"
return epilogue
def gen_prologue_arm64_gas(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
prologue = ""
if reg_count > self.volatile_registers:
# Calculate required stack space
# If we allocate stack space we must spill fp and lr, so we always spill at least 2 registers
registers_to_spill = 2 + reg_count - self.volatile_registers
# Stack pointer remain 16B aligned, so round up to the nearest multiple of 16B
required_stack_space = 16 * ((registers_to_spill + 1) // 2)
prologue += " stp fp, lr, [sp, #-%d]! // allocate %d bytes of stack; store FP/LR\n" % (required_stack_space, required_stack_space)
stack_offset = 16
for i in range(self.volatile_registers, reg_count-1, 2):
prologue += " stp X_%d, X_%d, [sp, #%d]\n" % (i, i+1, stack_offset)
stack_offset += 16
if registers_to_spill % 2 == 1:
prologue += " str X_%d, [sp, #%d]\n" % (reg_count-1, stack_offset)
return prologue
def gen_epilogue_arm64_gas(self, arg_count, reg_count, stack_alloc_size, xmm_reg_count):
epilogue = ""
if reg_count > self.volatile_registers:
# Calculate required stack space
# If we allocate stack space we must spill fp and lr, so we always spill at least 2 registers
registers_to_spill = 2 + reg_count - self.volatile_registers
# Stack pointer remain 16B aligned, so round up to the nearest multiple of 16B
required_stack_space = 16 * ((registers_to_spill + 1) // 2)
stack_offset = required_stack_space-16
if registers_to_spill % 2 == 1:
epilogue += " ldr X_%d, [sp, #%d]\n" % (reg_count-1, stack_offset)
stack_offset -= 16
for i in reversed(range(self.volatile_registers, reg_count-1, 2)):
epilogue += " ldp X_%d, X_%d, [sp, #%d]\n" % (i, i+1, stack_offset)
stack_offset -= 16
epilogue += " ldp fp, lr, [sp], #%d // deallocate %d bytes of stack; restore FP/LR\n" % (required_stack_space, required_stack_space)
epilogue += " ret\n"
return epilogue
def gen_get_memslot_offset_arm64(self, slot, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested=False):
logging.error("symcryptasm currently does not support memory slots for arm64!")
exit(1)
CALLING_CONVENTION_ARM64_AAPCS64_ARMASM64 = CallingConvention(
"arm64_aapcs64", "arm64", MAPPING_ARM64_AAPCS64, 8, 8, 18,
gen_prologue_arm64_armasm64, gen_epilogue_arm64_armasm64, gen_get_memslot_offset_arm64)
CALLING_CONVENTION_ARM64_AAPCS64_GAS = CallingConvention(
"arm64_aapcs64", "arm64", MAPPING_ARM64_AAPCS64, 8, 8, 18,
gen_prologue_arm64_gas, gen_epilogue_arm64_gas, gen_get_memslot_offset_arm64)
CALLING_CONVENTION_ARM64EC_MSFT = CallingConvention(
"arm64ec_msft", "arm64", MAPPING_ARM64_ARM64ECMSFT, 8, 8, 16,
gen_prologue_arm64_armasm64, gen_epilogue_arm64_armasm64, gen_get_memslot_offset_arm64)
CALLING_CONVENTION_ARM32_AAPCS32 = CallingConvention(
"arm32_aapcs32", "arm32", MAPPING_ARM32_AAPCS32, 4, 4, 4,
gen_prologue_aapcs32, gen_epilogue_aapcs32, gen_get_memslot_offset_arm64)
def gen_function_defines(architecture, mapping, arg_count, reg_count, start=True):
defines = ""
if architecture == "amd64":
prefix64 = "Q"
prefix32 = "D"
prefix16 = "W"
prefix8 = "B"
elif architecture == "arm64":
prefix64 = "X_"
prefix32 = "W_"
elif architecture == "arm32":
return defines
else:
logging.error("Unhandled architecture (%s) in gen_function_defines" % architecture)
exit(1)
for (index, reg) in mapping.items():
if (index != 'H') and (index >= max(arg_count+1, reg_count)):
continue
if start:
if (reg.name64 is not None):
defines += "#define %s%s %s\n" % (prefix64, index, reg.name64)
if (reg.name32 is not None):
defines += "#define %s%s %s\n" % (prefix32, index, reg.name32)
if (reg.name16 is not None):
defines += "#define %s%s %s\n" % (prefix16, index, reg.name16)
if (reg.name8 is not None):
defines += "#define %s%s %s\n" % (prefix8, index, reg.name8)
else:
if (reg.name64 is not None):
defines += "#undef %s%s\n" % (prefix64, index)
if (reg.name32 is not None):
defines += "#undef %s%s\n" % (prefix32, index)
if (reg.name16 is not None):
defines += "#undef %s%s\n" % (prefix16, index)
if (reg.name8 is not None):
defines += "#undef %s%s\n" % (prefix8, index)
return defines
def gen_function_start_defines(architecture, mapping, arg_count, reg_count):
return gen_function_defines(architecture, mapping, arg_count, reg_count, start=True)
def gen_function_end_defines(architecture, mapping, arg_count, reg_count):
return gen_function_defines(architecture, mapping, arg_count, reg_count, start=False)
def replace_identifier(arg, newarg, line):
"""Replaces all instances of identifier arg with newarg in string line."""
argmatch = r"(^|[^a-zA-Z0-9_])" + arg + r"(?![a-zA-Z0-9_])"
line = re.sub(argmatch, r"\1" + newarg, line)
return line
MASM_FRAMELESS_FUNCTION_ENTRY = "LEAF_ENTRY %s"
MASM_FRAMELESS_FUNCTION_END = "LEAF_END %s"
MASM_FRAME_FUNCTION_ENTRY = "NESTED_ENTRY %s"
MASM_FRAME_FUNCTION_END = "NESTED_END %s"
# MASM function macros takes the text area as an argument
MASM_FUNCTION_TEMPLATE = "%s, _TEXT\n"
# ARMASM64 function macros must be correctly indented
ARMASM64_FUNCTION_TEMPLATE = " %s\n"
GAS_FUNCTION_ENTRY = "%s: .global %s\n.type %s, %%function\n// .func %s\n"
GAS_FUNCTION_END = "// .endfunc // %s"
def generate_prologue(assembler, calling_convention, function_name, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested):
function_entry = None
if assembler in ["masm", "armasm64"]:
# need to identify and mark up frame functions in masm and armasm64
# for masm we also consider Xmm register and local buffer use (stack_alloc_size)
if assembler == "masm" and (nested or (reg_count > calling_convention.volatile_registers) or (xmm_reg_count > 6) or (stack_alloc_size > 0)):
function_entry = MASM_FRAME_FUNCTION_ENTRY % (function_name)
elif nested or (reg_count > calling_convention.volatile_registers):
function_entry = MASM_FRAME_FUNCTION_ENTRY % (function_name)
else:
function_entry = MASM_FRAMELESS_FUNCTION_ENTRY % (function_name)
if assembler == "masm":
function_entry = MASM_FUNCTION_TEMPLATE % function_entry
elif assembler == "armasm64":
function_entry = ARMASM64_FUNCTION_TEMPLATE % function_entry
elif assembler == "gas":
function_entry = GAS_FUNCTION_ENTRY % (function_name, function_name, function_name, function_name)
else:
logging.error("Unhandled assembler (%s) in generate_prologue" % assembler)
exit(1)
prologue = gen_function_start_defines(calling_convention.architecture, calling_convention.mapping, arg_count, reg_count)
prologue += "%s" % (function_entry)
prologue += calling_convention.gen_prologue_fn(arg_count, reg_count, stack_alloc_size, xmm_reg_count)
return prologue
def generate_epilogue(assembler, calling_convention, function_name, arg_count, reg_count, stack_alloc_size, xmm_reg_count, nested):
function_end = None
if assembler in ["masm", "armasm64"]:
# need to identify and mark up frame functions in masm
# for masm we also consider Xmm register and local buffer use (stack_alloc_size)
if assembler == "masm" and (nested or (reg_count > calling_convention.volatile_registers) or (xmm_reg_count > 6) or (stack_alloc_size > 0)):
function_end = MASM_FRAME_FUNCTION_END % (function_name)
elif nested or (reg_count > calling_convention.volatile_registers):
function_end = MASM_FRAME_FUNCTION_END % (function_name)
else:
function_end = MASM_FRAMELESS_FUNCTION_END % (function_name)
if assembler == "masm":
function_end = MASM_FUNCTION_TEMPLATE % function_end
elif assembler == "armasm64":
function_end = ARMASM64_FUNCTION_TEMPLATE % function_end
elif assembler == "gas":
function_end = GAS_FUNCTION_END % function_name
else:
logging.error("Unhandled assembler (%s) in generate_epilogue" % assembler)
exit(1)
epilogue = calling_convention.gen_epilogue_fn(arg_count, reg_count, stack_alloc_size, xmm_reg_count)
epilogue += "%s" % (function_end)
epilogue += gen_function_end_defines(calling_convention.architecture, calling_convention.mapping, arg_count, reg_count)
return epilogue
MASM_MACRO_START = "%s MACRO %s\n"
MASM_MACRO_END = "ENDM\n"
ARMASM64_MACRO_START= " MACRO\n %s %s"
ARMASM64_MACRO_END = " MEND\n"
GAS_MACRO_START = ".macro %s %s\n"
GAS_MACRO_END = ".endm\n"
MASM_ALTERNATE_ENTRY= "ALTERNATE_ENTRY %s\n"
GAS_ALTERNATE_ENTRY = "%s: .global %s\n"
ARMASM64_ALTERNATE_ENTRY= " ALTERNATE_ENTRY %s\n"
FUNCTION_START_PATTERN = re.compile(r"\s*(NESTED_)?(MUL_)?FUNCTION_START\s*\(\s*([a-zA-Z0-9_\(\)]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*(,\s*[0-9\*\+\-]+)?\s*(,\s*[0-9]+)?\s*\)")
FUNCTION_END_PATTERN = re.compile(r"\s*(NESTED_)?(MUL_)?FUNCTION_END\s*\(\s*([a-zA-Z0-9_\(\)]+)\s*\)")
GET_MEMSLOT_PATTERN = re.compile(r"GET_MEMSLOT_OFFSET\s*\(\s*slot([0-9]+)\s*\)")
ALTERNATE_ENTRY_PATTERN = re.compile(r"\s*ALTERNATE_ENTRY\s*\(\s*([a-zA-Z0-9_\(\)]+)\s*\)")
MACRO_START_PATTERN = re.compile(r"\s*MACRO_START\s*\(\s*([A-Z_0-9]+)\s*,([^\)]+)\)")
MACRO_END_PATTERN = re.compile(r"\s*MACRO_END\s*\(\s*\)")
INCLUDE_PATTERN = re.compile(r"\s*INCLUDE\s*\(\s*([^\s]+)\s*\)")
class ProcessingStateMachine:
"""A class to hold the state when processing a file and handle files line by line"""
def __init__(self, assembler, normal_calling_convention, mul_calling_convention, nested_calling_convention):
self.assembler = assembler
self.normal_calling_convention = normal_calling_convention
self.mul_calling_convention = mul_calling_convention
self.nested_calling_convention = nested_calling_convention
self.function_start_match = None
self.function_start_line = 0
self.is_nested_function = None
self.is_mul_function = None
self.calling_convention = None
self.function_name = None
self.arg_count = None
self.reg_count = None
self.stack_alloc_size = None
self.xmm_reg_count = None
self.macro_start_match = None
self.macro_name = None
self.macro_args = None
def process_line(self, line, line_num):
if self.function_start_match == None and self.macro_start_match == None:
return self.process_normal_line(line, line_num)
elif self.function_start_match != None:
return self.process_function_line(line, line_num)
elif self.macro_start_match != None:
return self.process_macro_line(line, line_num)
else:
logging.error("Whoops, something is broken with the state machine (failed at line %d)" % line_num)
exit(1)
def process_normal_line(self, line, line_num):
# Not currently in a function or macro
match = FUNCTION_START_PATTERN.match(line)
if (match):
return self.process_start_function(match, line, line_num)
match = MACRO_START_PATTERN.match(line)
if (match):
return self.process_start_macro(match, line, line_num)
# Not starting a function or a macro
return line
def process_start_function(self, match, line, line_num):
# Entering a new function
self.function_start_match = match
self.function_start_line = line_num
self.is_nested_function = (match.group(1) == "NESTED_")
self.is_mul_function = (match.group(2) == "MUL_")
self.function_name = match.group(3)
self.arg_count = int(match.group(4))
self.reg_count = int(match.group(5))
# last two parameters are optional and their corresponding capturing groups will be None if not supplied
self.stack_alloc_size = 0 if not match.group(6) else eval(match.group(6).strip(", \'\""))
self.xmm_reg_count = 0 if not match.group(7) else int(match.group(7).strip(", "))
if self.is_nested_function and self.nested_calling_convention is None:
logging.error(
"symcryptasm nested functions are not currently supported with assembler (%s) and architecture (%s)!\n\t"
"%s (line %d)"
% (self.assembler, self.normal_calling_convention.architecture, line, line_num))
exit(1)
if self.is_mul_function and self.mul_calling_convention is None:
logging.error(
"symcryptasm mul functions are not supported with assembler (%s) and architecture (%s)!\n\t"
"%s (line %d)"
% (self.assembler, self.normal_calling_convention.architecture, line, line_num))
exit(1)
if self.is_nested_function and self.is_mul_function:
logging.error(
"Too many prefixes for symcryptasm function - currently only 1 of prefix, MUL_ or NESTED_, is supported!\n\t"
"%s (line %d)"
% (line, line_num))
exit(1)
if self.arg_count > self.normal_calling_convention.max_arguments:
logging.error(
"Too many (%d) arguments for symcryptasm function - only %d arguments are supported by calling convention (%s)\n\t"
"%s (line %d)"
% (self.arg_count, self.normal_calling_convention.max_arguments, self.normal_calling_convention.name, match.group(0), line_num))
exit(1)
if self.reg_count > len(self.normal_calling_convention.mapping):
logging.error(
"Too many (%d) registers required for symcryptasm function - only %d registers are mapped by calling convention (%s)\n\t"
"%s (line %d)"
% (self.reg_count, len(self.normal_calling_convention.mapping), self.normal_calling_convention.name, match.group(0), line_num))
exit(1)
if self.is_mul_function and self.reg_count > len(self.mul_calling_convention.mapping)-1:
logging.error(
"Too many (%d) registers required for symcryptasm mul function - only %d registers are mapped by calling convention (%s)\n\t"
"%s (line %d)"
% (self.reg_count, len(self.mul_calling_convention.mapping)-1, self.mul_calling_convention.name, match.group(0), line_num))
exit(1)
if not 0 <= self.xmm_reg_count <= 16:
logging.error(
"Invalid number of used XMM registers (%d) specified for calling convention (%s) - must be in range [0,16]\n\t"
"%s (line %d)"
% (self.xmm_reg_count, self.mul_calling_convention.name, match.group(6), line_num))
exit(1)
logging.info("%d: function start %s, %d, %d" % (line_num, self.function_name, self.arg_count, self.reg_count))
if self.is_nested_function:
self.calling_convention = self.nested_calling_convention
elif self.is_mul_function:
self.calling_convention = self.mul_calling_convention
else:
self.calling_convention = self.normal_calling_convention
return generate_prologue(self.assembler,
self.calling_convention,
self.function_name,
self.arg_count,
self.reg_count,
self.stack_alloc_size,
self.xmm_reg_count,
self.is_nested_function
)
def process_start_macro(self, match, line, line_num):
self.macro_start_match = match
self.macro_name = match.group(1)
self.macro_args = [ x.strip() for x in match.group(2).split(",") ]
logging.info("%d: macro start %s, %s" % (line_num, self.macro_name, self.macro_args))
if self.assembler == "masm":
return MASM_MACRO_START % (self.macro_name, match.group(2))
elif self.assembler == "gas":
return GAS_MACRO_START % (self.macro_name, match.group(2))
elif self.assembler == "armasm64":
# In armasm64 we need to escape all macro arguments with $
prefixed_args = ", $".join(self.macro_args)
if prefixed_args:
prefixed_args = "$" + prefixed_args
return ARMASM64_MACRO_START % (self.macro_name, prefixed_args)
else:
logging.error("Unhandled assembler (%s) in process_start_macro" % self.assembler)
exit(1)
def process_function_line(self, line, line_num):
# Currently in a function
match = ALTERNATE_ENTRY_PATTERN.match(line)
if (match):
if self.assembler == "masm":
return MASM_ALTERNATE_ENTRY % match.group(1)
elif self.assembler == "gas":
return GAS_ALTERNATE_ENTRY % (match.group(1), match.group(1))
elif self.assembler == "armasm64":
return ARMASM64_ALTERNATE_ENTRY % match.group(1)
else:
logging.error("Unhandled assembler (%s) in process_function_line" % self.assembler)
exit(1)
match = FUNCTION_END_PATTERN.match(line)
if (match):
# Check the end function has same prefix as previous start function
if (self.is_nested_function ^ (match.group(1) == "NESTED_")) or \
(self.is_mul_function ^ (match.group(2) == "MUL_")):
logging.error("Function start and end do not have same MUL_ or NESTED_ prefix!\n\tStart: %s (line %d)\n\tEnd: %s (line %d)" \
% (self.function_start_match.group(0), self.function_start_line, match.group(0), line_num))
exit(1)
# Check the end function pattern has the same label as the previous start function pattern
if self.function_name != match.groups()[-1]:
logging.error("Function start label does not match Function end label!\n\tStart: %s (line %d)\n\tEnd: %s (line %d)" \
% (self.function_name, self.function_start_line, match.groups()[-1], line_num))
exit(1)
epilogue = generate_epilogue(self.assembler, self.calling_convention, self.function_name, self.arg_count, self.reg_count, self.stack_alloc_size, self.xmm_reg_count, self.is_nested_function)
logging.info("%d: function end %s" % (line_num, self.function_name))
self.function_start_match = None
self.function_start_line = 0
self.is_nested_function = None
self.is_mul_function = None
self.calling_convention = None
self.function_name = None
self.arg_count = None
self.reg_count = None
self.stack_alloc_size = None
self.xmm_reg_count = None
return epilogue
# replace any GET_MEMSLOT_OFFSET macros in line
match = GET_MEMSLOT_PATTERN.search(line)
while(match):
slot = int(match.group(1))
replacement = self.calling_convention.gen_get_memslot_offset_fn(slot, self.arg_count, self.reg_count, self.stack_alloc_size, self.xmm_reg_count)
line = GET_MEMSLOT_PATTERN.sub(replacement, line)
match = GET_MEMSLOT_PATTERN.search(line)
logging.info("%d: memslot macro %d" % (line_num, slot))
# Not modifying the line any further
return line
def process_macro_line(self, line, line_num):
# Currently in a macro
match = MACRO_END_PATTERN.match(line)
if (match):
logging.info("%d: macro end %s" % (line_num, self.macro_name))
self.macro_start_match = None
self.macro_name = None
self.macro_args = None
if self.assembler == "masm":
return MASM_MACRO_END
elif self.assembler == "gas":
return GAS_MACRO_END
elif self.assembler == "armasm64":
return ARMASM64_MACRO_END
else:
logging.error("Unhandled assembler (%s) in process_macro_line" % self.assembler)
exit(1)
arg_prefix = ""
if self.assembler == "armasm64":
# In armasm64 macros we need to escape all of the macro arguments with a $ in the macro body
arg_prefix = "$"
elif self.assembler == "gas":
# In GAS macros we need to escape all of the macro arguments with a backslash in the macro body
arg_prefix = r"\\"
if arg_prefix:
for arg in self.macro_args:
line = replace_identifier(arg, arg_prefix + arg, line)
# Not modifying the line any further
return line
def expand_files(filename, line_num_parent, line_parent):
''' Read the contents of filename and insert include files where there's an INCLUDE directive'''
if line_parent:
logging.info(
"expand file %s\n\t"
"%s (line %d)"
% (filename, line_parent, line_num_parent))
base_dir = os.path.dirname(filename)
expanded_lines = []
has_includes = False
try:
infile = open(filename)
for line_num, line in enumerate(infile.readlines()):
match = INCLUDE_PATTERN.match(line)
if (match):
include_file = match.group(1)
full_path = base_dir + "/" + include_file
logging.info("%d: including file %s" % (line_num + 1, include_file))
logging.info("%d: full path %s" % (line_num + 1, full_path))
include_lines, _ = expand_files(full_path, line_num + 1, line)
expanded_lines.extend(include_lines)
has_includes = True
else:
expanded_lines.append(line)
infile.close()
except IOError:
logging.error(
"cannot open include file %s\n\t"
"%s (line %d)"
% (filename, line_parent, line_num_parent))
exit(1)
return expanded_lines, has_includes
def gen_file_header(assembler, architecture, calling_convention):
""" Generate header to be inserted at the beginning of each symcryptasm file"""
header = ""
if assembler == "masm":
header += "// begin masm header\n"
header += "option casemap:none\n"
header += "// end masm header\n\n"
return header
def process_file(assembler, architecture, calling_convention, infilename, outfilename):
normal_calling_convention = None
if assembler == "masm":
if architecture == "amd64" and calling_convention == "msft":
normal_calling_convention = CALLING_CONVENTION_AMD64_MSFT
mul_calling_convention = CALLING_CONVENTION_AMD64_MSFT_MUL
nested_calling_convention = CALLING_CONVENTION_AMD64_MSFT_NESTED
elif assembler == "gas":
if architecture == "amd64" and calling_convention == "systemv":
normal_calling_convention = CALLING_CONVENTION_AMD64_SYSTEMV
mul_calling_convention = CALLING_CONVENTION_AMD64_SYSTEMV_MUL
nested_calling_convention = CALLING_CONVENTION_AMD64_SYSTEMV_NESTED
elif architecture == "arm64" and calling_convention == "aapcs64":
normal_calling_convention = CALLING_CONVENTION_ARM64_AAPCS64_GAS
mul_calling_convention = None
nested_calling_convention = None
elif architecture == "arm" and calling_convention == "aapcs32":
normal_calling_convention = CALLING_CONVENTION_ARM32_AAPCS32
mul_calling_convention = None
nested_calling_convention = None
elif assembler == "armasm64":
if architecture == "arm64" and calling_convention == "aapcs64":
normal_calling_convention = CALLING_CONVENTION_ARM64_AAPCS64_ARMASM64
mul_calling_convention = None
nested_calling_convention = None
elif architecture == "arm64" and calling_convention == "arm64ec":
normal_calling_convention = CALLING_CONVENTION_ARM64EC_MSFT
mul_calling_convention = None
nested_calling_convention = None
else:
logging.error("Unhandled assembler (%s) in process_file" % assembler)
exit(1)
if normal_calling_convention is None:
logging.error("Unhandled combination (%s + %s + %s) in process_file"
% (assembler, architecture, calling_convention))
exit(1)
file_processing_state = ProcessingStateMachine(
assembler, normal_calling_convention, mul_calling_convention, nested_calling_convention)
#
# expand included files
#
expanded_lines = []
# suppress header insertion
header = ""
# insert assembler specific header per symcryptasm file
#header = gen_file_header(assembler, architecture, calling_convention)
#expanded_lines.extend(header)
# expand_files() is called recursively when a .symcryptasm file contains an INCLUDE directive,
# except for the first call here where we're starting to process the input source file
# as if it was included by some other file.
expanded_file, infile_has_includes = expand_files(infilename, 0, "")
expanded_lines.extend(expanded_file)
# if header was nonempty or there were any INCLUDE directives, output the expanded source file for debugging
if header or infile_has_includes:
expanded_filename = os.path.dirname(outfilename) + "/" + os.path.basename(infilename) + "exp"
with open(expanded_filename, "w") as outfile:
outfile.writelines(expanded_lines)
# iterate through file line by line in one pass
with open(outfilename, "w") as outfile:
for line_num, line in enumerate(expanded_lines):
processed_line = file_processing_state.process_line(line, line_num)
# logging.info("processed line: %s" % processed_line)
outfile.write(processed_line)
if __name__ == "__main__":
import argparse
# logging.basicConfig(level=logging.INFO)
parser = argparse.ArgumentParser(description="Preprocess symcryptasm into files that will be further processed with C preprocessor to generate MASM or GAS")
parser.add_argument('assembler', type=str, help='Assembler that we want to preprocess for', choices=['masm', 'gas', 'armasm64'])
parser.add_argument('architecture', type=str, help='Architecture that we want to preprocess for', choices=['amd64', 'arm64', 'arm'])
parser.add_argument('calling_convention', type=str, help='Calling convention that we want to preprocess for', choices=['msft', 'systemv', 'aapcs64', 'arm64ec', 'aapcs32'])
parser.add_argument('inputfile', type=str, help='Path to input file')
parser.add_argument('outputfile', type=str, help='Path to output file')
args = parser.parse_args()
process_file(args.assembler, args.architecture, args.calling_convention, args.inputfile, args.outputfile)
logging.info("Done")