Allow writing a 0 at the upper bound of a null-terminated array. (#242)

This change addresses the suggestion from @mwhicks1 in issue #235 that we relax the bounds checking for null-terminated arrays to allow a write of 0 at the upper bound of a null-terminated array. There will be a matching pull request in the checked-clang repo for compiler change. - Update the description of bounds checking in the Checked C specification. - Add tests to nullterm_pointers.c that check this behavior. - Update the sample code in string-helpers.c to delete some special-case code in the "squeeze" function. This function removes all occurrences of a character from a string. If no characters are removed, it redundantly overwrites the existing null-terminator. Before, this had to be special-cased to avoid an out-of-bounds write. I also updated the sample code to avoid declaring default bounds of count(0) and clarified the comments. I ran into a subtlety when revising the specification. When we allow a 0 to be written at the upper bound, we have to expand the allowed range of memory to include the memory at the upper bound. It is possible for even this expanded range to be empty. The specification is careful to not allow a write of 0 in this situation., by including a comparison of the pointer address against the lower bound.
2017-11-29 16:22:34 -05:00 · 2017-11-29 16:22:34 -05:00 · f63a82dcfe
--- a/samples/string-helpers.c
+++ b/samples/string-helpers.c
@ -6,7 +6,7 @@
 // To compile the file using clang, on Unix/Mac use
 //  clang -o string-helpers -fcheckedc-extension string-helper.c
 // On Windows use:
-//  clang -o string-helpers.exe -fcheckedc-extension string-helper.c
+//  clang -o string-helpers.exe -fcheckedc-extension string-helpers.c
 // 
 // Then run the program with 6 string arguments, the 3rd of which should
 // be an integer  For example:
@ -18,41 +18,55 @@
 //   nt_array_ptr type.  Arrays of null-terminated data are represented
 //   using the nt_checked type.
 //
-//   These types follow slightly different rules than
-//   array_ptr and checked arrays do:
-//   - Given an nt_array_ptr, the element exactly at the
-//     upper bound of the nt_array_ptr can be read but not
-//     written.
-//   - Given an nt_checked array, it has count of
-//      of elements that excludes the null terminator (is one
-//      less than the declared size).
-//   - nt_array_ptr have a default bounds of count(0). This
-//     means that you can can dereference an nt_array_ptr without
-//     bounds (NOTE: the compiler is currently not yet inserting
-//     the default bounds.  You'll need to add them explicitly.
+//   An nt_array_ptr is like an array_ptr, except that the array that it
+//   points to is followed by a sequence of elements that is null-terminated.
+//   The bounds on the nt_array_ptr delimit the array.  The sequence begins
+//   at the upper bound.
+//
+//   Because of this,  nt_array_ptr and nt_checked arrays follow slightly
+//   different rules than array_ptr and checked arrays do:
+//   - Given an nt_array_ptr, the element exactly at the upper bound of the
+//     nt_array_ptr can be read (it is the first element of the sequence).
+//     A 0 can be written there using an assignment (there must always be
+//     enough space in the sequence for at least the null terminator).  No
+//     other writes to that element are allowed.
+//   - Given an nt_checked array, it converts to an nt_array_ptr whose count
+//     excludes the  null terminator (whose count is one less than the declared
+//     size).
+//   - nt_array_ptrs have a default bounds of count(0). This means that you
+//     can dereference an nt_array_ptr that does not have a bounds
+//     declaration.
 //
 //   Some simple examples:
 //
-//   The following is valid for an nt_array_ptr.  It
-//   would fail at run-time for an array_ptr.
+//   The following code is valid for an nt_array_ptr.  It would fail at runtime
+//   for an array_ptr.
 //
 //   char fetch(nt_array_ptr<char> p : count(0)) {
 //     return *p;
 //   }
 //
-//   The following code is valid for nt_array_ptr.
-//   It would fail at runtime for an array_ptr.
+//   The following code is valid for an nt_array_ptr.  It would fail at runtime
+//   for an array_ptr.
 //
 //   int f(nt_array_ptr<char> p : count(n), int n) {
-//     if (p[n]) {  // read eleement at upper bound.
+//     if (p[n]) {  // read element at upper bound.
 //       ...
 // 
-//   The following code fails at runtime for both
-//   nt_array_ptr and array_ptr:
+//    If v != 0, the following code fails at runtime for an nt_array_ptr.  It
+//    always fails for an array_ptr.
 //
-//   int set(nt_array_ptr<char> p : count(n), int n, int v) {
-//     p[n] = v;  // runtime error! attempt to write element at upper bound.
-//     ...
+//     int set(nt_array_ptr<char> p : count(n), int n, int v) {
+//       p[n] = v;  // fails unless v is zero.
+//       ...
+//     }
+//
+//    The following code is valid for an nt_array_ptr and would fail at runtime
+//    for an array_ptr:
+//
+//    int set_zero(nt_array_ptr<char> p : count(n), int n) {
+//      p[n] = 0;  // allowed for nt_array_ptr.  Runtime error for array_ptr!
+//    }
 // 
 //   The following example illustrates how bounds differ from
 //   the count:
@ -65,8 +79,8 @@
 //
 // About the example functions.
 //
-// There three important aspects to using
-// null-terminated character pointers. in Checked C:
+// There three important aspects to using null-terminated character pointers
+// in Checked C:
 // 1. Nt_array_ptrs with no bounds declared have a default bounds of count(0).
 // 2. If you are using array subscripting to access the nt_array_ptr,
 //    you'll need to widen the bounds as you determine the last character
@ -108,7 +122,8 @@
 #include <stdio_checked.h>

 // Return length of p (adapted from p. 39, K&R 2nd Edition).
-checked int my_strlen(nt_array_ptr<char> p : count(0)) {
+// p implicitly has count(0).
+checked int my_strlen(nt_array_ptr<char> p) {
  int i = 0;
  // Create a temporary whose count of elements
  // can change.
@ -121,7 +136,8 @@ checked int my_strlen(nt_array_ptr<char> p : count(0)) {
 }

 // Delete all c from p (adapted from p. 47, K&R 2nd Edition)
-checked void squeeze(nt_array_ptr<char> p : count(0), char c) {
+// p implicltly has count(0).
+checked void squeeze(nt_array_ptr<char> p, char c) {
  int i = 0, j = 0;
  // Create a temporary whose count of elements can
  // change.
@ -135,12 +151,10 @@ checked void squeeze(nt_array_ptr<char> p : count(0), char c) {
    if (tmp[i] != c)
      tmp[j++] = tmp[i];
  }
-  // Tricky case: if i == j, s[j] = `\0' would
-  // attempt to write to the character at the upper
-  // bound.  This would be a runtime error.  Only
-  // write a `\0` if we removed a character.
-  if (j < i)
-    s[j] = '\0';
+  // if i==j, this writes a 0 at the upper bound.  Writing a 0 at the upper bound
+  // is allowed for pointers to null-terminated arrays.  It is not allowed for
+  // regular arrays.
+  s[j] = 0;
 }

 // Convert p to integer (adapted from p. 61, K&R 2nd Edition).
@ -161,15 +175,15 @@ checked int my_atoi(char p nt_checked[] : count(0)) {
  return sign * n;
 }

-// Reverse a string in place (p. 62, K&R 2nd Edition)
-checked void reverse(nt_array_ptr<char> p : count(0)) {
+// Reverse a string in place (p. 62, K&R 2nd Edition).
+// p implicitly has count(0).
+checked void reverse(nt_array_ptr<char> p) {
  int len = 0;
  // Calculate the length of the string.
  nt_array_ptr<char> s : count(len) = p;
  for (; s[len]; len++);

-  // Now that we know the length, use pointer
-  // just like we would use an array_ptr
+  // Now that we know the length, use s just like we would use an array_ptr.
  for (int i = 0, j = len - 1; i < j; i++, j--) {
    int c = s[i];
    s[i] = s[j];
@ -179,8 +193,8 @@ checked void reverse(nt_array_ptr<char> p : count(0)) {

 // Return < 0 if s < t, 0 if s == t, > 0 if s > t.
 // Adapted from p.106, K&R 2nd Edition.
-checked int my_strcmp(nt_array_ptr<char> s : count(0),
-                      nt_array_ptr<char> t : count(0)) {
+// s and t implicitly have count(0).
+checked int my_strcmp(nt_array_ptr<char> s, nt_array_ptr<char> t) {
  // Reading *s and *t is allowed for count(0)
  for (; *s == *t; s++, t++) // Incrementing s, t allowed because *s, *t != `\0`
    if (*s == '\0')
@ -212,4 +226,3 @@ int main(int argc, nt_array_ptr<char> argv checked[] : count(argc)) {
  printf("strcmp(\"%s\", \"%s\") = %d\n", arg5, arg6, my_strcmp(arg5, arg6));
  return 0;
 }
-
--- a/spec/bounds_safety/core-extensions.tex
+++ b/spec/bounds_safety/core-extensions.tex
@ -367,7 +367,8 @@ An \keyword{nt\_checked} array with size \var{d} converts
 to an \ntarrayptr\ with a count of \var{d - 1} elements.
 This is the number of elements in the prefix array.   This means that
 programs can still read an array element containing a null terminator.
-However, attempting to overwrite a null terminator is a runtime error.
+However, attempting to overwrite the null terminator with a non-null
+value is a runtime error.

 \subsection{\ntarrayptr\ usually follows the rules for \arrayptr}
 Because \ntarrayptr\ extends \arrayptr, the discussion and rules for
--- a/spec/bounds_safety/variable-bounds.tex
+++ b/spec/bounds_safety/variable-bounds.tex
@ -662,20 +662,48 @@ Accessing outside of the bounds for an
 individual inner dimension is a violation of the C Standard and logically incorrect,
 but it does not compromise memory safety or type safety.

-Bounds checks for \ntarrayptr\ values allow read access to memory exactly
-at the upper bound.  This is the beginning of the null-terminated
-sequence (bounds checks for \arrayptr\ values only allow access to
-memory below the upper bounds).  For memory reads,
+Bounds checks for \ntarrayptr\ values allow read access to memory
+at the upper bound.  An \ntarrayptr\ points to an array with declared bounds
+that is followed by a sequence of elements that is null-terminated.
+The element at the upper bound is the beginning of the null-terminated
+sequence.  Allowing a read at the upper bound lets a program check the
+first element of the sequence to see if is non-null
+(bounds checks for \arrayptr\ values only allow access to
+memory below the upper bound).  For memory reads,
 given \texttt{*\var{e1}} where {\var{e1}} has {\bounds{\var{e2}}{\var{e3}}},
 the compiler computes \var{e1} to some temporary \var{t}.
 The compiler inserts a runtime check that  \texttt{\var{e2} <= \var{t} \&\&
-\var{t} <= \var{e3}}.   For memory writes, the compiler inserts the
-same check as for \arrayptr:
-\texttt{\var{e2} <= \var{t} \&\& \var{t} < \var{e3}}.
+\var{t} <= \var{e3}}.
+
+For memory writes, assignment of 0 (the null value)
+using the assignment operator is allowed at the upper bound.
+This is the first element of the null-terminated sequence
+and there must be enough space in the sequence for at least a null terminator.
+Otherwise, the check
+is the same as for \arrayptr.   Given an expression {\var{e1}} that
+has {\bounds{\var{e2}}{\var{e3}}}, here is a precise description:
+\begin{itemize}
+\item Given an assignment of the form \texttt{*\var{e1} = \var{e4}}, the
+value of \var{e4} is computed to some temporary \var{v}.  The
+check is \texttt{(\var{e2} <= \var{t} \&\& \var{t} < \var{e3}) ||
+(\var{e2} <= \var{t} \&\& \var{t} == \var{e3} \&\& \var{v} == 0)}.
+
+When checking for a write of 0 exactly at the upper bound, we need
+to include the first element of the sequence in the allowed memory range.
+The second lower bound comparison \texttt{\var{e2} <= \var{t}}
+prevents an assignment
+at the upper bound when this expanded range is empty.
+A compiler can avoid the duplicate comparison by using
+the check
+\texttt{\var{e2} <= \var{t} \&\& (\var{t} < \var{e3} ||
+(\var{t} == \var{e3} \&\& \var{v} == 0))}
+\item Given an assignment via a compound assignment operator, the check
+is \texttt{\var{e2} <= \var{t} \&\& \var{t} < \var{e3}}.
+\end{itemize}
+
 If the bounds for \var{e1} are inferred, the checks
 must be provably true at compile time.

-
 No bounds checks are inserted when the \texttt{\&} operator is applied to
 a dereference or subscript exression.
 The dereference or subscript expression does not produce a result that is used
--- a/tests/dynamic_checking/bounds/nullterm_pointers.c
+++ b/tests/dynamic_checking/bounds/nullterm_pointers.c
@ -6,11 +6,21 @@
 // RUN:  %t1 2 | FileCheck %s --check-prefixes=CHECK
 // RUN:  %t1 3 | FileCheck %s --check-prefixes=CHECK
 // RUN:  %t1 4 | FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 5 | FileCheck %s --check-prefixes=CHECK,NO-BOUNDS-FAILURES-2
+// RUN:  %t1 6 | FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 7 | FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 8 | FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 9 | FileCheck %s --check-prefixes=CHECK,NO-BOUNDS-FAILURES-3
+// RUN:  %t1 10 | FileCheck %s --check-prefixes=CHECK
 //
-// RUN:  %t1 21 | FileCheck %s --check-prefixes=CHECK,NO-BOUNDS-FAILURES-2
+// RUN:  %t1 21 | FileCheck %s --check-prefixes=CHECK,NO-BOUNDS-FAILURES-4
 // RUN:  %t1 22 | FileCheck %s --check-prefixes=CHECK
 // RUN:  %t1 23 | FileCheck %s --check-prefixes=CHECK
 // RUN:  %t1 24 | FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 25 | FileCheck %s --check-prefixes=CHECK,NO-BOUNDS-FAILURES-5
+// RUN:  %t1 26 | FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 27| FileCheck %s --check-prefixes=CHECK
+// RUN:  %t1 28 | FileCheck %s --check-prefixes=CHECK

 #include <assert.h>
 #include <signal.h>
@ -33,12 +43,21 @@ int test1(void);
 int test2(void);
 void test3(void);
 void test4(void);
+void test5(void);
+void test6(void);
+void test7(void);
+void test8(void);
+void test9(void);
+void test10(void);

 int test21(struct CountedNullTermString *p);
 int test22(struct CountedString *p);
 int test23(struct CountedNullTermString *p);
 int test24(struct CountedString *p);
-
+int test25(struct CountedNullTermString *p);
+int test26(struct CountedString *p);
+int test27(struct CountedNullTermString *p);
+int test28(struct CountedString *p);

 // Handle an out-of-bounds reference by immediately exiting. This causes
 // some output to be missing.
@ -73,8 +92,10 @@ int main(int argc, array_ptr<char*> argv : count(argc)) {
    return EXIT_FAILURE;
  }

-  struct CountedNullTermString nullterm = { "abcde", 5 };
-  struct CountedString plain = { "abcde", 5 };
+  char data1 nt_checked[6] = "abcde";
+  char data2 checked[6] = "abcde";
+  struct CountedNullTermString nullterm = { data1, 5 };
+  struct CountedString plain = { data2, 5 };

  // CHECK: Beginning test
  puts("Beginning test");
@ -92,6 +113,24 @@ int main(int argc, array_ptr<char*> argv : count(argc)) {
    case 4:
      test4();
      break;
+    case 5:
+      test5();
+      break;
+    case 6:
+      test6();
+      break;
+    case 7:
+      test7();
+      break;
+    case 8:
+      test8();
+      break;
+    case 9:
+      test9();
+      break;
+    case 10:
+      test10();
+      break;

    case 21:
      test21(&nullterm);
@ -105,6 +144,18 @@ int main(int argc, array_ptr<char*> argv : count(argc)) {
    case 24:
      test24(&plain);
      break;
+    case 25:
+      test25(&nullterm);
+      break;
+    case 26:
+      test26(&plain);
+      break;
+    case 27:
+      test27(&nullterm);
+      break;
+    case 28:
+      test28(&plain);
+      break;

    default:
      puts("Unexpected test case");
@ -141,21 +192,96 @@ int test2(void) {
  return i;
 }

+// Write a non-zero character at the upper bound of a string.  This
+// should cause a runtime fault.
 void test3(void) {
-  nt_array_ptr<char> s : count(0) = "hello";
+  char data nt_checked[6] = "hello";
+  nt_array_ptr<char> s : count(0) = data;
  while (*s) {
    *s = 'd';
+    s++;
  }
  // CHECK-NOT: expected bounds failure on write
  puts("expected bounds failure on write");
  return;
 }

+// Write a non-zero character exactly at the upper bound of an array_ptr.
 void test4(void) {
-  array_ptr<char> s : count(0) = "hello";
-  while (*s) {
-    *s = 'd';
-  }
+  char data checked[6] = "hello";
+  array_ptr<char> s : count(6) = data;
+  *(s + 6) = 'd';
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return;
+}
+
+// Write a zero character at the upper bound of a string.  This should
+// not cause a runtime fault.
+void test5(void) {
+  char data nt_checked[6] = "hello";
+  nt_array_ptr<char> s : count(5) = data;
+  s[5] = 0;
+  // NO-BOUNDS-FAILURES-2: wrote nul at the upper bound of a string
+  puts("wrote nul at the upper bound of a string");
+  return;
+}
+
+// Write 0 at the upper bound of an array_ptr<char>.  Should cause
+// a runtime fault.
+void test6(void) {
+  char data checked[6] = "hello";
+  array_ptr<char> s : count(6) = data;
+  char result = 0;
+  s[6] = result;
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return;
+}
+
+
+// Write 0 at memory location one past the upper bound of a string.
+// Expected to cause a runtime fault.
+void test7(void) {
+  char data nt_checked[6] = "hello";
+  array_ptr<char> s : count(5) = data;
+  char result = 0;
+  s[5 + 1] = result;
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return;
+}
+
+// Write 0 at memory location one past the upper bound of an array_ptr<char>.
+// Expected to a cause a runtime fault.
+void test8(void) {
+  char data checked[6] = "hello";
+  array_ptr<char> s : count(6) = data;
+  char result = 0;
+  s[6 + 1] = result;
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return;
+}
+
+// Write 0 exactly at upper bound of a range for a null-terminated pointer,
+// when the lower bound == the upper bound.  Expected to succeed.
+void test9(void) {
+  char data nt_checked[6] = "hello";
+  nt_array_ptr<char> s : bounds(data + 5, data + 5) = data;
+  s[5] = 0;
+  // NO-BOUNDS-FAILURES-3: expected write at range with lower == upper to succeed
+  puts("expected write at range with lower == upper to succeed");
+  return;
+}
+
+
+// Write 0 at the upper bound of a range for a null-terminated pointer,  where
+// the lower bound is above the upper bound.  Expected to cause a runtime fault.
+void test10(void) {
+  char data nt_checked[6] = "hello";
+  nt_array_ptr<char> s : bounds(data + 6, data + 5) = data;
+  s[6] = 0;
  // CHECK-NOT: expected bounds failure on write
  puts("expected bounds failure on write");
  return;
@ -168,12 +294,12 @@ int test21(struct CountedNullTermString *p) {
    // CHECK-NOT: expected null terminator
    puts("expected null terminator");
  else
-    // NO-BOUNDS-FAILURES-2: found null terminator at nt_array_ptr upper bound  
+    // NO-BOUNDS-FAILURES-4: found null terminator at nt_array_ptr upper bound
    puts("found null terminator at nt_array_ptr upper bound");
  return 0;
 }

-// Read exactly at the upper bound of a plain array_ptr.  Expected
+// Read exactly at the upper bound of an array_ptr.  Expected
 // to cause a runtime fault.
 int test22(struct CountedString *p) {
  char result = p->s[p->len];
@ -182,6 +308,8 @@ int test22(struct CountedString *p) {
  return result;
 }

+// Write a non-zero value at exactly the upper bound of a string.  Should not
+// cause a runtime fault.
 int test23(struct CountedNullTermString *p) {
  char result = 'a';
  p->s[p->len] = result;
@ -190,10 +318,52 @@ int test23(struct CountedNullTermString *p) {
  return result;
 }

+// Write a non-zero value at exactly the upper bound of an array_ptr<char>.
+// Expected to a cause a runtime fault.
 int test24(struct CountedString *p) {
  char result = 'a';
  p->s[p->len] = result;
  // CHECK-NOT: expected bounds failure on write
  puts("expected bounds failure on write");
  return result;
-}
+}
+
+// Write 0 at exactly the upper bound of a string.  Not expected to cause a runtime
+// fault.
+int test25(struct CountedNullTermString *p) {
+  char result = 0;
+  p->s[p->len] = result;
+  // NO-BOUNDS-FAILURES-5: wrote nul at the upper bound of a string
+  puts("wrote nul at the upper bound of a string");
+  return result;
+}
+
+// Write 0 at exactly the upper bound of an array_ptr<char>.
+int test26(struct CountedString *p) {
+  char result = 0;
+  p->s[p->len] = result;
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return result;
+}
+
+
+// Write 0 at memory location one past the upper bound of a string.  Expected
+// to a cause a runtime fault.
+int test27(struct CountedNullTermString *p) {
+  char result = 0;
+  p->s[p->len + 1] = result;
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return result;
+}
+
+// Write 0 at memory location one past the upper bound of a character array.
+// Expected to a cause a runtime fault.
+int test28(struct CountedString *p) {
+  char result = 0;
+  p->s[p->len + 1] = result;
+  // CHECK-NOT: expected bounds failure on write
+  puts("expected bounds failure on write");
+  return result;
+}