Rust Programming

1 Program Analysis

Types of program analysis

Dynamic analysis: run the program and watch what it does.
Static analysis: read the source code.

1.1 Dynamic Analysis

Run the program, watch what it does, and look for problematic behavior.
Can find problems, but only if the program exhibits problematic behavior on the inputs you use to test. (Separately, some tools only check for certain types of issues.)
Commonly combined with techniques to run the program with lots of different test inputs (e.g. fuzzing), yet this still can’t give us any assurances that code is bug-free
Dynamic analysis is great! Test your code and understand the limitations!

Valgrind

Takes in compiled binary (executable).
Disassembles binary into intermediate, assembly/assembly-like representation.
Turns this ^ back into machine code (re-compiles); one small block at a time during execution.
While "instrumenting", modifying instructions or inserting "analysis" code.

int main() { char *buf = (char *)malloc(8); buf[16] = 'a'; }

Compiler:

mov edi, 8 call malloc mov QWORD PTR [rbp-8], rax mov rax, QWORD PTR [rbp-8] add rax, 16 mov BYTE PTR [rax], 97

Valgrind

mov edi, 8 call valgrind_malloc mov QWORD PTR [rbp-8], rax # record memory write ^ mov rax, QWORD PTR [rbp-8] # record memory read add rax, 16 mov BYTE PTR [rax], 97 # record memory write ^ # Invalid write of size 4

Pros & Cons

Works with any binary compiled by any compiler (even if you don’t have source code available!)
But not a lot of information is available in binaries.
For example, cannot detect stack-based buffer overflows!

LLVM Sanitizers

Instrument source code
Implemented as part of the LLVM compiler suite (e.g. clang)
Because more information is available pre-compilation, there is a lot more analysis that sanitizers can do (and they’re also easier to implement)

Types of sanitizers

AddressSanitizer: Finds use of improper memory addresses: out of bounds memory accesses, double free, use after free
LeakSanitizer: Finds memory leaks
MemorySanitizer: Finds use of uninitialized memory
UndefinedBehaviorSanitizer: Finds usage of null pointers, integer/float overflow, etc
ThreadSanitizer: Finds improper usage of threads

Fundamental limitation of dynamic analysis

Dynamic analysis can only report bad behavior that actually happened
If your program worked fine with the input you provided, but it might do bad things in certain edge cases, dynamic analysis cannot tell you anything about

#include <stdio.h> #include <string.h> int main() { char s[100]; int i; printf("\nEnter a string : "); // The size of the input string may get larger than 100! gets(s); for (i = 0; s[i] != '\0'; i++) { if(s[i] >= 'a' && s[i] <= 'z') { s[i] = s[i] -32; } } printf("\nString in Upper Case = %s", s); return 0; }

Fuzzing

Very simple but extremely effective
Most common fuzzers: AFL (archived) and libfuzzer
Still, cannot provide any guarantees that a program is bug-free (if the fuzzer didn’t find anything in 24 hours, maybe we just didn’t run it long enough)
Google OSS-Fuzz is a large cluster that fuzzes open-source software 24/7

1.2 Static Analysis

linting

Linters employ very simple techniques (e.g. ctrl+f) to find obvious mistakes.
The person running the linter can configure a set of rules to enforce.
- Rules are intended to improve the style of the codebase.
- Just because there is a linter error doesn’t mean the code is broken (e.g. it's possible to call strcpy() without introducing bugs, but many linters will complain if you call it)
Common C/C++ linter: clang-tidy

Dataflow Analysis

We can trace through how the program might execute, keeping track of possible variable values

Example 1

void printToUpper(const char *str) { char *upper = strdup(str); for (int i = 0; str[i] != '\0'; i++) { if(str[i] >= 'a' && str[i] <= 'z') { upper[i] = str[i] - ('a' - 'A'); } } printf("%s\n", upper); free(upper); } int main(int argc, char *argv[]) { printf("Enter a string to make uppercase, or type \"quit\" to quit:\n"); char input[512]; // safely read input string fgets(input, sizeof(input), stdin); char *toMakeUppercase; if (strcmp(input, "quit") != 0) { toMakeUppercase = input; } // The pointer toMakeUppercase can be uninitialized here! printToUpper(toMakeUppercase); }

Example 2

int main(int argc, char *argv[]) { // Goal: parse out a string between brackets // (e.g. " [target string]" -> "target string") char *parsed = strdup(argv[1]); // Find open bracket char *open_bracket = strchr(parsed, '['); if (open_bracket == NULL) { printf("Malformed input!\n"); return 1; // Early return fails to clean up heap alllocation! } // Make the output string start after the open bracket parsed = open_bracket + 1; // Find the close bracket char *close_bracket = strchr(parsed, ']'); if (close_bracket == NULL) { printf("Malformed input!\n"); return 1; // Early return fails to clean up heap alllocation! } // Replace the close bracket with a null // terminator to end the parsed string there *close_bracket = '\0'; printf("Parsed string: %s\n", parsed); free(parsed); return 0; }

Example 3(Work across functions)

void freeSometimes(void *buf) { if (rand() == 1) { return; } free(buf); // buf can be a heap allocation or a freed allocation! } int main() { void *buf = malloc(8); freeSometimes(buf); return 0; }

Limitations

False positives
- Dataflow analysis will follow each branch, even if it’s impossible for some condition to be true in real life.
- False positives are the Achille’s heel of static analysis. Need a good signal/noise ratio or else no one will use your analyzer.
Need to limit scope to get reasonable performance
- Many static analyzers only analyze a single file at a time: they don't do dataflow analysis into/out of functions elsewhere in the codebase
- If you have a huge codebase, loops, tons of conditions, etc., dataflow analysis can get unwieldy.
Static analysis can only report bad behavior that might happen

Fundamental limitations of static analysis

If you can only look at a few lines of code, it's hard to tell (without broader context) whether that code is safe.
Getting broader context is impossible in the general case.
You can always add more specific things to check for, but there will always be other ways to mess up.

2 Memory Safety

Good Memory Management

Pre/postconditions are essential to breaking code into small pieces with welldefined interfaces in between.
It's the programmer's responsibility to make sure the pre/postconditions are upheld, the compiler does not know what your postconditions are.
If you free too early, other parts of your code might still be using pointers to that memory.
If you don't free anywhere (or you free in a function that only gets called sometimes), you'll have a memory leak.
Good C/C++ code will clearly define how memory is passed around and "who" is responsible for cleaning it up.
If you read C/C++ code, you'll see notions of "ownership" in the comments, where the "owner" is responsible for the memory.

Ownership in Rust: Use Teddy Bear Analogy

let julio = "I have the teddy bear!".to_string();
Julio is the owner of the teddy bear. He can do whatever he wants with it.
Julio is responsible for putting the gift back where they found it before leaving (free the memory!)
let ryan = julio;
Now, Ryan is the owner of the toy! Ryan can do anything he wants with the toy!
Ryan is now responsible for putting the toy back where Julio found it before leaving (free the memory ).
Julio has no ownership of the toy anymore, and can't do anything with this bear anymore.
Function Calls
- Ownership Transfered
  fn take_ownership(s: String) { // s takes ownership of the String println!("{}", s); } // s goes out of scope and the string is dropped (memory freed) fn main() { let julio = "I have the teddy bear!".to_string(); take_ownership(julio); // Ownership is transferred to the function // println!("{}", julio); // This will cause a compile-time error because julio no longer owns the string }
  This means that at the end of the function execution, they will be responsible for freeing the toy in memory!
- Borrowing
  fn borrow_teddy(bear: &String) { // The & signifies borrowing println!("Playing with the bear: {}", bear); } // The borrow ends here. The original owner (julio) still has the teddy bear. fn main() { let julio = "I have the teddy bear!".to_string(); borrow_teddy(&julio); // Borrow julio using a reference println!("{}", julio); // This is perfectly fine now! julio still owns the string. }

Ownership Rules

Each value in Rust has a variable that's called its owner.
There can only be one owner at a time.
When the owner goes out of scope, the values will be dropped.

Last modified: 25 November 2024