Rust Programming
1 Program Analysis
Types of program analysis
Dynamic analysis: run the program and watch what it does.
Static analysis: read the source code.
1.1 Dynamic Analysis
Run the program, watch what it does, and look for problematic behavior.
Can find problems, but only if the program exhibits problematic behavior on the inputs you use to test. (Separately, some tools only check for certain types of issues.)
Commonly combined with techniques to run the program with lots of different test inputs (e.g. fuzzing), yet this still can’t give us any assurances that code is bug-free
Dynamic analysis is great! Test your code and understand the limitations!
Valgrind
Valgrind
Takes in compiled binary (executable).
Disassembles binary into intermediate, assembly/assembly-like representation.
Turns this ^ back into machine code (re-compiles); one small block at a time during execution.
While "instrumenting", modifying instructions or inserting "analysis" code.
Compiler:
Valgrind
Pros & Cons
Works with any binary compiled by any compiler (even if you don’t have source code available!)
But not a lot of information is available in binaries.
For example, cannot detect stack-based buffer overflows!
LLVM Sanitizers
Instrument source code
Implemented as part of the LLVM compiler suite (e.g. clang)
Because more information is available pre-compilation, there is a lot more analysis that sanitizers can do (and they’re also easier to implement)
Types of sanitizers
AddressSanitizer: Finds use of improper memory addresses: out of bounds memory accesses, double free, use after free
LeakSanitizer: Finds memory leaks
MemorySanitizer: Finds use of uninitialized memory
UndefinedBehaviorSanitizer: Finds usage of null pointers, integer/float overflow, etc
ThreadSanitizer: Finds improper usage of threads
Fundamental limitation of dynamic analysis
Dynamic analysis can only report bad behavior that actually happened
If your program worked fine with the input you provided, but it might do bad things in certain edge cases, dynamic analysis cannot tell you anything about
Fuzzing
Very simple but extremely effective
Most common fuzzers: AFL (archived) and libfuzzer
Still, cannot provide any guarantees that a program is bug-free (if the fuzzer didn’t find anything in 24 hours, maybe we just didn’t run it long enough)
Google OSS-Fuzz is a large cluster that fuzzes open-source software 24/7
1.2 Static Analysis
linting
Linters employ very simple techniques (e.g. ctrl+f) to find obvious mistakes.
The person running the linter can configure a set of rules to enforce.
Rules are intended to improve the style of the codebase.
Just because there is a linter error doesn’t mean the code is broken (e.g. it's possible to call
strcpy()
without introducing bugs, but many linters will complain if you call it)
Common C/C++ linter: clang-tidy
Dataflow Analysis
We can trace through how the program might execute, keeping track of possible variable values
Example 1
Example 2
Example 3(Work across functions)
Limitations
False positives
Dataflow analysis will follow each branch, even if it’s impossible for some condition to be true in real life.
False positives are the Achille’s heel of static analysis. Need a good signal/noise ratio or else no one will use your analyzer.
Need to limit scope to get reasonable performance
Many static analyzers only analyze a single file at a time: they don't do dataflow analysis into/out of functions elsewhere in the codebase
If you have a huge codebase, loops, tons of conditions, etc., dataflow analysis can get unwieldy.
Static analysis can only report bad behavior that might happen
Fundamental limitations of static analysis
If you can only look at a few lines of code, it's hard to tell (without broader context) whether that code is safe.
Getting broader context is impossible in the general case.
You can always add more specific things to check for, but there will always be other ways to mess up.
2 Memory Safety
Good Memory Management
Pre/postconditions are essential to breaking code into small pieces with welldefined interfaces in between.
It's the programmer's responsibility to make sure the pre/postconditions are upheld, the compiler does not know what your postconditions are.
If you free too early, other parts of your code might still be using pointers to that memory.
If you don't free anywhere (or you free in a function that only gets called sometimes), you'll have a memory leak.
Good C/C++ code will clearly define how memory is passed around and "who" is responsible for cleaning it up.
If you read C/C++ code, you'll see notions of "ownership" in the comments, where the "owner" is responsible for the memory.
Ownership in Rust: Use Teddy Bear Analogy
- let julio = "I have the teddy bear!".to_string();
Julio is the owner of the teddy bear. He can do whatever he wants with it.
Julio is responsible for putting the gift back where they found it before leaving (free the memory!)
- let ryan = julio;
Now, Ryan is the owner of the toy! Ryan can do anything he wants with the toy!
Ryan is now responsible for putting the toy back where Julio found it before leaving (free the memory ).
Julio has no ownership of the toy anymore, and can't do anything with this bear anymore.
Function Calls
Ownership Transfered
fn take_ownership(s: String) { // s takes ownership of the String println!("{}", s); } // s goes out of scope and the string is dropped (memory freed) fn main() { let julio = "I have the teddy bear!".to_string(); take_ownership(julio); // Ownership is transferred to the function // println!("{}", julio); // This will cause a compile-time error because julio no longer owns the string }This means that at the end of the function execution, they will be responsible for freeing the toy in memory!
Borrowing
fn borrow_teddy(bear: &String) { // The & signifies borrowing println!("Playing with the bear: {}", bear); } // The borrow ends here. The original owner (julio) still has the teddy bear. fn main() { let julio = "I have the teddy bear!".to_string(); borrow_teddy(&julio); // Borrow julio using a reference println!("{}", julio); // This is perfectly fine now! julio still owns the string. }
Ownership Rules
Each value in Rust has a variable that's called its owner.
There can only be one owner at a time.
When the owner goes out of scope, the values will be dropped.