- Jan-15-2018 - Peeking inside Trait Objects
- Jan-16-2018 - What does Rust's "Unsafe" mean?
- Jan-17-2018 - What's Tokio and Async IO All About?
- Jan-18-2018 - A Journey into Iterators
- Jan-19-2018 - Error Handling in Rust
- Jan-20-2018 - Pretty State Machine Patterns in Rust
- Jan-21-2018 - Finding Closure in Rust
- Jan-22-2018 - Why is Rust difficult?
- Jan-22-2018 - Macros
- Jan-23-2018 - Macros in Rust pt1
- Jan-24-2018 - Macros in Rust pt2
- Jan-25-2018 - Macros in Rust pt3
- Jan-26-2018 - Macros in Rust pt4
- Jan-27-2018 - Virtual Structs Part 1 - Where Rust's enums shine
- Jan-28-2018 - Virtual Structs Part 2: Classes strike back
- Jan-29-2018 - Virtual Structs Part 3: Bringing Enums and Structs Together
- Jan-30-2018 - Taking Rust everywhere with rustup
- Jan-31-2018 - The Problem With Single-threaded Shared Mutability
- Feb-01-2018 - Rust Lifetimes for the Uninitialised
- Feb-02-2018 - What Are Sum, Product, and Pi Types?
- Feb-03-2018 - Mentally Modelling Modules
- Feb-04-2018 - Rust Tidbits: What Is a Lang Item?
- Feb-15-2018 - Some notes on Send and Sync
- Feb-16-2018 - The Option Type
- Feb-19-2018 - The Sized Trait
Jan-15-2018
Title: Peeking inside Trait Objects
Rust's trait system forms the basis of the generic system and polymorphic functions and types. Traits allow abstraction over behavior that types can have in common.
Example
// trait definition
trait Foo {
fn method(&self) -> String;
}
// implementation of trait Foo for type u8
impl Foo for u8 {
fn method(&self) -> String { format!("u8: {}", *self) }
}
Trait objects are normal values that store a value of any type that implements a given trait, where the precise type can only be known at runtime.
Trait objects are obtained by casting and coercions.
Representation
pub struct TraitObject {
pub data: *mut (), // data pointer
pub vtable: *mut (), // vtable pointer
}
The data pointer addresses the data that the trait object is storing, while the vtable pointer points to the virtual method table corresponding to the trait implementation.
Jan-16-2018
Title: What does Rust's "Unsafe" mean?
Rust aims to be memory safe, so that code cannot crash due to dangling pointers or iterator
invalidation. There are things that cannot fit the type system e.g. interaction with the OS and
system libs. Rust fills these gaps using the unsafe
keyword.
There are two ways of opting into unsafe behavior: with an unsafe
block or with an unsafe
function
// calling some C functions imported via FFI
unsafe fn foo() {
some_c_function();
}
fn bar() {
unsafe {
another_c_function();
}
The unsafe
context allows one to:
- call unsafe functions
- dereference raw pointers
- access a mutable global variable
- use inline assembly
An
unsafe
context is the programmer telling the compiler that the code is guaranteed to be safe due to invariants impossible to express in the type system, and that it satisfies the invariants that Rust itself imposes.
Jan-17-2018
Title: What's Tokio and Async IO All About?
One of Rust's key features if "fearless concurrency". But the kind of concurrency required for handling a large amount of I/O bound tasks is absent from Rust.
"Handling N things at once" is best done by using threads. However, OS threads do not scale when N is large: each thread needs to allocate a stack, setting up a thread involves a bunch of syscalls, and context switching is expensive. We need "lighter" threads.
The Go language has lightweight threads, called "goroutines". They are spawn using the go
keyword.
listener, err = net.Listen(...)
// handle err
for {
conn, err := listener.Accept()
// handle err
// spawn goroutine:
go handler(conn)
}
This is a loop that waits for new TCP connections, and spawns a new goroutine to handle each connection.
The spawned goroutine will shut down when handler
finishes. The main loop keeps executing since
it runs in a different goroutine.
Go implements an M:N threading model with a scheduler that swaps goroutines in and out, much like the OS scheduler. Rust used to support lightweight threads pre-1.0 (feature got removed since it wasn't zero cost).
Two building blocks of lightweight threading are Async I/O and futures.
In regular blocking I/O, when you request I/O your thread will not be allowed to run until the
operation is done. In Async (non-blocking) I/O a thread queues a request for I/O with the OS and
continues execution. The I/O request is executed at some later point by the kernel. The OS provides
system calls like epoll
for Async I/O tasks.
The Rust library mio is a platform-agnostic wrapper around non-blocking I/O and interfaces such as epoll, kqueue etc. It forms a building block for Async I/O.
Futures are another building block. A Future is the promise of eventually having a value.
Futures can be wait
ed (blocking), poll
ed and chained.
Tokio is a wrapper around mio that uses futures. It has a core event loop, and you feed it closures that return futures. The event loop schedules the tasks (closures) passed to it.
Rust's code that has futures isn't elegant/pretty. Generators and the async/await syntax are experimental features aimed at reducing the boilerplate code, essentially turning
fn foo(...) -> Future<ReturnType, ErrorType> {
do_io().and_then(|data| do_more_io(compute(data)))
.and_then(|more_data| do_even_more_io(more_compute(more_data)))
// ......
}
to
#[async]
fn foo(...) -> Result<ReturnType, ErrorType> {
let data = await!(do_io());
let result = compute(data);
let more_data = await!(do_more_io());
// ....
Jan-18-2018
Title: A Journey into Iterators
Rust supports iterators. They are a fast, safe, lazy way of working with data structures, streams, and other more creative applications.
A Rust iterator implements the Iterator
trait. There are also traits from conversion from
and into iterators.
Example
fn main() {
let input = [1, 2, 3]; // define a set of values
let iterator = input.iter(); // create an iterator over the them.
let mapped = iterator.map(|&x| x * 2); // map iterator to a closure
let output = mapped.collect::<Vec<usize>>(); // convert iterator into a collection
println!("{:?}", output);
}
Iterators provide convenient methods such as .next()
for single element iteration, .take(n)
for
a batch pick and .skip(n)
for discarding n
elements.
Many of Rust's collection data structures support iterators e.g. Vectors, VecDeques, HashMaps.
Writing an iterator simply means implementing the Iterator
trait.
struct CountUp {
current: usize,
}
impl Iterator for CountUp {
type Item = usize;
// The only fn we need to provide for a basic iterator.
fn next(&mut self) -> Option<usize> {
self.current += 1;
Some(self.current)
}
}
Rust's ranges also implement the Iterator
traits.
Iterators can be merged, chained, even split into other iterators. They can also be filter
ed and
fold
ed.
fn main() {
let input = 1..10;
let output = input
.filter(|&item| item % 2 == 0) // Keep Evens
.map(|item| item * 2) // Multiply by two.
.fold(0, |accumulator, item| accumulator + item);
println!("{}", output);
}
Check out std::collections
and std::iter
Jan-19-2018
Title: Error Handling in Rust
Error handling is divided into two broad categories: exceptions and return values. Error handling in Rust is implemented via return values.
To unwrap
something is to say, "Gimme the result, else just panic and stop execution". The
Option
and Result
types implement the unwrap
method.
The Option
type is a way to use Rust's type system to express the possibility of absence.
enum Option<T> {
None,
Some(T),
}
Case analysis (through pattern matching) is used to get the value stored inside an
Option<T>
. The unwrap
method abstracts away the case analysis.
The Option
trait defines combinators that are useful in get rid of case analysis. A great example
is map
that maps a closure to the value inside of an Option<T>
let maybe_some_string = Some(String::from("Hello, World!"));
let maybe_some_len = maybe_some_string.map(|s| s.len());
assert_eq!(maybe_some_len, Some(13));
The unwrap_or
combinator is useful for providing a default value when an Option
value is
None
.
pub fn unwrap_or(self, def: T) -> T {
match self {
Some(x) => x,
None => def,
}
}
The and_then
combinator makes is to compose distinct computations, essentially by chaining.
fn square(x: u32) -> Option<u32> { Some(x * x) }
assert_eq!(Some(2).and_then(square).and_then(square), Some(16));
The Result
type is a richer version of Option
. It expresses the possibility of error.
enum Result<T, E> {
Ok(T),
Err(E),
}
Just like Option
, Result
implements lots of combinators including map
, map_err
, unwrap_or
and and_then
. The Result
type also supports aliases when dealing with many references to one
Result
. A common alias is io::Result
.
pub type Result<T> = result::Result<T, Error>;
Multiple errors can be handled by having a custom enum
to represent one of many
possibilities.
#[derive(Debug)]
enum CliError {
Io(io::Error),
Parse(num::ParseIntError),
}
The standard library defines two traits for error handing: std::error::Error
and
std::convert::From
. The first one for describing errors and the latter for conversion.
impl From<io::Error> for CliError {
fn from(err: io::Error) -> CliError {
CliError::Io(err)
}
}
The try!
macro is useful for encapsulating case analysis, control flow and error type
conversion.
Jan-20-2018
Title: Pretty State Machine Patterns in Rust
A State Machine is any machine which has a set of states and transitions defined between them. When we talk about a machine we’re referring to the abstract concept of something which does something.
States are a way to reason about where a machine is in its process. For example, we can think about a bottle filling machine. The machine is in a waiting state when it is waiting for a new bottle. Once it detects a bottle it moves to the filling state. Upon detecting the bottle is filled it enters the done state. After the bottle has left the machine we return to the waiting state.
+---------+ +---------+ +------+
| | | | | |
| Waiting +-->+ Filling +-->+ Done |
| | | | | |
+----+----+ +---------+ +--+---+
^ |
+-------------------------+
When designing a state machine in Rust, we ideally want these characteristics:
- Can only be in one state at a time.
- Each state should able have its own associated values if required.
- Transitioning between states should have well defined semantics.
- It should be possible to have some level of shared state.
- Only explicitly defined transitions should be permitted.
- Changing from one state to another should consume the state so it can no longer be used.
- We shouldn't need to allocate memory for all states.
- Any error messages should be easy to understand.
- We shouldn't need to resort to heap allocations to do this.
- The type system should be harnessed to our greatest ability.
- As many errors as possible should be at compile-time.
An approach to achieve is use a combination of generics, enum
s and shared values.
fn main() {
let mut the_factory = Factory::new();
the_factory.bottle_filling_machine = the_factory.bottle_filling_machine.step();
}
// This is our state machine for our Bottle Filling Machine
struct BFM<S> {
shared_value: usize,
state: S
}
// The following states can be the 'S' in StateMachine<S>
struct Waiting {
waiting_time: std::time::Duration,
}
struct Filling {
rate: usize,
}
struct Done;
// Our Machine starts in the 'Waiting' state.
impl BFM<Waiting> {
fn new(shared_value: usize) -> Self {
BFM {
shared_value: shared_value,
state: Waiting {
waiting_time: std::time::Duration::new(0, 0),
}
}
}
}
// The following are the defined transitions between states.
impl From<BFM<Waiting>> for BFM<Filling> {
fn from(val: BFM<Waiting>) -> BFM<Filling> {
BFM {
shared_value: val.shared_value,
state: Filling {
rate: 1,
}
}
}
}
impl From<BFM<Filling>> for BFM<Done> {
fn from(val: BFM<Filling>) -> BFM<Done> {
BFM {
shared_value: val.shared_value,
state: Done,
}
}
}
impl From<BFM<Done>> for BFM<Waiting> {
fn from(val: BFM<Done>) -> BFM<Waiting> {
BFM {
shared_value: val.shared_value,
state: Waiting {
waiting_time: std::time::Duration::new(0, 0),
}
}
}
}
// Here is we're building an enum so we can contain this state machine in a parent.
enum MachineWrapper {
Waiting(BFM<Waiting>),
Filling(BFM<Filling>),
Done(BFM<Done>),
}
// Defining a function which shifts the state along.
impl MachineWrapper {
fn step(mut self) -> Self {
self = match self {
MachineWrapper::Waiting(val) => MachineWrapper::Filling(val.into()),
MachineWrapper::Filling(val) => MachineWrapper::Done(val.into()),
MachineWrapper::Done(val) => MachineWrapper::Waiting(val.into()),
};
self
}
}
// The structure with a parent.
struct Factory {
bottle_filling_machine: MachineWrapper,
}
impl Factory {
fn new() -> Self {
Factory {
bottle_filling_machine: MachineWrapper::Waiting(BFM::new(0)),
}
}
}
Jan-21-2018
Title: Finding Closure in Rust
A closure is a function that can directly use variables from the scope in which it is defined. This is often described as the closure closing over or capturing variables.
Rust has C++11 inspired closures using the trait system, allowing for:
- allocation-less statically dispatched closures
- choice to opt-in to type-erasure and dynamic dispatch
Syntactically, a closure in Rust is an anonymous function value defined similar to Ruby, with pipes.
fn main() {
let mut v = [5, 4, 1, 3, 2];
v.sort_by(|a, b| a.cmp(b)); // closure `|arguments...| body`
assert!(v == [1, 2, 3, 4, 5]);
}
How do closures work? The definition of Option::map
(an example closure implementation) is:
impl<X> Option<X> {
pub fn map<Y, F: FnOnce(X) -> Y>(self, f: F) -> Option<Y> {
match self {
Some(x) => Some(f(x)),
None => None
}
}
}
The FnOnce
trait and its close cousins Fn
and FnMut
represent how the variables are captured by the
closure:
Fn
: by referenceFnMut
: by mutable referenceFnOnce
: by value
By default, the compiler looks at the closure body to see how captured variables are used, and uses
that to infers how variables should be captured, that is deciding between Fn
, FnMut
and
FnOnce
.
The move
keyword is used to define an escaping closure, one that might leave the stack frame
where it is created.
use std::thread;
thread::spawn(move || {
// some work here
});
The use of traits for closures allows one to opt-in into dynamic dispatch via trait objects:
let mut closures: Vec<Box<Fn()>> = vec![];
let text = "second";
closures.push(Box::new(|| println!("first")));
closures.push(Box::new(|| println!("{}", text)));
closures.push(Box::new(|| println!("third")));
for f in &closures {
f(); // first / second / third
}
Jan-22-2018-1
Title: Why is Rust difficult?
Rust is considered difficult to learn by many people. However, it's not necessarily a bad thing if you get something in return for the investment.
Rust aims to solve hard problems; it's therefore harder than languages that solve simpler problems, such as Lua.
On the other hand, if you declare upfront that your language needs to be able to solve any hard problem anyone thinks of, run fast and be safe to use then you’ll probably get your wish, but it’ll hardly be simple.
The Rust language is honest. It does not abstract away problems from the programmer; a beginner therefore has to grasp the complete complexity of the problem.
Rust is different. Features such as ownership, traits and lifetimes makes Rust stand out from other languages.
The compiler is very strict. The Rust compiler requires your code to be typed, memory safe and be free of data races. Even having dead code is an issue!
But at the same time, it makes learning it a bit harder, because it insists on you learning everything needed to write a good program. An average is not acceptable.
There is also a general perception that Rust is hard.
Despite all this, it easy to quickly come up to speed, find the language interesting, become a better programmer (the compile is a strict teacher) and meet some very nice people in the community.
Jan-22-2018-2
Title: Macros
Macros are a syntactic language feature. A macro use is expanded according to a macro definition. They are like functions, however, their expansions happens entirely at compile time. In Rust this after the parsing step of compilation.
Rust has a hygienic macro system - it preserves the scoping of the macro definition preventing shadowing of variables during macro expansion.
Macros can be implemented as simple textual substitution, by manipulating tokens after lexing, or
by manipulating the AST (Abstract Syntax Tree) after parsing. This also depends on how macros are
defined. macro_rules
in Rust allow for pattern matching of arguments in the macro definition.
Rust macros are lexed into tokens before expansion and parsed afterwards.
In a procedural macro system, a macro is defined as a program. When its use is encountered, the macro is executed with the macro arguments as input. The macro use is then replaced by the result of the execution.
proc_macro! foo(x) {
let mut result = String::new();
for i in 0..10 {
result.push_str(x)
}
result
}
fn main() {
let a = "foo!(bar)"; // Hand-waving about string literals.
}
will expand to
fn main() {
let a = "barbarbarbarbarbarbarbarbarbar";
}
A procedural macro is a generalisation of the syntactic macros described so far. One could imagine implementing a syntactic macro as a procedural macro by returning the text of the syntactic macro after manually substituting the arguments.
Jan-23-2018
Title: Macros in Rust pt1
Rust offers an array of macro features:
macro_rules!
macros- procedural macros
- built-in macros such as
println!
and#[derive()]
macro_rules!
lets you write syntactic macros based on pattern matching. They are used in a
function like style, the !
distinguishes them from regular functions.
macro_rules! hello {
() => { println!("hello world"); }
}
fn main() {
hello!();
}
The above gets expanded into:
fn main() {
println!("hello world");
}
To illustrate pattern matching in macros:
This code snippet
macro_rules! my_print {
(foo <> $e: expr) => { println!("FOOOOOOOOOOOOoooooooooooooooooooo!! {}", $e); };
($e: expr) => { println!("{}", $e); };
($i: ident, $e: expr) => {
let $i = {
let a = $e;
println!("{}", a);
a
};
};
}
fn main() {
my_print!(x, 30 + 12);
my_print!("hello!");
my_print!(foo <> "hello!");
}
Outputs
42
hello!
FOOOOOOOOOOOOoooooooooooooooooooo!! hello!
Jan-24-2018
Title: Macros in Rust pt2
Procedural macros are implemented using pure Rust code at the meta level. They are powerful since the output can be anything you can express as an Abstract Syntax Tree.
At the moment procedural macros are used to write custom derive. They also need to be in their own
crate of the proc-macro
crate type.
An example procedural macro implementation:
extern crate proc_macro;
extern crate syn;
#[macro_use]
extern crate quote;
use proc_macro::TokenStream;
#[proc_macro_derive(HelloWorld)]
pub fn hello_world(input: TokenStream) -> TokenStream {
// Construct a string representation of the type definition
let s = input.to_string();
// Parse the string representation
let ast = syn::parse_derive_input(&s).unwrap();
// Build the impl
let gen = impl_hello_world(&ast);
// Return the generated impl
gen.parse().unwrap()
}
The syn
and qoute
crates make it easier to write procedural macros.
Syn is a parsing library for parsing a stream of Rust tokens into a syntax tree of Rust source code.
The qoute!
macro is used to turn Rust syntax tree data structures into tokens of source code.
Error handling inside procedural macros is through the panic!
macro.
fn impl_hello_world(ast: &syn::DeriveInput) -> quote::Tokens {
let name = &ast.ident;
// Check if derive(HelloWorld) was specified for a struct
if let syn::Body::Struct(_) = ast.body {
// Yes, this is a struct
quote! {
impl HelloWorld for #name {
fn hello_world() {
println!("Hello, World! My name is {}", stringify!(#name));
}
}
}
} else {
//Nope. This is an Enum. We cannot handle these!
panic!("#[derive(HelloWorld)] is only defined for structs, not for enums!");
}
}
This post also borrows from the rusk book
Jan-25-2018
Title: Macros in Rust pt3
This post is about macro hygiene in Rust
A hygienic macro system preserves the scoping of a macro definition. The following examples illustrate that:
macro_rules! foo {
() => {
let x = 0;
}
}
fn main() {
let mut x = 42;
foo!();
println!("{}", x);
}
The above code prints 42
as the output. Here the x
defined in main
and the one in foo!
are
disjoint.
macro_rules! foo {
($x: ident) => {
$x = 0;
}
}
fn main() {
let mut x = 42;
foo!(x);
println!("{}", x);
}
This prints 0
. x
is passed in to the macro and then modified.
Rust is hygienic with respect to variables scoping, labels, feature gates and stability checks.
There are some limitations:
- hygiene only works on expression variables and labels.
- no hygiene is applied to lifetimes or type variables or types themselves.
- no hygiene with respect to privacy and safety i.e. in
unsafe
blocks.
The implementation of hygienic macro system is a bit complex. Much of the work happens during the macro expansion and name resolution phases of Rust code compilation.
The macro expansion phase is hygienic. During name resolution syntactic names are resolved into definitions. For unhygienic system, this resolutions is basically string equality. In Rust however, we consider identifiers(names) and a syntax context. To check if two identifiers are equal, they are resolved in the respective syntax contexts and the results compared. The syntax context is added to an identifier during the macro expansion phase.
The macro hygiene algorithm used in Rust comes from the Macros That Work Together paper. This mtwt algorithm is used during macro expansion. The whole syntax tree of a crate is walked, expanding macro uses and applying the algorithm to all identifiers.
Mtwt has two key concepts - marking and renaming. Marking is applied when we expand a macro and renaming when we enter a new scope. A syntax context under mtwt consists of a series of marks and renames.
Jan-26-2018
Title: Macros in Rust pt4
This post is about stuff around the import, export and modularisation of macros.
The order of items matters for macros. You can only refer to a macro after it is declared.
Macros defined inside a module cannot be used unless the module is annotated with #[macro_use]
. Macros defined before and outside a module can be used without importation.
Macros are encapsulated by crates - they must be explicitly imported and exported. When importing macros from a crate, you annotate extern crate
with #[macro_use]
.
Program representations that matter in macros:
- source text
- token trees
- abstract syntax tree
Source text is the text passed to the compiler. It's stored in a codemap and is immutable. Once it's lexed, it hardly used by the compiler. It's main use is in error messages.
Lexing is the first stage of compilation where source text is transformed into tokens.
Parsing a token tree gives an abstract syntax tree(AST). An AST is a tree of nodes representing the source text in a concrete way.
The three main phases of compilation are:
- parsing and expansion
- analysis, and
- code generation
libsyntax implements lexing, parsing and macro expansion. This is purely syntactic. The result of this is an AST. During lexing and parsing, macro use is left as is in the AST. The macro expansion phase walks the AST and performs substitutions while maintaining hygiene.
Spans (a span identifies a section of code in the source text) are used to tracing macros in order to highlight source code in error messages. Due to the possibility of nested macros, the are also expansion traces table that holds a trace of each macro expansion.
Jan-27-2018
Title: Virtual Structs Part 1: Where Rust's enum shines
This post is about why Rust enums are powerful.
Rust enums are more powerful compared to there equivalent in C++. The key difference between Rust and C++ is the size of the enum types. In Rust, the size of an enum instance is equal to the maximum of its variants, which means that we can pass enums around by value and know that we have enough space to store any variant of the enum. In contrast, when using classes in C++, the size of an enum instance will vary, depending on what specific variance it is. C++ requires passing around enums using a pointer, since we don't know how much space is need upfront.
This flexibility makes it easy to extend Rust's enums with ease.
enum ErrorCode {
FileNotFound,
UnexpectedChar
}
can be extended to
enum ErrorCode {
FileNotFound,
UnexpectedChar { expected: Vec<String>, found: char }
}
In C++ (or Java, Scala, etc) one would require some sort of class hierarchy (more complex) to achieve this.
Rust really relies deeply on the flat, uniform layout for enums. For example, every time you make a nullable pointer like
Option<&T>
, you are taking advantage of the fact that options are laid out flat in memory, whether they areNone
orSome
.
Jan-28-2018
Title: Virtual Structs Part 2: Classes strike back
This post is about the shortcomings of enums in Rust, vs a class based approach.
- Memory bloat
Enums have a size equal to their largest variant, this is sometimes a handicap. Avoid this memory bloat enum variants have to interned more so those with a lot of associated data. On the other hand a classes approach would allow exact memory size.
- Common fields
To have common fields across enums variants requires having a struct wrapper around the enum to provide the shared fields. This doesn't feel natural. An class-based approach would simply have a base class containing the common fields that is inherited by the variants.
- Initialization of common fields
If an enum has a lot of variants, being able to extract the common initialization code into a function is important. Since Rust lacks special constructors, there's no way to write a function that initializes only the common fields of the enum variants. C++ and Java use initialization based on constructors. This approach is problematic but it makes it helps abstract over the initialization of base class fields from subclass fields.
- Refinement types
Rust currently lacks a way to "refine" the type of an enum to indicate the set of variants that it might be i.e. we can't make each variant its own type. Classes allow you to achieve this through inheritance.
Jan-29-2018
Title: Virtual Structs Part 3: Bringing Enums and Structs Together
This post is about a proposal of bridging enums and structs in Rust.
Enums and structs can be generalized into a single concept i.e. an algebraic data type. Enums can be interpreted into a tree or hierarchy. This idea can be thought of as type hierarchy.
The Option
enum:
enum Option<T> {
Some(T), None
}
can be represented as:
enum Option<T>
|
+- struct None<T>
+- struct Some<T>
Enum declarations can also be extended with the ability to have fields as well as variants:
enum TypeData<T> {
// Common fields:
id: u32,
flags: u32,
...,
// Variants:
Int { },
Uint { },
...
}
Enums can also be made unsized, allowing each value to be sized to a particular variant.
One interesting question is whether we can concisely state conditions in which one would prefer to have “precise variant sizes” (class-like) vs “largest variant” (enum). I think the “precise sizes” approach is better when the following apply:
- A recursive type (like a tree), which tends to force boxing anyhow. Examples: the AST or types in the compiler, DOM in servo, a GUI.
- Instances never change what variant they are.
- Potentially wide variance in the sizes of the variants.
It would also be great to use pattern matching as an elegant downcasting mechanism.
Enums types can also be used as bounds - making them more capable and convenient.
Enums could also have an associated struct type that can be used as a constructor allowing initialization of common fields.
Finally enums could support subtyping and coercion - this is hard to get right.
Jan-30-2018
Title: Taking Rust everywhere with rustup
Cross-compilation allows you to develop on one "host" platform, but get a final binary that runs on a different "target" platform. Rust aims to make this a no-brainer using rustup.
At its heart, rustup is a toolchain manager for Rust. It can download and switch between copies of the Rust compiler and standard library for all supported platforms, and track Rust’s nightly, beta, and release channels, as well as specific versions.
rustup's handy commands:
- rustup install toolchain-name: allows you to install multiple toolchains eg
beta
,nightly
andstable
. - rustup show: shows the installed toolchains
- rustup update: updates the toolchains to the latest versions.
- rustup default: allows you to set the default toolchain
- rustup target: allows you to add/remove a target to a toolchain
Jan-31-2018
Title: The Problem With Single-threaded Shared Mutability
This post is about Rust's choice to disallow having multiple mutable aliases to the same data.
Rust uses a Read-Write lock pattern to prevent having multiple mutable aliases, the reasons for these are:
- Prevent memory unsafety
Rust enums can cause segfaults with the RWLock pattern.
enum StringOrInt {
Str(String),
Int(i64)
}
let x = Str("Hi!".to_string());
let y = &mut x;
if let Str(ref insides) = x {
*y = Int(1);
println!("x says: {}", insides);
}
Here, we invalidated the insides reference because setting x to Int(1) meant that there is no longer a string inside it. However, insides is still a reference to a String, and the generated assembly would try to dereference the memory location where the pointer to the allocated string was, and probably end up trying to dereference 1 or some nearby data instead, and cause a segfault.
- Prevent Iterator invalidation
let buf = vec![1,2,3,4];
for i in &buf {
buf.push(i);
}
Iterator invalidation involves consuming iterators whilst modifying their underlying dataset. The code snippet above would loop infinitely due to this. Rust prevents this from happening.
Apart from infinite looping, reference invalidation would occur once the vector overflows its capacity and is reallocated.
- Prevent data corruption/ race?
Aliasing with mutability in a sufficiently complex, single-threaded program is effectively the same thing as accessing data shared across multiple threads without a lock
The above reasons make it possible to write safe abstractions, even for generic code.
Feb-01-2018
Title: Rust Lifetimes for the Uninitialised
This post is about Lifetimes in Rust.
Ownership dictates that every piece of data is exclusively owned by another part of the program. Once ownership ends, the value is dropped. This gives us two points: introduction of the data into the program (initialisation) and removal (dropping). The interesting thing in Rust is that those two points are always clearly present. The range between those points is the region the binding and it's associated value is alive, its lifetime.
Rust provides has lifetime elision that makes lifetime annotations unnecessary for common use cases.
A common problem I see in trainings or the Hack & Learn is that presented with a lifetime problem, people start messing around with lifetime syntax. This is often the wrong approach. It is always the wrong approach if you don't fully understand what the compiler calls you out on. It has probably found an issue you haven't thought about, making a borrow invalid.
Explicit lifetimes are needed everywhere where situations are unclear, such as in
std::str.split
that iteratively hands out references to subslices of a string. Since
split
does not copy strings, lifetime annotations are required to ensure that the strings are not
dropped while the iterator still exists.
Remember the two golden rules:
- Don't fiddle with lifetime syntax until you understood what the compiler calls you out on
- Taking ownership (e.g. through cloning or using Box) isn't cheating
Feb-02-2018
Title: What Are Sum, Product, and Pi Types?
In its essence, a sum type is basically an "or" type. In Rust, they are enums.
enum Foo {
Str(String),
Bool(bool)
}
let foo = Foo::Bool(true);
// "pattern matching"
match foo {
Str(s) => /* do something with string `s` */,
Bool(b) => /* do something with bool `b` */,
}
The are called sum types because they are the sum of the constituents types. In the above enum
Foo = String + bool
.
Product types are usually contain every possible combinations of elements of their constituent types.
struct Foo {
x: bool,
y: u8,
}
The set of possible values of Foo
is {(x,y): x is a member of bool, y is a member of u8}
i.e. a
cartesian product, often represented as Foo = bool * u8
.
What is a Pi type?.
It’s essentially a form of dependent type. A dependent type is when your type can depend on a value.
// (the proposed rust syntax)
fn make_array<const x: u8>() -> Array<bool, x> {
// ...
}
make_array
is a function which can accept any integer and return a different type in each case.
You can view it as a set of functions, where each function corresponds to a different integer input.
struct make_array {
make_array_0: fn() -> Array<bool, 0>,
make_array_1: fn() -> Array<bool, 1>,
make_array_2: fn() -> Array<bool, 2>,
// ...
}
Feb-03-2018
Title: Mentally Modelling Modules
The module and import system in Rust is sometimes confusing to deal with, this post explains one right way of modelling it.
Two rules for resolving confusion on how imports work:
use foo::bar::baz
resolvesfoo
relative to the root modulelib.rs
ormain.rs
- you can resolve relative to the current module by explicitly using
use self::foo::bar::baz
- you can resolve relative to the current module by explicitly using
foo::bar::baz
within your code resolvesfoo
relative to the current module.- you can resolve relative to the root by explicitly using
::foo::bar::baz
- you can resolve relative to the root by explicitly using
How the import system works can be summed up as:
extern crate
anduse
will act as if they were defining the imported item in the current module, like a symbolic linkuse foo::bar::baz
resolves the path relative to the root modulefoo::bar::baz
in an inline path (i.e. not in ause
) will resolve relative to the current module::foo::bar::baz
will always resolve relative to the root moduleself::foo::bar::baz
will always resolve relative to the current modulesuper::foo::bar::baz
will always resolve relative to the parent module
Basic rules on module privacy:
- If you can access a module, you can access all its public contents.
- A module can access its child modules, but not recursively.
- A child module can access its parent module (and it parents) and all their contents.
Feb-04-2018
Title: Rust Tidbits: What Is a Lang Item?
Lang items are a way for the stdlib (and libcore) to define types, traits, functions and other items which the complier needs to know about.
The
rustc
compiler has certain pluggable operations, that is, functionality that isn't hard-coded into the language, but is implemented in libraries, with a special marker to tell the compiler it exists. The marker is the attribute#[lang = "..."]
and there are various different values of ..., i.e. various different 'lang items'. docs link
For example the expression x + y
is desugared into Add::add(x, y)
where the Add
trait is
marked as a lang item using #[lang = "add"]
. The compiler can then call it whenever it encounters
the addition operator.
#[lang = "add"]
#[stable(feature = "rust1", since = "1.0.0")]
pub trait Add<RHS=Self> {
...
}
Lang items are usually only required when you do an operation which needs them. Most lang items
are use in the standard library and thus will be available in your program, such as the add
item.
Skipping the standard library using #![no_std]
will usually trigger
compiler errors about required lang items.
#![no_std]
fn main() {
let x = 1 + 3;
}
errors out with:
error: language item required, but not found: `panic_fmt`
error: language item required, but not found: `eh_personality`
Basically, whenever the compiler needs to use special treatment with an item – whether it be dispatching calls to functions and trait methods in various situations, conferring special semantics to types/traits, or requiring traits to be implemented, the type will be defined in the standard library (libstd, libcore, or one of the crates behind the libstd façade), and marked as a lang item.
Feb-15-2018
Title: Some notes on Send and Sync
There are two parts to Rust's support for concurrency:
-
ownership and lifetimes
-
the
Send
andSync
traitsThese traits capture and control the two most common ways a piece of data be accessed and thrown around by threads, dictating whether it is safe to transfer ownership or pass a reference into another thread.
They are marker traits (have no methods and inherently do not provide any functionality) for certain invariants that types implementing them are expected to fulfill.
These invariants are:
- if type
T
isSend
, passing it by value to another thread should be safe. - if type
T
isSync
, passing its reference&T
to another thread should be safe.
These two traits enable a lot of useful concurrency and parallel patterns to be expressed while guaranteeing memory safety.
If a type T
implements both Sync
and Copy
, then it can also implement Send
.
Transferring a &mut T
between threads is guaranteed to be safe if T
implements Send
.
Feb-16-2018
Title: The Option Type
Optional types are basically compiler-enforced null-checks. If a method returns an Option, any caller of that method must deal with it in some way or the code won't compile. As a bonus, the Option type in the method's signature is a dead giveaway to developers that the method may return null.
fn get_shortest(names: Vec<&str>) -> Option<&str> {
if names.len() > 0 {
let mut shortest = names[0];
for name in names.iter() {
if name.len() < shortest.len() {
shortest = *name;
}
}
Some(shortest)
} else {
None
}
}
The function above returns an Option
which encloses either a String or None.
To get the enclosed String from the Option, we use its unwrap
method; this will however panic if
None is received.
A better way is to use a match
statement for unwrapping:
fn do_calculation(names: Vec<&str>) -> f64 {
match get_shortest(names) {
Some(x) => x.len() as f64 * std::f64::consts::PI,
None => // handle the null case here
}
}
... optionals are a nice tool to have. They communicate that a null return is possible, and force developers to handle that possibility one way or another.
Feb-19-2018
Title: The Sized Trait
Sized
is a marker trait that is automatically implemented. A type is considered sized if the
precise size of a value of type is known and fixed at compile time once the real types of the type
parameters are known i.e. after monomorphisation.
For example u8
is one byte and Vec<T>
is 12 or 24 bytes (32 vs 64 bit platforms).
Unsized types are known as dynamically sized types (DSTs) and include Trait
s and slices [T]
.
A slice represents an unknown number of T
s contiguous in memory. A Trait
represents a value of
any type that implements Trait
and these have varying sizes.
Unsized values must always be behind a pointer: &[T]
or Box<Trait>
. The pointer has the info
required to compute the size.
Sized types have more flexibility - compiler can manipulate them directly. Having an unsized type behind a pointer effectively makes it sized - size of the pointer.
Since Sized
is bound for type parameters by default, a special syntax ?Sized
is used to
opt-out.
This unusual decision was chosen because of the increased flexibility of sized types, and some data (which I now can’t find in the issue tracker) which indicated that most type parameters needed to be sized.