Notes: TAO
Notes from Facebook's TAO white paper
Objects table
id: 1234
otype: PERSON
data: {
name: John
}
id: 2314
otype: PERSON
data: {
name: Jane
}
id: 6372
otype: PET
data: {
kind: DOG,
name: Wolfie
}
id: 7265
otype: HOME
data: {
address: {}
bedrooms: 3
bathrooms: 2
}
Associations table
id: ??
atype: FRIEND_FAMILY
from: 1234
to: 2314
time: 127457635365
data: {
potential-extra-data: ...
}
id: ??
atype: HOMES
from: 1234
to: 7265
time: 34346253234
data: {
}
SELECT Objects.data FROM Objects INNER JOIN Associations ON Objects.id = Associations.finish WHERE Associations.start = 1234 AND Associations.atype = 'FRIENDS_FAMILY'
API Reads
Point queries
obj_get
assoc_get
Range queries
assoc_range
assoc_time_range
Count queries
assoc_count
API Writes
Create, update, delete Objects
obj_add
obj_update
obj_del
Set and Delete for Associations
assoc_add
assoc_del
Function Signatures
assoc_add(id1, atype, id2, time, (k→v)*)
Adds or overwrites the association (id1, atype,id2), and its inverse (id1, inv(atype), id2) if defined.
assoc_delete(id1, atype, id2)
Deletes the association (id1, atype, id2) and the inverse if it exists.
assoc_change_type(id1, atype, id2, newtype)
Changes the association (id1, atype, id2) to (id1, newtype, id2), if (id1, atype, id2) exists.
assoc_get(id1, atype, id2set, high?, low?)
returns all of the associations (id1, atype, id2) and their time and data, where id2 = id2 set and high ≥ time ≥ low (if specified). The optional time bounds are to improve cache-ability for large association lists
assoc_count(id1, atype)
returns the size of the association list for (id1, atype), which is the number of edges of type atype that originate at id1.
assoc_range(id1, atype, pos, limit)
returns elements of the (id1, atype) association list with index i === (pos, pos + limit).
assoc_time_range(id1, atype, high, low, limit)
returns elements from the (id1, atype) association list, starting with the first association where time ≤ high, returning only edges where time ≥ low.
More details
TAO enforces a per-atype upper bound (typically 6,000)
“50 most recent comments on Alice’s checkin”
assoc_range(632, COMMENT, 0, 50)
“How many checkins at the GG Bridge?”
assoc_count(534, CHECKIN)
The TAO API is mapped to a small set of simple SQL queries.
It is important to consider the data accesses that doesn't use the API.
These include backups, bulk import and deletion of data, bulk migrations from one data format to another, replica creation, asynchronous replication, consistency monitoring tools, and operational debugging.
By default all object types are stored in one table, and all association types in another.
The TAO in-memory cache contains objects, association lists, and association counts. We fill the cache on demand and evict items using a least recently used (LRU) policy.
If for whatever reason inverse associations are not fully created hanging associations are scheduled for repair by an asynchronous job.