Notes: TAO

Notes from Facebook's TAO white paper

Objects table

id: 1234  
otype: PERSON  
data: {  
   name: John
}
id: 2314  
otype: PERSON  
data: {  
  name: Jane  
}   
id: 6372  
otype: PET  
data: {  
  kind: DOG,  
  name: wolfie  
}
id: 7265  
otype: HOME  
data: {  
  address: {}  
  bedrooms: 3  
  bathrooms: 2  
} 

Associations table

id: ??  
atype: FRIEND_FAMILY  
from: 1234  
to: 2314  
time: 127457635365  
data:  {
  potential-extra-data: ...  
}
id: ??  
atype: HOMES  
from: 1234  
to: 7265  
time: 34346253234  
data: {
    
}
SELECT Objects.data FROM Objects INNER JOIN Associations ON Objects.id = Associations.finish WHERE Associations.start = 1234 AND Associations.atype = 'FRIENDS_FAMILY'

API Reads

Point queries

obj_get

assoc_get

Range queries

assoc_range

assoc_time_range

Count queries

assoc_count

API Writes

Create, update, delete Objects

obj_add

obj_update

obj_del

Set and Delete for Associations

assoc_add

assoc_del

assoc_add(id1, atype, id2, time, (k→v)*)

Adds or overwrites the association (id1, atype,id2), and its inverse (id1, inv(atype), id2) if defined.

assoc_delete(id1, atype, id2)

Deletes the association (id1, atype, id2) and the inverse if it exists.

assoc_change_type(id1, atype, id2, newtype)

Changes the association (id1, atype, id2) to (id1, newtype, id2), if (id1, atype, id2) exists.

assoc_get(id1, atype, id2set, high?, low?)

returns all of the associations (id1, atype, id2) and their time and data, where id2 = id2 set and high ≥ time ≥ low (if specified). The optional time bounds are to improve cache-ability for large association lists

assoc_count(id1, atype)

returns the size of the association list for (id1, atype), which is the number of edges of type atype that originate at id1.

assoc_range(id1, atype, pos, limit)

returns elements of the (id1, atype) association list with index i === (pos, pos + limit).

assoc_time_range(id1, atype, high, low, limit)

returns elements from the (id1, atype) association list, starting with the first association where time ≤ high, returning only edges where time ≥ low.

TAO enforces a per-atype upper bound (typically 6,000)

“50 most recent comments on Alice’s checkin”

assoc_range(632, COMMENT, 0, 50)

“How many checkins at the GG Bridge?”

assoc_count(534, CHECKIN)

The TAO API is mapped to a small set of simple SQL queries.

It is important to consider the data accesses that doesn't use the API.

These include backups, bulk import and deletion of data, bulk migrations from one data format to another, replica creation, asynchronous replication, consistency monitoring tools, and operational debugging.

By default all object types are stored in one table, and all association types in another.

The TAO in-memory cache contains objects, association lists, and association counts. We fill the cache on demand and evict items using a least recently used (LRU) policy.

If for whatever reason inverse associations are not fully created hanging associations are scheduled for repair by an asynchronous job.