Database Theory
- ACID
- Atomicity: Either all operations complete or nothing completes
- Consistency: Guarantees database invariants such as constraints, cascades and/or triggers
- Isolation: concurrent execution of multiple transactions produce same results irrespective of order in which they might get executed
- Durability: Once a transaction commits, it does not get lost due to a alter failure (written to disk)
- MVCC, alternative to locking, lets readers access previous, un-modified version of data that is being modified by writer
- Optimistic locking uses versioning to prevent Lost Update phenomenon
- Caches in disk controllers and disks can be write-through (writes are sent through) or write-back (writes are sent later on)
- Databases cannot guarantee integrity (such as that of commit or WAL) if there are write-back caches
- Every attribute depends on the key 1NF, the whole key 2NF and nothing but the key 3NF
- 1NF: Atomic Values in each cell (no repeating groups)
- 2NF: No partial key dependencies
- 3NF: No transitive dependencies
- BCNF (3.5NF): For every non-trivial FD \(X \to Y\), \(X\) is a superkey
- 3NF only mandates for non-key attributes that for every non-trivial FD \(X \to A\), \(X\) is superkey and \(A\) is a non-prime attribute
- For relations that don't have overlapping superkeys, it's same as 3NF.
- 3NF makes distinction between key and non-key attributes, whereas BCNF doesn't
- 3NF allows part of super-key to depend on some other part of super-key, whereas BCNF doesn't
- 4NF: Every multi-value dependency has a superkey
- 5NF: (Project-Join NF, PJNF) isolate semantically related multi-valued facts into separate tables
Relational Theory
- Superkey is a set of attribute that uniquely identifies a tuple
- Minimal superkey is minimum set of attributes that forms a superkey, defines a candidate key
- Trivial superkey is set all attributes in any relation
- Functional Dependency (): A set of attributes \(X\) determines the value of another set of attributes \(Y: X \to Y\)
- FD: \(X \to Y\) is a trivial FD, if \(Y\) is subset of \(X\)
- Prime attribute is an attribute found in a candidate key
Concurrency
- Read phenomena
- Dirty Reads: Read operation allowed on uncommitted data
- Non-repeated reads: Locks on Read (
SELECT) operations are released immediately allowing another transaction to modify the data
- Phantom reads: Only read rows are locked, allowing rows created later with the same criteria to be read back
- Isolation levels
| Isolation |
Dirty Read |
Non-repeatable Reads |
Phantom reads |
| Uncommitted |
Yes |
Yes |
Yes |
| Read Committed |
- |
Yes |
Yes |
| Repeatable Read |
- |
- |
Yes |
| Serializable |
- |
- |
- |
- Snapshot Isolation is used in MVCC systems, and prevents all anomalies except Phantom Reads
Distributed Databases
- CAP Theorem: Distributed data store can provide only two of the three:
- Consistency: Every read receives most recent write or an error
- Availability: Every read receives a response -- without the guarantee that it's the most recent
- Partition tolerance: system continues to operate even when network drops some of the messages between nodes
- RAFT: A protocol for distributed, replicated system with one leader and multiple follower nodes
- follower nodes replicate leader node's logs
- Only a majority (not all) of follower nodes are required for any operation to be successful
- nodes elect a leader when starting or when doesn't receive a heartbeat (append message)
- log replication
- client sends changes to leader
- leader updates log and sends log entry to followers on the next heartbeat
- if majority of followers acknowledge it, it's committed
- response is sent back to the client
- if network disruption causes network to segment into different partitions
- only on partition at most will have majority to commit the transaction
- when network is healed and communication is reestablished, all non-majority partitions will have
- its leader step down
- all nodes rollback their uncommitted logs and replicate log of the majority leader