Skip to content

Database Theory

  • ACID
    • Atomicity: Either all operations complete or nothing completes
    • Consistency: Guarantees database invariants such as constraints, cascades and/or triggers
    • Isolation: concurrent execution of multiple transactions produce same results irrespective of order in which they might get executed
    • Durability: Once a transaction commits, it does not get lost due to a alter failure (written to disk)
  • MVCC, alternative to locking, lets readers access previous, un-modified version of data that is being modified by writer
  • Optimistic locking uses versioning to prevent Lost Update phenomenon
  • Caches in disk controllers and disks can be write-through (writes are sent through) or write-back (writes are sent later on)
    • Databases cannot guarantee integrity (such as that of commit or WAL) if there are write-back caches

Normal Forms

  • Every attribute depends on the key 1NF, the whole key 2NF and nothing but the key 3NF
  • 1NF: Atomic Values in each cell (no repeating groups)
  • 2NF: No partial key dependencies
  • 3NF: No transitive dependencies
    • BCNF (3.5NF): For every non-trivial FD \(X \to Y\), \(X\) is a superkey
    • 3NF only mandates for non-key attributes that for every non-trivial FD \(X \to A\), \(X\) is superkey and \(A\) is a non-prime attribute
    • For relations that don't have overlapping superkeys, it's same as 3NF.
    • 3NF makes distinction between key and non-key attributes, whereas BCNF doesn't
    • 3NF allows part of super-key to depend on some other part of super-key, whereas BCNF doesn't
  • 4NF: Every multi-value dependency has a superkey
  • 5NF: (Project-Join NF, PJNF) isolate semantically related multi-valued facts into separate tables

Relational Theory

  • Superkey is a set of attribute that uniquely identifies a tuple
    • Minimal superkey is minimum set of attributes that forms a superkey, defines a candidate key
    • Trivial superkey is set all attributes in any relation
  • Functional Dependency (): A set of attributes \(X\) determines the value of another set of attributes \(Y: X \to Y\)
    • FD: \(X \to Y\) is a trivial FD, if \(Y\) is subset of \(X\)
  • Prime attribute is an attribute found in a candidate key

Concurrency

  • Read phenomena
    • Dirty Reads: Read operation allowed on uncommitted data
    • Non-repeated reads: Locks on Read (SELECT) operations are released immediately allowing another transaction to modify the data
    • Phantom reads: Only read rows are locked, allowing rows created later with the same criteria to be read back
  • Isolation levels
Isolation Dirty Read Non-repeatable Reads Phantom reads
Uncommitted Yes Yes Yes
Read Committed - Yes Yes
Repeatable Read - - Yes
Serializable - - -
  • Snapshot Isolation is used in MVCC systems, and prevents all anomalies except Phantom Reads

Distributed Databases

  • CAP Theorem: Distributed data store can provide only two of the three:
    • Consistency: Every read receives most recent write or an error
    • Availability: Every read receives a response -- without the guarantee that it's the most recent
    • Partition tolerance: system continues to operate even when network drops some of the messages between nodes
  • RAFT: A protocol for distributed, replicated system with one leader and multiple follower nodes
    • follower nodes replicate leader node's logs
    • Only a majority (not all) of follower nodes are required for any operation to be successful
    • nodes elect a leader when starting or when doesn't receive a heartbeat (append message)
    • log replication
      • client sends changes to leader
      • leader updates log and sends log entry to followers on the next heartbeat
      • if majority of followers acknowledge it, it's committed
      • response is sent back to the client
    • if network disruption causes network to segment into different partitions
      • only on partition at most will have majority to commit the transaction
      • when network is healed and communication is reestablished, all non-majority partitions will have
        • its leader step down
        • all nodes rollback their uncommitted logs and replicate log of the majority leader