DBMS

Relational databases, SQL, normalization

Practice →

Database Management Systems (DBMS) is software that enables users to create, manage, and manipulate databases. It provides an interface between users/applications and the database, ensuring data integrity, security, and efficient retrieval through structured query languages like SQL. Understanding DBMS fundamentals is essential for technical interviews and software development roles.

Complexity / Key Facts

Number of tuples in Cartesian Product: R x S = |R| x |S|\text{Number of tuples in Cartesian Product: R x S = |R| x |S|}
Maximum rows in Natural Join: min(|R|, |S|)\text{Maximum rows in Natural Join: min(|R|, |S|)}
Number of attributes in Union/Intersection: Same as R (if union-compatible)\text{Number of attributes in Union/Intersection: Same as R (if union-compatible)}
1NF: All attributes contain atomic (indivisible) values\text{1NF: All attributes contain atomic (indivisible) values}
2NF: 1NF + No partial dependency (non-prime attributes depend on full candidate key)\text{2NF: 1NF + No partial dependency (non-prime attributes depend on full candidate key)}
3NF: 2NF + No transitive dependency (non-prime attributes don’t depend on other non-prime attributes)\text{3NF: 2NF + No transitive dependency (non-prime attributes don't depend on other non-prime attributes)}
BCNF: For every FD X -> Y, X must be a superkey\text{BCNF: For every FD X -> Y, X must be a superkey}
ACID Properties: Atomicity, Consistency, Isolation, Durability\text{ACID Properties: Atomicity, Consistency, Isolation, Durability}

Key Concepts

Data Models and Database Architecture

The three-schema architecture consists of: (1) External Level - user views and how data appears to specific users; (2) Conceptual Level - logical structure of the entire database including entities, relationships, and constraints; (3) Internal Level - physical storage structure and access methods. Common data models include: Hierarchical (tree structure), Network (graph structure), Relational (tables with rows and columns), and Object-Oriented. The Relational Model uses tables (relations) where rows are tuples and columns are attributes, with keys (Primary, Foreign, Candidate, Super) defining relationships.

SQL Operations and Query Processing

SQL commands are categorized as: DDL (Data Definition Language) - CREATE, ALTER, DROP, TRUNCATE for schema modification; DML (Data Manipulation Language) - SELECT, INSERT, UPDATE, DELETE for data operations; DCL (Data Control Language) - GRANT, REVOKE for permissions; TCL (Transaction Control Language) - COMMIT, ROLLBACK, SAVEPOINT. Important SQL operations include: JOINs (INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF), subqueries (correlated and non-correlated), aggregate functions (COUNT, SUM, AVG, MAX, MIN), GROUP BY with HAVING clause, and set operations (UNION, INTERSECT, EXCEPT). Query optimization involves indexes, query rewriting, and understanding execution plans.

Normalization and Functional Dependencies

Normalization is the process of organizing data to reduce redundancy and improve integrity. Functional Dependency (FD): X -> Y means X uniquely determines Y. Closure of attributes (F ) finds all attributes functionally determined by a set. Normal Forms in order: 1NF eliminates repeating groups; 2NF eliminates partial dependencies (non-prime attributes must depend on entire candidate key); 3NF eliminates transitive dependencies; BCNF requires every determinant to be a superkey. Higher forms (4NF, 5NF) handle multi-valued and join dependencies. Decomposition must be lossless (natural join recovers original) and dependency-preserving (all FDs are preserved in decomposed relations).

Transaction Management and Concurrency Control

A transaction is a logical unit of work with ACID properties: Atomicity (all-or-nothing execution), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed changes persist). Concurrency issues include: Lost Update, Dirty Read, Non-Repeatable Read, Phantom Read. Locking mechanisms: Binary Locks (S/X), Shared/Exclusive Locks, Two-Phase Locking (2PL - growing then shrinking phase ensures serializability). Deadlock handling: Prevention (wait-die, wound-wait), Avoidance (Banker's algorithm), Detection (wait-for graph), Recovery. Timestamp ordering assigns unique timestamps to transactions. Recovery techniques: Deferred Update (redo only), Immediate Update (undo-redo), Shadow Paging, Checkpoints.

Indexing and Query Optimization

Indexes speed up data retrieval at the cost of storage and maintenance overhead. Types: Primary Index (on ordering key field), Secondary Index (on non-ordering fields), Clustered Index (determines physical order), Non-clustered Index (separate structure with pointers), Dense Index (entry for every record), Sparse Index (entry for each block). B-Trees and B+ Trees are balanced tree structures for efficient searching, insertion, and deletion. Hash indexes use hash functions for direct access. Query optimization involves: cost-based selection of access paths, join ordering, use of indexes, materialized views, and understanding query execution plans. The optimizer estimates costs based on statistics (cardinality, selectivity, distribution).

ER Model and Database Design

The Entity-Relationship (ER) Model is a high-level conceptual data model. Components: Entities (objects with independent existence - strong/weak), Attributes (simple, composite, multi-valued, derived, key), Relationships (associations between entities with cardinality ratios: 1:1, 1:N, M:N). ER Diagram notation: rectangles for entities, ellipses for attributes, diamonds for relationships, lines connecting them with cardinality markers. Extended ER features: Specialization (top-down, ISA hierarchy with disjoint/overlapping constraints), Generalization (bottom-up), Aggregation (treating relationship as entity). Converting ER to Relational: Entities become tables, M:N relationships become junction tables, weak entities include owner's key, ISA hierarchies use single table, multiple tables, or separate relations.

Tips

  • Always identify candidate keys first when analyzing normal forms - every normal form check depends on knowing what the keys are
  • For SQL queries involving multiple tables, draw the join diagram mentally: INNER JOIN returns matching rows only, LEFT JOIN returns all from left with NULLs for non-matches
  • In transaction questions, check for serializability using precedence graph (conflict serializability) or view equivalence (view serializability)
  • Remember: BCNF is stricter than 3NF. A relation in BCNF is automatically in 3NF, but not vice versa. Decomposition to BCNF may lose some functional dependencies
  • When calculating index selectivity: High cardinality columns (many unique values) benefit more from B+ tree indexes; low cardinality may favor bitmap indexes
  • For ER-to-Relational conversion: 1:N relationships place the '1' side key on the 'N' side; M:N relationships always become a separate table with both keys

Practice with questions

Real placement-style technical questions.

Start Exercise →