Short Questions Answers for Database System subject

Database Systems — Short Questions and Answers

1. What is data?

Raw facts, figures, or symbols that can represent anything like text, images, or numbers.

2. What are the three types of data?

Structured, unstructured, and semi-structured data.

3. What is structured data?

Data organized in tables with rows and columns, like spreadsheets or SQL databases.

4. Give an example of unstructured data.

Images, videos, social media posts, or emails.

5. What is semi-structured data?

Data with tags or metadata, such as XML or JSON files.

6. What are the five V’s of data?

Volume, Variety, Velocity, Veracity, and Value.

7. Why is data important?

It supports decision-making, innovation, personalization, and automation.

8. What is a database?

An organized collection of data stored and accessed electronically.

9. How is a database different from a file system?

Databases allow complex queries and relationships; file systems do not.

10. Name the components of a database system.

Data, DBMS, Database Application, and Database Administrator (DBA).

11. List types of databases.

Relational, NoSQL, Object-Oriented, Hierarchical, and Network databases.

12. Who introduced the Relational Model?

E. F. Codd in 1970.

13. What is a DBMS?

Software that manages databases and facilitates interaction with users and applications.

14. What is a schema?

The logical structure defining how data is organized in a database.

15. What is normalization?

Organizing data to reduce redundancy and improve integrity.

16. What are ACID properties?

Atomicity, Consistency, Isolation, and Durability — ensuring reliable transactions.

17. Give examples of real-world uses of databases.

Banking, healthcare, retail, and education systems.

18. What is database architecture?

The structural design of a DBMS showing how data is stored, organized, and accessed.

19. Name two types of database distribution.

Centralized and decentralized databases.

20. What is a centralized database?

All data stored in one location managed by a single entity.

21. What is a decentralized database?

Data distributed across multiple nodes or locations (e.g., blockchain).

22. What are the main functions of a DBMS?

Data storage, retrieval, updating, and security management.

23. What are the three levels of database architecture?

Data layer, application layer, and presentation layer.

24. What is one-tier architecture?

Database, application, and interface reside on a single machine.

25. What is two-tier architecture?

A client connects directly to a database server (client-server model).

26. What is three-tier architecture?

Consists of presentation, application, and database layers — used in modern web apps.

27. List one advantage of three-tier architecture.

Improves scalability, security, and deployment efficiency.

28. What is a client-server architecture?

A model where clients request services from a database server.

29. Who developed the Relational Data Model?

E. F. Codd in 1970.

30. What is a relation in a relational model?

A table that represents data.

31. What is a tuple?

A single row in a table.

32. What is an attribute?

A column representing a data field in a table.

33. What is a domain?

The set of valid values for an attribute.

34. What is a primary key?

A unique identifier for each record in a table.

35. What is a foreign key?

A field in one table that references the primary key in another table.

36. Define entity integrity.

Ensures no primary key field is NULL.

37. Define referential integrity.

Ensures foreign keys match existing primary keys.

38. Define domain integrity.

Ensures attribute values come from a valid domain.

39. List advantages of the relational model.

Simple, flexible, consistent, and supports relationships.

40. Give an example of a relational connection.

The “Dept” in STUDENT table is a foreign key referencing “DeptID” in DEPARTMENT.

40. Give an example of a relational connection.

The “Dept” in STUDENT table is a foreign key referencing “DeptID” in DEPARTMENT.
This creates a parent-child relationship enforcing referential integrity between students and departments.

41. What is a relation schema?

A relation schema defines a table's name and its attributes (columns).
It specifies attribute names, types, and constraints but not the actual data.

42. What is a relation instance?

A relation instance is the set of tuples (rows) in a table at a specific time.
It represents the current data snapshot for the relation schema.

43. Define a domain in the relational model.

A domain is the set of permissible values for an attribute.
It constrains type, range, or enumerated values to ensure data validity.

44. What is a primary key?

A primary key is a minimal attribute (or set) that uniquely identifies each row.
It cannot be NULL and is used as the main identifier for the table.

45. What is a candidate key?

A candidate key is a minimal superkey with no redundant attributes.
Any candidate key can serve as a primary key; non-chosen ones are alternate keys.

46. What is a super key?

A super key is any set of attributes that uniquely identifies rows in a relation.
It may contain extra attributes and is not necessarily minimal.

47. What is an alternate key?

An alternate key is a candidate key not chosen as the primary key.
It still guarantees uniqueness and can be indexed or referenced by foreign keys.

48. Define a composite key.

A composite key is a primary key made of two or more attributes combined.
It uniquely identifies rows when no single attribute is sufficient.

49. What is a foreign key?

A foreign key is an attribute in the child table that references a primary key in the parent table.
It enforces referential integrity, ensuring related records exist in the parent.

50. What is referential integrity?

Referential integrity ensures foreign key values match existing primary key values or are NULL.
It prevents orphaned child records and maintains consistent links across tables.

51. What is a NULL value?

NULL represents the absence of a known value for an attribute in a tuple.
It is different from zero or empty string and affects comparisons and aggregates.

52. What is a unique constraint?

A unique constraint enforces that all values in a column or column set are distinct.
It allows NULLs depending on DBMS rules but prevents duplicate non-NULL entries.

53. What is a check constraint?

A check constraint enforces a boolean condition on column values for each row.
It validates domain rules like ranges (e.g., age BETWEEN 0 AND 120) on insert/update.

54. What is a surrogate key?

A surrogate key is an artificial, system-generated identifier (e.g., auto-increment id).
It has no business meaning and simplifies joins and key stability.

55. What is a natural key?

A natural key is a real-world attribute (or set) used to uniquely identify records.
It carries domain meaning but can change and thus may be less stable than a surrogate key.

56. What is normalization?

Normalization is the process of organizing data to reduce redundancy and anomalies.
It uses normal forms (1NF, 2NF, 3NF, BCNF, etc.) to structure relations properly.

57. What is 1NF (First Normal Form)?

1NF requires that each attribute value be atomic and each tuple consistent with the schema.
It forbids repeating groups and nested relations in a single table.

58. What is 2NF (Second Normal Form)?

2NF requires a table be in 1NF and that every non-key attribute is fully functionally dependent on the whole primary key.
It eliminates partial dependencies in tables with composite primary keys.

59. What is 3NF (Third Normal Form)?

3NF requires 2NF and that non-key attributes depend only on the primary key, not on other non-key attributes.
It removes transitive dependencies to reduce redundancy.

60. What is BCNF (Boyce–Codd Normal Form)?

BCNF is a stricter form where every determinant must be a candidate key.
It resolves certain anomalies that 3NF doesn't fully address in complex dependency cases.

61. What are insertion, update, and deletion anomalies?

These anomalies occur when redundancy and poor design cause inconsistency on insert/update/delete.
Normalization reduces such anomalies by separating concerns into related tables.

62. Define functional dependency.

A functional dependency X → Y means attribute set X uniquely determines attribute set Y.
It is the basis for identifying keys and normalizing relations.

63. What is denormalization and why use it?

Denormalization intentionally adds redundancy to improve read performance or simplify queries.
It trades storage and potential anomalies for faster joins and reduced query complexity.

64. What is a join in SQL?

A join combines rows from two or more tables based on related columns (usually keys).
Common types include INNER JOIN, LEFT/RIGHT OUTER JOIN, FULL OUTER JOIN, and CROSS JOIN.

65. What is an INNER JOIN?

INNER JOIN returns rows where there is a match in both joined tables based on the join condition.
Unmatched rows from either table are excluded from the result set.

66. What is a LEFT OUTER JOIN?

LEFT JOIN returns all rows from the left table and matched rows from the right table; unmatched right rows are NULL.
It's used to keep parent rows even when children are absent.

67. What is a RIGHT OUTER JOIN?

RIGHT JOIN returns all rows from the right table and matched rows from the left; unmatched left rows are NULL.
It mirrors LEFT JOIN but retains all rows from the right-side table.

68. What is a FULL OUTER JOIN?

FULL JOIN returns matched rows plus unmatched rows from both tables with NULLs where there is no match.
Not all DBMSs implement it efficiently, but it shows complete set union of both sides.

69. What is a CROSS JOIN?

CROSS JOIN returns the Cartesian product of the two tables (every combination of rows).
Use carefully—result size multiplies rows from both tables and can be huge.

70. What is a self-join?

A self-join joins a table to itself to compare rows within the same table.
It requires aliasing the table to differentiate the two roles in the query.

71. What is a correlated subquery?

A correlated subquery references outer query columns and executes per outer row.
It can be less efficient than equivalent joins but is useful for row-by-row logic.

72. What is a scalar subquery?

A scalar subquery returns a single value (one row, one column) usable in expressions.
DBMS will throw an error if it returns more than one row in that context.

73. What are set operations in SQL?

Set operations combine result sets: UNION, UNION ALL, INTERSECT, and EXCEPT (or MINUS).
They require compatible column counts and types across the combined queries.

74. What is the difference between UNION and UNION ALL?

UNION removes duplicate rows from the combined result set; UNION ALL preserves duplicates.
UNION ALL is faster because it skips duplicate elimination.

75. What are transactions?

A transaction is a sequence of DB operations treated as a single logical unit of work.
It ensures atomicity: either all operations succeed (commit) or none take effect (rollback).

76. Define ACID properties.

ACID stands for Atomicity, Consistency, Isolation, Durability—core transaction guarantees.
They ensure reliable processing even under failures and concurrency.

77. What is isolation level?

Isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE) control visibility of concurrent transactions.
Higher isolation reduces anomalies but can decrease concurrency and throughput.

78. What is a deadlock?

A deadlock is a cyclic wait where two or more transactions block each other indefinitely.
DBMSs detect and resolve deadlocks by aborting one transaction to break the cycle.

79. What is locking?

Locking controls concurrent access to data with shared (read) and exclusive (write) locks.
Proper locking prevents corruption but can cause contention and reduce parallelism.

80. What is an index?

An index is a data structure that speeds up data retrieval by providing fast lookup paths.
Indexes improve SELECT performance but add overhead on INSERT/UPDATE/DELETE and consume space.

81. Difference between clustered and non-clustered index?

A clustered index defines the physical order of rows in a table; there is typically one per table.
Non-clustered indexes store pointers to rows and can be many per table for different access patterns.

82. What is a covering index?

A covering index contains all columns needed to satisfy a query, avoiding access to the base table.
It greatly improves performance for those specific queries at the cost of larger index size.

83. What is a view?

A view is a virtual table defined by a stored query; it presents data from one or more tables.
Views can simplify complex queries, encapsulate logic, and provide a security layer.

84. When is a view updatable?

A view is updatable if DBMS can map INSERT/UPDATE/DELETE operations to underlying base tables unambiguously.
Complex views with joins, aggregates, or GROUP BY are often non-updatable.

85. What is a materialized view?

A materialized view stores the query result physically and must be refreshed periodically.
It speeds up reads for costly aggregations at the cost of storage and maintenance overhead.

86. What are triggers?

Triggers are automatic procedures executed in response to table events (INSERT/UPDATE/DELETE).
They enforce complex integrity rules or auditing but can introduce hidden behavior and performance costs.

87. What is a stored procedure?

A stored procedure is a precompiled set of SQL statements stored in the DB for reuse.
It encapsulates logic close to the data, can accept parameters, and improves performance and maintainability.

88. What is a function in SQL?

A SQL function returns a single value and can be used in expressions; it should be deterministic and side-effect free ideally.
Functions differ from procedures in that they return values and are callable in queries.

89. What is data integrity?

Data integrity ensures accuracy, consistency, and validity of data across the database.
It is enforced through keys, constraints, transactions, and application logic.

90. What is schema evolution?

Schema evolution is the process of changing a database schema over time (add/remove columns, alter types).
It requires migration strategies to preserve data, minimize downtime, and maintain compatibility.

91. What is sharding?

Sharding partitions data across multiple machines by a shard key to scale horizontally.
It improves performance and capacity but complicates queries, transactions, and rebalancing.

92. What is replication?

Replication copies data across multiple servers for availability, read scaling, or fault tolerance.
It can be synchronous or asynchronous, with trade-offs between consistency and latency.

93. What is eventual consistency?

Eventual consistency means updates will propagate and converge across replicas given enough time.
It favors availability and partition tolerance over immediate consistency in distributed systems.

94. What is ACID vs BASE?

ACID emphasizes strict transactional guarantees; BASE (Basically Available, Soft state, Eventual consistency) relaxes them for scalability.
BASE is common in distributed NoSQL systems where high availability matters more than strict consistency.

95. What is an ER diagram?

An Entity-Relationship (ER) diagram models entities, attributes, and relationships for database design.
It helps convert conceptual models into relational schemas during database design.

96. What is cardinality in relationships?

Cardinality indicates the number of related entities (one-to-one, one-to-many, many-to-many).
It guides schema design and whether to use foreign keys or associative (join) tables.

97. How are many-to-many relationships represented?

Many-to-many relationships are implemented using a junction table containing foreign keys to both related tables.
The junction table may have its own primary key or a composite primary key of the two foreign keys.

98. What is an associative entity?

An associative entity is a table that represents a relationship and can hold attributes about the relationship itself.
It converts many-to-many relationships into two one-to-many relationships for normalization.

99. What is query optimization?

Query optimization is the DBMS process of choosing an efficient execution plan for SQL statements.
It uses statistics, indexes, and rewrite rules to minimize I/O and CPU for query execution.