Are you getting ready for a database administrator position? Is SQL knowledge a prerequisite for the position that you are applying for? If the answer is yes, then you probably need to brush up your database skills once again. This post is all about getting you ready for the interview process. The following are 10 common yet challenging SQL interview questions that you must get ready for. The question set is broken into 3 sections – easy, intermediate, and hard.
SQL is the most widely used language for relational database systems. SQL stands for Structured Query Language that is used for manipulating data residing in the database. SQL can be used for manipulating data using a plethora of different operations including (but not limited to) union, intersection, deletion, update, and division.
The tables are the most basic unit of an RDBMS. Tables consist of rows (typically called a record) and columns (typically called a field).
A DBMS (Database Management System) is responsible for efficient and systematic storage, retrieval, manipulation, and update of data stored in the databases. Some examples include Microsoft SQL Server, Oracle, MongoDB, Apache Cassandra, etc.
An RDBMS (Relational Database Management System) is a special type of DBMS that uses the relational model to store the data. It uses the traditional tabular structure to store data. Two or more tables are connected (or related) by common fields. The majority of the database management systems (including Oracle, Microsoft SQL Server, and MySQL) used by the organizations are of this type.
The uniqueness constraint (responsible for the creation of a unique key) ensures that all the values belonging to a single column are unique i.e. different. This constraint helps identify each record uniquely. The main difference between the primary key and a unique key is that you can define one and only one primary key for a table, whereas there can be any number of unique keys in the table.
The join relational operation is used to combine the records of two or more tables that are related by some common field(s). Keys play a huge role in how the records are joined. SQL supports mainly 4 kinds of joins namely the inner join (or simply, join), left outer join (or left join), right outer join (or right join), and full outer join (or outer join). Each of these joins usually yields different results and are preferred one to another depending on the needs.
Intermediate questions
A foreign key is a set of one or more keys that essentially refer to the primary key in another table. Foreign keys are crucial to maintaining the referential integrity of a database.
Self-join is a special type of join where the join is applied to itself. The join is applied on the basis of some relation defined in terms of the fields. It usually uses the inner or left outer join.
A query is a self-contained and runnable SQL command responsible for performing a specific task (e.g. retrieve, delete, or update some data). A query usually cannot be broken further and still be runnable.
Clauses are used to give the users more power to fine-tune their query and filter or rearrange the desired records. ‘Group by’, ‘where’, and ‘having’ are some popular clauses available in SQL.
An alias is a temporary name that can be assigned to a table (or view) or a field. In most of the cases, an alias is used for reducing the visibility of sensitive data and increasing the query readability.
Views are virtual tables consisting of a subset of data of some tables. Views provide, as the name suggests, an efficient way to represent intermediate data possibly generated from the result-set of a SQL query. A view can contain rows and fields just like any other table in the DBMS. The fields present in the view are usually from one or more real tables.
Hard questions
Normalization is a systematic and usually an iterative process of removing dependency and redundancy from the tables. Normalization helps immensely inefficient addition, deletion, update (manipulation as a whole), and storage of data. Being an iterative process, the level of normalization of a table(s) is usually denoted by the normal forms.
Denormalisation is the complete opposite of the normalization. Using denormalization, the normal form of the data is lowered and some systematic redundancy is fed into the database. This plays a major role in improving the performance of the query processor.
Data integrity plays a huge role in maintaining the consistency, accuracy, and reliability of the data. The data integrity is usually maintained through defining a number of hand-picked constraint(s) which usually guarantee the integrity of the data. It is critical for any system related to storing, manipulation, or retrieval of data. The enforced constraint(s) make sure that the database remains consistent and accurate right from the very start (usually from the very beginning when the data is fed into the database).
Cursors are a special type of control structure that is used for manipulation of records. A cursor acts as a pointer to a record. They are crucial to traversal, retrieval, and removal of records.
Stored Procedures are very much like a subroutine or function in a traditional programming language – they are created by the database administrator to perform some very specific task. A stored procedure consists of one or more SQL commands. They are key to providing a layer of insulation between the end-user (or any user that does not have the administrative permission) and the core database. It makes sure that the user does not get direct access to the data – presumably sensitive or confidential.
A recursive stored procedure is a special type of stored procedure where the stored procedure uses recursion. The procedure calls itself until some boundary condition is fulfilled which is defined in the procedure.
‘Like’ is a common operation that is used for pattern matching. There are some other characters such as ‘%’ and ‘_’ (both without the single quotes) that have special meaning in this context. ‘_’ stands for exactly 1 and ‘%’ stands for 0 or more characters.
Online Transaction Processing, or OLTP in short, is a class of software that is used primarily for efficient data insertion, retrieval, manipulation, and processing. One key distinctive feature of OLTP software is its ability to perform concurrent operations. Decentralization is one of the preferred modes of action for avoiding a single-point failure. This ensures both a more secure and reliable database system.
OLTP is related to another class of software called Online Analytical Processing (OLAP) that are used for similar tasks, although there is a fundamental difference between these two. OLTP involves usually short, high-frequency transactions whereas OLAP is built specifically for tackling low-frequency, but complex (often involving aggregation operations) transactions.