SQL SERVER – Sample Interview Questions and Answers (Series One) – Updated

I will be sharing a series of interview questions and answers. The topics will be focused on:

  1. Data Modeling
  2. Transact SQL
  3. Data Warehouse Design
  4. SQL Server Integration Services (SSIS)
  5. SQL Server Reporting Services (SSRS)
  6. SQL Server Analysis Services (SSAS)

Please feel free to share your ideas with me on how to improve this further. I will also update this as I come up with more questions and answers.

Data Modeling

Describe a primary key.

  • A primary key is a column that uniquely identifies a row in a table.
  • A table can only have one primary key
  • A primary key cannot contain any NULL value
  • You can combine multiple keys together to generate a single primary key
  • A unique clustered index is created when a primary key is created

Describe a unique key

  • A unique key is a column that uniquely identifies a row in a table
  • Unlike a primary key, a unique key can accept up to one NULL value
  • Unlike a primary key, a table can have more than one unique key
  • Unique non-clustered index is created when a unique key is created

Describe a foreign key

  • A foreign key helps to enforce referential integrity between related tables
  • It is a column in a table that references a primary key in another table
  • A foreign key can reference itself, that is the foreign key is acting as its own primary key (in the case of a unary degree relationship)
  • NULLs are allowed in a foreign key (not recommended)

What are the types of relationships in database design?

  • One-to-one
  • One-to-many
  • Many-to-many
  • One-to-fixed

What is Normalization?

Normalization is simply a step by step process of efficiently organizing data in a database based on some normalization rules.

Advantages of normalizing include:

  • Getting rid of insert, update and delete anomalies
  • Helps to improve the performance of DML operations (insert, update, delete)
  • Eliminating data redundancy

Normalization comes with some drawbacks like reducing the performance of read operations (SELECT)

What are the most common degrees of normalization?

  • First Normal Form (1NF): First normal form sets the very basic rules for an organized database. It is the most basic form of normalization

o   There should be no repeating rows. To ensure this, it means the table must have                     a primary key.

o   There should be no repeating columns. This means there cannot be two different                   columns with the same information.

o   All the attributes should be single-valued

o   All composite attributes should be broken down into its minimal component

  • Second Normal Form (2NF): A table is in its second normal form when

o   It satisfies the conditions for a first normal form

o   There are no partial dependencies. If any exist, they must be removed.

  • Third Normal Form (3NF): A table is in its third normal form when

o   It satisfies the conditions for a second normal form

o   There are no transitive dependencies. It any exist, they must be removed.

Posted in SQL Server | Tagged , , , | Leave a comment

Removing duplicate records from a table in SQL server

Sometimes as a SQL developer, you might find a need to detect and remove duplicate records from a table. Duplicates often occur due to improper database design. There are quite a few methods of eliminating duplicate records. The method you decide to use typically would depend on the situation you are faced with. In this article, I will share two methods of removing duplicate records. The first method involves removing duplicate records from a table without a UniqueID while the second method involves removing duplicate records from a table with a UniqueID.

Method One: Removing Duplicates from a table without a UniqueID

This approach is typically suitable for not too large datasets and where database space is not a concern as it involves moving the distinct records into a temporary table, truncating the original table, and then moving the records from the temporary table back into the original table. Before we continue, let’s create our sample dataset:

Taking a closer look at the table below, you will realize there are some duplicate records. For example, employee John Brown appears three times in the table.

Steps

1. Select DISTINCT records into temporary table.

2. TRUNCATE original table

3. Reinsert the records from the temporary table into the original table

4. Now, you can check the original table to verify that the duplicate records are gone.

You will notice that in our original dataset, we had 12 rows of data, this has been reduced to just 5 rows because we have been able to remove the duplicate records.

The above method is typically suitable for not too large dataset and where space is not much of a concern. You can also replace the temporary table with a permanent table to reduce the risk of losing all the data if you close out of the session.

Method Two: Removing duplicates from a table with a UniqueID 

This approach is my most preferred since most tables will contain some kind of uniqueID anyways. To demonstrate this method, I will be recreating the employee table (feel free to drop or rename the table we created in the first step). In this new table we’ll be creating, we’ll add an identity column, which will serve as our uniqueID.

Looking at the resultset above, you will realize that using a SELECT DISTINCT does not eliminate the duplicates. This is because all the records are unique due to the uniqueID. To handle this, we’ll use one of the ranking functions called the ROW_NUMBER() as well as a Common Table Expression (CTE).

Steps:

1. Create a CTE called Emp_CTE

You will notice that there is a new derived column called RecordCount. This column tells us the number of times a similar set of records appear in the table. If you look at the resultset below, if are only interested in the records with RecordCount = 1 (highlighted in yellow). The rest of the records (highlighted in red) need to go because they are duplicate records.

2. DELETE the duplicate records from the CTE

I will run the above CTE again but this time, instead of a SELECT statement, I will be performing a DELETE operation on the records that appear more than once.

3. Run a SELECT statement to verify that the duplicates are gone

From the resultset below, notice that the duplicates are all gone now.

Summary

Delete operations can be expensive, so it might be helpful to avoid duplicates in the first place by making sure that there are proper constraints and relationships on your database tables when creating your tables.

If you like what you read or if you have some suggestions, kindly comment on the post, I’d be very happy to hear from you. You can also visit www.eklipseconsult.com if you want to learn more on SQL Server and Business Intelligence.

Thanks for reading!

Posted in SQL Server | Tagged , , , , , | Leave a comment

SSIS – Handling Comma’s in CSV Export

At work, I often find myself exporting data in .csv format for our clients. I am talking about thousands of data, to the point that I never really noticed this issue.

I am talking about data that contain special characters like a comma for example.

So once in a while, I’d encounter names with commas in them. Names with suffixes like “Smith, Jr”, “Donald, Jr”, or “Elizabeth, III” and so on.

When exporting to a comma separated value .csv file using SQL Server Integration Services (SSIS), the flat file destination connection manager assumes that this comma is the end of the data and therefore “spills” the rest of the data into the next column. The problem with this is that it disrupts the arrangement of the output data thereby causing inconsistent and even incorrect output.

In the following example, I will be showing how to correct this issue using a text qualifier:

Before I continue, I will create my sample data below:

My result is shown below:

Notice how the last name has some suffix in it.

Next I will show the typical approach to exporting data from an OLEDB source into a .csv file in a serious of steps

Step One: Create a data flow task

Step Two: Create your Source and destination tasks

Step Three: Connect the data source. In my case, I am connecting to the employee table I created earlier which contains the last names including commas.

Step Four: Configure the flat file connection manager.

Notice that the highlighter text qualifier section has a default value <none>

Step Five: Complete and run your package

Step Six: Review the generated .csv file

You will notice that everything looks fine until column D (MiddleName). The suffix has “spilled” over to that column and the middle name has over shifted over to the BirthDate.  You will also notice that a new column has been added (column I) which has no header and was not part of the original employee table.

This is a problem and it is quite easy to solve.

The key is to utilize the text qualifier option in the flat file connection manager as shown in a couple of steps below.

Step Seven: Repeat the same steps as above (Step One to Step Three).

Step Eight: This time around, when you want to configure the flat file connection manager, replace the <none> in the text qualifier option with a quote (“) as highlighted in the image below.

Step Nine: Complete and run your package

Step Ten: Review the generated .csv file

You will notice that the data is returned as it was in the source table. The last name is displayed correctly without and spilling over.

Although this is a simple process, it is often overlooked by many developers. This is sometimes due to the volume of data we work with and sometimes because we often do not encounter names with suffix or maybe because we just never noticed.

I will be very happy to hear your feedback on this. Let me know your thoughts by adding them in the comment section.

Thanks for reading and I hope it helps someone out there.

 

Posted in SQL Server, SQL Server Integration Services (SSIS) | Tagged , , , , , , | Leave a comment

OUTPUT Clause (Transact -SQL)

When you perform a DML operation using an INSERT, UPDATE, DELETE or MERGE statement, an OUTPUT clause can be used to return information regarding the rows affected. You can find this useful when you need to archive, audit or even troubleshoot the result of DML operations performed on a table.

There are two keywords that can be “activated” when you specify the OUTPUT clause.

1. Inserted:
The inserted keyword is “activated” when you use INSERT with OUTPUT.
For example:
For this example, I will be creating a temporary table called “employee” and I will be inserting some records into the table.
With the OUTPUT clause, we’ll be able to display the records that were inserted into the table.

The OUTPUT clause in this case basically worked as a SELECT statement, giving us information about the data that was inserted into the table.

2. Deleted:

When the OUTPUT clause is used in conjunction with a DELETE statement, the deleted keyword is enabled.

For example:

Based on the sample we created above, I am going to delete records belonging to employee ID 5.

You can see that the resultset only shows us the data that was deleted and nothing more.

3. Inserted and Deleted:

Both the inserted and deleted keywords are enabled when the OUTPUT clause is used alongside an UPDATE statement.
For example:
I am going to update the salary of Pete who has an Employee ID “2” from $60,000 to $65,000. Here we will see how the OUTPUT clause behaves with an UPDATE statement.


From the resultset above, you can see that I was able to utilize both the inserted keyword and the deleted keyword. This is because an UPDATE is simply SQL Server deleting an old record and inserting a new record in its place

CONCLUSION
There are so many uses to the OUTPUT clause such as for archiving, auditing and informational purposes. You can even input the result from the OUTPUT into a table or even use it in a stored procedure.

Posted in Uncategorized | Leave a comment

Testing Crayon Syntax Highlighter

Since I will be writing a lot of SQL queries on this blog, I decided to try out one of the syntax highlighters out there. The one I am using is called Crayon Syntax Highlighter and it seems to do the what I want it to do quite well. So let’s go!

Above is a sample query from the Microsoft Adventureworks database. I am simply performing a join between the employee table and the person table in order to get the first name, last name, birth date, hire date and job title of all the employees working in the company.

Posted in Uncategorized | Leave a comment

Hello world!

Yes, Hello world!

I am excited to finally start this blog this year, 2017. I got the SQLAmerica.net domain in 2015 and I have not been able to get something started. I got a lot of motivation from Kenneth Fisher (www.sqlstudies.com) who told me to just start.

Now we are here. I am hoping that by this time next year, I will look back at this first post and be amazed at how much progress has been made. I will be super excited to see this happen.

So, what is SQL America all about? Honestly, I don’t know. But one thing is sure, I am believing this will be an opportunity for me to share the little I know as I learn on a daily basis. My blog might not become something big or extraordinary but one thing is sure, I will be very happy if just one person is helped from reading this blog.

With all that said, I want to welcome you to 2017 and welcome to my world of SQL America.

Hello World!

Posted in General, Uncategorized | Tagged , , | Leave a comment