SQL Server Database Keys: PrimaryKey, NaturalKey, ForeignKey, UniqueKey, CandidateKey, AlternateKey, CompositeKey, SuperKey, SurrogateKey
Hi Guys,
How are you doing?
Many a times during interviews I notice that candidates know lots of things but they really hesitate while talking about “SQL Server Database Keys“. And even if they know about keys, they fail to provide proper answers with examples. So, today I would like to summarize few definitions which every database developer should be aware of.
I am not going to explain each in details but this will sure give you enough confidence to generate your own examples during interviews π
I am going to use the following table for example reference in below definitions. Forget about the proper table design because we need only these columns for demo.
Table name : Employee (Parent Table)
Columns :
- Id (Identity column of table )
- Emp_Id (Employee’s Id Number given by a company/organization)
- Emp_SSN (Employee’s Social Security Number)
Table Name : EmployeeDetails (Child Table)
Columns :
- id_EmployeeDetails (PrimaryKey of table)
PrimaryKey : A PrimaryKey is made up of single or multiple columns so it can uniquely identify a record from a table. A table can have only 1 PrimaryKey. This key does not accept NULL values. Best practice suggests that every table in database should have PrimaryKey. We need to do analysis while creating a table that what column should be PrimaryKey. That depends of what values that table is going to contain. E.g. A table which contains employee details of a company then Employee_Id is considered to be a good PrimaryKey for this table. But there are some cases when we never know the variance of data each column could contain, then we can use a column with simply auto-increment number E.g. Identity Column. Clustered index is created only on PrimaryKey column. PrimaryKey is by default UniqueKey. When you declare a column as PrimaryKey then “NOT NULL” constraint is applied to that column implicitly.
NaturalKey: A column value which is a real-world object as well as implemented as PrimaryKey of a table and if it can uniquely identifies each record/details of table then that kind of PrimaryKey is NaturalKey of table. E.g. in real-world we know that SSN in a country or Emp_Id in company/organization/firm will be unique, so such columns are considered as NaturalKeys.
ForeignKey : A ForeignKey is made up of one or more columns in a table and is referenced in another table as PrimaryKey. That means, if parent table has ForeignKey id_EmployeeDetails, then id_EmployeeDetails is the PrimaryKey of EmployeeDetails table (child/details table). During the normalization process of database, it plays big role. It is also used to implement business constraints on the data we store in tables.
UniqueKey : A UniqueKey provides similar motive i.e. to identify records uniquely from table. But unlike PrimaryKey, UniqueKey accept 1 NULL value (because if you see, if a column have 1 value as NULL then it is unique automatically in the whole column. isnβt it ? :)). UniqueKey can be applied to one more columns hence you can have multiple UniqueKeys for a table. Non-clustered indexes can be created on these type of key columns. Generally these type of keys are used on tables to enforce general or business constraints. E.g. a general constraint – value in the column should not contain duplicate values. A business constraint – the value (amount) of a column should be >= 500 USD. When you declare a column as UniqueKey then “NOT NULL” constraint is NOT applied to that column implicitly because UniqueKey can accept 1 NULL value in the entire column.
CandidateKey : A CandidateKey is made up of one or more columns which can uniquely identify records in a table. A table can have multiple CandidateKeys. CandidateKeys are the successors of PrimaryKey. And one of these CandidateKeys is considered for PrimaryKey of a table. E.g. CandidateKeys for an employee table could be Identity column, Emp_Id, Employee_SSN etc.
AlternateKey : All available CandidateKeys apart from the PrimaryKey in a table are considered as AlternateKeys. Functionally, Alternatekey is same as CandidateKey. E.g. Emp_Id, Emp_SSN are CandidateKeys but if we use Emp_Id as PrimaryKey of a table then Emp_SSN will be AlternateKey for the table.
CompositeKey : A CompositeKey is nothing but a PrimaryKey/UniqueKey combined with other column(s) of table in order to identify records uniquely from a table. CompositeKey may or may not be a SuperKey.
SuperKey : When a PrimaryKey is made up of more than one columns of table then it is considered as SuperKey of the table. SuperKey is a superset of PrimaryKey. Therefore a PrimaryKey is known as “Minimum SuperKey” of a table. When we combine one or more columns to identify unique records from a table, then that combination is called SuperKey. Table can have multiple SuperKeys. (Sometimes people say, a combination of UniqueKey and non-UniqueKey creates SuperKey of table. But if you think properly then you will find out that any column of table, combined with UniqueKey will be unique only.)
SurrogateKey : The motive of SurrogateKey is same as PrimaryKey/CandidateKey i.e. to identify records uniquely from a table. But SurrogateKey column does not bear any business meaning as a key. Identity column of a table is the best example of SurrogateKey. This column uniquely identifies the records from a table but does not have a business meaning for the records it identifies. Sometimes we choose identity column as a PrimaryKey which is beneficial in many ways. But choosing SurrogateKey as PrimaryKey is purely a database design decision. For relational databases, we should use meaningful PrimaryKeys like Emp_Id or Emp_SSN rather than just identity column. But for Data warehousing/OLAP storage, we should use SurrogateKeys like identity column for a table. In OLAP databases, SurrogateKey of a dimension table is referred in fact table. If we choose identity column as a SurrogateKey in a dimension table then it will be easy to maintain relationships between dimension and fact tables. But if we use PrimaryKey (other than identity column of dimension table) of dimension table in fact table for reference then we may lose efficiency of joins as well as key management while loading the fact tables would be tedious operation. As a best practice we let SurrogateKey be auto-generated and managed by SQL Server itself. Sometimes SurrogateKeys are also referred as MeaningLessKeys, SubstituteKeys, Non-NaturalKeys, or ArtificialKeys π
HouseKey : Dude, this is the key of you house which is used to unlock your house-door π
Please feel free to explore more details on Google and MSDN. I hope you have enjoyed this article.
Thanks, Khilit
Thank you for awesome post. it helped me a lot π Thanks, Janet
Really useful post. Nicely written. Thanks, ron
Really it was very useful me.