Before beginning this tutorial, it is assumed that:
- You have installed a Relational Database Management System (RDBMS)
- This system uses the Structured Query Language
- You have configured the RDBMS
- You have read the documentation and know how to access the RDBMS
- You know how to execute a query statement within the RDBMS
Part I – What is SQL?
The goal of these tutorials is to teach you everything you need to know about SQL. We will start off easy, the first few tutorials will be extremely simple, and somewhat short. As we progress into the more detailed features of SQL, they will grow in both length and knowledge. I used Microsoft SQL Server for this tutorial. The queries presented within this tutorial, however, should work on any ANSI SQL-92 RDBMS. I hope you enjoy.
Abridged Background of SQL
The acronym, SQL, stands for Structured Query Language. It is commonly pronounced as ‘Sequel’, though this is not the proper way to pronounce it. SQL is simply pronounced, ‘S Q L’. You say each letter individually. It is an acronym, not an abbreviation; there by it has no spoken form. Now that you know how to pronounce it, let’s discuss what it is.
SQL is a Relational Calculus developed by Codd in the early 70’s. It is commonly used in Relational Database Management Systems (known throughout the rest of this tutorial as RDBMS). It is, without a doubt, the easiest ‘language’ you will ever learn. Actually knowing the language, and being able to use it, however, are two separate issues. There are very few keywords, phrases or ‘methods’ within SQL. The power behind your queries is not in the language, but the developer using the language. You will learn, throughout this tutorial and in general practice, that no matter how much you already know about relational data … their is always more to learn.
New concepts and theories are derived all the time, and with each new concept comes a new way of querying your data. You may start off by storing data in one design, and then later on move onto a more complex, yet fitting, design. This tutorial series will work with basic storage designs and rather simple queries. I will discuss advanced material later on, but it will only be for the particular purpose of showing you how to use a special feature.
There is no set standard for how your data should be stored; the only real standard is that the data be relational. Non-relational data will cause performance issues down the line. Though Non-relational designs are some times better suited for the job then relational designs; they are usually frowned upon. Larger databases, storing thousands of records or more, should always be relational. If the database is only storing a few records for simple data retrieval, the relational aspect of it may be very relaxed. I will discuss how to store and work with your data in both relational and non-relational methods. The purpose of doing so will be to show you the difference between a relational design, and a non-relational design.
Part II – Introduction to SQL
The Basics
I’d like you to imagine yourself as a fruit farmer for this introduction. As a fruit farmer, you produce Apples, Oranges, Bananas, Grapefruit, Strawberries and Raspberries. You harvest each fruit during a different season. Your seasons are Spring, Summer, Fall, Winter, Warm, Cold and Yearly. You have two fields, one is located at coordinates (1,1) on your farm map, the other being located at coordinates (2,1) on your farm map. Now, as a farmer, you are a businessman. You need to keep records of what you produced, what you harvested, what was good and what was bad.
You have an existing filing system that you use, and it works. The problem with the filing system though is, it uses too much paper to track items. You wish to consolidate your storage onto a computer. In doing this, you are now able to store your data using a smaller amount of space, while at the same time being able to retrieve your data much faster. You need to keep track of all these items, down to the last apple that fell from the basket on your way to the truck. To accomplish this marvelous feat of data storage, you’ve decided to purchase a new server machine. This server runs a Relational Database Management System (refered to as RDBMS throughout the rest of this text). Now you have to learn how to actually store your data in it, without creating an even larger mess.
Before we begin, we need to create a database. We do so by executing a query called a CREATE statement. What this statement does is create an entity (?? element, object ??) in the database. Our first CREATE statement is going to actually create the database itself, all of our CREATE statements, and all of our further queries will be actually be run within this database after it is created. To create the database, we execute the following statement:
CREATE DATABASE MyFarmData; |
After executing this statement, we now have a database named “MyFarmData” and are able to access it, and work within it. Most RDBMS’s have there own methods for changing the current database. Some common methods are listed below:
SQL Server – USE MyFarmData;PostgreSQL – \c MyFarmData |
After changing the currently active database to the one we just created, we are left with the dilema of “What to do next?”. Well, thats actually quite simple. We need to sit down and think about what we are going to have our database do for us. And guess what, you lucked out this time. I already sat down and thought about what we need our database to do for us, so let me share this with you;
- Our database needs to track all of our products, namely the fruits we harvest.
- Our database needs to track the quantity of the harvests
- Our database needs to track the location the fruits were harvest from (which field)
- Our database needs to track the season we harvested the fruits
Now, knowing that we need to keep track of all this stuff, we need to figure out “How”. Well, simply put. We need a location for this data to reside in. RDBMS’s store there information in what is called a Table. For lack of a better explanation, a Table is simply just that, a Table. Picture your old filing system, where did you put the documents when you needed to review them? You placed them on a table and spread them out so that you could view them easier. Well, RDBMS’s do the same thing, they place all the records into a table and lay them out so that it can find the data it needs, when it needs it, and how it needs it.
Given the above requirements for our database, and the knowledge that we need to store our data in a table, let’s start working on our CREATE TABLE statements. A CREATE TABLE statement is very similar in actions to a CREATE DATABASE statement, in the fact that it creates an element within your Database. The difference between CREATE TABLE and CREATE DATABASE is simply the syntax, and the parameters. A CREATE TABLE statement takes a series of parameters, and follows a set syntax. An example of such a statement follows:
CREATE TABLE fruits (fruit_id int,name varchar(64),field_x int,
field_y int, harvest varchar(64) ); |
Now, looking at that statement, we know that it is creating a Table named “fruits”, but what we do not know is what the information between the ( ) is for. Well, they are the column names and the column data types that the table contains. A column within a table is simply an item in which we are storing data. For instance, if you look at some of your old filing system records, you will notice that you have information which is similar throughout all the records, and is also placed in the same location in each record. These items could be the name of the fruit, the field that the fruit is located in, and the time in which you are to harvest this fruit. As such, the columns in the table mentioned above are also used for this purpose. The name column for example is where we store the name of the fruit. The field_x and field_y columns are where we store the X and Y coordinates of the field the fruit is located on. The harvest column is where we store the harvesting season for the fruit. Now, what we did not keep track of in our old filing system as an ID for the fruit, simply because each fruit had its own name, so we filed the fruits alphabetically in our cabinet. RDBMS’s like to file there records numerically, so, we are going to store our records in numeric order. We assigned a unique numeric digit to each of our fruits and we store that digit in the fruit_id column.
Ok, so now your sitting there going “yeah, yeah … we have a table and some columns … but what do we do with them?”. So lets move onto the INSERT INTO statement. Wait, INSERT INTO? We discussed the use of, and the purpose of the CREATE statement. It creates DATABASES, TABLES and a few other RDBMS dependant elements. Well, now we are going to discuss the INSERT statement. INSERT INTO does simply what the word says, it “inserts” data into a table. An example of an INSERT statement follows:
INSERT INTO fruits (fruit_id, name, field_x, field_y, harvest)VALUES (1, ‘Apple’, 1, 1, ‘Fall’); |
The above statement simply takes the data 1, Apple, 1, 1 and Fall and places it into the Table fruits. Now, what we did to make sure the data was placed into the proper columns was assigned the column name to the data before giving it the data. We did this by listing the columns in the fruits table that we wanted to assign values to directly after naming the table. I refer to this as the column list. So, basically, we have the keywords INSERT INTO, a table name, following by a list of columns which we want to assign values to within the record, followed by the keyword VALUES and then finally, the values list. Note the placement of the ( and ) within the statement. We ‘blocked’ our column list and our values list within an opening and closing parenthesis. Those of you familiar with programming in just about any language are familiar with the use and the purpose of the parenthesis. Those of you who are not, well, the parenthesis’s are basically statement that the column list belongs to the table name mentioned directly before the opening parenthesis and the closing parenthesis simply states the end of that block. We did the same for the values list.
NOTE: Strings need to be escaped using single quotes. To escape a single quote within a string, either double the quote or back-slash (\) it.
Now, if you were to execute the above query, you would have a single record in your fruits table, which had a fruit_id value of 1. The name column would then contain the value ‘Apple’, and so on and so forth. I believe this statement is quite simple, and does not need much explanation. Further information about this statement can be found in later chapters (?? tutorials ??) of this book. For now, I would like you to take the rest of our crops and “insert” them into the fruits table using the above as a template for your queries. Keep in mind, the value of fruit_id must be unique for each fruit. If you forgot what your crops were, then I would have to say you need to find a new day job. But, alas, you’d probably like me to give you the list again anyway, so here it is:
Apples, Oranges, Bananas, Grapefruit, Strawberries and Raspberries |
After you’ve inserted a record into the fruits table for each of the above crops, stop back and we’ll discuss how to work with that data a little bit.
Introduction to Select
Back so soon? Ok, so, let’s recap … we’ve created a table, and we’ve placed data inside the table … what next? Well, we need to retrieve the data, don’t we? I mean, what’s the point of putting it there if we can’t get it back later on when we need it, right? Let’s move onto the SELECT statement.
SELECT statements will play a large role in your use of the RDBMS. They are probably going to be the statement’s which you run the most, simply because it is the statement which allows you to retrieve the data you’ve placed in your tables. The simplest SELECT statement which you can write is quite easy to follow, and an example is shown below:
SELECT * FROM fruits; |
Those of you who are familiar with console based operating systems (DOS, Linux, BSD, UNiX, etc) and/or searching for files, are aware that the Asterisk
has a very simple purpose. The Asterisk simply means ‘all’, or ‘anything’. So, now that we know what an Asterisk is for and what it means, let’s look at the query. SELECT is telling the RDBMS what type of operation to perform, FROM is telling the RDBMS where to perform this operation, fruits is where we are performing the operation, and Asterisk is telling it to grab all columns. The above query could also be re-written as:
SELECT fruit_id, name, field_x, field_y, harvest FROM fruits; |
This would return the exact same results as the previous statement using the Asterisk. The latter of the two, uses a column list to determine what columns we want to retrieve. This allows us to retrieve some columns, while not retrieving ‘all’ of them. This is helpful for when, say, we want to find out what fruits we produce, but we do not care about there ID, there field location, or the harvest time. Now, say we wanted to find out what fruits needed to be harvested in the Fall season. We could write something like:
SELECT * FROM fruits WHERE harvest = ‘Fall’; |
This would return the same results as the two queries before it, except that it would limit the results quite a bit. It would return all the columns for all the records where the value of the harvest column was ‘Fall’. Now, let’s try a query on our own. Retrieve only the records from the fruits table that are to be harvested in Spring, limit the display to showing only the name and the field location (we know the harvest, and care not for the unique identifier).
If you came up with:
SELECT name, field_x, field_y FROM fruits WHERE harvest = ‘Spring’; |
You are correct, if not, re-read the Introduction to Select and try again. No cheating now!
Introduction to Delete
Ok, so, let’s recap again … we can create databases and tables, we can place data within our tables, and now we know how to retrieve the data. We also know how to retrieve only the data which we want (in the simplest form). Now, what if we stop producing a particular crop because our fields can no longer support growing it, or because our customers are tired of Banana’s? Well, if that were the case, we may want to remove these records from the database. Correct? Ok, we’ll lets assume you want to remove a record from the fruits table. We can do this using the DELETE statement. The DELETE statement shares a similarities with the SELECT statement. DELETE also utilizes the WHERE clause. This allows you to identify exactly which records you wish to remove. An example is as follows:
DELETE FROM fruits WHERE fruit_id = 1; |
Now, given our knowledge of SELECT, and our short intro to DELETE we know that the statement above is going to remove records from the fruits that contain the value 1 in the fruit_id table. Now, given that we know fruit_id contains unique values (or it should, I told you to make sure they were unique), we know that only one record will be removed. This is the record that contains all of the information for our ‘Apple’ crop. Now, why did we just remove the Apple crop? Gee, I wonder, I personally like apples, let’s remedy this and add apple’s back into the fruits table by running an INSERT statement. After you’ve placed Apples back into the fruits table, stop back and we’ll go into further detail as to how to write more complex WHERE clauses.
WHERE my data?
Yeah, yeah, the title makes me laugh too, let’s move on. Simply put, the WHERE clause (mentioned briefly earlier) is going to be the most important part of your queries. The WHERE clause is what ensures your data is being dealt with properly. It limits your SELECT, DELETE and your UPDATE (we’ll get to this next…). If you remove the WHERE clause from your SELECT or DELETE statements, the statement will execute on every single record in the table. This is usually not something you want to do. Simply put;
DELETE FROM fruits; |
The above statement will remove every single record in your fruits table. When will you ever want to do this? Maybe when you decide to stop being a fruit farmer and start getting into the IT industry, thats a possibility, but hey, that’s a long shot, right? So, let’s get into the nitty gritty with the WHERE clause.
The WHERE clause in its simplest form is quite basic, you simply say WHERE, and then list your ‘requirements’. A requirement would be something like name = ‘Apple’. That’s a basic requirement, but in the event you want to be a bit more specific, like say, you have three fruits located on the field in the X and Y offset of (1,0). Now, lets picture two crops, each of the same fruit, but each is also located on both fields. Now, how would we retrieve all crops;
WHERE name = ‘Apple’ AND fiel_x = 1 AND field_y = 0 |
Well, lets see … I think we just did. That’s it, you simply say AND. If, say, you wanted to OR the record, you could also say;
WHERE name = ‘Apple’ OR name = ‘Orange’ AND fiel_x = 1 AND field_y = 0; |
What do you think that will return? If you guessed that it would return all records where the name was either Apple or Orange, and the crop was located in the field with X/Y offset (1,0) then you are correct. If you thought it would return something else, well, your wrong.
To continue on further, we need a bit more data in your database. We simply don’t have enough to allow us to continue further, so, lets add more data to the system.
Part III – Working with the proper Data
A Closer look at WHERE
As mentioned previously, the WHERE clause plays a crucial role in any database developers work. The proper usage and understanding of the WHERE clause is critical. This chapter will focus completely on the WHERE clause, and in the end, will, hopefully, give you the proper understanding of the WHERE clause that you need.
To start, let’s discuss the exact purpose of the WHERE clause. When you are querying the RDBMS to retrieve, modify, or remove data records from tables, you almost certainly do not want it to perform the given actions on every single record it finds, so what you do is define the records to alter simply by giving it a WHERE clause. The WHERE clause limit the records which are affected to those which return true in nature to the constraints held within the WHERE clause. For instance, if you wanted all the data that contained an A in the column, then you would perform something such as this:
WHERE field LIKE ‘%A%’ |
Now, the above example uses something new to us, the LIKE statement. LIKE is very useful when you want to find string references within a small character field. LIKE will, and usually does, tend to be quite a slow operation when you are working with large character fields, or large text objects. Some RDBMS’s prohibit the use of LIKE on a large text object (such as the text datatype, or a BLOB datatype — both explain later).
So, now we know we can use both mathematical and string comparison’s within a WHERE clause, let’s look at how we could use them together:
WHERE field_x = 1 and field_y = 0 and name LIKE ‘A%’ |
Now, the above is performing two mathematical operations and a single string comparison. The difference between this string comparison and the one previous is that there is only one % symbol. The % in a LIKE statement is similar in context to an Asterisk (*) in a DOS environment, it stands for ‘anything after’, so ‘A%’ would be an A and then anything following it, so all records where the name column contained an A as the first character of the string, would be returned as true.
Take a look at this example;
WHERE x = 1 or x = 2 or x = 3 or x = 4 |
Thats quite a short list of possibilities, but what if we wanted to have a possible 150 matching numeric values for the x column? Are we to place 150 OR’s into the query? Of course not, who would ever want to write that many OR’s? To over come this dilemma, we use something called an IN statement. IN simple allows us to do the following:
WHERE x IN (1,2,3,4,5,6,7,8,9,100,101,102,103,104,105,106,107,108,109) |
Now that looks a lot cleaner, doesn’t it?
Are you asking yourself if IN is for numeric comparison’s only? Well, if so, tell yourself no. IN can be used with character columns as well, for example;
WHERE name in (‘Apples’,’Bananas’,’Oranges’) |
That would return true for all records where the name column contained either Apples, Bananas or Oranges. Better yet, it would return all columns where Apples, Bananas, or Oranges was IN the name column.
One Response
I receive these blog posts trough google alerts. always great articles and introduction for non techies. excellent work.