SQL allows you to access only certain parts of your data at a time so you don’t have to download the data into a CSV, manipulate it, and possibly overload Excel.
In other words, SQL takes care of the data analysis that you may be used to doing in Excel.
(If you want to dig into this aspect of SQL a bit more, here is a blog post to get you started.
)How to Write Simple SQL QueriesUnderstand the hierarchy of your databaseBefore you get started, it’s important to become accustomed to your database and its hierarchy.
If you have multiple databases of data, you’ll need to zero in on the location of the data you want to work with.
For example, let’s pretend we’re working with multiple databases about people in the United States.
Type in the query “SHOW DATABASES;”.
Our results may show that you have a couple of databases for different locations, including one for New England.
Within your database, you’ll have different tables containing the data you want to work with.
Using the same example above, let’s say we want to find out which information is contained in one of the databases.
If we use the query“SHOW TABLES in New-England;”we’ll find that we have tables for each state in New England:people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
Finally, you need to find out which fields are in the tables.
Fields are the specific pieces of data that you can pull from your database.
For example, if you want to pull someone’s address, the field name may not just be “address” — it may be separated into address_city, address_state, address_zip.
In order to figure this out, use the query “Describe people_massachusetts;”.
That will provide a list of all of the data that you can pull using SQL.
Let’s do a quick review of the hierarchy using our New England example:Our database is: NewEngland.
Our tables within that database are: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
Our fields within the people_massachusetts table include: address_city, address_state, address_zip, hair_color, first_name, and last_name.
Now, to learn how to write a simple SQL query, let’s use the following example:Who are the people who have red hair in Massachusetts and were born in 2003 organized in alphabetical order?SELECTSELECT chooses the fields that you want to be displayed in your chart.
This is the specific piece of information that you want to pull from your database.
In the example above, we want to find the people who fit the rest of the criteria.
Here is our SQL query:SELECT first_name, last_nameFROMFROM pinpoints the table that you want to pull the data from.
In the earlier section, we found that there were six tables for each of the six states in New England: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
Because we’re looking for people in Massachusetts specifically, we’ll pull data from that specific table.
Here is our SQL query:SELECT first_name, last_nameFROM people_massachusettsWHEREWHERE allows you to filter your query to be more specific.
In our example, we want to filter our query to include only people with red hair who were born in 2003.
Let’s start with the red hair filter.
Here is our SQL query:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”hair_color could have been part of your initial SELECT statement if you’d wanted to look at all of the people in Massachusetts along with their specific hair color.
But if you want to filter to see only people with red hair, you can do so in the WHERE statement.
ANDAND allows you to add additional criteria to your WHERE statement.
Remember, we want to filter by people who had red hair in addition to people who were born in 2003.
Since our WHERE statement is taken up by the red hair criteria, how can we filter by a specific year of birth as well?That’s where the AND statement comes in.
In this case, the AND statement is a date property — but it doesn’t necessarily have to be.
(Note: Be to check the format of your dates with your product team to make sure it is in the correct format.
)Here is our SQL query:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”AND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’ORDER BYWhen you create SQL queries, you shouldn’t have to export the data to Excel.
The calculation and organization should be done within the query.
That’s where the “ORDER BY” and “GROUP BY” functions come in.
First, we’ll look at our SQL queries with the ORDER BY and then GROUP BY functions, respectively.
Then, we’ll take a brief look at the difference between the two.
Your ORDER BY clause will allow you to sort by any of the fields that you have specified in the SELECT statement.
In this case, let’s order by last name.
Here is our SQL query:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”AND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’ORDER BY last_name;GROUP BY“GROUP BY” is similar to “ORDER BY,” but it will aggregate data that has similarities.
For example, if you have any duplicates in your data, iyou can use “GROUP BY” to count the number of duplicates in your fields.
Here is your SQL query:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”AND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’GROUP BY last_name;ORDER BY VS.
GROUP BYTo clearly show you the difference between an “ORDER BY” statement and a “GROUP BY” statement, let’s step outside our Massachusetts example briefly to look at a very simple dataset.
Below is a list of four employees’ ID numbers and names.
If we were to use an ORDER BY statement on this list, the names of the employees would get sorted in alphabetical order.
The results would look like this:If we were to use a GROUP BY statement, the employees would be counted based on the number of times they appeared in the initial table.
Note that Peter appeared twice in the initial table.
The results would look like this:With me so far?.Okay.
Let’s return to the SQL query we’ve been creating about red-haired people in Massachusetts who were born in 2003.
LIMITDepending on the amount of data you have in your database, it may take a long time to run the queries.
It can be frustrating if you find yourself waiting a long time to run a query that you didn’t really want to begin with.
If you want to test our query, the LIMIT function is a great one to use because it allows you to limit the number of results you get.
For example, if we suspect there are millions of people who have red hair in Massachusetts, we may want to test out our query using LIMIT before we run it in full to make sure we’re getting the information we want.
Let’s say, for instance, we only want to see the first 100 people.
Here is our SQL query:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”AND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’ORDER BY last_nameLIMIT 100;That’s it for the basics!Feeling good?.Here are a few other ways to take your SQL queries up a notch.
Bonus: Advanced SQL TipsNow that you have mastered how to create a SQL query, let’s walk through some other tricks that you can use to take it up a notch, starting with the asterisk.
*When you add an asterisk to one of your SQL queries, it tells the query that you want to include all the columns of data in your results.
In the example we’ve been using, we’ve only had two column names: first_name and last_name.
But let’s say we had 15 columns’ worth of data that we want to see in our results — it would be kind of a pain to type out all 15 column names in the SELECT statement.
Instead, if you replace the names of those columns with an asterisk, the query will know to pull all of the columns into the results.
Here’s what the SQL query would look like:SELECT *FROM people_massachusettsWHERE hair_color = “red”AND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’ORDER BY last_nameLIMIT 100;LAST 30 DAYSOnce I started using SQL regularly, I found that one of my go-to queries involved trying to find which people took an action or fulfilled a certain set of criteria within the last 30 days.
Since this type of query was so useful for me, I wanted to share that capability with you.
Let’s pretend today is December 1, 2014.
You could create these parameters by making the birth_date span between November 1, 2014 and November 30, 2014.
That SQL query would look like this:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”AND birth_date BETWEEN ‘2014–11–01’ AND ‘2014–11–30’ORDER BY last_nameLIMIT 100;But that would require thinking about which dates cover the last 30 days, and it would mean you’d have to constantly update this query.
Instead, to make the dates automatically span the last 30 days no matter which day it is, you can type this under AND:birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30))Your SQL query would, therefore, look like this:SELECT first_name, last_nameFROM people_massachusettsWHERE hair_color = “red”AND birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30))ORDER BY last_nameLIMIT 100;COUNTIn some cases, you may want to count the number of times that a criterium of a field appears.
For example, let’s say you want to count the number of times the different hair colors appear for the people you are tallying up from Massachusetts.
In this case, COUNT will come in handy so you don’t have to manually add up the number of people who have different hair colors or export that information to Excel.
Here’s what that SQL query would look like:SELECT hair_color, COUNT(hair_color)FROM people_massachusettsAND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’GROUP BY hair_color;JOINThere may be a time where you need to access information from two different tables in one SQL query.
In SQL, you can use a JOIN clause to do this.
(For those of you familiar with Excel formulas, this is similar to how you would use the VLOOKUP formula when you need to combine information from two different sheets in Excel.
)For example, let’s say we have one table that has data of all Massachusetts residents’ user IDs and their birthdates.
Let’s say we also have an entirely separate table that has data of all Masachusetts residents’ user IDs and their hair color.
If we want to figure out the hair color of Massachusetts residents born in the year 2003, we’d need to access information from both tables and combine them.
This works because both tables share a matching column: the Massachusetts residents’ user IDs.
Because we’re calling out fields from two different tables, our SELECT statement is also going to change slightly.
Instead of just listing out the fields we want to include in our results, we’ll need to specify which table they’re coming from.
(Note: The asterisk function may come in handy here so your query includes both tables in your results.
)To specify a field from a specific table, all we’d have to do is combine the name of the table with the name of the field.
For example, our SELECT statement would say “table.
field” — with the period separating the table name and the field name.
Let’s take a look at what this looks like in action.
We’re assuming a few things in this case:The Massachusetts birthdate table includes the following fields: first_name, last_name, user_id, birthdateThe Massachusetts hair color table includes the following fields: user_id, hair_colorYour SQL query would, therefore, look like:SELECT birthdate_massachusetts.
last_nameFROM birthdate_massachusetts JOIN haircolor_massachusetts USING (user_id)WHERE hair_color = “red”AND birth_date BETWEEN ‘2003–01–01’ AND ‘2003–12–31’ORDER BY last_name;This query would join the two tables using the field “user_id” which appears in both the birthdate_massachusetts table and the haircolor_massachusetts table.
You would then be able to see a table of people born in 2003 who have red hair.
Congratulations: You’re ready to get started with your own SQL queries!.While there’s a lot more you can do with SQL, I hope you found this overview of the basics helpful so you can get your hands dirty.
With a strong foundation of the basics, you’ll be able to navigate SQL better and work toward some of the more complex examples.
.. More details