Welcome to SQL 2: Working With Data ValuesExplore the many flavors of SQL data manipulation in part 2 of our series.
Todd BirchardBlockedUnblockFollowFollowingFeb 21Now that we’ve gotten the fundamentals of creating databases and tables out of the way, we can start getting into the meat and potatoes of SQL interactions: selecting, updating, and deleting data.
We’ll start with the basic structure of these queries and then break into the powerful operations with enough detail to make you dangerous.
Selecting Data From a TableAs mentioned previously, SQL operations have a rather strict order of operations which clauses have to respect in order to make a valid query.
We’ll begin by dissecting a common SELECT statement:SELECT column_name_1, column_name_2FROM schema_name.
table_nameWHERE column_name_1 = "Value";This is perhaps the most common structure of SELECT queries.
First, we list the names of the columns we’d like to select separated by commas.
To receive all columns, we can simply say SELECT *.
These columns need to come from somewhere, so we specify the table we’re referring to next.
This either takes a form of FROM table_name (non-PostgreSQL), or FROM schema_name.
In theory, a semicolon here would result in a valid query, but we usually want to select rows that meet certain criteria.
This is where the WHERE clause comes in: only rows which return "true" for our WHERE conditional will be returned.
In the above example, we're validating that a string matches exactly "Value".
Selecting Only Distinct ValuesSomething that often comes in handy is selecting distinct values in a column.
In other words, if a value exists in the same column in 100 rows, running DISTINCT query will only show us that value once.
This is a good way of seeing the unique content of a column without yet diving into the distribution of said value.
The effect is similar to the United States Senate, or the Electoral College: forget the masses, and prop up Wyoming 2020:SELECT DISTINCT column_name FROM table_name;Offsetting and Limiting Results in our QueriesWhen selecting data, the combination of OFFSET and LIMIT are critical at times.
If we're selecting from a database with hundreds of thousands of rows, we would be wasting an obscene amount of system resources to fetch all rows at once; instead, we can have our application or API paginate the results.
LIMIT is followed by an integer, which in essence says "return no more than X results.
"OFFSET is also followed by an integer, which denotes a numerical starting point for returned results, aka: "return all results which occur after the Xth result:"SELECT *FROM table_nameLIMIT 50 OFFSET 0;The above returns the first 50 results.
If we wanted to build paginated results on the application side, we could construct our query like this:from SQLAlchemy import engine, session# Set up a SQLAlchemy sessionSession = sessionmaker()engine = create_engine('sqlite:///example.
configure(bind=engine)sess = Session()# Appication variablespage_number = 3page_size = 50results_subset = page_number * results limit# Querysession.
offset(results_subset)Such an application could increment page_number by 1 each time the user clicks on to the next page, which would then appropriately modify our query to return the next page of results.
Another use for OFFSET could be to pick up where a failed script left off.
If we were to write an entire database to a CSV and experience a failure.
We could pick up where the script left off by setting OFFSET equal to the number of rows in the CSV, to avoid running the entire script all over again.
Sorting ResultsLast to consider for now is sorting our results by using the ORDER BY clause.
We can sort our results by any specified column, and state whether we'd like the results to be ascending (ASC) or descending (DESC):SELECT *FROM schema_name.
table_nameWHERE column_name_1 = "Value"ORDER BY updated_date DESCLIMIT 50 OFFSET 10;Sophisticated SELECT StatementsOf course, we can select rows with WHERE logic that goes much deeper than an exact match.
One of the most versatile of these operations is LIKE.
Using Regex with LIKELIKE is perhaps the most powerful way to select columns with string values.
With LIKE, we can leverage regular expressions to build highly complex logic.
Let's start with some of my favorites:SELECT *FROM peopleWHERE name LIKE "%Wade%";Passing a string to LIKE with percentage signs on both sides is essentially a "contains" statement.
% is equivalent to a wildcard, thus placing % on either side of our string will return true whether the person's first name, middle name, or last name is Wade.
Check out other useful combinations for %:a%: Finds any values that start with "a".
%a: Finds any values that end with "a".
%or%: Finds any values that have "or" in any position.
_r%: Finds any values that have "r" in the second position.
a_%_%: Finds any values that start with "a" and are at least 3 characters in length.
a%o: Finds any values that start with "a" and ends with "o".
Finding Values Which are NOT LIKEThe opposite of LIKE is of course NOT LIKE, which runs the same conditional, but returns the opposite true/false value of LIKE:SELECT *FROM peopleWHERE name NOT LIKE "%Wade%";DateTime columns are extremely useful for selecting data.
Unlike plain strings, we can easily extract numerical values for month, day, and year from a DateTime by using MONTH(column_name), DAY(column_name), and YEAR(column_name) respectively.
For example, using MONTH() on a column that contains a DateTime of 2019-01-26 05:42:34 would return 1, aka January.
Because the values come back as integers, it is then trivial to find results within a date range:SELECT * FROM posts WHERE YEAR(created_at) < 2018;Finding Rows with NULL ValuesNULL is a special datatype which essentially denotes the "absence of something," therefore no conditional will never equal NULL.
Instead, we find rows where a value IS NULL:SELECT * FROM posts WHERE author IS NULL;This should not come as a surprise to anybody familiar with validating datatypes.
The reverse of this, of course, is NOT NULL:SELECT * FROM posts WHERE author IS NOT NULL;Inserting DataAn INSERT query creates a new row, and is rather straightforward: we state the columns we'd like to insert data into, followed by the values to insert into said columns:INSERT INTO table_name (column_1, column_2, column_3) VALUES ("value1", "value2", "value3");Many things could result in a failed insert.
For one, the number of values must match the number of columns we specify; if we don’t we’ve either provided too few or too many values.
Second, vales must respect a column’s data type.
If we try to insert an integer into a DateTime column, we’ll receive an error.
Finally, we must consider the keys and constraints of the table.
If keys exist that specify certain columns must not be empty, or must be unique, those keys must too be respected.
As a shorthand trick, if we’re inserting values into all of a table’s columns, we can skip the part where we explicitly list the column names:INSERT INTO table_nameVALUES ("value1", "value2", "value3");Here’s a quick example of an insert query with real data:INSERT INTO friends (id, name, birthday) VALUES (1, 'Jane Doe', '1990-05-30');Updating rows is where things get interesting.
There’s so much we can do here, so let’s work our way up:UPDATE table_name SET column_name_1 = 'value' WHERE column_name_2 = 'value';That’s as simple as it gets: the value of a column, in a row that matches our conditional.
Note that SET always comes before WHERE.
Here's the same query with real data:UPDATE celebs SET twitter_handle = '@taylorswift13' WHERE id = 4;UPDATE Records: Useful LogicJoining Strings Using CONCATYou will find that it’s common practice to update rows based on data which already exists in said rows: in other words, sanitizing or modifying data.
A great string operator is CONCAT().
CONCAT("string_1", "string_2") will join all the strings passed to a single string.
Below is a real-world example of using CONCAT() in conjunction with NOT LIKE to determine which post excerpts don't end in punctuation.
If the excerpt does not end with a punctuation mark, we add a period to the end:UPDATE postsSET custom_excerpt = CONCAT(custom_excerpt, '.
')WHERE custom_excerpt NOT LIKE '%.
' AND custom_excerpt NOT LIKE '%!' AND custom_excerpt NOT LIKE '%?';Using REPLACEREPLACE() works in SQL as it does in nearly every programming language.
We pass REPLACE() three values:The string to be modified.
The substring within the string which will be replaced.
The value of the replacement.
We can do plenty of clever things with REPLACE().
This is an example that changes the featured image of blog posts to contain the “retina image” suffix:UPDATE postsSET feature_image = REPLACE(feature_image, '.
jpg');I across a fun exercise the other day when dealing with a nightmare situation involving changing CDNs.
It touches on everything we’ve reviewed thus far and provided an excellent example of what can be achieved in SQL alone.
The challenge in moving hundreds of images for hundreds of posts came in the form of a file structure.
Ghost likes to save images in a dated folder structure, like 2019/02/image.
Our previous CDN did not abide by this at all, so had a dump of all images in a single folder.
Thankfully, we can leverage the metadata of our posts to discern this file structure.
Because images are added to posts when posts are created, we can use the created_at column from our posts table to figure out the right dated folder:UPDATE postsSET feature_image = CONCAT("https://cdn.
com/posts/", YEAR(created_at), "/", LPAD(MONTH(created_at), 2, '0'), "/", SUBSTRING_INDEX(feature_image, '/', – 1) );Let’s break down the contents in our CONCAT:https://cdn.
com/posts/: The base URL of our new CDN.
YEAR(created_at): Extracting the year from our post creation date (corresponds to a folder).
LPAD(MONTH(created_at), 2, '0'): Using MONTH(created_at) returns a single digit for early months, but our folder structure wants to always have months a double-digits (ie: 2018/01/ as opposed to 2018/1/).
We can use LPAD() here to 'pad' our dates so that months are always two digits long, and shorter dates will be padded with the number 0.
SUBSTRING_INDEX(feature_image, '/', – 1): We're getting the filename of each post's image by finding everything that comes after the last slash in our existing image URL.
The result for every image will now look like this:https://cdn.
jpgDELETE RecordsLet’s wrap up for today with our last type of query, deleting rows:DELETE FROM celebsWHERE twitter_handle IS NULL;Originally published at hackersandslackers.
com on February 22, 2019.
.. More details