Database System
1 SQL Part Ⅰ
1.1 SQL Introduction
SQL = Structured Query Language
Although over 40 years old, it keeps re-emerging as the standard.
Features:
Declarative!
Specify what you want, not how to get it.
Implemented widely
With varying levels of efficiency, completeness.
Constrained
Not targeted at Turing-complete tasks.
General-purpose and feature-rich
Many years of added features.
Extensible: Callouts to other languages, databases.
1.2 Relational Terminology
Definitions:
Database: Set of named relations.
Relation (aka Table):
Schema: description ( "metadata").
Instance: set of data satisfying the schema.
Attribute (aka Column, Field)
Tuple (aka Record, Row)
Cardinality: Number of rows in a table.
Properties:
Schema is fixed, unique attribute names, atomic (aka primitive) types.
Tables are NOT ordered, they are sets or multisets (bags).
Tables are FLAT, no nested attributes.
Tables DO NOT prescribe how they are implemented/stored on disk, which is called physical data independence.
1.3 SQL Language
SQL Language:
Two Sublanguages:
DDL (Data Definition Language): Define and modify schema.
DML (Data Manipulation Language): Queries can be written intuitively.
Relational Database Management System (RDBMS): responsible for efficient evaluation, choose and run algorithms for declarative queries (choice of algorithm must not affect query answer).
SQL DDL Examples:
Sailors
sid | sname | rating | age |
---|---|---|---|
1 | Fred | 7 | 22 |
2 | Jim | 2 | 39 |
3 | Nancy | 8 | 27 |
Boats
bid | bname | color |
---|---|---|
101 | Nina | red |
102 | Pinta | blue |
103 | Santa Maria | red |
Reserves
sid | bid | day |
---|---|---|
1 | 102 | 9/12 |
2 | 102 | 9/13 |
1.4 SQL Queries
Basic Single Table Queries:
Produce all tuples in the table that satisfy the predicate.
Output the expressions in the SELECT list,
Expression can be a column reference, or an arithmetic expression over column refs.
Example:
SELECT, FROM, WHERE lines:
Return all unique (name, GPA) pairs from students.
DISTINCT specifies removal of duplicate rows before output.
Can refer to the students table as S, this is called an alias.
ORDER line:
ORDER BY clause specifies output to be sorted.
Sorts the output by GPA in descending order, then by name in ascending order.
Also computes the value of age*2 and names it a2.
Obviously must refer to columns in the output.
Note the AS clause for naming output columns!
LIMIT line:
Only produces the first 3 output rows.
Typically used with ORDER BY.
Otherwise the output is non- deterministic.
Not a "pure" declarative construct in that case – output set depends on algorithm for query processing.
Aggregates:
Before producing output, compute a summary (aka an aggregate) of some arithmetic expression.
Produces 1 row of output, with one column of average in this case.
Other aggregates: SUM, COUNT, MAX, MIN (and others).
Group By:
Partition table into groups with same GROUP BY column values, can group by a list of columns.
Produce an aggregate result per group, cardinality (rows) of output = number of distinct group values.
Note: can put grouping columns in SELECT list (in this case, average GPA + department).
Having:
The HAVING predicate filters groups
HAVING is applied after grouping and aggregation
Hence can contain anything that could go in the SELECT list
i.e., aggs or GROUP BY columns
HAVING can only be used in aggregate queries
It's an optional clause
2 SQL Part Ⅱ
2.1 Join Queries
Join: Form cross product of the tables, output all tuples and select specific columns.
Example:
Sailors
sid | sname | rating | age |
---|---|---|---|
1 | Popeye | 10 | 22 |
2 | OliveOyl | 11 | 39 |
3 | Garfield | 1 | 27 |
4 | Bob | 5 | 19 |
Reserves
sid | bid | day |
---|---|---|
1 | 102 | 9/12 |
2 | 102 | 9/13 |
1 | 101 | 10/01 |
Output
sid | sname | bid |
---|---|---|
1 | Popeye | 102 |
2 | OliveOyl | 102 |
Self-Joins
Output
sname1 | age1 | sname2 | age2 |
---|---|---|---|
Popeye | 22 | Bob | 19 |
OliveOyl | 39 | Popeye | 22 |
OliveOyl | 39 | Garfield | 27 |
OliveOyl | 39 | Bob | 19 |
Garfield | 27 | Popeye | 22 |
Garfield | 27 | Bob | 19 |
Inner Join
2.2 Select & Where Advanced
Arithmetic Expressions
SQL Calculator
three | two | one | sanity |
---|---|---|---|
3.0 | 2.0 | 1.0 | 1 |
String Comparison
Boolean Logic vs. Set Operators
Reserve a red or a green boat
Reserve a red and a green boat
Set Semantics
Default (Think of each letter as being a tuple in a relation):
R = {A, A, A, A, B, B, C, D}
S = {A, A, B, B, B, C, E}
UNION: {A, B, C, D, E}
INTERSECT: {A, B, C}
EXCEPT: {D}
"ALL": Multiset Semantics
R = {A, A, A, A, B, B, C, D} = {A(4), B(2), C(1), D(1)}
S = {A, A, B, B, B, C, E} = {A(2), B(3), C(1), E(1)}
UNION ALL (sum of all cardinalities): {A(6), B(5), C(2), D(1), E(1)}
INTERSECT ALL (min of cardinalities): {A(min(4,2)), B(min(2,3)), C(min(1,1)), D(min(1,0)), E(min(0,1))} = {A, A, B, B, C}
EXCEPT ALL (subtract cardinalities): {A(4-2), B(2-3), C(1-1), D(1-0), E(0-1)} = {A, A, D}
2.3 Nested Queries
In
Example
SELECT S.sname FROM Sailors S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid = 102); -- SubqueryNot In
Example
SELECT S.sname FROM Sailors S WHERE S.sid NOT IN (SELECT R.sid FROM Reserves R WHERE R.bid = 103); -- SubqueryExists
Example 1 (bit odd, but legal)
SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid = 103); /* The query will output all the sailors if there is at least one sailor who reserves boat 103 or return nothing if there isn't */Example 2 (Nested Queries with Correlation)
SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHERE R.bid = 103 AND R.sid = S.sid); /* The query will output all the sailors who reserve boat 103. * In this example, the subquery will take every tuple of the sailors and run it, * tuple by tuple. */
Example for Division
ARGMAX
Example: Sailor with the Highest Rating