Block 2: Programming and Database Skills
Topic 2.4 · 6 Objectives
The SELECT statement is the foundation of SQL data retrieval. It specifies which columns to return. FROM identifies the table, and WHERE filters rows based on conditions.
JOINs combine rows from two or more tables based on a related column. Understanding the different types is critical for data analysis.
| JOIN Type | Returns |
|---|---|
INNER JOIN | Only rows with matching values in both tables |
LEFT JOIN | All rows from the left table, plus matches from the right (NULL if no match) |
RIGHT JOIN | All rows from the right table, plus matches from the left (NULL if no match) |
FULL OUTER JOIN | All rows from both tables (NULL where there is no match on either side) |
GROUP BY aggregates rows that share a value. HAVING filters groups (like WHERE but for aggregated data). ORDER BY sorts results, and LIMIT restricts the number of rows returned.
WHERE filters individual rows before grouping. HAVING filters groups after aggregation. You cannot use aggregate functions in a WHERE clause.
Aggregate functions perform calculations across a set of rows and return a single value.
| Function | Description | Example |
|---|---|---|
COUNT() | Number of rows | COUNT(*) or COUNT(column) |
SUM() | Total of numeric values | SUM(salary) |
AVG() | Mean of numeric values | AVG(salary) |
MIN() | Smallest value | MIN(salary) |
MAX() | Largest value | MAX(salary) |
SQL provides operators and functions for advanced data filtering and manipulation:
DISTINCT — removes duplicate rows from resultsIS NULL / IS NOT NULL — checks for missing valuesUPPER(), LOWER(), LENGTH() — string functionsROUND(), ABS() — numeric functionsCASE WHEN — conditional logic within queriesCRUD stands for Create, Read, Update, Delete — the four fundamental operations for persistent data storage.
Defines a new table structure with columns and their data types.
Adds new rows to an existing table.
Retrieves data from one or more tables. This is the most commonly used SQL statement.
Modifies existing rows in a table. Always use a WHERE clause to target specific rows — omitting it updates every row.
UPDATE or DELETE without a WHERE clause will affect every row in the table. This is one of the most common and dangerous SQL mistakes.
Removes rows from a table.
Python's built-in sqlite3 module provides a lightweight way to interact with SQLite databases without any additional installation. This is the most commonly tested database module on the exam.
with statement (context manager) to ensure the connection is properly closed, even if an error occurs:
| Method | Returns | Use Case |
|---|---|---|
fetchone() | A single row as a tuple, or None | When you expect one result |
fetchall() | A list of all remaining rows as tuples | When the result set is small |
fetchmany(n) | A list of the next n rows | When processing in batches |
pymysql is a third-party library for connecting to MySQL databases. It follows a similar pattern to sqlite3 but requires connection parameters.
The general workflow for any Python database interaction follows the DB-API 2.0 pattern:
connect()conn.cursor()cursor.execute() or cursor.executemany()fetchone(), fetchall(), or fetchmany()conn.commit() (for INSERT/UPDATE/DELETE)conn.close()| Issue | Cause | Solution |
|---|---|---|
OperationalError | Database file not found or permissions error | Check file path and permissions |
| Data not persisted | Forgot to call conn.commit() | Always commit after INSERT/UPDATE/DELETE |
ProgrammingError | SQL syntax error or table doesn't exist | Verify SQL syntax and table names |
| Resource leak | Connection not closed | Use with statement or finally block |
InterfaceError | Operating on a closed cursor/connection | Ensure connection is still open before querying |
Parameterized queries (also called prepared statements) separate SQL logic from user-supplied data. Instead of inserting values directly into a query string, you use placeholders and pass values separately. This is the single most important defense against SQL injection attacks.
The sqlite3 module uses the ? character as its placeholder. Parameters are passed as a tuple.
The pymysql module uses %s as its placeholder (regardless of data type). Do not confuse this with Python's % string formatting operator — they are handled differently by the database driver.
sqlite3 uses ? (or :name for named params). pymysql uses %s. Both prevent SQL injection when used correctly.
SQL injection occurs when an attacker manipulates a query by inserting malicious SQL through user input. Parameterized queries prevent this because the database engine treats parameter values as data, never as executable SQL.
When working with SQLite through Python, values are automatically converted between SQL types and Python types. Understanding this mapping is essential for the exam.
| SQLite Type | Python Type | Description | Example |
|---|---|---|---|
INTEGER | int | Whole numbers | 42 |
REAL | float | Floating-point numbers | 3.14 |
TEXT | str | Text strings | "hello" |
BLOB | bytes | Binary data | b"\x00\x01" |
NULL | None | Missing/absent value | None |
INTEGER can technically store a string. However, Python's sqlite3 module returns values in their natural Python types based on what SQLite actually stored.
SQL injection is a code injection technique that exploits security vulnerabilities in an application's database layer. It occurs when user input is incorporated into SQL queries without proper sanitization or parameterization.
An attacker crafts input that changes the intended SQL query's logic:
? as its placeholder (also supporting :name for named parameters). %s is used by pymysql. The other options are not used by standard Python database modules.REAL column map to?float. INTEGER maps to int, TEXT maps to str, BLOB maps to bytes, and NULL maps to None. Python's Decimal type is not natively used by sqlite3.cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))cursor.execute("SELECT * FROM users WHERE id = :id", {"id": user_id})cursor.fetchone() return when no rows match the query?[]()NoneStopIteration exceptionfetchone() returns None. fetchall() returns an empty list [] in the same scenario. No exception is raised.COUNT(*) counts all rows; COUNT(column) counts non-NULL values. SUM() adds numeric values. LEN() is not a standard SQL function (LENGTH() is). TOTAL() exists in SQLite but returns the sum as a float, not a row count.conn.commit() necessary after INSERT, UPDATE, or DELETE?conn.commit() is called, changes exist only in memory and will be lost if the connection is closed or the program crashes. Committing writes the changes permanently to disk.