All Articles

MySQL for Data Analytics

Disclaimer: This tutorial uses a fresh MySQL 5.7 instance. Certain functions from this tutorial may behave differently or be deprecaited in other versions of MySQL.

Creating a Running Total

Say you wanted to create a running total (also referred to as a cumulative sum) you can do so by utilizing a variable and a derived view, via subquery. The following examples create a running total for the table and assign it’s value to the variable running_total. Note that a dervied view must have an alias, in the example below I chose “v”.

-- As a single query with a JOIN
SELECT purchase_date, purchase_amount,
@running_total := @running_total + purchase_amount AS running_total
FROM purchases
JOIN (SELECT @running_total := 0) v
ORDER BY purchase_date;
-- As two separate queries with SET
SET @running_total := 0;
SELECT purchase_date,purchase_amount,
  @running_total := @running_total + purchase_amount AS running_total
FROM purchases
ORDER BY purchase_date;

Implementing Conditional Logic Using Cases

In some scenarios you may need to introduce conditional logic into your queries. While a simple IF() will suffice in many situations, if you need to account for multiple conditions then the CASE statement will be a better alternative. In the example below we use case conditionals to create a query which will show us all of the subscribers whose monthly payments are over due (have not been paid in the last 30 days).

-- Payments that have not been made within the last 30 days
SELECT *, 
CASE WHEN purchase_date < (SELECT DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)) THEN 'overdue'
  ELSE 'on time'
END AS payment_status
FROM purchases;
-- Categorizing purchase prices with multiple categories
SELECT *, 
CASE
  WHEN purchase_amount >= 1000 THEN 'large'
  WHEN purchase_amount BETWEEN 500 AND 999 THEN 'medium'
  ELSE 'small'
END AS purchase_category
FROM purchases;

Maintaining Statistics Using Triggers

Triggers can be useful for maintaining business logic in a centralized fashion and they also can help you with reporting by keeping real time statistics. One of the major benefits of triggers is that they execute automatically without requiring human intervention.

-- Create a trigger for qualified leads
DELIMITER //
CREATE TRIGGER update_lead_qualification
AFTER INSERT ON sales_calls
FOR EACH ROW
BEGIN
  UPDATE sales_leads SET qualified = NEW.fit
END //
DELIMITER ;
-- Update stats on sales
DELIMITER //
CREATE TRIGGER update_purchase_stats
AFTER INSERT ON sales
FOR EACH ROW
BEGIN
  SET @team = SELECT team_id FROM sales_person WHERE sales_person_id = NEW.sales_person_id;
  INSERT INTO sales_statistics
  (team_id, sales_total, updated)
  VALUES
  (
    (SELECT @team FROM dual),
    (SELECT sales_total 
      FROM sales_statistics 
      WHERE team_id = @team AND updated BETWEEN DATE_SUB(CURRENT_DATE INTERVAL 30 DAYS) AND NOW()
    ),
    NOW()
  );
END //
DELIMITER ;

Making Queries Convenient With Stored Procedures

Stored procedures are useful when you have a piece of logic you need to frequently utilize such as looking up specific information or clearing out tables. Procedures differ from triggers in several ways including the manner in which they are invoked using the CALL keyword. Ultimately, a procedure is a stored program and provides a great deal of

-- Return countries and their exports
DELIMITER //
CREATE PROCEDURE country_exports()
BEGIN
  SELECT c.country_name, e.export_name
  FROM exports AS e
  INNER JOIN countries AS c ON (e.country_id = c.country_id);
END //
DELIMITER ;
-- Call the country_exports procedure
CALL country_exports();
-- Truncate a table, removing all data
DELIMITER //
CREATE PROCEDURE clean(table_name CHAR(100))
BEGIN
  TRUNCATE TABLE table_name;
END //
DELIMITER ;
-- Call the truncate 
CALL clean('sales_statistics');

Saving Yourself Time With Functions

If there are queries or calculations which you frequently perform you can define a function to quickly retrieve the desired values. In the example below

-- format a product number
DELIMITER //
CREATE FUNCTION format_product_number(num INT, typ VARCHAR(15))
RETURNS STRING
BEGIN
  SET @prefix = ""
  SELECT
  CASE 
    WHEN typ = 'machinery' THEN 'mac-'
    WHEN typ = 'consumable' THEN 'cons-'
    WHEN typ = 'material' THEN 'mat-'
    ELSE 'misc-'
  END
  FROM dual;
  RETURN SELECT CONCAT(prefix, num) FROM dual;
END //
DELIMITER ;
-- get the highest sale from a given team
DELIMITER //
CREATE FUNCTION highest_sale(team INT)
RETURNS DOUBLE
BEGIN
  RETURN SELECT MAX(sale) FROM sales WHERE team_id = @team;
END //
DELIMITER ;
-- formatting our product number
SELECT format_product_number(10, 'machinery') FROM dual;
-- using the highest sale query
SELECT highest_sale(14) FROM dual;
-- using the 

Formatting And Manipulating Dates

Date formatting is one of the least exciting but more valuable things to be knowledgeable of in MySQL when it comes to creating reports. Chances are that if you are creating a report you will need to provide some sort of date field and possibly perform calculations or grouping based on that field. Below are some examples of selecting the year, month, day and other data from a date field.

-- Select all purchases for the month of January
SELECT * 
FROM purchases
WHERE MONTH(purchase_date) = 1;
-- Select all purchases from 2018
SELECT *
FROM purchases
WHERE YEAR(purchase_date) = 2018;
-- Format the purchase dates with day of the week, name of the month, and numeric day and year
SELECT DATE_FORMAT(purchase_date, '%W %M %e %Y')
FROM purchases;
-- Get the date with 24 hour AM/PM formatting
SELECT DATE_FORMAT(purchase_date, '%r')
FROM purchases;
-- Numeric day of the year
SELECT DATE_FORMAT(CURRENT_DATE, '%j')
FROM dual;
-- Subtracting 2 months from a date
SELECT DATE_SUB(CURRENT_DATE, INTERVAL 2 MONTH)
FROM dual;
-- Find the number of days between today and the most recent purchase
SELECT DATEDIFF(purchase_date, CURRENT_DATE) AS last_purchase
FROM purchases
ORDER BY purchase_date DESC
LIMIT 1;
-- Find the date for next Wednesday
SELECT CURRENT_DATE + INTERVAL 2 - WEEKDAY(CURRENT_DATE) DAY
FROM dual
LIMIT 1;
-- Get the day of the week from a date
SELECT DAYOFWEEK(CURRENT_DATE)
FROM dual;
-- Get the day of the month
SELECT DAYOFMONTH(CURRENT_DATE)
FROM dual;

Documenting Schema With Comments

Documentation is one of the most crucial aspects to any development project however, it’s often the first thing to go out the window when deadlines are concerned. Below are some examples for documenting at the table and column levels.

-- Add a comment to a table for future reference
ALTER TABLE Orders
MODIFY OrderNumber int NOT NULL COMMENT 'order id';
SHOW FULL COLUMNs FROM Orders;
-- Add a comment to a column to help identify the data
ALTER TABLE Orders
COMMENT 'order tbl';