본문 바로가기

Coding 공부기록

[SQL edge case] 문제를 정확하게 파악하기

Questions from leetcode

1. Customers placing the largest number of orders

=> 단순하게 MAX(order_number)라고 했는데 여기서 order_number는 주문번호이기에 크기 비교용이 아님. 따라서 각 customer마다 몇 건의 order record가 있는지를 count해서 그 빈도가 가장 큰 사람의 customer_id를 제시해야함.

SELECT customer_number
FROM orders
GROUP BY customer_number
ORDER BY COUNT(*) DESC
LIMIT 1;
SELECT customer_number
FROM orders
GROUP BY customer_number
HAVING COUNT(customer_number) >= all
(SELECT COUNT(customer_number)
 FROM orders GROUP BY customer_number); 

COUNT(*): count(*) / count(1) 과 count(변수이름) 은 NULL 값을 포함하는 여부가 다르다.

count(*) 과 count(1) 은 NULL 값인 컬럼도 포함하지만, count(name) 은 NULL 값인 컬럼은 포함하지 않는다.

ALL/ANY: The operator must be a standard comparison operator (=, <>, !=, >, >=, <, or <=). 

SELECT column_name(s)
FROM table_name
WHERE column_name operator ALL
  (SELECT column_name
  FROM table_name
  WHERE condition);

HAVING: HAVING에서는 위의 SELECT파트에서 만든 새로운 변수 이름을 쓸 수 없다.

*Order of clause: SELECT - FROM - WHERE - GROUP BY - HAVING - ORDER BY 

 

Follow-up: What if more than one customer has the largest number of orders, can you find all the customer_number in this case?

SELECT customer_number
FROM Orders
GROUP BY customer_number
HAVING COUNT(order_number) = (
            SELECT COUNT(order_number) AS order_cnt FROM Orders
            GROUP BY customer_number
            ORDER BY 1 DESC
            LIMIT 1)

order_number의 빈도수가 데이터에서 나오는 최고 값과 같은 모든 customer_number를 리턴함

 

2. Several friends at a cinema ticket office would like to reserve consecutive available seats.
Can you help to query all the consecutive available seats order by the seat_id using the following cinema table?

=> Idea: Self-join을 활용하여 풀어보자

SELECT DISTINCT a.seat_id
FROM cinema a JOIN cinema b
ON ABS(a.seat_id - b.seat_id) = 1
AND a.free = true and b.free = true
ORDER BY a.seat_id;

SELECT DISTINCT seat_id
FROM cinema
WHERE free = 1 AND
                     (seat_id - 1 IN (SELECT seat_id FROM cinema WHERE free = 1) OR
                      seat_id + 1 IN (SELECT seat_id FROM cinema WHERE free = 1))
ORDER BY seat_id;

To find the consecutive available seats, the value in the a.seat_id should be more(or less) than the value b.seat_id, and both of them should be free.

seat_id                              free                                 seat_id                              free

1 1 1 1
2 0 1 1
3 1 1 1
4 1 1 1
5 1 1 1

*ABS() = 1 을 ON의 조건으로 걸 수 있다는 점!

*알리아스.변수명 = 변수값 이렇게 쓸 수 있다는 점.

*DISTINCT를 안쓸 경우 4 - 3, 4 - 5 모두가 잡혀서 좌석은 4 하나인데 레코드가 두 개 잡히게 됨. 

*IN: IN is a logical operator in SQL that allows you to specify a list of values that you'd like to include in the results.

ex)  SELECT *
      FROM tutorial.billboard_top_100_year_end
      WHERE "group" IN ('M.C. Hammer', 'Hammer', 'Elvis Presley')

 

3. #1511 Customer Order Frequency

Write an SQL query to report the customer_id and name of customers who have spent at least $100 in each month of June and July 2020.

* Table1: Customers (customer_id | name | country)

* Table2: Product (product_id | description | price)

* Table3: Orders (order_id | customer_id | product_id | order_date | quantity)

SELECT customer_id, name
FROM Customers JOIN Orders USING (customer_id)
               JOIN Product USING (product_id)
GROUP BY customer_id
HAVING SUM(IF(LEFT(order_date, 7)='2020-06', quantity, 0) * price)>=100
            AND SUM(IF(LEFT(order_date,7)='2020-07', quantity, 0)*price)>=100

JOIN하는 두 테이블의 Column 이름이 같으면 USING 사용 가능 (간단), 다르면 ON 사용해야함

여러개의 JOIN을 연달아 하기 가능

문제에서 >=$100라는 조건을 주었는데 이는 SUM값으로 구할 수 있다. 중요한 부분은 GROUP BY가 HAVING보다 먼저온다는 것! month함수를 쓰기보다 LEFT로 간단하게 제약을 주었고, quantity 또는 0으로 equation을 간결하게 한 것이 멋진 쿼리이다. 

 

4. #1322. Ads Performance

A company is running Ads and wants to calculate the performance of each Ad.

Performance of the Ad is measured using Click-Through Rate (CTR) where:

Click-Through Rate (CTR)

Write an SQL query to find the CTR of each Ad. Round CTR to two decimal points.

Return the result table ordered by CTR in descending order and by ad_id in ascending order in case of a tie.

 

SELECT 

for ad_id = 3, ctr = (1/(1+1)) * 100 = 50.00 for ad_id = 5, ctr = 0.00, Note that ad_id = 5 has no clicks or views. Note that we do not care about Ignored Ads.

SELECT ad_id,
IFNULL(ROUND(SUM(CASE WHEN action='Clicked' THEN 1 ELSE 0 END)/
                     SUM(CASE WHEN action='Clicked' OR action='Viewed' THEN 1 ELSE 0 END)*100 ,2),0) AS ctr
FROM Ads
GROUP BY ad_id
ORDER ctr DESC

*IFNULL statement: SELECT IFNULL(Column, "Null일 경우 대체 값") FROM 테이블명; 

예제: "SELECT keyword, IFNULL( common_rep, IFNULL( fur_rep, makeup_rep ) ) as syn_2 FROM syn_color";
출처: https://solbel.tistory.com/1074 [개발자의 끄적끄적]

*COALESCE 지정한 표현식들 중에 NULL이 아닌 첫 번째 값을 반환한다.

*SUM(action='Clicked') 같이 category 변수의 SUM? COUNT와의 차이는? ENUM type이라 index를 사용한다고 하는데, 안전하게 CASE WHEN 사용하는 것으로!

 

5. #1083. Sales Analysis II

Write an SQL query that reports the buyers who have bought S8 but not iPhone. Note that S8 and iPhone are products present in the Product table. Return the result table in any order.

The query result format is in the following example.

-- Product: product_id | product_name | unit_price

-- Sales: seller_id | product_id | buyer_id | sale_date | quantity | price

SELECT buyer_id
FROM Sales
LEFT JOIN Product USING(product_id)
GROUP BY buyer_id
HAVING SUM(CASE WHEN product_name = 'S8' THEN 1 ELSE 0 END) > 0
     AND SUM(CASE WHEN product_name ='iPhone' THEN 1 ELSE 0 END) = 0;

6. #610. Triangle Judgement

SELECT x, y, z, 
       CASE WHEN x+y > z AND x+z > y AND z+y>x THEN 'Yes' ELSE 'No' END AS 'triangle'
FROM triangle

7. #607. Sales Person

Given three tables: salesperson, company, orders.
Output all the names in the table salesperson, who didn’t have sales to company 'RED'.

salesperson: sales_id | name | salary | commission_rate | hire_date

company: com_id | name | city

orders: order_id | order_date | com_id | sales_id | amount

SELECT name 
FROM salesperson
WHERE sales_id NOT IN (SELECT sales_id
                                 FROM orders
                                 JOIN company
                                 USING(com_id)
                                 WHERE name = 'red')

*NOT: NOT is a logical operator in SQL that you can put before any conditional statement to select rows for which that statement is false.

ex) SELECT * FROM tutorial.billboard_top_100_year_end WHERE year = 2013 AND year_rank NOT BETWEEN 2 AND 3

=> In the above case, you can see that results for which year_rank is equal to 2 or 3 are not included. (No 2, No 3)

https://mode.com/sql-tutorial/sql-not-operator/

 

SQL NOT | Basic SQL - Mode

Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out the beginning. In this lesson we'll cover: The SQL NOT operator NOT is a logical operator in SQL that you can put before any conditional statement to sel

mode.com

8. #1303. Find the Team Size

Write an SQL query to find the team size of each of the employees.

Return result table in any order. The query result format is in the following example.

Employee: employee_id | team_id

SELECT employee_id,
       team_size
FROM Employee
JOIN (SELECT team_id, 
       COUNT(*) AS team_size
       FROM Employee 
       GROUP BY team_id) team USING(team_id)
SELECT employee_id,
          COUNT(*) OVER (PARTITION BY team_id) AS team_size
FROM Employee

*AS 다음에 소문자하고 _ 로만 변수명을 하면 ' 안해도 됨

*team_id 별로 COUNT 한 것을 team_size라고 하자는 뜻 (window function)

 

9. #511. Game Play Analysis I

Write an SQL query to report the first login date for each player. Return the result table in any order.

Activity: player_id | device_id | event_date | games_played

SELECT player_id,
          MIN(event_date) AS first_login
FROM Activity
GROUP BY player_id

10. #1068. Product Sales Analysis I

Write an SQL query that reports the product_name, year, and price for each sale_id in the Sales table.

Sales: sale_id | product_id | year | quantity | price

Product: product_id | product_name

SELECT product_name, year, price
FROM Sales LEFT JOIN Product USING (product_id)

11. #1251. Average Selling Price

Write an SQL query to find the average selling price for each product. average_price should be rounded to 2 decimal places.

Prices: product_id | start_date | end_date | price

UnitsSold: product_id | purchase_date | units

SELECT u.product_id,
          ROUND(SUM(units * price)/SUM(units),2) AS average_price 
FROM UnitsSold u
LEFT JOIN Prices p
ON u.product_id = p.product_id AND u.purchase_date >= p.start_date AND u.purchase_date <= p.end_date
GROUP BY 1

 

SELECT u.product_id,
          ROUND(SUM(u.units*p.price)/SUM(u.units),2) AS average_price
FROM UnitsSold u
JOIN Prices p
ON u.product_id=p.product_id AND (u.purchase_date BETWEEN p.start_date AND p.end_date)
GROUP BY product_id;

*GROUP BY에서 product_id는 공통된 이름이어서 하나의 칼럼 => no need to specify

*JOIN을 주로 사용하는데 LEFT JOIN을 사용안하는 이유는?

 

12. #1113. Reported Posts

Write an SQL query that reports the number of posts reported yesterday for each report reason. Assume today is 2019-07-05. Return the result table in any order.

Actions: user_id | post_id | action_date | action | extra

SELECT 
       extra AS report_reason,
       COUNT(extra) AS report_count
FROM Actions
WHERE action_date = '2019-07-04'
          AND action = 'report'
GROUP BY extra
SELECT 
  extra AS report_reason,
  COUNT(DISTINCT post_id) as report_count
FROM Actions
WHERE action_date = '2019-07-04'
          AND action = 'report'
GROUP BY extra

** 왼쪽 내 코드에서의 실수: 한 포스팅당 action을 여러개 할 수 있다는 점을 간과함 (ex. user_id=1이 작성한 post_id=1에서 3가지 action이 발생하여 row가 3개가 됨 (view/like/share). 데이터를 꼼꼼히 살펴봐야함.  

 

13. #613. Shortest Distance in a Line

Write an SQL query to report the shortest distance between any two points from the Point table.

Point: x

SELECT MIN(ABS(a.distance)) AS shortest
FROM
(SELECT p1.x - p2.x AS distance
FROM Point p1
JOIN Point p2
 ON p1.x != p2.xa

*Alias is required! 이름붙이기 싫어도 붙이도록. 

*처음에 p1와 p2가 다른 조건에서 붙이라는 것을 빼먹었었다.

 

14. #1084. Sales Analysis III

Write an SQL query that reports the products that were only sold in the spring of 2019. That is, between 2019-01-01 and 2019-03-31 inclusive.

Product: product_id | product_name | unit_price

Sales: seller_id | product_id | buyer_id | sale_date | quantity | price

SELECT product_id, product_name
FROM Sales 
JOIN Product 
Using(product_id)
GROUP BY product_id
HAVING MIN(sale_date) >= '2019-01-01' AND MAX(sale_date) <= '2019-03-31'