SQL Combining duplicate rows
Is there a quick method to combine rows together where they are the same (based on some ordering).
I have a history table, but alot of the rows are effectively duplicated. I.e. ID |Status |Value |update_date 1 |Open | $100 |01/01/2019 1 |Open | $100 |02/01/2019 1 |Pending| $100 |21/01/2019 1 |Open | $100 |25/01/2019 So I want a table that would be have rows 1 & 2 combined 1 |Open | $100 |01/01/2019 1 |Pending| $100 |21/01/2019 1 |Open | $100 |25/01/2019 Basic group by doesnt work, because it joins the first and the last rows together. I have examples where there are 50 updates with no changes. Do i just need to do comparison. Or I could do a delete I guess? if the next row fields = current row fields delete?? |
You could use something like this:
SELECT a.* FROM table AS a WHERE a.Status <> (SELECT b.Status FROM table AS b WHERE a.Value = b.Value AND a.update_date < b.update_date ORDER BY b.update_date DESC LIMIT 1) [stolen/modified from stackoverflow, money back guarantee. fixing the ordering is your job] It might be quicker for you up front, but slow for the database. On a table with 1M+ rows and no indexes, it'll be doing 1M+ subqueries. 10K rows might be fine. Pi vs Xeon, etc. YMMV. |
Thanks, I have several fields I want to check to see if they have changed.
I have attached a row number to each row partitioning by all the fields I want to check ordering it by the date. Then I will select the first row |
Couldn't you just build your table with a DISTINCT clause?
|
So what did you come up with? This isn't Stack Overflow here!
|
Quote:
Hey Blynk: Why do you say the first two rows are duplicated? Different date! |
Haha, how many times have I or a colleague puzzled over some code or query, wondering why we can't get something so obvious to work, only to discover our query was fine, we just hadn't paid enough attention to the data.
|
Quote:
I have a history table that tracks changes. However for some reason, there are rows that are "Updated" but nothing is different. So all the values are the same except the update date. Based on that, I was not interested in the 2nd row as that causes issues with tracking of what happened. In that example, nothing actually happened on the 2nd, but the data looked like it did. |
OK... is there additional data that we're missing? Is there another table you can cross-reference it to, or a code other than "ID = 1"? If you do, then you have a chance of filtering it using basic SQL
If not, you'll need to start mucking with sub-selects to filter out items that have an identical transaction from the previous day - I assume it's only the previous day you're talking about and not an ordered thing. If it is purely an order thing and not a next-day-only thing, you may need to build a result-set in a stored procedure or stored function (depending on the DB type and how you want to access it) to run across the DB using a cursor and retain the previous line, compare the line excluding date before adding to result-set and if it matches don't add it. |
blynk has a "history table", or a chronologically ordered list.
Basically they want to record the first row, then exclude any subsequent row that matches all columns in this row with the exception of the "update_date" column, recording the next row that does not match all columns (i.e. a genuine update), and so on. I think it can be done, but I'm too lazy to think through the nested structures to figure it out. |
Quote:
|
I've done it through staging tables.
Can't remember extactly but I ordered them and created a rank based on the fields I wanted to track. Which meant #1 was always the first change. Everything else was ignored. |
All times are GMT +13. The time now is 03:30. |
Powered by Trololololooooo
© Copyright NZGames.com 1996-2024
Site paid for by members (love you guys)