NZGames.com Forums - SQL Combining duplicate rows

NZGames.com Forums (https://forums.nzgames.com/index.php)

- Technology & Troubleshooting (https://forums.nzgames.com/forumdisplay.php?f=9)

- - SQL Combining duplicate rows (https://forums.nzgames.com/showthread.php?t=87920)

blynk

27th November 2019 12:46

SQL Combining duplicate rows

Is there a quick method to combine rows together where they are the same (based on some ordering).
I have a history table, but alot of the rows are effectively duplicated.

I.e.
ID |Status |Value |update_date
1 |Open | $100 |01/01/2019
1 |Open | $100 |02/01/2019
1 |Pending| $100 |21/01/2019
1 |Open | $100 |25/01/2019

So I want a table that would be have rows 1 & 2 combined

1 |Open | $100 |01/01/2019
1 |Pending| $100 |21/01/2019
1 |Open | $100 |25/01/2019

Basic group by doesnt work, because it joins the first and the last rows together.

I have examples where there are 50 updates with no changes. Do i just need to do comparison.

Or I could do a delete I guess? if the next row fields = current row fields delete??

DrTiTus

27th November 2019 13:57

You could use something like this:

SELECT a.* FROM table AS a
WHERE a.Status <>
(SELECT b.Status FROM table AS b
WHERE a.Value = b.Value AND a.update_date < b.update_date
ORDER BY b.update_date DESC LIMIT 1)

[stolen/modified from stackoverflow, money back guarantee. fixing the ordering is your job]

It might be quicker for you up front, but slow for the database. On a table with 1M+ rows and no indexes, it'll be doing 1M+ subqueries. 10K rows might be fine. Pi vs Xeon, etc. YMMV.

blynk

27th November 2019 14:52

Thanks, I have several fields I want to check to see if they have changed.

I have attached a row number to each row partitioning by all the fields I want to check ordering it by the date.
Then I will select the first row

Lightspeed

27th November 2019 17:36

Couldn't you just build your table with a DISTINCT clause?

Lightspeed

25th January 2020 14:04

So what did you come up with? This isn't Stack Overflow here!

crocos

27th January 2020 20:00

Quote:

Originally Posted by Lightspeed (Post 2008983)

Couldn't you just build your table with a DISTINCT clause?

No, because in his example they're not actually duplicated.

Hey Blynk: Why do you say the first two rows are duplicated? Different date!

Lightspeed

28th January 2020 13:44

Haha, how many times have I or a colleague puzzled over some code or query, wondering why we can't get something so obvious to work, only to discover our query was fine, we just hadn't paid enough attention to the data.

blynk

28th January 2020 20:30

Quote:

Originally Posted by crocos (Post 2009562)

No, because in his example they're not actually duplicated.

Hey Blynk: Why do you say the first two rows are duplicated? Different date!

Maybe a better explanation was needed at the start.
I have a history table that tracks changes. However for some reason, there are rows that are "Updated" but nothing is different.
So all the values are the same except the update date.

Based on that, I was not interested in the 2nd row as that causes issues with tracking of what happened.

In that example, nothing actually happened on the 2nd, but the data looked like it did.

crocos

31st January 2020 00:43

OK... is there additional data that we're missing? Is there another table you can cross-reference it to, or a code other than "ID = 1"? If you do, then you have a chance of filtering it using basic SQL

If not, you'll need to start mucking with sub-selects to filter out items that have an identical transaction from the previous day - I assume it's only the previous day you're talking about and not an ordered thing.

If it is purely an order thing and not a next-day-only thing, you may need to build a result-set in a stored procedure or stored function (depending on the DB type and how you want to access it) to run across the DB using a cursor and retain the previous line, compare the line excluding date before adding to result-set and if it matches don't add it.

Lightspeed

31st January 2020 01:33

blynk has a "history table", or a chronologically ordered list.

Basically they want to record the first row, then exclude any subsequent row that matches all columns in this row with the exception of the "update_date" column, recording the next row that does not match all columns (i.e. a genuine update), and so on.

I think it can be done, but I'm too lazy to think through the nested structures to figure it out.

crocos

31st January 2020 13:01

Quote:

Originally Posted by Lightspeed (Post 2009605)

I think it can be done, but I'm too lazy to think through the nested structures to figure it out.

I'd already thought of three different ways to do it above, in increasing complexity depending on how his data is structured :p

blynk

31st January 2020 15:08

I've done it through staging tables.
Can't remember extactly but I ordered them and created a rank based on the fields I wanted to track. Which meant #1 was always the first change.
Everything else was ignored.

All times are GMT +13. The time now is 03:30.