Tabularly

5 ways to deduplicate Excel rows — and why most online tools get it wrong

May 28, 2026

“Remove duplicates” sounds like a single well-defined operation. It isn’t. There are at least five distinct things you might mean by it, and the difference between them is the difference between cleaning your data correctly and corrupting it silently.

Here’s the full set, in roughly increasing order of usefulness.

1. Full-row duplicate removal

The simplest case: drop any row that’s a byte-for-byte duplicate of another row. Excel’s built-in Remove Duplicates with all columns selected does this. It’s also what most “remove duplicates online” tools do by default.

This is fine when your data is already normalized and you just got bitten by an accidental copy-paste. It’s almost never what you actually want when working with messy real-world data, because real duplicates rarely match byte-for-byte. Different timestamps, slightly different addresses, one column with a typo — and the “duplicate” survives.

2. By-key duplicate removal, keep first

You pick one or more columns that define a duplicate (typically email for a customer list, or order_id for sales data), and drop subsequent rows that share the same key. Excel’s Remove Duplicates with a subset of columns selected does this — and it always keeps the first occurrence.

This is what most people mean when they say “dedupe my customer list.” It’s also a trap: “first occurrence” depends on how the data happens to be sorted, and that’s usually not meaningful. If your data is sorted by signup_date ascending, you keep the oldest version of each customer. If it’s sorted descending, you keep the newest. The result depends on incidental order.

3. By-key, keep last

Same as above but keep the last occurrence. Excel doesn’t expose this directly — you have to sort the data first, then run Remove Duplicates. Awkward, but doable.

Same fragility as #2: the answer depends on whatever the most recent sort happened to be.

4. By-key, keep the row with max value in another column

This is the one most people actually want. You pick the dedup key (email), and you pick a tiebreaker column (updated_at), and you keep the row with the maximum value in the tiebreaker. Translation: “for each unique customer, keep the most recently updated record.”

Excel can’t do this directly. You can simulate it with a pivot table or a series of helper columns and INDEX/MATCH — it works but it’s painful, and it doesn’t compose well when your dataset changes.

Tabularly’s Dedupe tool defaults to this mode: pick the dedup key, pick keep by column mode, pick the tiebreaker column, choose ascending or descending. Done.

5. By-key with normalized comparison

Sometimes the duplicates only show up if you normalize first: trim whitespace, lowercase, strip punctuation. Ada Lovelace and ada lovelace are the same person in any reasonable interpretation, but byte-equal? No.

Most online tools don’t do this for you. The general solution is: pre-clean the data (a normalization pass) before running the dedupe. In Tabularly you can use the Columns op to add a normalized helper column, then dedupe by it.

So which one do I want?

If your data is already clean and ordered, #2 works. For anything else, you want #4 — keep by another column. The fact that Excel doesn’t offer #4 out of the box is a real gap, and it’s why people end up with stale data in supposedly “deduped” customer lists.

Tabularly’s Dedupe tool makes #4 a first-class option. Drop your file, pick the dedup column, pick the tiebreaker, choose direction. The whole thing runs in your browser — no upload, no Excel required.

The general principle

When tooling makes only the easy case easy, people use only the easy case. Then their data ends up in worse shape than if they hadn’t deduped at all — they got rid of duplicates but kept the wrong ones. The fix isn’t more clever scripts; it’s making the right case as easy as the wrong case.