tutorials bash

Remove duplicate lines from text files (with sort)

A quick method to remove duplicates from text files - including for example CSV files - where multiple records have been added (perhaps automatically) at different times resulting in multiple copies of the same record scattered throughout the file. Here is a simple one-liner bash command to remove duplicates using sort.

This method is sensitive to the line endings of the file. If you have files editing in a combination of Unix/Linux/Mac/Windows you may have a variety of line-endings in place.

The simplest route is to run the file through dos2unix before attempting the sort/unique filter.

Requirements

sort

Method

In a bash shell enter:

python
sort -u file.csv -o file.csv

This takes your file, sorts it (using sort), gets the unique entries (-u) and writes it to the outfile (-o) which here is the same as the initial file.

Over 10,000 developers have bought Create GUI Applications with Python & Qt!
Create GUI Applications with Python & Qt6
More info Get the book

Downloadable ebook (PDF, ePub) & Complete Source code

To support developers in [[ countryRegion ]] I give a [[ localizedDiscount[couponCode] ]]% discount on all books and courses.

[[ activeDiscount.description ]] I'm giving a [[ activeDiscount.discount ]]% discount on all books and courses.