Question:
I have a large archive of recordings on a well-used mythbox and I know I have accumulated a lot of duplicates. I need to free some space, but going through and manually getting rid of the duplicates would be a monumental task.
Is there any mechanism or script available to track down these duplicates, and preferably delete all but the newest recording? I've searched the web and poked at my interface with no luck.
Answer:
You're welcome to use the script I wrote; I've been running it regularly to remove the duplicates.
$ wget http://evuraan.info/evuraan/stuff/myth-remove-duplicates.sh.txt -O myth-remove-duplicates.sh $ chmod +x myth-remove-duplicates.shEdit to replace RECORDINGDIR, user_name and pass_word with what's applicable to your setup and run myth-remove-duplicates.sh.
Running it :
$ ./myth-remove-duplicates.sh 1091_20131221090100.mpg is duplicate removed `/var/lib/mythtv/recordings/1091_20131221090100.mpg' 1091_20131221070100.mpg is duplicate removed `/var/lib/mythtv/recordings/1091_20131221070100.mpg'
Here's the script, myth-remove-duplicates.sh:
#!/bin/bash # Authored by Evuraan_AT_gmail_DOT_com # ABSOLUTELY NO WARRANTY, to the extent permitted by # applicable law. # YMMV. # Use at your own risk. user_name="mythtv" pass_word="yourpassword" RECORDINGDIR="/var/lib/mythtv/recordings" list="/tmp/recordings.txt-$RANDOM-$RANDOM" SQLSCRIPT="/tmp/recordings.sql-txt-$RANDOM-$RANDOM" gen_lists(){ mysql -u "$user_name" -p"$pass_word" -e "select starttime,basename,title,description from recorded order by starttime" mythconverg | tac > $list } verify_duplicate(){ # if verbatim repeats.. sum_a=$(egrep "${a:0:16}" $list | awk -F".mpg\t" 'NR>1 {print $NF}' |md5sum) [ ! -z "$sum_a" ] && egrep "${a:0:16}" $list | awk -F".mpg\t" 'END {print $NF}' | md5sum |egrep -q ${sum_a//-} } remove_duplicates(){ [ -s "${RECORDINGDIR}/${a}" ] && ( rm -v "${RECORDINGDIR}/${a}" ; echo "DELETE FROM mythconverg.recorded WHERE basename = '$a';" > $SQLSCRIPT ) [ -s $SQLSCRIPT ] && mysql -u "$user_name" -p"$pass_word" mythconverg < $SQLSCRIPT } gen_lists awk {'print $3'} $list |grep mpg$ | while read a ; do [[ $(egrep -c ${a:0:16} $list ) -ge 2 ]] && verify_duplicate && sed -i /"${a:0:16}"/d $list && echo $a is duplicate && remove_duplicates done rm $SQLSCRIPT $list 1>/dev/null 2>&1 || :