Tuesday 8 January 2019

Source code filters

Every now and then I stumble upon code that is not UTF-8 and has DOS line endings. Build programs don't like that very much. So instead of waiting for builds to fail I like to check source code so now and then.

Find files that are not compatible with UTF-8 encoding
$ find . -path ./.git -prune -o -type f -exec file -i {} \;|awk -F'[:;]' '$3 !~ /charset=us-ascii|charset=utf-8|charset=binary/ ? enc=$1 : enc="" { print $3, $1  }'

Count the number of files that have dos line endings
$ find . -path ./.git -prune -o -type f -exec dos2unix -id {} \; |awk '{s+=$1} END {print s}'