As I discussed yesterday, I’m currently talking about the dependencies in John Graham Cummings, hncomments bash script. At the end of this I’m hoping that I have all the pieces that I can actually run this myself.

The recode tool, a utility I had never even heard of, converts files between various character sets. Given all the troubles I’ve had over the years with crawling the different app stores and encodings, I’m glad to learn about this.

If you’re on Linux then you can install recode with:

sudo apt-get install recode

If you’re on OSX then install recode with:

brew install recode

Use recode –help to get assistance (this is only a subset of the help):

recode --help
Free `recode' converts files between various character sets and surfaces.

Usage: recode [OPTION]... [ [CHARSET] | REQUEST [FILE]... ]

If a long option shows an argument as mandatory, then it is mandatory
for the equivalent short option also.  Similarly for optional arguments.

Listings:
  -l, --list[=FORMAT]        list one or all known charsets and aliases
  -k, --known=PAIRS          restrict charsets according to known PAIRS list
  -h, --header[=[LN/]NAME]   write table NAME on stdout using LN, then exit
  -F, --freeze-tables        write out a C module holding all tables
  -T, --find-subsets         report all charsets being subset of others
  -C, --copyright            display Copyright and copying conditions
      --help                 display this help and exit
      --version              output version information and exit

Operation modes:
  -v, --verbose           explain sequence of steps and report progress
  -q, --quiet, --silent   inhibit messages about irreversible recodings
  -f, --force             force recodings even when not reversible
  -t, --touch             touch the recoded files after replacement
  -i, --sequence=files    use intermediate files for sequencing passes
      --sequence=memory   use memory buffers for sequencing passes
  -p, --sequence=pipe     use pipe machinery for sequencing passes

Fine tuning:
  -s, --strict           use strict mappings, even loose characters
  -d, --diacritics       convert only diacritics or alike for HTML/LaTeX
  -S, --source[=LN]      limit recoding to strings and comments as for LN
  -c, --colons           use colons instead of double quotes for diaeresis
  -g, --graphics         approximate IBMPC rulers by ASCII graphics
  -x, --ignore=CHARSET   ignore CHARSET while choosing a recoding path

If we go back to the source on hncomments then we can see how recode fits in:

jq -r ‘.hits .[] .author + “\nhttps://news.ycombinator.com/item?id=” + .objectID + “\n\n” + .comment_text + “\n\n—\n\n”’ <(echo $j) sed -e ‘s/<[^>]*>/ /g;’ recode -f html..ascii mail -s “Latest $q HN comments” $e

Here recode is taking the output from sed and converting it from html character encoding to ASCII.