Interesting Data Science Utilities
Hacker News had an excellent article on tools for large scale CSV / TSV / etc utilities. If you do this type of work a lot / look at sizable amounts of raw data, I'd be strongly surprised if you didn't find a new tool here. The things I'm looking at are visidata and octosql and gron.
Here are some of the interesting takeaways on the tool front:
- http://jmespath.org/
- https://github.com/BurntSushi/xsv
- https://github.com/dinedal/textql
- https://github.com/n3mo/data-science
- https://stedolan.github.io/jq/
- https://gitlab.redox-os.org/redox-os/parallel
- https://github.com/willghatch/racket-rash
- https://visidata.org/
- https://github.com/tomnomnom/gron - JSON grep
- https://github.com/dflemstr/rq
- https://www.gnu.org/software/datamash/
- https://github.com/johnkerl/miller (written in D)
- https://github.com/mechatroner/RBQL
- https://github.com/shellbound/jwalk/
- https://www.rdocumentation.org/packages/plyr/
- https://github.com/google/crush-tools
- https://github.com/python-mario/mario (python for manipulation)
- https://github.com/cube2222/octosql/ (sql for manipulation)
- https://github.com/dkogan/vnlog
- https://csvkit.readthedocs.io/
- https://github.com/eBay/tsv-utils-dlang
- http://harelba.github.io/q/
- https://github.com/BatchLabs/charlatan
- https://github.com/dinedal/textql
- https://github.com/BurntSushi/xsv
- https://github.com/dbohdan/sqawk
- https://stedolan.github.io/jq/
- https://github.com/benbernard/RecordStream
- https://github.com/noyesno/awka (awk)
Posted In: #data_science #machine_learning