A Python and Data Science Tooling Vocabulary for a Rubyist

I am unabashedly a Ruby guy. I’ve now spent 13 years immersed in Ruby on a daily basis. But a new consulting gig has me delving into Python both as a light implementer and as a likely remote manager of some Python folk in a Data Science / Machine Learning context.

I wrote this as a regularly updated document so I have a place to stick new vocabulary items I learn.

A lot of this is the names of libraries and tools because learning any language isn’t just the language, it is the constellation of stuff that make it useful for a given task. The focus here is clearly on scientific computing and machine learning.

You should also likely see Python Glossary.

A

  • Anaconda - A packaged distribution of Python and R aimed at Data Science. More…. Includes multiple bits of tooling such as Jupyter. * Anaconda Cloud - A platform for sharing notebooks and packages.

B

  • Bert - Bert or Bidirectional Encoder Representation from Transformers is a state of the art (2018) language model for natural language processing (NLP). Bert is based on a Google paper which shows that a language model which is bidirectionally trained can have a deeper sense of language context and flow than single-direction language models. More…
  • Bokeh - Bokeh is an interactive visualization library that targets modern web browsers for presentation.  More…

C

  • Conda - an open source package manager for “any language” but originally for Python. More…

D

  • Dask - Dask is a flexible library for parallel computing in Python. Dask is composed of two parts: … “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. More…
  • DataShader - Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly. Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. More…

E

  • Egg - a Python component. Think Ruby gem.

F

G

H

  • **H20.ai **- H2O.ai is the creator of the leading open source machine learning and artificial intelligence platform trusted by hundreds of thousands of data scientists. More…
  • Holoviews - HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. More…

I

J

  • Jupyter Notebook - The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. python3 -m pip install jupyter and then run it with: jupyter notebook

K

L

M

  • MatPlotLib - Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. More…

N

  • Numba - Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM. More…
  • NumPy - A standard numerical library for Python. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. More…

O

P

  • Pandas - pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. More…
  • Pip - A Python package installer. Example for python 3: python3 -m pip install –upgrade pip
  • PyCharm - an ide for Python from the JetBrains folk.
  • Pythonic - something that is done in a very Python like way.
  • PyTorch - PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing. It is primarily developed by Facebook’s artificial intelligence research group. It is free and open-source software released under the Modified BSD license. More…

Q

R

  • Repl - This is short for Read Evaluate Print Loop and it is the result of what you get when you type python at your command prompt. A repl gives you a place to type Python code you want executed. Type quit() to exit the Python repl – the () are required as quit is a method not a statement. Think irb or “rails c”.

  • Roberta - A robustly optimized method for pretraining natural language processing (NLP) systems that improves on Bidirectional Encoder Representations from Transformers, or BERT, the self-supervised method released by Google in 2018. More…

S

  • Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.  It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.More…
  • SciPy - SciPy is a free and open-source Python library used for scientific computing and technical computing. More…

T

  • TensorFlow - TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.[5] It is used for both research and production at Google. More…
  • Torch - Torch is an open-source machine learning library, a scientific computing framework, and a script language based on the Lua programming language.[3] It provides a wide range of algorithms for deep learning, and uses the scripting language LuaJIT, and an underlying C implementation. As of 2018, Torch is no longer in active development.[4] However, PyTorch is actively developed as of August 2019. More…

U

V

  • VirtualEnv - a virtual environment manager allowing you to have more than one Python on a machine. Think RbEnv or RVM. More… pip install virtualenv

W

X

Y

Z