Python Libraries

  • AutoViz - AutoViz performs automatic visualization of any dataset with a single line of python code. Give it any .csv, .txt, or .json file of any size and AutoViz will visualize it and save it as html files automatically.

  • Numba - Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.

  • scikit-learn - Scikit-learn is an open source machine learning library that is a simple and efficient tool for predictive data analysis.

  • NetworkX - NetworkX is a python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

  • pandas - is a fast, powerful, flexible, and easy to use open source data analysis and manipulation tool.

  • Vaex - Combines memory mapping, a sophisticated expression system, and fast out-of-core algorithms. Efficiently visualize and explore big datasets, and build machine learning models on a single machine.

  • SciPy - SciPy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems.

  • XGBoost - XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.

  • PyMC - PyMC is a probabilistic programming library for Python that allows users to build Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo (MCMC) methods.

  • statsmodels - statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

  • bokeh - Python has an incredible ecosystem of powerful analytics tools: NumPy, Scipy, Pandas, Dask, Scikit-Learn, OpenCV, and more. With a wide array of widgets, plot tools, and UI events that can trigger real Python callbacks, the Bokeh server is the bridge that lets you connect these tools to rich, interactive visualizations in the browser.

  • Blaze - The Blaze ecosystem is a set of libraries that help users store, describe, query and process data.

  • SparklingPandas - SparklingPandas aims to make it easy to use the distributed computing power of PySpark to scale your data analysis with Pandas. SparklingPandas builds on Spark’s DataFrame class to give you a polished, pythonic, and Pandas-like API.

  • Superset - Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

  • PyCM - PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters.

  • Plotly Dash - Dash is the original low-code framework for rapidly building data apps in Python, R, Julia, and F# (experimental). Written on top of Plotly.js and React.js, Dash is ideal for building and deploying data apps with customized user interfaces. It’s particularly suited for anyone who works with data.

