Apache Spark - A unified analytics engine for large-scale data processing
Apache Flink
MLeap: Deploy ML Pipelines to Production
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
A cluster computing framework for processing large-scale geospatial data
Use the world of Python from the comfort of Scala!
A Spark library for Amazon SageMaker.
A library that provides useful extensions to Apache Spark and PySpark.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
GeoTrellis for PySpark
Static facades for using TensorFlow in ScalaPy
The Vizier kernel-free notebook programming environment
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Helpers for setting up an embedded Python interpreter
Make Structs Easy (MSE)
Spark library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Apache Spark based framework for analysis A/B experiments
Online latent state estimation with Spark
Apache Spark data source for Adobe Analytics Data Feed