A community index of third-party packages for Apache Spark.

Showing packages 51 - 100 out of 517

Spark SQL IBM Cloudant External Datasource

@cloudant / No release yet /

(1)

  • 1data source
  • 1sql


Docker container for spark standalone cluster.

@epahomov / No release yet /

(0)

  • 1tools
  • 1deployment


An implement of Factorization Machines (LibFM)

@zhengruifeng / No release yet /

(0)

  • 1ml
  • 1mllib
  • 1machine learning


Package adding dropout regularization to Apache Spark MLlib project

@rakeshchalasani / No release yet /

(1)

  • 1machine learning
  • 1mllib
  • 1scala


Feature Selection framework based on Information Theory that includes: mRMR, InfoGain, JMI and other commonly used FS filters.

@sramirez / Latest release: 1.4.4 (2017-09-25) / Apache-2.0 /

(8)

  • 3feature-selection
  • 3mllib
  • 3machine learning


Spark implementation of Fayyad's discretizer based on Minimum Description Length Principle (MDLP)

@sramirez / Latest release: 1.4.1 (2017-09-25) / Apache-2.0 /

(7)

  • 2discretization
  • 2mllib
  • 1machine learning


Coursera Machine Learning class examples in Spark

@zinniasystems / No release yet /

(0)

  • 2ml
  • 2machine learning
  • 1example


Maven archetype used to bootstrap a Spark Scala project

@mbonaci / Latest release: 0.9 (2015-04-24) / MIT /

(0)

  • 1Maven
  • 1tools
  • 1scala


Deprecated, please see couchbase/couchbase-spark-connector

@couchbaselabs / Latest release: 1.0.0 (2015-10-20) / Apache-2.0 /

(1)

  • 1streaming
  • 1library
  • 1sql


Using JPMML Evaluator to validate the PMML models exported from Spark

@selvinsource / No release yet /

(1)

  • 1ml
  • 1mllib
  • 1machine learning


SBT plugin for spark-ec2

@pishen / No release yet /

(0)

  • 1tools
  • 1sbt
  • 1deployment


Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL

@saurfang / Latest release: 3.0.0-s_2.12 (2020-09-13) / Apache-2.0 /

(1)

  • 1sas
  • 1tools
  • 1sql


Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

@LucidWorks / Latest release: 2.0.1 (2016-06-09) / Apache-2.0 /

(1)

  • 1ml
  • 1data sources
  • 1solr


Spark and Spark SQL integration for Succinct

@amplab / Latest release: 0.1.8 (2019-07-10) / Apache-2.0 /

(1)

  • 1application
  • 1data source


PySpark + Scikit-learn = Sparkit-learn

@lensacom / No release yet /

(2)

  • 1python
  • 1scikit-learn
  • 1machine learning


RabbitMQ Spark Streaming receiver

@Stratio / Latest release: 0.4.0 (2016-12-20) / Apache-2.0 /

(10)

  • 4streaming


Streaming Recommendation Engine using matrix factorization with user and product bias

@brkyvz / Latest release: 0.1.0 (2015-05-26) / Apache-2.0 /

(2)

  • 1streaming
  • 1ml
  • 1machine learning


Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.

@cloudml / No release yet /

(2)

  • 1ml
  • 1mllib
  • 1machine learning


ElasticSearch integration for Apache Spark

@SHSE / Latest release: 1.0.7 (2016-02-04) / Apache-2.0 /

(1)

  • 1analytics
  • 1search
  • 1elasticsearch


Test Project

@EronWright / Latest release: 0.0.13 (2015-06-11) / Apache-2.0 /

(0)


Pyspark support for Elastic Search

@TargetHolding / Latest release: 0.4.2 (2016-03-22) / Apache-2.0 /

(1)

  • 1python
  • 1spark
  • 1database


A machine learning package built for humans.

@airbnb / No release yet /

(1)

  • 1machine learning


Distributed solver library for large-scale structured output prediction

@dalab / No release yet /

(0)

  • 1Support Vector Machine
  • 1Structured Prediction
  • 1machine learning


Manipulate Apache Spark Streaming by SQL

@Intel-bigdata / No release yet /

(1)

  • 1streaming
  • 1sql


Two way association analysis

@mfawadalam / No release yet /

(0)


Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data/Compute Engine

@ddf-project / No release yet /

(11)

  • 3API
  • 2tools
  • 2machine learning


A library for exposing dateTime functions from the joda library as SQL functions. With a dsl to build dateTime catalyst expressions.

@SparklineData / Latest release: 0.0.2 (2015-10-29) / Apache-2.0 /

(1)

  • 1spark
  • 1sql
  • 1dateTime


A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.

@bigdatagenomics / No release yet /

(1)


Deploy Spark cluster in an easy way.

@pishen / Latest release: 0.5.1 (2015-06-25) / Apache-2.0 /

(0)

  • 1tools
  • 1sbt
  • 1deployment


A distributed implementation of AdaBoost.MH and MP-Boost using Apache Spark

@tizfa / Latest release: 0.6 (2015-07-01) / Apache-2.0 /

(0)

  • 1adaboost
  • 1classification
  • 1machine learning


A Hivemall wrapper for Spark

@maropu / Latest release: 0.0.6 (2016-04-07) / Apache-2.0 /

(0)

  • 1sql
  • 1hive
  • 1machine learning


Official integration between Apache Spark and Elasticsearch real-time search and analytics

@elastic / Latest release: 5.3.1 (2017-04-21) / Apache-2.0 /

(3)

  • 1search
  • 1elasticsearch
  • 1sql


Highly Scalable Grid-Density Clustering Algorithm for Spark MLLib

@thomastriplet / No release yet /

(0)

  • 1clustering
  • 1spark
  • 1machine learning


Spark package with multiple LDA implementations

@EntilZha / No release yet /

(0)


Restful service for running Spark SQL/Shark queries on top of Spark, with Mesos and Tachyon support.

@Atigeo / No release yet /

(0)


Restful service that enables support for multiple spark contexts created from the same server.

@Atigeo / No release yet /

(0)


WIP Demo Package

@brkyvz / No release yet /

(0)


Alternative to Spark machine learning pipeline feature extractors, focused on building sparse feature vectors.

@collectivemedia / No release yet /

(1)

  • 2feature extraction
  • 2machine learning


Spark algorithms for building and processing k-nn graphs

@tdebatty / Latest release: 0.13 (2016-02-17) / MIT /

(1)

  • 1graph
  • 1machine learning


Native, optimized access to HBase Data through Spark SQL/Dataframe Interfaces

@Huawei-Spark / Latest release: 1.0.0 (2015-07-17) / Apache-2.0 /

(1)

  • 1hbase


Simplified tabular data processing library for Spark

@Atigeo / No release yet /

(0)


Geo Spatial Data Analytics on Spark

@harsha2010 / Latest release: 1.0.5-s_2.11 (2017-08-14) / Apache-2.0 /

(1)

  • 2geospatial
  • 2data source
  • 2sql


Scala library for converting Spark rows to case classes

@ypg-data / Latest release: 0.2.0-s_2.11 (2016-03-01) / Apache-2.0 /

(0)

  • 1sql
  • 1library
  • 1scala


An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build(SBT) for building the project.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 /

(1)

  • 1streaming
  • 1sbt
  • 1scala


Pyspark Notebook With Docker.

@prabeesh / Latest release: 0.1.0 (2015-08-04) / Apache-2.0 /

(1)

  • 2python
  • 1docker
  • 1pyspark


SparkListener that converts SparkListenerEvents to JSON and forwards them to an external service via RPC.

@hammerlab / Latest release: 2.0.1 (2015-10-12) / Apache-2.0 /

(0)


An Apache Spark utility for pulling Tweets from Gnip's PowerTrack in realtime

@knoldus / No release yet /

(1)

  • 1streaming
  • 1data source
  • 1scala


Generic solution for scanning, joining and mutating HBase tables to and from the Spark RDDs.

@michal-harish / No release yet /

(0)


Spark Salesforce Wave Connector

@springml / Latest release: 1.2.0 (2018-04-25) / Apache-2.0 /

(2)

  • 1salesforce
  • 1data source


Library for computing centrality for graph nodes

@webgeist / Latest release: 0.11 (2015-08-09) / LGPL-3.0 /

(3)

  • 2graph