Track Nlp with Ruby Updates Daily
Curated List: Practical Natural Language Processing done in Ruby
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 arbox/nlp-with-ruby · ⭐ 1.1K · 🏷️ Computer Science
Aug 27, 2022
Multipurpose Engines
- ruby-spacy (⭐65) — Wrapper module for spaCy NLP library via PyCall (⭐1.1k).
Mar 30, 2021
Language Aware String Manipulation / Constituency Parsing
- iuliia (⭐10) — transliteration Cyrillic to Latin in many possible ways (defined by the reference implementation (⭐72)).
Jan 17, 2019
Spelling and Error Correction / Constituency Parsing
- gingerice (⭐478) - Spelling and Grammar corrections via the Ginger API.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2018
- Natural Language Processing and Tweet Sentiment Analysis by Cassandra Corrales [post]
- 2017
- The Google NLP API Meets Ruby by Aja Hammerly [post]
- Syntax Isn't Everything: NLP For Rubyists by Aja Hammerly [slides]
- Scientific Computing on JRuby by Prasun Anand [slides | video | slides | slides]
- Unicode Normalization in Ruby by Starr Horne [post]
Related Resources / Constituency Parsing
Sep 02, 2017
Machine Learning Libraries / Constituency Parsing
- rblearn (⭐2) - Feature Extraction and Crossvalidation library.
Jul 27, 2017
Language Aware String Manipulation / Constituency Parsing
- regex_sample - sample string generation from a given Regular Expression.
Jul 24, 2017
Language Aware String Manipulation / Constituency Parsing
- translit_kit (⭐7) - Transliterate Hebrew & Yiddish text into Latin characters.
- re2 (⭐146) - hight-speed Regular Expression library for Text Mining and Text Extraction.
Jun 14, 2017
Related Resources / Constituency Parsing
May 24, 2017
Community / Constituency Parsing
May 19, 2017
Dialog Agents, Assistants, and Chatbots / Constituency Parsing
- chatterbot (⭐490) - Straightforward ruby-based Twitter Bot Framework, using OAuth to authenticate.
- lita (⭐1.7k) - Highly extensible chat operation bot framework written with persistent storage on Redis.
May 17, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2011
- Ruby one-liners by Benoit Hamelin [post]
- Clustering in Ruby by Colin Drake [post/)]
May 16, 2017
Multipurpose Engines / On-line APIs
- google-cloud-language (⭐1.4k) - Google's Natural Language service API for Ruby.
May 07, 2017
Pipeline Generation
- parallel (⭐4.2k) - Supervisor for parallel execution on multiple CPUs or in many threads.
- pwrake (⭐57) - Rake extensions to run local and remote tasks in parallel.
May 02, 2017
Projects and Code Examples / Constituency Parsing
- RSyntaxTree - Web based demonstration of the syntactic tree visualization.
Apr 21, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2006
- Speak My Language: Natural Language Processing With Ruby by Michael Granger [slides | write-up | write-up]
Apr 16, 2017
Lexical Processing / Lexical Statistics: Counting Types and Tokens
- words_counted (⭐162) - Pure Ruby library counting word statistics with different custom options.
Full Text Search, Information Retrieval, Indexing / Constituency Parsing
- google-api-client (⭐2.8k) - Ruby API library for Google services.
Machine Translation / Constituency Parsing
- zipf (⭐3) - implementation of BLEU and other base algorithms.
Text Extraction / Constituency Parsing
- yomu (⭐502) - library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit.
Projects and Code Examples / Constituency Parsing
- Words Counted - examples of customizable word statistics powered by words_counted (⭐162).
Apr 14, 2017
Optical Character Recognition / Constituency Parsing
- tesseract-ocr (⭐630) - FFI based wrapper over the Tesseract OCR Engine (⭐68k).
Language Aware String Manipulation / Constituency Parsing
- fuzzy_match (⭐680) - Fuzzy string comparison with Distance measures and Regular Expression.
- fuzzy_tools (⭐23) - Toolset for fuzzy searches in Ruby tuned for accuracy.
Needs your Help! / Constituency Parsing
- summarize (⭐205) - Ruby native wrapper for Open Text Summarizer (⭐243).
Apr 11, 2017
Machine Learning Libraries / Constituency Parsing
- ruby-fann (⭐505) - Ruby bindings to the Fast Artificial Neural Network Library (FANN).
Full Text Search, Information Retrieval, Indexing / Constituency Parsing
- rsolr (⭐420) - Ruby and Rails client library for Apache Solr.
- sunspot (⭐3k) - Rails centric client for Apache Solr.
- thinking-sphinx (⭐1.6k) - Active Record plugin for using Sphinx in (not only) Rails based projects.
- elasticsearch (⭐2k) - Ruby client and API for Elasticsearch.
- elasticsearch-rails (⭐3.1k) - Ruby and Rails integrations for Elasticsearch.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2009
- Porting the UEA-Lite Stemmer to Ruby by Jason Adams [post]
- NLP Resources for Ruby by Jason Adams [post]
Projects and Code Examples / Constituency Parsing
- Going the Distance (⭐61) - Implementations of various distance algorithms with example calculations.
- Named entity recognition with Stanford NER and Ruby (⭐20) - NER Examples in Ruby and Java with some explanations.
Needs your Help! / Constituency Parsing
- ferret (⭐280) - Information Retrieval in C and Ruby.
Apr 10, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2016
- Quickly Create a Telegram Bot in Ruby by Ardian Haxha [tutorial]
- Deep Learning: An Introduction for Ruby Developers by Geoffrey Litt [slides]
- How I made a pure-Ruby word2vec program more than 3x faster by Kei Sawada [slides]
- Dōmo arigatō, Mr. Roboto: Machine Learning with Ruby by Eric Weinstein [slides | video]
Apr 06, 2017
Machine Learning Libraries / Constituency Parsing
- decisiontree (⭐1.5k) - Decision Tree ID3 Algorithm in pure Ruby [post].
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2007
- Decision Tree Learning in Ruby by Ilya Grigorik [post]
Apr 05, 2017
Multipurpose Engines / On-line APIs
- wlapi (⭐19) - Ruby client library for Wortschatz Leipzig web services.
Lexical Processing / Filtering Stop Words
- stopwords-filter (⭐78) - Filter and Stop Word Lexicon based on the SnowBall lemmatizer.
Sentiment Analysis / Constituency Parsing
- stimmung (⭐20) - Semantic Polarity based on the SentiWS lexicon.
Mar 03, 2017
Numbers, Dates, and Time Parsing / Constituency Parsing
- numerizer (⭐38) - Ruby parser for English number expressions.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2015
- N-gram Analysis for Fun and Profit by Jesus Castello [tutorial]
- Machine Learning made simple with Ruby by Lorenzo Masini [tutorial]
- Using Ruby Machine Learning to Find Paris Hilton Quotes by Rick Carlino [tutorial]
- Exploring Natural Language Processing in Ruby by Kevin Dias [slides]
- Machine Learning made simple with Ruby by Lorenzo Masini [post]
- Practical Data Science in Ruby by Bobby Grayson [slides]
Feb 24, 2017
Spelling and Error Correction / Constituency Parsing
- hunspell-i18n (⭐4) - Ruby bindings to the standard Hunspell Spell Checker.
- ffi-hunspell (⭐49) - FFI based Ruby bindings for Hunspell.
- hunspell (⭐35) - Ruby bindings to Hunspell via Ruby C API.
Feb 17, 2017
Pipeline Generation
- phobos (⭐219) - Simplified Ruby Client for Apache Kafka.
Feb 15, 2017
Language Identification / On-line APIs
- scylla (⭐36) - Language Categorization and Identification.
Feb 10, 2017
Linguistic Resources / Constituency Parsing
- rwordnet (⭐90) - Pure Ruby self contained API library for the Princeton WordNet®.
- wordnet (⭐138) - Performance tuned bindings for the Princeton WordNet®.
Feb 03, 2017
Multipurpose Engines
- open_nlp (⭐11) - JRuby Bindings for the OpenNLP Toolkit.
Jan 29, 2017
Multipurpose Engines
- nlp_toolz (⭐2) - Wrapper over some OpenNLP classes and the original Berkeley Parser (⭐182).
Machine Learning Libraries / Constituency Parsing
- lda-ruby (⭐133) - Ruby implementation of the LDA (Latent Dirichlet Allocation) for automatic Topic Modelling and Document Clustering.
Jan 23, 2017
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2010
- bayes_motel – Bayesian classification for Ruby by Mike Perham [post]
Jan 10, 2017
Multipurpose Engines / On-line APIs
- monkeylearn-ruby (⭐80) - Sentiment Analysis, Topic Modelling, Language Detection, Named Entity Recognition via a Ruby based Web API client.
Jan 09, 2017
Multipurpose Engines
- open-nlp (⭐91) - Ruby Bindings for the OpenNLP Toolkit.
- stanford-core-nlp (⭐434) - Ruby Bindings for the Stanford CoreNLP (⭐9.9k) tools.
Numbers, Dates, and Time Parsing / Constituency Parsing
- chronic (⭐3.2k) - Pure Ruby natural language date parser.
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2014
- Natural Language Parsing with Ruby by Glauco Custódio [tutorial]
- Demystifying Data Science: Analyzing Conference Talks with Rails and Ngrams by Todd Schneider [video | code (⭐34)]
- Natural Language Processing with Ruby by Konstantin Tennhard [video | video | video | slides]
Jan 06, 2017
Multipurpose Engines
- treat (⭐1.4k) - Natural Language Processing framework for Ruby (like NLTK for Python).
Multipurpose Engines / On-line APIs
- wit-ruby (⭐282) - Ruby client library for the Wit.ai Language Understanding Platform.
Related Resources / Constituency Parsing
- Awesome TensorFlow (⭐17k) - Machine Learning with TensorFlow libraries.
Jan 04, 2017
Multipurpose Engines / On-line APIs
- alchemyapi_ruby (⭐36) - Legacy Ruby SDK for AlchemyAPI/Bluemix.
Related Resources / Constituency Parsing
- Awesome Ruby (⭐14k) - Among other awesome items a short list of NLP related projects.
- Awesome OCR (⭐3k) - Multitude of OCR (Optical Character Recognition) resources.
Dec 19, 2016
Pipeline Generation
- ruby-spark (⭐227) - Spark bindings with an easy to understand DSL.
Dec 12, 2016
Pipeline Generation
- composable_operations (⭐46) - Definition framework for operation pipelines.
Dec 08, 2016
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2012
- Machine Learning with Ruby, Part One by Vasily Vasinov [tutorial]
Dec 07, 2016
Articles, Posts, Talks, and Presentations / Constituency Parsing
- 2013
- How to parse 'go' - Natural Language Processing in Ruby by Tom Cartwright [slides | video]
- Natural Language Processing in Ruby by Brandon Black [slides | video]
- Natural Language Processing with Ruby: n-grams by Nathan Kleyn [tutorial | code (⭐33)]
- Seeking Lovecraft, Part 1: An introduction to NLP and the Treat Gem by Robert Qualls [tutorial]
Related Resources / Constituency Parsing
- Speech and Natural Language Processing (⭐2.2k) - General List of NLP related resources (mostly not for Ruby programmers).
Dec 06, 2016
Related Resources / Constituency Parsing
- Ruby NLP (⭐1.3k) - State-of-Art collection of Ruby libraries for NLP.
- Scientific Ruby - Linear Algebra, Visualization and Scientific Computing for Ruby.
- iRuby (⭐917) - IRuby kernel for Jupyter (formelly IPython).
Nov 30, 2016
Community / Constituency Parsing
Nov 29, 2016
Language Aware String Manipulation / Constituency Parsing
- active_support (⭐57k) -
RoR
ActiveSupportgem has various string extensions that can handle case.
Books / Constituency Parsing
- Miller, Rob. Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Programmers, 2015. [link]
- Watson, Mark. Practical Semantic Web and Linked Data Applications. Lulu, 2010. [link]
Nov 27, 2016
Segmentation / On-line APIs
- tokenizer (⭐45) - Simple multilingual tokenizer. [tutorial]
- pragmatic_tokenizer (⭐91) - Multilingual tokenizer to split a string into tokens.
- textoken (⭐31) - Simple and customizable text tokenization library.
- pragmatic_segmenter (⭐572) - Word Boundary Disambiguation with many cookies.
- punkt-segmenter (⭐92) - Pure Ruby implementation of the Punkt Segmenter.
- tactful_tokenizer (⭐80) - RegExp based tokenizer for different languages.
- scapel (⭐51) - Sentence Boundary Disambiguation tool.
Lexical Processing / Stemming
- ruby-stemmer (⭐250) - Ruby-Stemmer exposes the SnowBall API to Ruby.
- uea-stemmer (⭐54) - Conservative stemmer for search and indexing.
Lexical Processing / Lemmatization
- lemmatizer (⭐108) - WordNet based Lemmatizer for English texts.
Lexical Processing / Lexical Statistics: Counting Types and Tokens
- wc (⭐6) - Facilities to count word occurrences in a text.
- word_count (⭐5) -
Word counter for
StringandHashobjects.
Phrasal Level Processing / Filtering Stop Words
- n_gram (⭐37) - N-Gram generator.
- ruby-ngram (⭐12) - Break words and phrases into ngrams.
- raingrams (⭐69) - Flexible and general-purpose ngrams library written in pure Ruby.
Semantic Analysis / Constituency Parsing
- amatch (⭐384) - Set of five distance types between strings (including Levenshtein, Sellers, Jaro-Winkler, 'pair distance').
- damerau-levenshtein (⭐146) - Calculates edit distance using the Damerau-Levenshtein algorithm.
- hotwater (⭐80) - Fast Ruby FFI string edit distance algorithms.
- levenshtein-ffi (⭐151) - Fast string edit distance computation, using the Damerau-Levenshtein algorithm.
- tf_idf (⭐35) - Term Frequency / Inverse Document Frequency in pure Ruby.
- tf-idf-similarity (⭐765) - Calculate the similarity between texts using TF/IDF.
Pragmatical Analysis / Constituency Parsing
- SentimentLib (⭐14) - Simple extensible sentiment analysis gem.
Text Alignment / Constituency Parsing
- alignment (⭐1) - Alignment routines for bilingual texts (Gale-Church implementation).
Machine Translation / Constituency Parsing
- microsoft_translator (⭐21) - Ruby client for the microsoft translator API.
- termit (⭐508) - Google Translate with speech synthesis in your terminal.
Numbers, Dates, and Time Parsing / Constituency Parsing
- chronic_between (⭐28) - Simple Ruby natural language parser for date and time ranges.
- chronic_duration (⭐355) - Pure Ruby parser for elapsed time.
- kronic (⭐149) - Methods for parsing and formatting human readable dates.
- nickel (⭐117) - Extracts date, time, and message information from naturally worded text.
- tickle (⭐82) - Parser for recurring and repeating events.
Named Entity Recognition / Constituency Parsing
- ruby-ner (⭐20) - Named Entity Recognition with Stanford NER and Ruby.
- ruby-nlp (⭐92) - Ruby Binding for Stanford Pos-Tagger and Name Entity Recognizer.
Text-to-Speech-to-Text / Constituency Parsing
- espeak-ruby (⭐196) - Small Ruby API for utilizing 'espeak' and 'lame' to create text-to-speech mp3 files.
- tts (⭐94) - Text-to-Speech conversion using the Google translate service.
- att_speech (⭐20) - Ruby wrapper over the AT&T Speech API for speech to text.
- pocketsphinx-ruby (⭐258) - Pocketsphinx bindings.
Machine Learning Libraries / Constituency Parsing
- rb-libsvm (⭐278) - Support Vector Machines with Ruby.
- rtimbl (⭐5) - Memory based learners from the Timbl framework.
- classifier-reborn (⭐555) - General classifier module to allow Bayesian and other types of classifications.
- liblinear-ruby-swig (⭐83) - Ruby interface to LIBLINEAR (much more efficient than LIBSVM for text classification).
- linnaeus (⭐38) - Redis-backed Bayesian classifier.
- maxent_string_classifier (⭐9) - JRuby maximum entropy classifier for string data, based on the OpenNLP Maxent framework.
- naive_bayes (⭐49) - Simple Naive Bayes classifier.
- nbayes (⭐154) - Full-featured, Ruby implementation of Naive Bayes.
- omnicat (⭐11) - Generalized rack framework for text classifications.
- omnicat-bayes (⭐31) - Naive Bayes text classification implementation as an OmniCat classifier strategy.
Language Aware String Manipulation / Constituency Parsing
- fuzzy-string-match (⭐284) - Fuzzy string matching library for Ruby.
- u - U extends Ruby’s Unicode support.
- unicode (⭐81) - Unicode normalization library.
- CommonRegexRuby (⭐80) - Find a lot of kinds of common information in a string.
- regexp-examples (⭐521) - Generate strings that match a given regular expression.
- verbal_expressions (⭐571) - Make difficult regular expressions easy.
Apr 22, 2016
Segmentation / On-line APIs
- nlp-pure (⭐20) - Natural language processing algorithms implemented in pure Ruby with minimal dependencies.
Machine Learning Libraries / Constituency Parsing
- weka (⭐64) - JRuby bindings for Weka, different ML algorithms implemented through Weka.