Case studies
Below you will find some case studies (demos) I have developed outside of my normal project work. If you are interested in descriptions of actual projects for customers please go to the projects page.
Below you will find some case studies (demos) I have developed outside of my normal project work. If you are interested in descriptions of actual projects for customers please go to the projects page.
This is a library implemented in Scala which takes a set of datapoints each represented as a dense vector of dimension 100-300 and indexes the data. The index can be queried to find other vectors close to the query by cosine similarity.
This is the output of a scalable clustering algorithm that I have implemented. The purpose is to answer the question: "How can one navigate a dataset where objects are represented as vectors".
Technical Blog Post: Hierarchical Clustering That Works.
TextDrill aims to provide an easy to use, flexible, faithful and fast interface for knowledge discovery in unstructured text. TextDrill is centered around common tasks of word, ngram, sentence and document clustering, as well entity extraction. Everything, including entities can be represented and linked in the same "knowledge" graph. At the moment, its is still early days for the tool and only a technical presentation is available for some of the features. I am driving the features based on user feedback.
Textdrill Tour (on textdrill.io).
One can extend Spark SQL by additional functions. To ammortize the implementation effort, those should be frequently used functions, and most likely should be related to the business domain. This approach allows to have a) very good performance similar to native Spark while coding in Python; b) minimize noise in the code; c) speak the language of the business via Spark SQL.
A bare-bones Python application (with tests and demos) that explores the foundational underpinnings of (immutable) infrastructure as code.