Simple, Scalable, and Responsive Data Retrieval with ElasticSearch

2019-05-24

DataStore > Category

Distributed systems are very popular tools in the ‘big data’ market space and ElasticSearch evolved to become one of the major players. It serves the niche role of scaling to store large amounts of data, then allows querying it quickly. It evolved greatly over the last ten years to provide a variety of functionality. While it serves its primary purpose well, teams should resist the urge to use it in other roles, such as advanced analytics.

Solving Textual Problems with Regular Expressions

2019-05-19

Process > DataScience

Regular Expressions provide an important foundation for learning systems. They are useful for quick and direct approaches to solving problems without creating mounds of training data, nor the infrastructure for deploying a model. While they are a common programming technique, and simple enough to employ, they tend to be used so infrequently that you must re-learn them each time you wish to apply. This post summarizes the basic regex syntax, strategies, and workflow in hopes it will decrease the time needed to implement.

Generalizing the Machine Learning Process

2019-05-09

Process > DataScience

This work describes a general approach to follow when performing machine learning (ML) manually, and when automating in a deployment setting. Unlike a classical statistical analysis, standard machine learning projects typically follow a general and repeatable process. While the practictioner should be aware of details for each of the steps and the reasons for choosing them, there is much less design-thinking and checking of assumptions that are necessary components of more mathematical modeling fields.

Formatting for Jupyter (.ipynb) Notebooks

2019-05-08

Blog > Category

This is a test post for formatting Jupyter Notebooks for Hugo. This workflow makes use of the code at repository nb2hugo, as well as the beakerx jupyter kernel. This is will test blog is a complicated workflow. Begin by running the newest version of JupyterLab. Run through the basic markdown sections. Next, try working with the R kernel by using rpy2 library. Run these cells to ensure functionality. Then close the notebook and re-open it in Beakerx with the beakerx Groovy kernel.

Formatting a Markdown Post

2019-05-02

Blog > Category

Cupiditate voluptas sunt velit. Accusantium aliquid expedita excepturi quis laborum autem. Quas occaecati et atque est repellat dolores. Laudantium in molestiae consequatur voluptate ipsa. Nulla quia non qui sed. Voluptatem et enim nesciunt sunt pariatur. Libero eius excepturi voluptatibus reprehenderit. Facere enim neque dolorem sed ullam non. Dolor sit molestias repellendus. Awesome-1 Facilis maiores doloribus similique sint quaerat reiciendis quia. Autem nemo voluptas rerum. Eos odio aut omnis. Adipisci voluptas nihil autem recusandae.

List of ToDo Posts

2019-05-01

category1 > category2

This is a list of blog posts that are referenced, but not yet complete. List of Future Posts General Machine Learning scoring metrics, model performance metrics, and graphs imbalanced data feature engineering models in-depth Natural Language Processing functional programming numpy nltk / spaCy gensim word2vec fasttext nlp overview General Data Concepts datawarehousing infrastructure architectures serverless deployments Business and Sales pricing pyramid matching problems and solutions marketing -> business development -> sales cycle and org structure