The R and Pandas Dataframe

Although Pandas uses the Dataframe as its primary data structure, just as R does, the Pandas syntax and underlying fundamentals can be disorienting for R users. This post will describe some basic comparisons and inconsistencies between the two languages. It will also provide some examples of very non-intuitive solutions to common problems. Introduction In the Datascience R versus Pandas debate, it is really an apples and oranges comparison. R is a domain specific language in the field of statistics, analytics, and data visualization.

Read More

Working through a Progressive Python Application

This post walks the developer through a python application as it progresses in development. It uses linux, docker, vscode, pyenv, pipenv and other tools for developing, building, and deploying an application. Environment Two tools can help you setup your local development environment: pyenv and pipenv. Pyenv is good for getting the correct python version. Pipenv is quite good at setting your virtual environment so that your versions of python and dependencies are separate from your actual machine.

Read More

Ranking Text With Word Embeddings

I recently implemented search functionality for my Hugo site, which can be seen at: https://imtorgdemo.github.io/pages/search/. The search uses lunr.js, an implementation of Solr. While it works, sufficiently, the metadata used for ranking queries could be improved. It would also be nice to visually locate the results by where resulting posts fit into the three data science fundamental disciplines: mathematics, computer science, and business. This narrative provides a quick solution for ranking posts by each discipline, then reducing the dimensions to 3 axes in the xy-plane.

Read More

The Insecurities of the 'Great Men'

While it may be tough to find the true nature of people, past their outward appearance, it can provide insight into all people, and to yourself. The ‘Great Man’ theory that leaders are born and not made came about in the 19th century when British aristocrats attained powerful positions through nepotism. It was primarily concerned with explaining leaders such Walpole, Wellington, and Caesar. Much of what is learned in history is their propoganda - getting the reality is much more difficult.

Read More

Distributing Code in Python

Distributing and deploying products is a necessary step in the solution development process, and an imperative in business. Each solution must be thoughtfully analyzed for strengths and weaknesses, especially from the perspective of security. The decisons you make are largely based on the language employed to create the solution. This post will describe steps taken in distributing a Python solution. Introduction to Bytecode Simple python scripts are a terrific approach to getting work done, quickly.

Read More

Incorporating an Initial Training Sample into a Project

During a data science project, data is often provided in an incremental manner. Some customer files are easier to obtain than others, such as when lengthy unarchiving processes are warranted. To ensure no time is wasted, available data can be put to use with initial analyses and model training as a Training Sample. The same data is incorporated with Training data when it is formally split into Training and Holdout sets.

Read More

Syntax Comparisons Across Languages

Scripting languages are quite popular for effectively getting work done. But, their similarities lead to mental difficulties when remembering syntax and common idioms. This post is used as a cheatsheet describing fundamental differences in how the languages are used. Introduction In any one day, I may program in five or six different languages. This is enjoyable when the syntax is different enough that there is no confusion. Domain Specific Languages, including R, SQL, Bash, and HTML, are orthogonal in their approach to being productive.

Read More

Processing Natural Language with Python and Friends

Python is a typical language chosen for Data Science work, and its strengths with strings make it especially useful for working with natural language. While the nltk library opened-up this work for python users, the newer spacy improves upon processing power by implementing Cython code. Tests display its power in production when compared with more traditional approaches, such as with Stanford’s CoreNLP. This post is an outline of examples from the spacy coursework and examples.

Read More

Historical Background of NLP with Deep Learning

The automated linguistic annotation of natural languages kept linguists and computer scientists hard at work for many decades. This work focused on the syntax of language as well as basic understanding and includes part-of-speech, named entity categorization, and syntatic dependency. Language meta-data tagging became much more accurate with the introduction of neural network and deep learning models. Because pairing meta-data with more powerful models is sure to allow for an explosion of new applications, it is important to understand the developments that allowed for the creation of this technology.

Read More

Prototyping an Interactive Frontend with Vue

Assumes you are comfortable with HTML, CSS, JS. This will cover creating SPAs with webpack and Vue basics. This will not cover mobile or advanced Vue. Improvements will make frontend development a more imperative programming approach as compared to the typical declarative methods found in more basic work. Introduction Once you understand the basics of browser languages, such as HTML, CSS, and JS, then you quickly desire to have more interactive elements.

Read More