Cluster analysis, or clustering, refers to the segmentation of items or observations into groups based on their similarity to one another. This is typically accomplished using some form of iterative algorithm that repeatedly assigns and reassigns items to different groups in an effort to try and maximize similarity within the clusters while also attempting to keep the clusters distinct from one another. …


Photo by Marc-Olivier Jodoin on Unsplash

In Part 1 of this article, we looked at some introductory topics in the domain of time series analysis. Topics covered in Part 1 included exploratory analysis, visualizations, seasonal decomposition, stationarity and ARIMA models. This included an evaluation of autoregression, moving averages, and differencing for development of time series forecasting models.

In Part 2, we are going to be expanding upon what we covered in Part 1 by looking at some alternative time series forecasting methods, including:

  • Simple Exponential Smoothing
  • Triple Exponential Smoothing (Holt-Winters Method)
  • Long Short-Term Memory (LSTM) Neural Networks
  • Prophet

Code samples are included in this article, but…


Photo by Fleur on Unsplash

Choosing an evaluation metric to assess model performance is an important element of the data analysis pipeline. By properly selecting an evaluation metric, or equation used to objectively assess a model’s performance, we can get a good idea how closely the results produced by our model match real-world observations. We can use the evaluation metric to determine if improvements should be made to model parameters or if another modeling method should be considered.

While choosing an evaluation metric seems like a simple task, there are a number of different metrics available. Each method has strengths and weaknesses depending on the…


Photo by Julian Hochgesang on Unsplash

Sentiment analysis refers to classification of a sample of text based on the sentiment or opinion it expresses. Whenever we write text, it contains some encoded information that conveys the attitude or feelings of the writer to the reader. This information is “encoded” in our understanding of how language is used as a tool of expression, and while this may seem like a very subjective concept, the goal of using machine learning and natural language processing to categorize a piece of text’s sentiment is to turn it into an objective task.

At its most basic level, sentiment analysis is used…


Project Description

Automated generation of coherent text is an area of Natural Language Processing (NLP) that has garnered a great deal of attention over the past several years. Several state-of-the-art language models have been developed that are capable of automatically generating text at a quality that approaches that of human-generated text. The possibilities for automated text generation are endless, and while many potential use cases are seemingly benign (i.e. automated summarization of long texts, generation of sporting event recaps, generation of text for entertainment purposes, etc.), …


Photo by Lee Campbell on Unsplash

Pitchfork is an online music review website that has been actively reviewing albums and individual songs since the mid-90's. The site started as a platform for reviewing independent, lo-fi, and underground artists that typically did not receive mainstream attention from mainstream music review publications. Over time, it gradually expanded to include reviews of more mainstream releases as well as classic album reissues. It still maintains a reputation as both a tastemaker and a bastion of musical snobbery, although currently it more closely resembles a traditional music review platform than it did in its early days.

The site gained popularity in…


Photo by Eilis Garvey on Unsplash

This article provides a basic introduction to audio classification using deep learning. We will build a Convolutional Neural Network (CNN) that takes Mel spectrograms generated from the UrbanSound8K dataset as input and attempts to classify each audio file based on human annotations of the files. Code for this article can be found in this Git repository.

Audio Classification

Audio classification describes the process of using machine learning algorithms to analyze raw audio data and identify the type of audio data that is present. In most applications, this is done using annotated data with target classes selected by human listeners.

There is a…


An Introduction to Time Series Analysis and Forecasting Using Python

Photo by Matthew Henry on Unsplash

Time Series Analysis & Forecasting

Time series data refers to a set of observations collected at different points in time, often at a regular interval. Analysis of time series data is crucial in a wide array of industries, including finance, epidemiology, meteorology, social sciences and many others.

In this article, we will look at the following topics within the domain of time series analysis and forecasting:

  • Exploratory Analysis
  • Visualizations
  • Seasonal Decomposition
  • Stationarity
  • ARIMA Models

These topics provide a very basic introduction to time series analysis and forecasting. More advanced forecasting methods will be discussed in…


Photo by Eduardo Santos on Unsplash

Project Overview

The use of machine learning and artificial intelligence for detection and prevention of crimes has increased dramatically over the past few decades. Law enforcement agencies have access to large volumes of crime data stretching back decades, and they are looking for ways that this data can be leveraged for prediction of crime patterns and types.

This project focuses on the use of historical crime data in San Francisco to predict the category of a crime event given only information about the event’s time and location. The project is based on the San Francisco Crime Classification Kaggle Competition, which concluded in…

Scott Duda

Water/Wastewater Engineer ♦ Data Nerd

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store