Data Analysis, Programming

Performing EDA on NY Taxi Fare Dataset to see PySpark in action — because cloud computing is the next big thing!

Photo by C Dustin on Unsplash

Introducing the Technologies

What is Spark and PySpark — Spark SQL and Spark MLlib?

From Wikipedia, Spark by Apache is an open-source analytics engine for large-scale data processing. It enables programmers to work upon data stored in multiple clusters with inherent data parallelism and fault tolerance.

At the base of the Spark engine are Resilient Distributed Databases (RDDs) that are a set of data items maintained over a cluster of machines in a fault-tolerant manner. These RDDs were developed to overcome the limitation of the Map-Reduce paradigm that forced a linear flow of data in programs by reading from disk, mapping and reducing and writing back to disk. …


A brief overview of the application I built, in which I have employed data analysis to power my FPL team up the charts

Fantasy Premier League: A Phenomenon

Non-football or non-sports fans will ask what exactly is Fantasy Premier League (FPL)? Well, let’s start with what FPL is and then see the rules of how the game is played before diving into the code of the data analysis.

Photo by Jack Monach on Unsplash

Overview of the Game

From Wikipedia, fantasy football (and fantasy sports in general) is a game in which participants assemble an imaginary team of real life footballers (sportsmen) and score points based on those players’ actual statistical performance or their perceived contribution on the field of play. Usually, in a particular fantasy game, players are selected from one specific division in a particular country.


Everyone knows Linear Regression, but do you know Kernel Regression?

Every beginner in Machine Learning starts by studying what regression means and how the linear regression algorithm works. In fact, the ease of understanding, explainability and the vast effective real-world use cases of linear regression is what makes the algorithm so famous. However, there are some situations to which linear regression is not suited. In this article, we will see what these situations are, what the kernel regression algorithm is and how it fits into the scenario. Finally, we will code the kernel regression algorithm with a Gaussian kernel from scratch. …


The best way to dive into ML is to see it in action. Here it is!

Anyone who has studied Machine Learning(ML) can attest to being overwhelmed by the sheer amount of math, equations and symbols that comprise ML at some point of time or other. As interesting as ML is, it is very hard for a beginner to digest all the concepts being thrown at him. I believe, to understand any ML algorithm, we have to first understand the principle or intuition behind it, then see it in action to understand how it works and then understand the math behind it (if you want to master the algorithm, you can go a step ahead and…

Kunj Mehta

22. Business Analyst at Quantiphi. Data Science and Machine enthusiast. https://www.linkedin.com/in/kunjmehta

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store