CategoryMachine Learning

Data Preprocessing in Python

In this blog post, we’re going to take a look at the common Data Preprocessing tools used in Python. Download the “Titanic” data to follow along.

In this blog post, we’re going to take a look at the common Data Preprocessing tools used in Python. Download the “Titanic” data to follow along.

Libraries we will use;

  • Pandas
  • Numpy
  • Sklearn
Continue reading

Malware Detection Through Intelligence

With the exponentially increasing attacks on both enterprise and community networks, Malware Detection is a growing problem, especially on mobile platforms. Since the official app-stores have millions of mobile apps, it is almost impossible to examine each of them manually for malicious behavior. Traditional approaches to malware detection are based on manual methods such as examining the behavior and/or decompiled code of malware programs in order to design malware signatures by hand. However, these methods are not scalable to a large number of applications and a new malware can be designed to evade existing signatures. For that reason, recently there have been so many works on automatic malware detection using Machine Learning techniques.

Continue reading

Predicting House Prices Using Linear Regression along with GraphLab Create

In this post we are going to talk about Linear Regression which is one of the most widely used statistical tools in Machine Learning. The idea is very simple. We have some features and we want to know how our predictions change as we change the value of features. Features are  the square footage of the house, # of bathrooms, # of bedrooms etc. and observation is the price of the house.

Continue reading

Getting Started with GraphLab and SFrames

GraphLab is a Python library that gives many out of the box features to use. It is a great library to learn the Machine Learning foundations. Many courses out there teaches several algorithms with a bunch of tools, and non-real world examples. However, if you are new to Machine Learning, GraphLab(powered by DATO) is a great library to start.

Continue reading

Stanford University Machine Learning Course – Problem on Sending Assignments with Octave

Note: In January 2017, the patches were built-in to the programming exercise scripts, and the patch files are not longer needed, and they are no longer available.

Since Octave 4.2.1 is now being distributed, there is no reason to maintain support for Octave 4.0.0. It is several years obsolete. Students should at the very least install Octave 4.0.1 or newer.

If you are using the latest octave 4.0.0, you will most likely face with an error when you submit your homework. The error probably will look like this;

Submission failed: unexpected error: urlread: HTTP response code said error

The solution is very easy you don’t need to worry about it. Click this link and download the patch.

Extract the contents of the patch. The folder structure is like ML_Octave_400_patch -> lib.

Copy the files under ML_Octave_400_patch -> lib, and paste them into machine-learning-ex1 -> ex1 -> lib.

Make sure you only override the makeValidFieldName.m, xxNumToHexStr.m and jsonlab -> loadjson.m.

Rest of the files will stay as they were downloaded. Now type exit in terminal and exit octave. Log back into octave and try to submit your work. If you didn’t miss any parts the solution should work very well.

Octave Plotting Problems on MAC OS

Here is the command that I tried to plot in Octave;

w = -6 - sqrt(10) * (rand(1,10000))
hist(w)

and this was the error that I got;

error: __init_gnuplot__: the gnuplot program is not available, see 'gnuplot_binary'
error: called from
    graphics_toolkit at line 85 column 5
    figure at line 86 column 7
    gcf at line 63 column 9
    gca at line 56 column 7
    hist at line 174 column 11
error: failed to load gnuplot graphics toolkit
error: called from
    figure at line 86 column 7
    gcf at line 63 column 9
    gca at line 56 column 7
    hist at line 174 column 11
error: base_graphics_toolkit::initialize: invalid graphics toolkit
error: called from
    figure at line 86 column 7
    gcf at line 63 column 9
    gca at line 56 column 7
    hist at line 174 column 11
error: evaluating argument list element number 1
error: called from
    gca at line 56 column 7
    hist at line 174 column 11

Then, I created .octaverc file in my home directory with the following content.

setenv GNUTERM x11

After that, I rerun the hist(w) command in Octave, and this time it printed the following error;

set terminal aqua enhanced title "Figure 2" size 560 420  font "*,6.66667" dashlength 1
                      ^
line 0: unknown or ambiguous terminal type; type just 'set terminal' for a list

Finally, I reinstalled the gnuplot package using homebrew with the following commands;

brew uninstall gnuplot
brew install gnuplot --with-aquaterm

After that restart octave and try to plot your function again. It should work fine.
If you face any problems please comment below.

Data Mining

What is data mining?

Data mining is the process of analyzing data from different perspectives in order to simplify a massive dataset and taking a beneficial summary from it. (Discovery of models for data.)

What is a data model?

Steps of approaching to a data modeling.

  1. Summarizing the data.
  2. Extracting the most featured data and ignoring the rest.

Summarization

PageRank by Google

One of the best examples of summarization is the PageRank idea by Google. This is a form of web mining.

Entire web structure is summarized into numbers(0-10) with that idea. Pagerank number determines the importance of a webpage and the higher number the webpage has, the higher chance typical searchers would like that page returned as an answer to their search query.

Will be continued..

Linear Regression with One Variable

Remember that in “Regression Problems” we are taking input variables and trying to map the output on to a “continuous” expected result function.

Linear Regression with one variable is also called as “univariate linear regression”. This is just more fancy way to call it.

Linear regression with one variable is used when you want to predict a single output value from a single input value. That means you only have one x as input(attribute) and one y as output.

Continue reading

Getting Started with Machine Learning

What is Machine Learning?

As an informal definition according to Arthur Samuel machine learning is;

The field of study that gives computers the ability to learn without being explicitly programmed.

As a modern definition Tom Mitchell describes

Machine learning as a computer program is said to learn from experience E with respect to some class of tasks and performance measure P, if its performance at tasks in T, as measured by P, with experience E.

Let’s give explain it with playing checkers example;

Continue reading