Authormburakergenc

Data Preprocessing in Python

In this blog post, we’re going to take a look at the common Data Preprocessing tools used in Python. Download the “Titanic” data to follow along.

In this blog post, we’re going to take a look at the common Data Preprocessing tools used in Python. Download the “Titanic” data to follow along.

Libraries we will use;

  • Pandas
  • Numpy
  • Sklearn
Continue reading

Malware Detection Through Intelligence

With the exponentially increasing attacks on both enterprise and community networks, Malware Detection is a growing problem, especially on mobile platforms. Since the official app-stores have millions of mobile apps, it is almost impossible to examine each of them manually for malicious behavior. Traditional approaches to malware detection are based on manual methods such as examining the behavior and/or decompiled code of malware programs in order to design malware signatures by hand. However, these methods are not scalable to a large number of applications and a new malware can be designed to evade existing signatures. For that reason, recently there have been so many works on automatic malware detection using Machine Learning techniques.

Continue reading

Predicting House Prices Using Linear Regression along with GraphLab Create

In this post we are going to talk about Linear Regression which is one of the most widely used statistical tools in Machine Learning. The idea is very simple. We have some features and we want to know how our predictions change as we change the value of features. Features are  the square footage of the house, # of bathrooms, # of bedrooms etc. and observation is the price of the house.

Continue reading

Getting Started with GraphLab and SFrames

GraphLab is a Python library that gives many out of the box features to use. It is a great library to learn the Machine Learning foundations. Many courses out there teaches several algorithms with a bunch of tools, and non-real world examples. However, if you are new to Machine Learning, GraphLab(powered by DATO) is a great library to start.

Continue reading

The Practicing Mind – Book Summary

The Book in a few sentences

In school, while learning to program or learning to drive or when making a speech we learned the importance of paying attention from a very young age. However nowadays we are more focused on multitasking, rather than focusing on what we are doing. We work more on the result instead of the process of achieving that result. Doing what we should be doing and being completely aware of what we are doing will lead us to reach a level of performance that feels complete.  Focusing on a task and practicing it(actively practicing) is very different and beneficial from passively learning.

Continue reading

Introduction to Artificial Intelligence

Artificial Intelligence or AI is one the newest fields in science and engineering. Therefore a student in chemistry, physics or biology may feel that all the good ideas have already been taken by Galileo, Einstein or other famous scholars. On the other hand, Artificial Intelligence is very open to new Einsteins and Galileos.

Continue reading

Stanford University Machine Learning Course – Problem on Sending Assignments with Octave

Note: In January 2017, the patches were built-in to the programming exercise scripts, and the patch files are not longer needed, and they are no longer available.

Since Octave 4.2.1 is now being distributed, there is no reason to maintain support for Octave 4.0.0. It is several years obsolete. Students should at the very least install Octave 4.0.1 or newer.

If you are using the latest octave 4.0.0, you will most likely face with an error when you submit your homework. The error probably will look like this;

Submission failed: unexpected error: urlread: HTTP response code said error

The solution is very easy you don’t need to worry about it. Click this link and download the patch.

Extract the contents of the patch. The folder structure is like ML_Octave_400_patch -> lib.

Copy the files under ML_Octave_400_patch -> lib, and paste them into machine-learning-ex1 -> ex1 -> lib.

Make sure you only override the makeValidFieldName.m, xxNumToHexStr.m and jsonlab -> loadjson.m.

Rest of the files will stay as they were downloaded. Now type exit in terminal and exit octave. Log back into octave and try to submit your work. If you didn’t miss any parts the solution should work very well.

Octave Plotting Problems on MAC OS

Here is the command that I tried to plot in Octave;

w = -6 - sqrt(10) * (rand(1,10000))
hist(w)

and this was the error that I got;

error: __init_gnuplot__: the gnuplot program is not available, see 'gnuplot_binary'
error: called from
    graphics_toolkit at line 85 column 5
    figure at line 86 column 7
    gcf at line 63 column 9
    gca at line 56 column 7
    hist at line 174 column 11
error: failed to load gnuplot graphics toolkit
error: called from
    figure at line 86 column 7
    gcf at line 63 column 9
    gca at line 56 column 7
    hist at line 174 column 11
error: base_graphics_toolkit::initialize: invalid graphics toolkit
error: called from
    figure at line 86 column 7
    gcf at line 63 column 9
    gca at line 56 column 7
    hist at line 174 column 11
error: evaluating argument list element number 1
error: called from
    gca at line 56 column 7
    hist at line 174 column 11

Then, I created .octaverc file in my home directory with the following content.

setenv GNUTERM x11

After that, I rerun the hist(w) command in Octave, and this time it printed the following error;

set terminal aqua enhanced title "Figure 2" size 560 420  font "*,6.66667" dashlength 1
                      ^
line 0: unknown or ambiguous terminal type; type just 'set terminal' for a list

Finally, I reinstalled the gnuplot package using homebrew with the following commands;

brew uninstall gnuplot
brew install gnuplot --with-aquaterm

After that restart octave and try to plot your function again. It should work fine.
If you face any problems please comment below.

Why Should You Learn to Program?

Writing programs or (known as programming) is a very rewarding and fun way of writing a poetry. Nope maybe not writing a poetry but a fun and rewarding activity. 🙂

There are many reasons for writing programs. You can make your living or you can solve a difficult data analysis problem just to have fun or even after learning enough you can help others to solve difficult problems. Actually, we are pretty much surrounded by computers in our daily lives. Most of us using laptops, desktops or cellphones every day. We can think of those devices as personal assistants because they can take care of many things for us with just a few clicks.

There are operating systems and set of applications that turns hardwares into a Personal Digital Assistant. These operating systems and applications are added to hardware by programmers.

The only thing that we need to do is knowing how to talk with those assistants. We need to know what language they are speaking. After learning how to talk with these devices we could tell a computer to do tasks on our behalf.

Data Mining

What is data mining?

Data mining is the process of analyzing data from different perspectives in order to simplify a massive dataset and taking a beneficial summary from it. (Discovery of models for data.)

What is a data model?

Steps of approaching to a data modeling.

  1. Summarizing the data.
  2. Extracting the most featured data and ignoring the rest.

Summarization

PageRank by Google

One of the best examples of summarization is the PageRank idea by Google. This is a form of web mining.

Entire web structure is summarized into numbers(0-10) with that idea. Pagerank number determines the importance of a webpage and the higher number the webpage has, the higher chance typical searchers would like that page returned as an answer to their search query.

Will be continued..