Moving From Python to R

Its been 3 months since I started with machine learning. Coming from a web development background, my first choice of tool for performing data analysis was python. Primary reason why I chose python was the familiarity with language. But soon enough I started to realize its not an optimal environment for data analytics. Because my programming background heavily influenced my workflow, I kept shuffling across multiple tools; terminal for running the script, sublime for writing python code and matplotlib chart viewers for visualizing the data. I wanted a better workflow to work with data. A workflow that will allow me to easily try out different visualizations, navigate through data without writing much code and keep a history of operations I performed.

Since I knew R is the most used programming language for data science, I decided to give it a try. I downloaded R distribution from the website. I started by executing some commands on the terminal, initial impressions were good. However, I could not find more efficient way to explore the data than earlier. Then I came across RStudio. RStudio is awesome. It allowed me pin down all points of interest within a single window. It shows me plots, help for R functions, data explorer and simplitstic editor for writing code. I loved the way I could execute commands on RConsole and see the output immediately on the plots tab. This allowed me to keep iterating quickly. Later, I started learning more about the language. RStudio also shows the help for functions in R, it has been the most useful feature for me. I’m discovering more and more as I learn about the language and ecosystem of packages. But as of now, R is definitely a way to go for my machine learning explorations.


Getting Started With Machine Learning

Machine learning is without a doubt the most interesting field I have come across so far. However, I found it extremely difficult to get started with the basic concepts of machine learning without spending hours and hours scrolling through world wide web. Coming from a web development background, I found it hard to get started with statistics, probability and other concepts related with machine learning. In this post, I will try and list resources that helped me and might help someone else in similar situation.

Courses

  • Andrew Ng’s coursera course - coursera
    If you haven’t heard of this course yet, just go ahead and take it.
    This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition.

  • Andrew Ng’s stanford course CS 229 - youtube
    This is a much detailed version than the coursera course.
    This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.

  • Machine learning course CS 156 by Yaser Abu-Mostafa - youtube
    This course helps clear many basic concepts. Also, I liked the presentation by Yaser Abu-Mostafa!
    This is an introductory course by Caltech Professor Yaser Abu-Mostafa on machine learning that covers the basic theory, algorithms, and applications.

  • Probabilistic graphical models - coursera
    A bit advanced course, but worth taking.
    In this class, you will learn the basics of the PGM representation and how to construct them, using both human knowledge and machine learning techniques.

  • In-depth introduction to machine learning - r-bloggers
    Awesome introduction, This lecture series is provided stanford professors Dr. Hastie and Dr. Tibshirani.

  • Mathematical Biostatistics Boot Camp 1 - coursera
    This class helps refresh basics of mathematics.
    Topics include probability, random variables, distributions, expectations, variances, independence, conditional probabilities, likelihood and some basic inferences based on confidence intervals.

Web resources

Language

Even though language is more of a personal choice, machine learning algorithms are hard to implement in every language out there. Thus, some languages are favoured by ML community. Primary languages used as per kaggle, and ML mastery are R, matlab and python. I personally prefer python, because it nicely fits in web stack. But R and matlab are surely great tools as well.

For python lovers, SciPy is a place to go. In case you’re new to python, I recommend learning the language first.

For learning R, there are loads of videos on YouTube. As a beginner in R, it is hard to find one that works for you. I found R Programming Tutorials by MarinStatsLectures quite helpful.

Thank you for reading and I hope this has been an informative post.


Passion and Pragmatism

In the world of startups, passion is considered as the most important quality of founders. I wondered for long time, why passion is so important? Why can’t someone be just good at what they do and figure out the rest? I thought the most important thing was to earn money. Coming from India where most of the business is software services, I saw making money is essentially putting more people on the project. Where does passion come in into this situation? Can you really be passionate about increasing the team size? Hire more people? Without understanding what their roles and functions are? That did not make much sense to me. So I dug deep, If companies in India are founded by the sole purpose of making money, why on earth someone would care if you are passionate about it or not? After coming to Silicon Valley I noticed many people talking their jaws off about passion. Any random act of writing code is coupled with passion, even if a programmer writes a program that adds two integers and nothing else, he must have done it passion. Now this situation was even more mysterious and confusing for me.

I could not see the any reasoning in putting passion first for literaly everything. I thought of it as another Silicon Valley idiosyncrasy. Afterwards, I got the opportunity to meet folks who are doing startups on a regular basis. Without a doubt, these guys were extremely passionate about their products. Everyone of them was so convinced that their product is going to change the world, they forgot the simple fact that 90% of companies in silicon valley fail. Now things got even more interesting at this point. Everyone is passionate, everyone loves their product, then why people can’t build great companies? If everyone loves their product so much and everyone here is passionate about their product, why can’t everyone succeed? Isn’t passion the most important thing? I tried to understand this reasoning for a while. One answer that might be probable is, if the company failed, founders were not passionate enough. This sounds like pure bullshit to me, I feel passion is binary, either you are passionate about something or you are not. End of story.

Bewildered with this situation, I started thinking that there must be some parameter I am missing. Continuing my search on how to build great companies, I came across the concept of lean startup. The principles on which lean startup focus on reflect one simple fact - Practicality. This struck me as the most important aspect in day to day happenings in life. Pragmatism is probably the most desired skill, one has to be practical in his personal and professional life. Similarly, in startups it is important to focus on what is right more than on what you’re passionate about, and practicality is a tool to get there. It is extremely common to be passionate about the wrong idea, and lean startup methodologies teach us how to practically determine vital components of starting a company and shaping ideas. However, pragmatism alone is not sufficient. Pragmatism fails to be the fuel that will drive founder’s vision. It is extremely hard to start and build a company from idea. And passion is the thing that will keep you going. Coupled with passion, pragmatism can make a huge impact.


Adapting to Changes

Change is the inevitable part of life. No matter what you do, situations around you are destined to change. Change can be perceived as good or bad depending upon the situation. I love this quote, it simply outlines how short-lived your plans could be, eventually resulting in change of situations.

Environment eats your plans for breakfast

Once you realize that change is inevitable, you should start looking for ways to best adapt to it. In my personal experience, I follow these guidelines while dealing with change.

Change is neutral, no hard feelings
Attaching sentiment to a change will take away the evaluation part of the process. I personally believe categorizing change is okay, but letting it affect your behavior is bad.

Change is inevitable, plan for it
If you think hard and boil down your plans, they will always come down to two alternatives; desirable and not desirable. Change might impose a switch of tracks. So be prepared for it.

Focus on the next thing, what had to happen, has happened.
Dwelling too much on the past often distorts the ability to think straight. An individual has to be practical while perceiving changes in life. Life is only gonna go forward, so better focus on it.

This set of guidelines is bound to evolve with time. If there are any other factors you can imagine, suggestions are most welcome :)


Herds on a Mountain Slope

Humans are competitive creatures, we always like to move forward in life. Comparing self progress with others is one way to keep the competitive spirit alive, other is to compare self progress. The comparison of self progress usually is with respect to time and position. Over the years, I observed that each year I think I slightly improved myself as compared to last year. This happens almost always, if I do forward interpolation of this thought; I realize that even though I am better than last year, I am stil preparing myself for the next one. In other words, I am better than last year but probably not as much I would be the year next.

Other observation I made is everyone has role models, and these role models also have their role models. This chain keeps on going, it may or may not be circular, but chain keeps on going. Simply put, at each point in time, almost every determined person has a place where he wants to be and where he currently is with respect to time. Considering the betterment of a being as upward progress, we can imagine the entire human community split in herds and standing on mountain slope. The herds above your position are where you want move eventually and herds below you is where you have been.