This repository was archived by the owner on Sep 12, 2018. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathIntroduction to Machine Learning using Python.rtf
More file actions
111 lines (87 loc) · 11.6 KB
/
Introduction to Machine Learning using Python.rtf
File metadata and controls
111 lines (87 loc) · 11.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}{\f1\fnil\fcharset2 Symbol;}}
{\colortbl ;\red0\green0\blue255;}
{\*\generator Riched20 10.0.16299}\viewkind4\uc1
\pard\sa200\sl276\slmult1\qc\b\f0\fs36\lang9 Introduction to Machine Learning using Python\par
\pard\sa200\sl276\slmult1\qj\b0\fs24 As the title suggests, this article aims the newbie developers interested to be a part of this digital revolution, Data Science, who possess minimal knowledge on machine learning and Python.\par
\pard\sl240\slmult1\qj What is Machine Learning?\par
Machine learning is a field of computer science that often uses statistical techniques to give computers the ability to "learn" with data, without being explicitly programmed. It's an application of Artificial Inteligence(AI). Practically, it means, we need to feed data into an algorithm, and use it to make predictions about what might happen in the future.\par
The name '\i machine learning\i0 ' was coined in 1959 by Arthur Samuel.\par
\par
In 1997, Tom Mitchell gave a \ldblquote well-posed\rdblquote definition that has proven more useful to engineering types : \ldblquote A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.\rdblquote \par
So if you want your program to predict, for example, traffic patterns at a busy intersection (task T), you can run it through a machine learning algorithm with data about past traffic patterns (experience E) and, if it has successfully \ldblquote learned\rdblquote , it will then do better at predicting future traffic patterns (performance measure P).\par
\par
Among the different types of ML tasks, a crucial distinction is drawn between supervised and unsupervised learning:\par
Supervised machine learning: The program is \ldblquote trained\rdblquote on a pre-defined set of \ldblquote training examples\rdblquote , which then facilitate its ability to reach an accurate conclusion when given new data.\par
Unsupervised machine learning: The program is given a bunch of data and must find patterns and relationships therein.\par
\par
\pard\sl276\slmult1\qj There is a really vast range of applications which involves domains such as,\par
\pard\sl240\slmult1\qj\tab Healthcare(\i e.g., \i0 personalised treatments and medications, drug manufacturing)\par
\pard\sl240\slmult1\qj\tab Finance(\i e.g.,\i0 fraud detection)\i\par
\i0\tab Retail(\i e.g.,\i0 product recommendations, improved curomer service)\par
\tab Travel(\i e.g.,\i0 dynamic pricing like, how does Uber determine the price of your ride, and sentimental analysis, like, TripAdvisor collects information of the travellers from social media when we share \tab photos and reviews, and tries on improvising its service based on the reviews)\par
\tab Media(\i e.g.,\i0 facebook,from personalizing news feed to rendering targeted ads, machine learning is the heart of all social media platforms for their own and user benefits.)\par
\par
On the other hand, Unlike R, Python is a complete language and platform that you can use for both research and development and developing production systems. It can feel oberwhelming to choose from multiple libraries and modules.\par
\par
So, let's start with the step by step procedure to be followed by a beginner to start mahine learning using Python.\par
\par
\pard{\pntext\f1\'B7\tab}{\*\pn\pnlvlblt\pnf1\pnindent0{\pntxtb\'B7}}\fi-360\li720\sl240\slmult1\qj Our first step shall be to learn Python.\par
\pard\li710\sl240\slmult1\qj Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. It was created by Guido van Rossum during 1985- 1990. Python source code is available under the GNU General Public License (GPL).\par
\par
You can follow the following sources to leverage your Python skills :\par
{{\field{\*\fldinst{HYPERLINK https://developers.google.com/edu/python/ }}{\fldrslt{https://developers.google.com/edu/python/\ul0\cf0}}}}\f0\fs24\par
{{\field{\*\fldinst{HYPERLINK https://www.youtube.com/playlist?list=PLfZeRfzhgQzTMgwFVezQbnpc1ck0I6CQl }}{\fldrslt{https://www.youtube.com/playlist?list=PLfZeRfzhgQzTMgwFVezQbnpc1ck0I6CQl\ul0\cf0}}}}\f0\fs24\par
\par
Python has an amazing ecosystem of libraries that make machine learning easy to get started with. It's is one of the most popular and in-demand language in the job market,today. This is why, we can get plenty of resources online to learn. Learners will find hardly any difficulty.\par
\pard\sl240\slmult1\qj\par
\pard{\pntext\f1\'B7\tab}{\*\pn\pnlvlblt\pnf1\pnindent0{\pntxtb\'B7}}\fi-360\li720\sl240\slmult1\qj The next step is installing Anaconda from the given link, {{\field{\*\fldinst{HYPERLINK https://docs.anaconda.com/anaconda/install/ }}{\fldrslt{https://docs.anaconda.com/anaconda/install/\ul0\cf0}}}}\f0\fs24 . \par
\pard\sl240\slmult1\qj\tab Follow the instructions and procedure for the installation stated in the site. The Anaconda package contains the required package to explore machine learning.\par
\par
\pard{\pntext\f1\'B7\tab}{\*\pn\pnlvlblt\pnf1\pnindent0{\pntxtb\'B7}}\fi-360\li720\sl240\slmult1\qj You have to learn the basic machine learning skills.\par
\pard\sl240\slmult1\qj\tab If you want to have an overall idea about Machine learning, from the scratch, you might want to follow this crash course by Google :\par
\tab {{\field{\*\fldinst{HYPERLINK https://developers.google.com/machine-learning/crash-course/?utm_source=keyword-blog&utm_medium=referral&utm_campaign=ica-practicum&utm_term=&utm_content=mlcc }}{\fldrslt{https://developers.google.com/machine-learning/crash-course/?utm_source=keyword-blog&utm_medium=referral&utm_campaign=ica-practicum&utm_term=&utm_content=mlcc\ul0\cf0}}}}\f0\fs24\par
\par
\tab Andrew Ng's Machine Learning course is also a great option for the learners : {{\field{\*\fldinst{HYPERLINK https://www.coursera.org/learn/machine-learning }}{\fldrslt{https://www.coursera.org/learn/machine-learning\ul0\cf0}}}}\f0\fs24\par
\pard\ri-1800\sl240\slmult1\qj\par
\pard{\pntext\f1\'B7\tab}{\*\pn\pnlvlblt\pnf1\pnindent0{\pntxtb\'B7}}\fi-360\li720\ri-1800\sl240\slmult1\qj Once we are comfortable with Python and Machine Learning, we shall shift to Python libraries.\b\par
\pard\ri-1800\sl240\slmult1\qj\b0\par
\pard
{\pntext\f0 a.\tab}{\*\pn\pnlvlbody\pnf0\pnindent0\pnstart1\pnlcltr{\pntxta.}}
\fi-568\li1278\ri-1800\sl240\slmult1\qj\tx426\b Pandas\b0 : \line Our first step is to read in the data and bring out some relevant and quick summary statistics, for which we shall use Pandas library. Pandas provide data structures and data analysis tool that make manipulating data in Python much quicker and effective.\line We'll read in our data from a csv file into a Pandas dataframe, using the \b\i read_csv \b0\i0 method.\line\par
{\pntext\f0 b.\tab}\b NumPy :\line\b0 The most common data structure is called a dataframe. A dataframe is an extension of a matrix.\line A matrix is a two-dimensional data structure, with rows and columns. Matrices in Python can be used via the \b NumPy\b0 library. As in case of matrices, we can't easily access columns and rows by name, and each column has to have the same datatype,hence, we use Dataframes, which can have different datatypes in each column. It has has a lot of built-in features for analyzing data.\line\par
{\pntext\f0 c.\tab}\b Matplotlib : \b0\line Matplotlib is the main plotting infrastructure in Python, and most other plotting libraries, like \i seaborn\i0 and \i ggplot2\i0 are built on top of Matplotlib. We import Matplotlib's plotting functions with \b\i import matplotlib.pyplot as plt\b0\i0 . We can then draw and show plots.\b\line\par
\pard
{\pntext\f0 d.\tab}{\*\pn\pnlvlbody\pnf0\pnindent0\pnstart1\pnlcltr{\pntxta.}}
\fi-568\li1278\ri-1800\sl240\slmult1\qj\tx426\tx2556 Scikit-learn : \b0\par
\pard\li1278\ri-1800\sl240\slmult1\qj\tx426\tx2556 The library is built upon the SciPy (Scientific Python) that must be installed before you can use scikit-learn. This stack that includes:\par
\pard\li1278\ri-1800\sl240\slmult1\qj\tx426\tx1278\tx2556 NumPy: Base n-dimensional array package\par
SciPy: Fundamental library for scientific computing\par
Matplotlib: Comprehensive 2D/3D plotting\par
IPython: Enhanced interactive console\par
Sympy: Symbolic mathematics\par
Pandas: Data structures and analysis\par
Extensions or modules for SciPy care conventionally named SciKits. As such, the module provides learning algorithms and is named scikit-learn.\par
\pard\ri-1800\sl240\slmult1\qj\tx426\b\par
\b0 Now, as you have had the grip of the basics of Python and its libraries and Machine learning algorithms, it's always the best to start with a small end to end project. Here are the steps how to start with the project :\par
\pard
{\pntext\f0 I.\tab}{\*\pn\pnlvlbody\pnf0\pnindent0\pnstart1\pnucrm{\pntxta.}}
\fi-360\li720\ri-1800\sl240\slmult1\qj\tx426 Define a Problem\par
{\pntext\f0 II.\tab}Prepare the Data\par
{\pntext\f0 III.\tab}Evaluate the Algorithms\par
{\pntext\f0 IV.\tab}Improve the Results\b\par
{\pntext\f0 V.\tab}\b0 Present the Results\b\par
\pard\ri-1800\sl240\slmult1\qj\tx426\par
\b0 To start with Machine Learning using Python, after the above given step of installing Anaconda, first check the version of python you are using, then, \par
\par
\pard
{\pntext\f0 1.\tab}{\*\pn\pnlvlbody\pnf0\pnindent0\pnstart1\pndec{\pntxta.}}
\fi-360\li720\ri-1800\sl240\slmult1\qj\tx426\b Import the libraries, \b0 such as sklearn, pandas, matplotlib,scipy,numpy\line\par
{\pntext\f0 2.\tab}\b Load the dataset \b0 : We load the data using \i pandas\i0 .\line\par
{\pntext\f0 3.\tab}\b Summarize the dataset \b0 : This includes :\line Dimensions of the dataset : \i print(dataset.shape) \i0 \line Statistical summary : \line\i print(dataset.head(20))\line print(dataset.describe()).\line\par
{\pntext\f0 4.\tab}\b\i0 Visualization of the dataset : \b0 Data Visualization comprises of 2 kinds of plots : \i Univariate\i0 and \i Multivariate\line\i0 Univariate plots are used to understand each attribute better. In this case, we can create box-plots and histograms.\line Whereas, Multivariate plots are used to understand the relationship between each attributes better. In this case, scatter plot can describe the correlation between the attributes.\line\par
{\pntext\f0 5.\tab}\b Evaluation of some algorithm :\b0\line Firstly, seperate out the validation set from the dataset, let's say, it's 20% of the dataset,which the algorithm won't be able to see or access. \line Next, we shall split the remaining dataset into 2 parts, Training (80%) and Test(20%).Now set a scoring metric, based on which evaluation is to be done on the models, let's say, accuracy. \line This is a ratio of the number of correctly predicted instances divided by the total number of instances in the dataset multiplied by 100 to give a percentage(\i e.g., \i0 95% accuracy).\line Hence, after setting up everything, we shall build the model. \line To get a good accuracy, we need to pass the training dataset in different models, after which we can find out the accuracy of each model. Then, the model with maximum accuracy shall be considered the best suit for the given problem. \line\par
{\pntext\f0 6.\tab}\b Make Predictions : \b0 After getting the best model, we want to get an idea on the validation set. We shall run the best fit model directly on the validation set and summarize the results as a final accuracy score. It's always a good practice to keep a validation set as it shall help us find whether the training set is overfitted and giving us some overly optimistic results.\par
\pard\ri-1800\sl240\slmult1\qj\tx426\par
\par
}