Course Content

Python for Data Science

Module 1: Introduction to Python

  • Overview of Python
  • The Companies using Python
  • Different Applications where Python is used
  • Discuss Python Scripts on UNIX/Windows
  • Values, Types, Variables
  • Operands and Expressions
  • Conditional Statements
  • Loops
  • Command Line Arguments
  • Writing to the screen
  • Hands On/Demo:
    • Creating “Hello World” code
    • Variables
    • Demonstrating Conditional Statements
    • Demonstrating Loops
  • Skills:
    • Fundamentals of Python programming

Module 2: Sequences and File Operations

  • Python files I/O Functions
  • Numbers
  • Strings and related operations
  • Tuples and related operations
  • Lists and related operations
  • Dictionaries and related operations
  • Sets and related operations
  • Hands On/Demo:
    • Tuple - properties, related operations, compared with a list
    • List - properties, related operations
    • Dictionary - properties, related operations
    • Set - properties, related operations
  • Skills:
    • File Operations using Python
    • Working with data types of Python

Module 3: Deep Dive – Functions, OOPs, Modules, Errors and Exceptions

  • Functions
  • Function Parameters
  • Global Variables
  • Variable Scope and Returning Values
  • Lambda Functions
  • Object-Oriented Concepts
  • Standard Libraries
  • Modules Used in Python
  • The Import Statements
  • Module Search Path
  • Package Installation Ways
  • Errors and Exception Handling
  • Handling Multiple Exceptions
  • Hands On/Demo:
    • Functions - Syntax, Arguments, Keyword Arguments, Return Values
    • Lambda - Features, Syntax, Options, Compared with the Functions
    • Sorting - Sequences, Dictionaries, Limitations of Sorting
    • Errors and Exceptions - Types of Issues, Remediation
    • Packages and Module - Modules, Import Options, sys Path
  • Skills:
    • Error and Exception management in Python
    • Working with functions in Python

Module 4: Introduction to NumPy, Pandas and Matplotlib

  • NumPy - arrays
  • Operations on arrays
  • Indexing slicing and iterating
  • Reading and writing arrays on files
  • Pandas - data structures & index operations
  • Reading and Writing data from Excel/CSV formats into Pandas
  • matplotlib library
  • Grids, axes, plots
  • Markers, colours, fonts and styling
  • Types of plots - bar graphs, pie charts, histograms
  • Contour plots
  • Hands On/Demo:
    • NumPy library- Creating NumPy array, operations performed on NumPy array
    • Pandas library- Creating series and dataframes, Importing and exporting data
    • Matplotlib - Using Scatterplot, histogram, bar graph, pie chart to show information, Styling of Plot
  • Skills:
    • Probability Distributions in Python
    • Python for Data Visualization

Module 5: Data Manipulation

  • Basic Functionalities of a data object
  • Merging of Data objects
  • Concatenation of data objects
  • Types of Joins on data objects
  • Exploring a Dataset
  • Analyzing a dataset
  • Hands On/Demo:
    • Pandas Function- Ndim(), axes(), values(), head(), tail(), sum(), std(), iteritems(), iterrows(), itertuples()
    • GroupBy operations
    • Aggregation
    • Concatenation
    • Merging
    • Joining
  • Skills:
    • Python in Data Manipulation

Module 6: Introduction to Machine Learning with Python

  • Python Revision (numpy, Pandas, scikit learn, matplotlib)
  • What is Machine Learning?
  • Machine Learning Use-Cases
  • Machine Learning Process Flow
  • Machine Learning Categories
  • Linear regression
  • Gradient descent
  • Hands On/Demo:
    • Linear Regression – Boston Dataset
  • Skills:
    • Machine Learning concepts
    • Machine Learning types
    • Linear Regression Implementation

Module 7: Supervised Learning - I

  • What are Classification and its use cases?
  • What is Decision Tree?
  • Algorithm for Decision Tree Induction
  • Creating a Perfect Decision Tree
  • Confusion Matrix
  • What is Random Forest?
  • Hands On/Demo:
    • Implementation of Logistic regression
    • Decision tree
    • Random forest
  • Skills:
    • Supervised Learning concepts
    • Implementing different types of Supervised Learning algorithms
    • Evaluating model output

Module 8: Dimensionality Reduction

  • Introduction to Dimensionality
  • Why Dimensionality Reduction
  • PCA
  • Factor Analysis
  • Scaling dimensional model
  • LDA
  • Hands-On/Demo:
    • PCA
    • Scaling
  • Skills:
    • Implementing Dimensionality Reduction Technique

Module 9: Supervised Learning - II

  • What is Naïve Bayes?
  • How Naïve Bayes works?
  • Implementing Naïve Bayes Classifier
  • What is Support Vector Machine?
  • Illustrate how Support Vector Machine works?
  • Hyper parameter Optimization
  • Grid Search vs Random Search
  • Implementation of Support Vector Machine for Classification
  • Hands-On/Demo:
    • Implementation of Naïve Bayes, SVM
  • Skills:
    • Supervised Learning concepts
    • Implementing different types of Supervised Learning algorithms
    • Evaluating model output

Module 10: Unsupervised Learning

  • What is Clustering & its Use Cases?
  • What is K-means Clustering?
  • How does K-means algorithm work?
  • How to do optimal clustering
  • What is C-means Clustering?
  • What is Hierarchical Clustering?
  • How Hierarchical Clustering works?
  • Hands-On/Demo:
    • Implementing K-means Clustering
    • Implementing Hierarchical Clustering
  • Skills:
    • Unsupervised Learning
    • Implementation of Clustering – various types

Module 11: Association Rules Mining and Recommendation Systems

  • What are Association Rules?
  • Association Rule Parameters
  • Calculating Association Rule Parameters
  • Recommendation Engines
  • How does Recommendation Engines work?
  • Collaborative Filtering
  • Content-Based Filtering
  • Hands-On/Demo:
    • Apriori Algorithm
    • Market Basket Analysis
  • Skills:
    • Data Mining using python
    • Recommender Systems using python

Module 12: Reinforcement Learning

  • What is Reinforcement Learning
  • Why Reinforcement Learning
  • Elements of Reinforcement Learning
  • Exploration vs Exploitation dilemma
  • Epsilon Greedy Algorithm
  • Markov Decision Process (MDP)
  • Q values and V values
  • Q – Learning
  • α values
  • Hands-On/Demo:
    • Calculating Reward
    • Discounted Reward
    • Calculating Optimal quantities
    • Implementing Q Learning
    • Setting up an Optimal Action
  • Skills:
    • Implement Reinforcement Learning using python
    • Developing Q Learning model in python

Module 13: Time Series Analysis

  • What is Time Series Analysis?
  • Importance of TSA
  • Components of TSA
  • White Noise
  • AR model
  • MA model
  • ARMA model
  • ARIMA model
  • Stationarity
  • ACF & PACF
  • Hands on/Demo:
    • Checking Stationarity
    • Converting a non-stationary data to stationary
    • Implementing Dickey-Fuller Test
    • Plot ACF and PACF
    • Generating the ARIMA plot
    • TSA Forecasting
  • Skills:
    • TSA in Python

Module 14: Model Selection and Boosting

  • What is Model Selection?
  • The need for Model Selection
  • Cross-Validation
  • What is Boosting?
  • How Boosting Algorithms work?
  • Types of Boosting Algorithms
  • Adaptive Boosting
  • Hands on/Demo:
    • Cross-Validation
    • AdaBoost
  • Skills:
    • Model Selection
    • Boosting algorithm using python