Data Science Training
( 2 )  195 Enrolled
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.
Data Science Online Training
Data Science Training Videos
DATA Science Training process
All the training sessions will be provided by LIVE Online Meeting using WebEx or GoToMeeting, with oneonone trainer student Interaction in real time.
Also, we provide DATA Science prerecorded High quality real time Training Videos.
About Trainers
Our Trainers are expert in implementation projects and support projects. In the training real time scenarios will be covered which helps the job seeker to handle the projects easily.
Who should join?
 Fresher
 Consultants
 End users
 IT/Business analysts
 Project Managers
 Project team members
Data Science Training Course Content
 Introduction to Big Data Analytics
 Data and its uses – a case study (Grocery store)
 Interactive marketing using data & IoT – A case study
 Course outline, road map, and takeaways from the course
 Stages of Analytics – Descriptive, Predictive, Prescriptive, etc.
 CrossIndustry Standard Process for Data Mining
 Machine Learning project management methodology
 Data Collection – Surveys and Design of Experiments
 Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and application
 Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
 Balanced versus Imbalanced datasets
 Cross Sectional versus Time Series vs Panel / Longitudinal Data
 Batch Processing vs Real Time Processing
 Structured versus Unstructured vs SemiStructured Data
 Big vs NotBig Data
 Data Cleaning / Preparation – Outlier Analysis, Missing Values Imputation Techniques, Transformations, Normalization / Standardization, Discretization
 Sampling techniques for handling Balanced vs. Imbalanced Datasets
 What is the Sampling Funnel and its application and its components?
 Population
 Sampling frame
 Simple random sampling
 Sample
 Measures of Central Tendency & Dispersion
 Population
 Mean/Average, Median, Mode
 Variance, Standard Deviation, Range
 Measure of Skewness
 Measure of Kurtosis
 Spread of the Data
 Various graphical techniques to understand data
 Bar Plot
 Histogram
 Boxplot
 Scatter Plot
 Line Chart
 Pair Plot
 Sample Statistics
 Population Parameters
 Inferential Statistics
 Random Variable and its definition
 Probability & Probability Distribution
 Continuous Probability Distribution / Probability Density Function
 Discrete Probability Distribution / Probability Mass Function
 Normal Distribution
 Standard Normal Distribution / Z distribution
 Z scores and the Z table
 QQ Plot / Quantile – Quantile plot
 Sampling Variation
 Central Limit Theorem
 Sample size calculator
 Confidence interval – concept
 Confidence interval with sigma
 Tdistribution / Student’st distribution
 Confidence interval
 Population parameter with Standard deviation known
 Population parameter with Standard deviation not known
 A complete recap of Statistics
 Formulating a Hypothesis
 Choosing Null and Alternative Hypothesis
 Type I or Alpha Error and Type II or Beta Error
 Confidence Level, Significance Level, Power of Test
 Comparative study of sample proportions using Hypothesis testing
 2 Sample ttest
 ANOVA
 2 Proportion test
 ChiSquare test
Scatter diagram

 Correlation analysis
 Correlation coefficient
 Ordinary least squares
 Principles of regression
 Simple Linear Regression
 Exponential Regression, Logarithmic Regression, Quadratic or Polynomial Regression
 Confidence Interval versus Prediction Interval
 Heteroscedasticity / Equal Variance
LINE assumption

 Linearity
 Independence
 Normality
 Equal Variance / Homoscedasticity
 Collinearity (Variance Inflation Factor)
 Multiple Linear Regression
 Model Quality metrics
 Deletion Diagnostics
 Understanding Overfitting (Variance) vs. Underfitting (Bias)
 Generalization error and Regularization techniques
 Different Error functions or Loss functions or Cost functions
 Lasso Regression
 Ridge Regression
 Principles of Logistic regression
 Types of Logistic regression
 Assumption & Steps in Logistic regression
 Analysis of Simple logistic regression results
 Multiple Logistic regression
 Confusion matrix
 False Positive, False Negative
 True Positive, True Negative
 Sensitivity, Recall, Specificity, F1
 Receiver operating characteristics curve (ROC curve)
 Precision Recall (PR) curve
 Lift charts and Gain charts
 Logit and LogLikelihood
 Category Baselining
 Modeling Nominal categorical data
 Handling Ordinal Categorical Data
 Interpreting the results of coefficient values
 Poisson Regression
 Poisson Regression with Offset
 Negative Binomial Regression
 Treatment of data with Excessive Zeros
 Zeroinflated Poisson
 Zeroinflated Negative Binomial
 Hurdle Model
 Hierarchical • Supervised vs Unsupervised learning • Data Mining Process • Hierarchical Clustering / Agglomerative Clustering • Dendrogram • Measure of distance
Numeric
Euclidean, Manhattan, Mahalanobis
Categorical
Binary Euclidean
 Simple Matching Coefficient
 Jaquard’s Coefficient
Mixed
 Gower’s General Dissimilarity Coefficient
Types of Linkages
 Single Linkage / Nearest Neighbour
 Complete Linkage / Farthest Neighbour
 Average Linkage
 Centroid Linkage
KMeans Clustering
Measurement metrics of clustering
 Within the Sum of Squares
 Between the Sum of Squares
 Total Sum of Squares
Choosing the ideal K value using Scree Plot / Elbow Curve
Other Clustering Techniques
 KMedians
 KMedoids
 KModes
 Clustering Large Application (CLARA)
 Partitioning Around Medoids (PAM)
 Densitybased spatial clustering of applications with noise (DBSCAN)
 Why Dimension Reduction
 Advantages of PCA
 Calculation of PCA weights
 2D Visualization using Principal components
 Basics of Matrix Algebra
 Factor Analysis
 What is Market Basket / Affinity Analysis
Measure of Association

 Support
 Confidence
 Lift Ratio
 Apriori Algorithm
 Sequential Pattern Mining
 Userbased Collaborative Filtering
 A measure of distance/similarity between users
 Driver for Recommendation
 Computation Reduction Techniques
 Search based methods/Item to Item Collaborative Filtering
 SVD in recommendation
 The vulnerability of recommendation systems
 Definition of a network (the LinkedIn analogy)
The measure of Node strength in a Network
 Degree centrality
 Closeness centrality
 Eigenvector centrality
 Adjacency matrix
 Betweenness centrality
 Cluster coefficient
Introduction to Google page ranking
 Deciding the K value
 Thumb rule in choosing the K value
 Building a KNN model by splitting the data
 Checking for Underfitting and Overfitting in KNN
 Generalization and Regulation Techniques to avoid overfitting in KNN
 Elements of classification tree – Root node, Child Node, Leaf Node, etc.
 Greedy algorithm
 Measure of Entropy
 Attribute selection using Information gain
 Ensemble techniques – Stacking, Boosting and Bagging
 Decision Tree C5.0 and understanding various arguments
 Checking for Underfitting and Overfitting in Decision Tree
 Generalization and Regulation Techniques to avoid overfitting in Decision Tree
 Random Forest and understanding various arguments
 Checking for Underfitting and Overfitting in Random Forest
 Generalization and Regulation Techniques to avoid overfitting in Random Forest
 Overfitting
 Underfitting
 Pruning
 Boosting
 Bagging or Bootstrap aggregating
 AdaBoost / Adaptive Boosting Algorithm
 Checking for Underfitting and Overfitting in AdaBoost
 Generalization and Regulation Techniques to avoid overfitting in AdaBoost
 Gradient Boosting Algorithm<
 Checking for Underfitting and Overfitting in Gradient Boosting
 Generalization and Regulation Techniques to avoid overfitting in Gradient Boosting
 Extreme Gradient Boosting (XGB) Algorithm
 Checking for Underfitting and Overfitting in XGB
 Generalization and Regulation Techniques to avoid overfitting in XGB
 Sources of data
 Bag of words
 Preprocessing, corpus Document Term Matrix (DTM) & TDM
 Word Clouds
Corpus level word clouds
 Sentiment Analysis
 Positive Word clouds
 Negative word clouds
 Unigram, Bigram, Trigram
 Semantic network
 Clustering
 Extract user reviews of the product/services from Amazon, Snapdeal and trip advisor
 Install Libraries from Shell
 Extraction and text analytics in Python
 LDA / Latent Dirichlet Allocation
 Topic Modelling
 Sentiment Extraction
 Lexicons & Emotion Mining
 Probability – Recap
 Bayes Rule
 Naïve Bayes Classifier
 Text Classification using Naive Bayes
 Checking for Underfitting and Overfitting in Naive Bayes
 Generalization and Regulation Techniques to avoid overfitting in Naive Bayes
 Neurons of a Biological Brain
 Artificial Neuron
 Perceptron
 Perceptron Algorithm
 Use case to classify a linearly separable data
 Multilayer Perceptron to handle nonlinear data
 Integration functions
 Activation functions
 Weights
 Bias
 Learning Rate (eta) – Shrinking Learning Rate, Decay Parameters
 Error functions – Entropy, Binary Cross Entropy, Categorical Cross Entropy, KL Divergence, etc.
 Artificial Neural networks
 ANN structure
 Gradient Descent Algorithms – Batch GD, SGD, Minibatch SGD
 Backward propagation
 Network Topology
 Principles of Gradient descent (Manual Calculation)
 Momentum, Nesterov Momentum
 Optimization methods: Adam, Adagrad, Adadelta, RMSProp
 CNN – Convolutional Neural Network
 RNN – Recurrent Neural Network
 Classification Hyperplanes
 Best fit “boundary”
 Kernel Tricks – Linear, RBF, etc.
 NonLinear Kernel Tricks
 Avoiding overfitting in SVM
 Regularization techniques in SVM
 Examples of Survival Analysis
 Time to event
 Censoring
 Survival, Hazard, Cumulative Hazard Functions
 Introduction to Parametric and nonparametric functions
 Introduction to time series data
 Steps to forecasting
 Components to time series data
 Scatter plot and Time Plot
 Lag Plot
 ACF – AutoCorrelation Function / Correlogram
 Visualization principles
 Naïve forecast methods
 Errors in the forecast and it metrics – ME, MAD, MSE, RMSE, MPE, MAPE
 ModelBased approaches
 Linear Model
 Exponential Model
 Quadratic Model
 Additive Seasonality
 Multiplicative Seasonality
 ModelBased approaches Continued
 AR (AutoRegressive) model for errors
 Random walk
 ARMA (AutoRegressive Moving Average), Order p and q
 ARIMA (AutoRegressive Integrated Moving Average), Order p, d, and q
 A datadriven approach to forecasting
Smoothing techniques
 Moving Average
 Exponential Smoothing
 Holt’s / Double Exponential Smoothing
 Winters / HoltWinters
 Deseasoning and detrending
 Econometric Models
 Forecasting using Python
 Forecasting using R
Srinivasa Rao – :
I had completed my course Data science with Python in Nextit vision. NextIT Vision is the best place to excel in the next phase of your career. The trainers and faculty here were very friendly and comfortable. My trainer had explained each and every topic clearly and he won’t move to another topic until
Archana Gaddam – :
Good class, amazing learning experience. Very clear explanation of all the topics with clear practical examples and demos.