#### Data Science Training

*( 2 )* | 195 Enrolled

Data science is an **interdisciplinary field** that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

## Data Science Online Training

## Data Science Training Videos

**DATA Science Training process**

All the training sessions will be provided by LIVE Online Meeting using **WebEx** or **GoToMeeting**, with one-on-one trainer student Interaction in real time.

Also, we provide DATA Science pre-recorded High quality real time Training Videos.

**About Trainers**

Our Trainers are expert in implementation projects and support projects. In the training real time scenarios will be covered which helps the job seeker to handle the projects easily.

**Who should join?**

- Fresher
- Consultants
- End users
- IT/Business analysts
- Project Managers
- Project team members

**Data Science** Training Course Content

- Introduction to Big Data Analytics
- Data and its uses – a case study (Grocery store)
- Interactive marketing using data & IoT – A case study
- Course outline, road map, and takeaways from the course
- Stages of Analytics – Descriptive, Predictive, Prescriptive, etc.
- Cross-Industry Standard Process for Data Mining
- Machine Learning project management methodology
- Data Collection – Surveys and Design of Experiments
- Data Types namely Continuous, Discrete, Categorical, Count, Qualitative, Quantitative and its identification and application
- Further classification of data in terms of Nominal, Ordinal, Interval & Ratio types
- Balanced versus Imbalanced datasets
- Cross Sectional versus Time Series vs Panel / Longitudinal Data
- Batch Processing vs Real Time Processing
- Structured versus Unstructured vs Semi-Structured Data
- Big vs Not-Big Data
- Data Cleaning / Preparation – Outlier Analysis, Missing Values Imputation Techniques, Transformations, Normalization / Standardization, Discretization
- Sampling techniques for handling Balanced vs. Imbalanced Datasets
- What is the Sampling Funnel and its application and its components?
- Population
- Sampling frame
- Simple random sampling
- Sample

- Measures of Central Tendency & Dispersion
- Population
- Mean/Average, Median, Mode
- Variance, Standard Deviation, Range
- Measure of Skewness
- Measure of Kurtosis
- Spread of the Data
- Various graphical techniques to understand data
- Bar Plot
- Histogram
- Boxplot
- Scatter Plot

- Line Chart
- Pair Plot
- Sample Statistics
- Population Parameters
- Inferential Statistics
- Random Variable and its definition
- Probability & Probability Distribution
- Continuous Probability Distribution / Probability Density Function
- Discrete Probability Distribution / Probability Mass Function

- Normal Distribution
- Standard Normal Distribution / Z distribution
- Z scores and the Z table
- QQ Plot / Quantile – Quantile plot
- Sampling Variation
- Central Limit Theorem
- Sample size calculator
- Confidence interval – concept
- Confidence interval with sigma
- T-distribution / Student’s-t distribution
- Confidence interval
- Population parameter with Standard deviation known
- Population parameter with Standard deviation not known

- A complete recap of Statistics
- Formulating a Hypothesis
- Choosing Null and Alternative Hypothesis
- Type I or Alpha Error and Type II or Beta Error
- Confidence Level, Significance Level, Power of Test
- Comparative study of sample proportions using Hypothesis testing
- 2 Sample t-test
- ANOVA
- 2 Proportion test
- Chi-Square test

Scatter diagram

- Correlation analysis
- Correlation coefficient

- Ordinary least squares
- Principles of regression
- Simple Linear Regression
- Exponential Regression, Logarithmic Regression, Quadratic or Polynomial Regression
- Confidence Interval versus Prediction Interval
- Heteroscedasticity / Equal Variance

LINE assumption

- Linearity
- Independence
- Normality
- Equal Variance / Homoscedasticity

- Collinearity (Variance Inflation Factor)
- Multiple Linear Regression
- Model Quality metrics
- Deletion Diagnostics
- Understanding Overfitting (Variance) vs. Underfitting (Bias)
- Generalization error and Regularization techniques
- Different Error functions or Loss functions or Cost functions
- Lasso Regression
- Ridge Regression
- Principles of Logistic regression
- Types of Logistic regression
- Assumption & Steps in Logistic regression
- Analysis of Simple logistic regression results
- Multiple Logistic regression
- Confusion matrix
- False Positive, False Negative
- True Positive, True Negative
- Sensitivity, Recall, Specificity, F1

- Receiver operating characteristics curve (ROC curve)
- Precision Recall (P-R) curve
- Lift charts and Gain charts
- Logit and Log-Likelihood
- Category Baselining
- Modeling Nominal categorical data
- Handling Ordinal Categorical Data
- Interpreting the results of coefficient values
- Poisson Regression
- Poisson Regression with Offset
- Negative Binomial Regression
- Treatment of data with Excessive Zeros
- Zero-inflated Poisson
- Zero-inflated Negative Binomial
- Hurdle Model

- Hierarchical • Supervised vs Unsupervised learning • Data Mining Process • Hierarchical Clustering / Agglomerative Clustering • Dendrogram • Measure of distance

Numeric

Euclidean, Manhattan, Mahalanobis

Categorical

Binary Euclidean

- Simple Matching Coefficient
- Jaquard’s Coefficient

Mixed

- Gower’s General Dissimilarity Coefficient

Types of Linkages

- Single Linkage / Nearest Neighbour
- Complete Linkage / Farthest Neighbour
- Average Linkage
- Centroid Linkage

K-Means Clustering

Measurement metrics of clustering

- Within the Sum of Squares
- Between the Sum of Squares
- Total Sum of Squares

Choosing the ideal K value using Scree Plot / Elbow Curve

Other Clustering Techniques

- K-Medians
- K-Medoids
- K-Modes
- Clustering Large Application (CLARA)
- Partitioning Around Medoids (PAM)
- Density-based spatial clustering of applications with noise (DBSCAN)

- Why Dimension Reduction
- Advantages of PCA
- Calculation of PCA weights
- 2D Visualization using Principal components
- Basics of Matrix Algebra
- Factor Analysis
- What is Market Basket / Affinity Analysis

Measure of Association

- Support
- Confidence
- Lift Ratio

- Apriori Algorithm
- Sequential Pattern Mining
- User-based Collaborative Filtering
- A measure of distance/similarity between users
- Driver for Recommendation
- Computation Reduction Techniques
- Search based methods/Item to Item Collaborative Filtering
- SVD in recommendation
- The vulnerability of recommendation systems

- Definition of a network (the LinkedIn analogy)

The measure of Node strength in a Network

- Degree centrality
- Closeness centrality
- Eigenvector centrality
- Adjacency matrix
- Betweenness centrality
- Cluster coefficient

Introduction to Google page ranking

- Deciding the K value
- Thumb rule in choosing the K value
- Building a KNN model by splitting the data
- Checking for Underfitting and Overfitting in KNN
- Generalization and Regulation Techniques to avoid overfitting in KNN
- Elements of classification tree – Root node, Child Node, Leaf Node, etc.
- Greedy algorithm
- Measure of Entropy
- Attribute selection using Information gain
- Ensemble techniques – Stacking, Boosting and Bagging
- Decision Tree C5.0 and understanding various arguments
- Checking for Underfitting and Overfitting in Decision Tree
- Generalization and Regulation Techniques to avoid overfitting in Decision Tree
- Random Forest and understanding various arguments
- Checking for Underfitting and Overfitting in Random Forest
- Generalization and Regulation Techniques to avoid overfitting in Random Forest
- Overfitting
- Underfitting
- Pruning
- Boosting
- Bagging or Bootstrap aggregating
- AdaBoost / Adaptive Boosting Algorithm
- Checking for Underfitting and Overfitting in AdaBoost
- Generalization and Regulation Techniques to avoid overfitting in AdaBoost
- Gradient Boosting Algorithm<
- Checking for Underfitting and Overfitting in Gradient Boosting
- Generalization and Regulation Techniques to avoid overfitting in Gradient Boosting
- Extreme Gradient Boosting (XGB) Algorithm
- Checking for Underfitting and Overfitting in XGB
- Generalization and Regulation Techniques to avoid overfitting in XGB

- Sources of data
- Bag of words
- Pre-processing, corpus Document Term Matrix (DTM) & TDM
- Word Clouds

Corpus level word clouds

- Sentiment Analysis
- Positive Word clouds
- Negative word clouds
- Unigram, Bigram, Trigram

- Semantic network
- Clustering
- Extract user reviews of the product/services from Amazon, Snapdeal and trip advisor
- Install Libraries from Shell
- Extraction and text analytics in Python
- LDA / Latent Dirichlet Allocation
- Topic Modelling
- Sentiment Extraction
- Lexicons & Emotion Mining
- Probability – Recap
- Bayes Rule
- Naïve Bayes Classifier
- Text Classification using Naive Bayes
- Checking for Underfitting and Overfitting in Naive Bayes
- Generalization and Regulation Techniques to avoid overfitting in Naive Bayes
- Neurons of a Biological Brain
- Artificial Neuron
- Perceptron
- Perceptron Algorithm
- Use case to classify a linearly separable data
- Multilayer Perceptron to handle non-linear data
- Integration functions
- Activation functions
- Weights
- Bias
- Learning Rate (eta) – Shrinking Learning Rate, Decay Parameters
- Error functions – Entropy, Binary Cross Entropy, Categorical Cross Entropy, KL Divergence, etc.
- Artificial Neural networks
- ANN structure
- Gradient Descent Algorithms – Batch GD, SGD, Mini-batch SGD
- Backward propagation
- Network Topology
- Principles of Gradient descent (Manual Calculation)
- Momentum, Nesterov Momentum
- Optimization methods: Adam, Adagrad, Adadelta, RMSProp
- CNN – Convolutional Neural Network
- RNN – Recurrent Neural Network

- Classification Hyperplanes
- Best fit “boundary”
- Kernel Tricks – Linear, RBF, etc.
- Non-Linear Kernel Tricks
- Avoiding overfitting in SVM
- Regularization techniques in SVM

- Examples of Survival Analysis
- Time to event
- Censoring
- Survival, Hazard, Cumulative Hazard Functions
- Introduction to Parametric and non-parametric functions
- Introduction to time series data
- Steps to forecasting
- Components to time series data
- Scatter plot and Time Plot
- Lag Plot
- ACF – Auto-Correlation Function / Correlogram
- Visualization principles
- Naïve forecast methods
- Errors in the forecast and it metrics – ME, MAD, MSE, RMSE, MPE, MAPE
- Model-Based approaches

- Linear Model
- Exponential Model
- Quadratic Model
- Additive Seasonality
- Multiplicative Seasonality

- Model-Based approaches Continued
- AR (Auto-Regressive) model for errors
- Random walk
- ARMA (Auto-Regressive Moving Average), Order p and q
- ARIMA (Auto-Regressive Integrated Moving Average), Order p, d, and q
- A data-driven approach to forecasting

Smoothing techniques

- Moving Average
- Exponential Smoothing
- Holt’s / Double Exponential Smoothing
- Winters / Holt-Winters

- De-seasoning and de-trending
- Econometric Models
- Forecasting using Python
- Forecasting using R

5out of 5Srinivasa Rao– :I had completed my course Data science with Python in Nextit vision. NextIT Vision is the best place to excel in the next phase of your career. The trainers and faculty here were very friendly and comfortable. My trainer had explained each and every topic clearly and he won’t move to another topic until

5out of 5Archana Gaddam– :Good class, amazing learning experience. Very clear explanation of all the topics with clear practical examples and demos.