National Institute of Technology Rourkela

राष्ट्रीय प्रौद्योगिकी संस्थान राउरकेला

ଜାତୀୟ ପ୍ରଯୁକ୍ତି ପ୍ରତିଷ୍ଠାନ ରାଉରକେଲା

An Institute of National Importance
NIT Rourkela Inside Page Banner

Syllabus

Course Details

Subject {L-T-P / C} : CS6355 : Applied Data Science { 3-0-0 / 3}

Subject Nature : Theory

Coordinator : Sibarama Panigrahi

Syllabus

Module 1 :

Mathematical Foundation: Linear Algebra and Vector Calculus, Probability and Statistics.
Introduction and Motivation to Data Science, Data: Definition, Types and Facets of Data, Data Quality Data Preprocessing: Aggregation, Sampling, Dimensionality reduction, Feature subset selection, Feature creation, Discretization and Binarization, Variable transformation Measures of Similarity and Dissimilarity Data science process.
Data Warehousing: Data Preprocessing, Warehouse Architecture, ETL, OLAP, Data Lakes, Big Data Pipeline.

Module 2 :

Descriptive Statistics: Introduction, Data Preparation Exploratory data analysis: Summarizing Data and Plotting, Scatter plot, Pair plot, Histogram, Probability Density Function (PDF), Univariate analysis using PDF, Cumulative distribution function (CDF), Percentiles and Quantiles, Inter-Quartile Range (IQR), MAD (Median Absolute Deviation), Box-plot with whiskers, Violin plots, Univariate, Bivariate, and Multivariate analysis, contour plot Outlier Treatment, Measuring Asymmetry: Skewness and Pearson’s Median Skewness Coefficient, Kernel Density Estimation: Sample and Estimated Mean, Variance and Standard Scores, Covariance, and Pearson’s and Spearman’s Rank Correlation.
Predictive Modeling: Regression, Decision Tree, Support Vector Machine (SVM), Ensemble Models: Bagging, Boosting.
Deep Learning: Introduction to Neural Network, Exploding and Vanishing Gradient Problem, Dropout, Regularization, Weight Initialization, Batch Normalization, Optimizers. Convolutional Neural Network (CNN): Convolution, Padding, Strides, Pooling, Convolution over RGB Images, Transfer Learning, Fusion: Early Fusion (Feature Level Fusion), Late Fusion (Decision Level Fusion), and Intermediate Fusion. Recurrent Neural Network (RNN), Long Short Term Neural Network (LSTM), Gated Recurrent Unit (GRU).

Module 3 :

Time Series Data Analysis: Introduction to Time Series, Univariate Time Series Forecasting using Statistical, Deep Learning and Hybrid Models: Autoregressive Moving Average (ARMA) models, Autoregressive Integrated Moving Average (ARIMA) models, Seasonal ARIMA (SARIMA), Exponential Smoothing, LSTM, GRU, Hybrid statistical and deep learning models. Multivariate Time Series Forecasting, Point Forecasting, Interval Forecasting, Spatial-Temporal Forecasting.

Module 4 :

Text Mining and Analytics: Introduction, Data Cleaning, Text Mining Techniques: Stemming, Stop-Word Removal, Tokenization, Lemmatization, Uni-gram, Bi-gram, n-gram, tf-idf, Word2Vec, Bag of Words Case Study.
Recommender System: Introduction, Content-Based Filtering, Collaborative Filtering, Hybrid Recommenders, Modelling User Preferences, Evaluating Recommenders.

Course Objective

1 .

To mine valuable information for strategic decision making under uncertain situations, develop products, analyze trends, and forecast.

2 .

Employ quantitative and descriptive modeling of solutions for real-world business problems.

3 .

Employ cutting edge tools and technologies to analyze Big Data.

4 .

To communicate findings, and effectively present results using data visualization techniques.

Course Outcome

1 .

Develop an in-depth understanding of the key technologies in data science and business analytics: data mining, machine learning, visualization techniques, predictive modeling, and statistics.

2 .

Practice problem analysis and decision-making.

3 .

Gain practical, hands-on experience with statistics, programming languages, and tools through applied research experiences.

4 .

Apply data science concepts and methods to solve problems in real-world contexts and communicate these solutions effectively.

Essential Reading

1 .

Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques, Elsevier

2 .

Laura Igual and Santi Seguí, Introduction to Data Science, Springer

3 .

Rob J Hyndman and George Athanasopoulos,, Forecasting: Principles and Practice, OTexts

4 .

Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Pearson

Supplementary Reading

1 .

Davy Cielin, Arno Meysman, Mohamed Ali, Introducing Data Science, Manning

2 .

Andreas, Practical Data Science, Apress

Journal and Conferences

1 .

.