National Institute of Technology Rourkela

राष्ट्रीय प्रौद्योगिकी संस्थान राउरकेला

ଜାତୀୟ ପ୍ରଯୁକ୍ତି ପ୍ରତିଷ୍ଠାନ ରାଉରକେଲା

An Institute of National Importance

Syllabus

Course Details

Subject {L-T-P / C} : CS4109 : Advanced Database { 3-0-0 / 3}

Subject Nature : Theory

Coordinator : Sambit Bakshi

Syllabus

Big Data introduction: definition and taxonomy, value for the enterprise, Setting up the demo environment, First steps with the Hadoop ecosystem,
The Hadoop ecosystem: Introduction to Hadoop, Hadoop components: MapReduce/Pig/Hive/HBase, Loading data into Hadoop, Handling files in Hadoop, Getting data from Hadoop, Querying big data with Hive, Introduction to the SQL Language, From SQL to HiveQL, NoSQL,

Big data & Machine learning: Quick into to Machine learning, Big Data & Machine Learning, Machine learning tools, Spark & SparkML, H2O, Azure ML

Case Studies of NLP and Health care

Course Objectives

  • introduce students the concept and challenge of big data
  • teach students in applying skills and tools to manage and analyze the big data

Course Outcomes

Upon completion of the subject, students will be able to:
(a) understand the concept and challenge of big data and why existing technology is inadequate to analyze the big data
(b) collect, manage, store, query, and analyze various form of big data and
(c) gain hands-on experience on large-scale analytics tools to solve some open big data problems and
(d) understand the impact of big data for business decisions and strategy.

Essential Reading

  • Sarkar Dipanjan, Text Analytics with Python, Apress
  • Anil Maheshwari, Big Data First Edition, Mcgraw Hill

Supplementary Reading

  • EMC Educational Services, Data Science and Big Data Analytics, Wiley
  • Kumar Vikas, Healthcare Analytics Made Simple, Packt Publishing Limited , ISBN: 9781787286702, 1787286703