National Institute of Technology Rourkela

राष्ट्रीय प्रौद्योगिकी संस्थान राउरकेला

ଜାତୀୟ ପ୍ରଯୁକ୍ତି ପ୍ରତିଷ୍ଠାନ ରାଉରକେଲା

An Institute of National Importance

Syllabus

Course Details

Subject {L-T-P / C} : CS4109 : Advanced Database { 3-0-0 / 3}

Subject Nature : Theory

Coordinator : Prof. Sambit Bakshi

Syllabus

Big Data introduction: definition and taxonomy, value for the enterprise, Setting up the demo environment, First steps with the Hadoop ecosystem,
The Hadoop ecosystem: Introduction to Hadoop, Hadoop components: MapReduce/Pig/Hive/HBase, Loading data into Hadoop, Handling files in Hadoop, Getting data from Hadoop, Querying big data with Hive, Introduction to the SQL Language, From SQL to HiveQL, NoSQL,

Big data & Machine learning: Quick into to Machine learning, Big Data & Machine Learning, Machine learning tools, Spark & SparkML, H2O, Azure ML

Case Studies of NLP and Health care

Course Objectives

  • introduce students the concept and challenge of big data
  • teach students in applying skills and tools to manage and analyze the big data

Course Outcomes

Upon completion of the subject, students will be able to: <br />(a) understand the concept and challenge of big data and why existing technology is inadequate to analyze the big data <br />(b) collect, manage, store, query, and analyze various form of big data and <br />(c) gain hands-on experience on large-scale analytics tools to solve some open big data problems and <br />(d) understand the impact of big data for business decisions and strategy.

Essential Reading

  • Sarkar Dipanjan, Text Analytics with Python, Apress
  • Anil Maheshwari, Big Data First Edition, Mcgraw Hill

Supplementary Reading

  • EMC Educational Services, Data Science and Big Data Analytics, Wiley
  • Kumar Vikas, Healthcare Analytics Made Simple, Packt Publishing Limited , ISBN: 9781787286702, 1787286703