Course Details
Subject {L-T-P / C} : CS4109 : Advanced Database { 3-0-0 / 3}
Subject Nature : Theory
Coordinator : Sambit Bakshi
Syllabus
Big Data introduction: definition and taxonomy, value for the enterprise, Setting up the demo environment, First steps with the Hadoop ecosystem,
The Hadoop ecosystem: Introduction to Hadoop, Hadoop components: MapReduce/Pig/Hive/HBase, Loading data into Hadoop, Handling files in Hadoop, Getting data from Hadoop, Querying big data with Hive, Introduction to the SQL Language, From SQL to HiveQL, NoSQL,
Big data & Machine learning: Quick into to Machine learning, Big Data & Machine Learning, Machine learning tools, Spark & SparkML, H2O, Azure ML
Case Studies of NLP and Health care
Course Objectives
- introduce students the concept and challenge of big data
- teach students in applying skills and tools to manage and analyze the big data
Course Outcomes
Upon completion of the subject, students will be able to:
(a) understand the concept and challenge of big data and why existing technology is inadequate to analyze the big data
(b) collect, manage, store, query, and analyze various form of big data and
(c) gain hands-on experience on large-scale analytics tools to solve some open big data problems and
(d) understand the impact of big data for business decisions and strategy.
Essential Reading
- Sarkar Dipanjan, Text Analytics with Python, Apress
- Anil Maheshwari, Big Data First Edition, Mcgraw Hill
Supplementary Reading
- EMC Educational Services, Data Science and Big Data Analytics, Wiley
- Kumar Vikas, Healthcare Analytics Made Simple, Packt Publishing Limited , ISBN: 9781787286702, 1787286703