Course Details
Subject {L-T-P / C} : CS4109 : Advanced Database { 3-0-0 / 3}
Subject Nature : Theory
Coordinator : Prof. Sambit Bakshi
Syllabus
Big Data introduction: definition and taxonomy, value for the enterprise, Setting up the demo environment, First steps with the Hadoop ecosystem,
The Hadoop ecosystem: Introduction to Hadoop, Hadoop components: MapReduce/Pig/Hive/HBase, Loading data into Hadoop, Handling files in Hadoop, Getting data from Hadoop, Querying big data with Hive, Introduction to the SQL Language, From SQL to HiveQL, NoSQL,
Big data & Machine learning: Quick into to Machine learning, Big Data & Machine Learning, Machine learning tools, Spark & SparkML, H2O, Azure ML
Case Studies of NLP and Health care
Course Objectives
- introduce students the concept and challenge of big data
- teach students in applying skills and tools to manage and analyze the big data
Course Outcomes
Upon completion of the subject, students will be able to: <br />(a) understand the concept and challenge of big data and why existing technology is inadequate to analyze the big data <br />(b) collect, manage, store, query, and analyze various form of big data and <br />(c) gain hands-on experience on large-scale analytics tools to solve some open big data problems and <br />(d) understand the impact of big data for business decisions and strategy.
Essential Reading
- Sarkar Dipanjan, Text Analytics with Python, Apress
- Anil Maheshwari, Big Data First Edition, Mcgraw Hill
Supplementary Reading
- EMC Educational Services, Data Science and Big Data Analytics, Wiley
- Kumar Vikas, Healthcare Analytics Made Simple, Packt Publishing Limited , ISBN: 9781787286702, 1787286703