• Mail us!

    info@uriahtraining.com
  • Call us today!

    +91 7619627218
  • We are open!

    Mon-Sun 9 AM - 7 PM

April 2019 Offer : 10% Off on All Courses | New Batch Start for CCNA(R & S, Security), CCNP(R & S), Advance Networking Courses(ASA Firewall, Checkpoint), A+ & N+, Linux(RHCSA, RHCE), VMware, Cyber Security(CEH, CISSP), Big Data(Spark & Scala, Hadoop), Python Programming(Data Analytics), Amazon Web Services (AWS), Automation Testing Courses(QTP/ Selenium), Digital Marketing Courses, SEO Training and Soft Skill & Personality Development.

Course Info

  • Category: Trending Courses
  • Trainer: Silvia Priyadharshini
  • Duration: 50 Hour
  • Timing: 10 AM to 1 PM (Weekdays)

Big Data Spark and Scala

Big Data

• Understanding Data & Hadoop: Basic Concepts
• What is BigData
• Characteristics of BigData
• Challenges with Traditional Systems
• Problems with BigData
• Handling BigData

HADOOP Core Concepts

• Problems with Existing Distributed Systems to deal Big Data
• Why Hadoop and An Overview and History of Hadoop
• Requirements of New Approach
• The Hadoop Project and Hadoop Components

Scala Basics

• Scala Installation
• Know the concepts of classes in scala
• Object orientation in scala
• Primitive Datatypes
• Scala simple build tool – SBT
• Functional programming in scala – Closures,Currying,Anonymous functions
• Exploring mutable and immutable variables
• Execution of Scala code through REPL or CLI
• Working on basic programming constructs
• Collections – array,set

Apache Spark

• Introduction to Apache Spark
• Hadoop vs Spark
• Why Spark
• Spark Vs Mapreduce
• Batch Vs. Real Time Big Data Analytics
• Spark Installation and Configuration
• Spark Execution Architecture
• Components of Spark – SQL,Streaming,Storm,GraphX
• Understanding Spark Context
• Resilient Distributed Data (RDD) – Partitions,Features ,Parallelism

Working with RDD’s

• RDD operations – Transformations and Actions
• RDD - DeepDive,Persistance/Caching,Lineage
• Types of RDD -Pair RDD,chain RDD
• Spark API programming
• Executing spark program with SBT and spark-assembly
• Understanding spark-submit.
• Running spark program in local mode and in cluster

Spark SQL – structure data ( Hive with spark sql) – batch processing

• Spark SQL overview
• Understanding Dataframes,Datasets.
• Dataframes Vs RDD’s
• Processing data using Dataframes
• Hive Context
• Custom case classes
• Temp tables Vs Persistent tables
• Inferring Schema programmatically
• Querying files as tables – CSV,Text,JSON,Parquet
• Standard transformations in querying
• Analytics and Window functions in sql
• Working of Spark SQL in Native and Hive context

Spark Streaming – unstructured data , real time processing

• Features of Spark Streaming
• Understanding Dstreams
• Use case 1:- Streaming data from netcat server
• Use case 2:- Flume and spark streaming integration
• Use case 3:- Kafka and Spark streaming integration (kafka -messaging service)
• Sliding window operations
• Transformers and Estimators