Hadoop and Spark Developer at Quad/Graphics
Greater Milwaukee Area
Quad/Graphics
Hadoop and Spark Developer (Data Engineer)
Sussex
Community Health Systems
Hadoop and Spark Developer
February 2017 to March 2018
Franklin
Atlantis University
Research Analyst - Big Data Developer (Internship)
February 2016 to November 2016
Miami/Fort Lauderdale Area
•Developing applications and analytic tools using Apache Spark with Scala and PySpark. Technologies used in the project include Sqoop, HDFS, Yarn, Spark 2.0 , Spark SQL, Flume, Hive, Impala, Kafka and Spark Streaming, Kudu, Oozie, Airflow.•Owned end-to-end development, including coding, testing, debugging and deployment•Strong analytical, troubleshooting, and problem-solving skills - experience in analyzing and understanding the business/technology... •Developing applications and analytic tools using Apache Spark with Scala and PySpark. Technologies used in the project include Sqoop, HDFS, Yarn, Spark 2.0 , Spark SQL, Flume, Hive, Impala, Kafka and Spark Streaming, Kudu, Oozie, Airflow.•Owned end-to-end development, including coding, testing, debugging and deployment•Strong analytical, troubleshooting, and problem-solving skills - experience in analyzing and understanding the business/technology system architectures• Importing and exporting the data by developing spark applications as well as using Sqoop from Relational Database to HDFS and reverse.•Developed continuous data ingestion pipelines with Streamsets to pull data from Elasticsearch to hdfs•Built an ETL system in Hadoop to process raw data arriving from SFTP using custom Spark 2.0 with scala and applying complex transformations in different levels to store in HDFS.•Developing applications in spark and using scala, pyspark to support analytics, predictive, and ETL/ELT.•Develop, deploy, and monitor application and server performance in developed applications (Yarn, Spark) •Performance Tuning of Spark, Map Reduce and other applications within Hadoop.•Experience in using different file formats Sequence files, AVRO file, Parquet file. Managing and reviewing Hadoop log files•Experience developing DeltaLake (Databricks) application to purge the data (Eg: Delete, Update, Merges, vacuum) •Design and development of applications utilizing data streaming (Spark Streaming, Kafka, Flume) •Automate the application using a shell script and schedule the application using Run-Deck, Airflow and Cron job.• Experience in design, development, and implementation of large scale data systems based on hadoop (Cloudera) •Experience in Cloudera Data Science Workbench (CDSW)•Experience with data visualization tools (Arcadia data, Tableau, Alteryx)•Experience with Git and Jenkins•Familiar with Agile software development (Scrum)
📖 Summary
Hadoop and Spark Developer (Data Engineer) @ Quad/Graphics •Developing applications and analytic tools using Apache Spark with Scala and PySpark. Technologies used in the project include Sqoop, HDFS, Yarn, Spark 2.0 , Spark SQL, Flume, Hive, Impala, Kafka and Spark Streaming, Kudu, Oozie, Airflow.•Owned end-to-end development, including coding, testing, debugging and deployment•Strong analytical, troubleshooting, and problem-solving skills - experience in analyzing and understanding the business/technology system architectures• Importing and exporting the data by developing spark applications as well as using Sqoop from Relational Database to HDFS and reverse.•Developed continuous data ingestion pipelines with Streamsets to pull data from Elasticsearch to hdfs•Built an ETL system in Hadoop to process raw data arriving from SFTP using custom Spark 2.0 with scala and applying complex transformations in different levels to store in HDFS.•Developing applications in spark and using scala, pyspark to support analytics, predictive, and ETL/ELT.•Develop, deploy, and monitor application and server performance in developed applications (Yarn, Spark) •Performance Tuning of Spark, Map Reduce and other applications within Hadoop.•Experience in using different file formats Sequence files, AVRO file, Parquet file. Managing and reviewing Hadoop log files•Experience developing DeltaLake (Databricks) application to purge the data (Eg: Delete, Update, Merges, vacuum) •Design and development of applications utilizing data streaming (Spark Streaming, Kafka, Flume) •Automate the application using a shell script and schedule the application using Run-Deck, Airflow and Cron job.• Experience in design, development, and implementation of large scale data systems based on hadoop (Cloudera) •Experience in Cloudera Data Science Workbench (CDSW)•Experience with data visualization tools (Arcadia data, Tableau, Alteryx)•Experience with Git and Jenkins•Familiar with Agile software development (Scrum) SussexHadoop and Spark Developer @ Community Health Systems •Experienced in development using Cloudera distribution system.•As a hadoop Developer, my responsibility is manage the data pipelines and data lake.•Performing Hadoop ETL using hive on data at different stages of pipeline.•Sqooped data from different source systems and automating them with oozie workflows.•Generation of business reports from data lake using Hadoop SQL (Impala) as per the Business Needs.•Automation of Business reports using Bash scripts in Unix on Datalake by sending them to business owners.•Developed Spark scala code to cleanse and perform ETL on the data in data pipeline indifferent stages.•Worked in different environments like DEV,QA, Datalake and Analytics Cluster as partof Hadoop Development.•Snapped the cleansed data to the Analytics Cluster for reporting purpose to Business•Developed pig scripts, python to perform Streaming and created tables on the top of itusing hive.•Developed multiple POCs using Scala and deployed on the Yarn cluster, compared theperformance of Spark, and SQL•Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs,and Scala.•Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.•Handled importing of data from various data sources, performed transformations using•Hive, Spark and loaded data into HDFS.•Developed pig, hive, sqoop, Hadoop streaming, spark actions in Oozie in the workflowmanagement.•Supported Map Reduce Programs those are running on the cluster.•Experienced in collecting, aggregating, and moving large amounts of streaming data intoHDFS using Flume.•Good Understanding of Workflow management process and in implementation.•Knowledge on HL7 protocols and parsing the messages from the HL7 messages.•Involved in the development of frameworks that are used in Data pipelines and co-ordinated with cloudera consultant From February 2017 to March 2018 (1 year 2 months) FranklinResearch Analyst - Big Data Developer (Internship) @ Atlantis University •Involved in creating tables, partitioning, bucketing of table in Hive•Creating Hive tables and working on them using Hive QL.•Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using HIVE and SQOOP.•Responsible for managing data coming from different sources• Performed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.•Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS. •Experience in Importing and exporting data into HDFS and Hive using Sqoop From February 2016 to November 2016 (10 months) Miami/Fort Lauderdale Area
What company does Srikanth Kondabathini work for?
Srikanth Kondabathini works for Quad/Graphics
What is Srikanth Kondabathini's role at Quad/Graphics?
Srikanth Kondabathini is Hadoop and Spark Developer (Data Engineer)
What industry does Srikanth Kondabathini work in?
Srikanth Kondabathini works in the industry.
Who are Srikanth Kondabathini's colleagues?
Srikanth Kondabathini's colleagues are Charlene Warras, Christian Hummel, Corey Lone, Denise Campbell, Sergio Javier Slep, Greg Gohr, Teresa Price, Steve Jaeger, David Moffat, and Jason Poteet
Extraversion (E), Intuition (N), Feeling (F), Judging (J)
1 year(s), 0 month(s)
Unlikely
Likely
There's 100% chance that Srikanth Kondabathini is seeking for new opportunities
Enjoy unlimited access and discover candidates outside of LinkedIn
One billion email addresses and counting
Everything you need to engage with more prospects.
ContactOut is used by
76% of Fortune 500 companies
Srikanth Kondabathini's Social Media Links
/company/q... /school/at...