Senior Big Data Engineer | Federal Health Care Practice| Information Management
Baltimore, Maryland Area
Kaiser Permanente
Sr. Developer Health plan and Analytics
Rockville, Maryland
M&T Bank
Senior Big Data Engineer ( Contractor)
October 2018 to February 2019
Baltimore, Maryland
Deloitte
Senior Consultant | Federal Health Care| Information Management (Full Time)
December 2014 to September 2018
Washington D.C. Metro Area
GE Capital
ETL Developer EDWH (Full Time)
August 2013 to December 2014
New Orleans
Boeing
Senior Informatica Consultant
June 2012 to June 2013
tulsa, oklahoma
Bank Alfalah Limited
Asst Manager ETL Developer (Full Time)
August 2003 to June 2011
punjab
Trinity Healthcare
Senior Informatica Consultant
July 2011 to May 2012
Saginaw, Michigan Area
Technologies: Hadoop, Hbase, Hive, Scala, SPARK, Sqoop, Flume, Kafka and Python ,and analyze large datasets (Structured and Unstructured data -XMLs, JSONs, PDFs) using Hadoop platform• Develop Python, Pyspark, Spark scripts to filter/cleanse/map/aggregate data• Manage and implement data processes (Data quality reports)• Develop data profiling, deduping logic, matching logic for analyses• Programming languages experience in Python, Pyspark and... Technologies: Hadoop, Hbase, Hive, Scala, SPARK, Sqoop, Flume, Kafka and Python ,and analyze large datasets (Structured and Unstructured data -XMLs, JSONs, PDFs) using Hadoop platform• Develop Python, Pyspark, Spark scripts to filter/cleanse/map/aggregate data• Manage and implement data processes (Data quality reports)• Develop data profiling, deduping logic, matching logic for analyses• Programming languages experience in Python, Pyspark and Spark for data ingestion.
What company does Salman Khan work for?
Salman Khan works for Kaiser Permanente
What is Salman Khan's role at Kaiser Permanente?
Salman Khan is Sr. Developer Health plan and Analytics
What industry does Salman Khan work in?
Salman Khan works in the Information Technology and Services industry.
Who are Salman Khan's colleagues?
Salman Khan's colleagues are Isac Silva, Matheus Palauro, Robbert Wijnsma, Evelien Plug, Erwin Engels, Mohamed Farouki, Berend Ritzema, Roel Wessels, Sheila Batista Alves, and Rogier Bolkestein
📖 Summary
Sr. Developer Health plan and Analytics @ Kaiser Permanente Technologies: Hadoop, Hbase, Hive, Scala, SPARK, Sqoop, Flume, Kafka and Python ,and analyze large datasets (Structured and Unstructured data -XMLs, JSONs, PDFs) using Hadoop platform• Develop Python, Pyspark, Spark scripts to filter/cleanse/map/aggregate data• Manage and implement data processes (Data quality reports)• Develop data profiling, deduping logic, matching logic for analyses• Programming languages experience in Python, Pyspark and Spark for data ingestion. Rockville, MarylandSenior Big Data Engineer ( Contractor) @ M&T Bank Responsible for building, maintaining data pipelines and data products to ingest, process large volume of structured / unstructured data from various sources.Analyzing the data needs, migrating the data into an Enterprise data lake, build data products and reports.build real time and batch based ETL pipelines with strong understanding of big data technologies and distributed processing frameworksStrong understanding of the big data cluster, and its architectureo Experience building and optimizing big data ETL pipelines.Advanced programming skills with Python and ScalaGood knowledge of spark internals and performance tuning of spark jobs.Strong SQL skills and is comfortable operating with relational data models and structure.Capable of accessing data via a variety of API/RESTful services.Experience with messaging systems like Kafka. Experience with No SQL databases. Neo4j, mongo, etc.Expertise with Continuous Integration/Continuous Delivery workflows and supporting applications.o Exposure to cloud environments and architectures. (preferably Azure)Ability to work collaboratively with other teams. Experience with containerization using tools such as Docker.Strong knowledge of Linux and Bash.Design and develop ETL workflows to migrate data from varied data sources including SQL Server, Netezza, Kafka etc. in batch and real-time.Develop checks and balances to ensure integrity of the ingested data.Design and Develop Spark jobs as per requirements for data processing needs.Work with Analysts and Data Scientists to assist them in building scalable data products.Designs systems, alerts, and dashboards to monitor data products in production From October 2018 to February 2019 (5 months) Baltimore, MarylandSenior Consultant | Federal Health Care| Information Management (Full Time) @ Deloitte • Technical lead for ETL & MDM design, development and data quality process. Design and develop ETL workflows to migrate data from varied data sources including SQL Server, Netezza, Kafka etc. in batch and real-time.Develop checks and balances to ensure integrity of the ingested data.Design and Develop Spark jobs as per requirements for data processing needs.Work with Analysts and Data Scientists to assist them in building scalable data products.Designs systems, alerts, and dashboards to monitor data products in production• Developed Conceptual and physical model on CDH for Budget and actuals• • Analyzing data with Hive and Pig.• Partitioning of tables in HIVE using static and dynamic partition.• Develop, validate and maintain HiveQL queries.• Involved in writing the test cases of data Migration from Hive to Hive and Mainframe to HDFS.• Carried out all the releases through SDLC process.• Designed Hive tables to load data to and from external tables.• Involved in writing MapReduce jobs.• Import and export data in Hbase.• Drive the workshops / Discussion with the clients.• Design the ETL architecture – business model displaying end- to- end dataflow and the DWH/DM models, ensuring data existed with minimal redundancy and good relationship reducing query times for reporting and increasing feasibility of CUBE creation.• Develop ETL sourcing strategy, Data Validation and data reconciliation process.• Develop process flows, Technical Design Document, Deployment Cutover Checklist.• Establish ETL frame work, enterprise naming standards and best practices• Informatica power-center installation, configure and administrator• Co-ordination with delivery team to ensure high quality and timely From December 2014 to September 2018 (3 years 10 months) Washington D.C. Metro AreaETL Developer EDWH (Full Time) @ GE Capital • Development of new workflow to extract the data from EDW and provide the XML, Flat Files, CSV to the end customer through FTP by writing script at session level.• Translated Business processes into Informatica mappings.• Managed a team of offshore and onshore resources for the successfully completion of project..• Created Tables, Keys (Unique and Primary) and Indexes in the SQL server.• Implemented Slowly Changing Dimension Type 1 and Type 2 for inserting and updating Staging /Target EDW tables for maintaining the history• Responsible for migrating from Development to staging and to Production (deployments).• Worked on different tasks in Workflows like sessions, events raise, event wait, decision-mail, command, worklets, Assignment, Timer and scheduling of the workflow• Used Workflow Manager for Creating, Validating, Testing and running the sequential, parallel, and initial and Incremental Load• Deployment of objects.• Monitoring and tracking the applications as per the schedule run.• Mainly involved in Preparation of Low Level Designs (Mapping Documents) by understanding the system CPMS.• Responsible in developing the ETL logics and the data maps for loading the tables.• Designed for populating Tables for one-time load and Incremental loads.• Analyze functional requirements provided by Business Analysts for the code changes.• Create workflows based on Dependencies in Informatica for the code changes.• Unit test the data mappings and workflows.• Validate the data loaded into database.• Provide the status report for all the application monitoring, tracking status.• Execute the Test cases for the code changes.• Provided extensive support in UAT (user acceptance testing) and deployment of mappings.Environment: Informatica Power Center 9.5, Teradata, UNIX, Oracle 11g, Mainframes, Flat Files, XML, VSAM Files, Sequential Files,Sequential Files, Hadoop Ecosystem (Hive, Pig, Map reduce and HDFS.) From August 2013 to December 2014 (1 year 5 months) New OrleansSenior Informatica Consultant @ Boeing • Working closely with all internal and external business partners (stakeholders) to deliver Datawarehouse solutions• Interacted with SAP Functional / Business Analyst to understand the business requirements Consulting on Interface tools and methodology. Define fact and Dimensions Tables into .• Converting the Main Frame files into flat files by defining the structure and doing performance tuning to run the wf very fast.• Formulating the ETL mappings to implement the business logic. Used transformations like lookup, update strategy, expression, filter, router, aggregate, sequence generator,. Implemented Performance tuning at mapping, session level. Extensively used look up cache and persistent cache to make the performance fast. Use sorter transformation to remove the duplicate record before aggregator.• Extensively used Informatica Transformation like Source Qualifier, Rank, SQL, Router, Filter, Lookup, Joiner, Aggregator, Normalizer, Sorter etc. and all transformation properties.• Followed the required client security policies and required approvals to move the code from one environment to other.• Developed ETL mappings, transformations using Informatica Power center 8.6• Deployed the Informatica code and worked on code merge between two difference development teams.• Identified the bottlenecks in the sources, targets, mappings, sessions and resolved the problems.• Created automated scripts to perform data cleansing and data loading.Environment: Windows XP/2008, Mainframe, UNIX, Oracle 10g,. SQL Server 2008, Informatica Power Center 9.1.0, Power Exchange Visi Prise HMS, SAP Business Objects From June 2012 to June 2013 (1 year 1 month) tulsa, oklahomaAsst Manager ETL Developer (Full Time) @ Bank Alfalah Limited • Interacted with Data Modelers and Business Analysts to understand the requirements and the impact of the ETL on the business.• Designed ETL specification documents for all the projects.• Extracted data from Flat files, DB2, SQL and Oracle to build an Operation Data Source. Applied business logic to load the data into Global Data Warehouse.• Extensively worked on Facts and Slowly Changing Dimension (SCD) tables.• Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.• Used various transformations like Filter, Router, Expression, Lookup (connected and unconnected), Aggregator, Sequence Generator, Update Strategy, Joiner, Normalizer, Sorter and Union to develop robust mappings in the Informatica Designer.• Used shortcuts to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.• Implemented Informatica recommendations, methodologies and best practices.• Implemented performance tuning logic on Targets, Sources, Mappings and Sessions to provide maximum efficiency and performance.• Involved in Unit, Integration, System, and Performance testing levels.• Written documentation to describe program development, logic, coding, testing, changes and corrections.• Migrated the code into QA (Testing) and supported QA team and UAT (User).• Created detailed Unit Test Document with all possible Test cases/Scripts.• Conducted code reviews developed by my team mates before moving the code into QA.• Provided support to develop the entire warehouse architecture and plan the ETL process.• Modified existing mappings for enhancements of new business requirements.Environment: Informatica Power Center, SSIS, Import Wizard, SQL Server, Windows, UNIX, Oracle, Card Pro, Bank Smart. From August 2003 to June 2011 (7 years 11 months) punjabSenior Informatica Consultant @ Trinity Healthcare • Interacted actively with Business Analysts and Data Modelers on Mapping documents and Design process for various Sources and Targets.• Provide technical, business, management expertise, and support the Department of Health and Human Services and Centers for Medicare and Medicaid Services (CMS) in building and maintaining a comprehensive enterprise architecture program.• Create Bteq scripts for loading into stage tables.• Developed reconciliation process.• Handled different issues with Flat Files successfully.• Involved in CODE WALK THROUGH with end users and clients.• Created Fast export/ bteq scripts for report generation to downstream Applications.• Using Fast load and Multi load utilities for loading staging area tables• Generation and Validation of XMLS.• Promoting code from one level to the next level.• Developed test plan at different levels using Autosys (UT and SIT).• Attending production issues and solving them if there is any issue in the application or else getting it resolved by DBA in case of Data base issue.• Contributed in developing complex ETL mappings, workflows for three types of mappings, source to staging, Dimension and fact loading. • Involved in performance tuning of the ETL mappings and writing SQL queries for extracting data from source systems. • Designed unit test cases for each ETL map and Involved in Integration testing and system testing phases of the project. • Developed shell scripts that are used to populate parameter files automatically before the ETL process starts. • Worked closely with data population source systems and warehouse components. • Defined Target Load Order Plan and Constraint based loading for loading data correctly into different Target Tables. • Used Lookup Transformation to access data from tables, which are not the source for mapping and used Unconnected Lookup to improve performance. From July 2011 to May 2012 (11 months) Saginaw, Michigan Area
Introversion (I), Sensing (S), Thinking (T), Perceiving (P)
2 year(s), 7 month(s)
Unlikely
Likely
There's 100% chance that Salman Khan is seeking for new opportunities
Enjoy unlimited access and discover candidates outside of LinkedIn
Trusted by 400K users from
76% of Fortune 500 companies
The most accurate data ever
Hire Anyone, Anywhere
with ContactOut today
Making remote or global hires? We can help.
No credit card required
Salman Khan's Social Media Links
/company/k...