BS, Computer Science, Economics (minor), Graduated Highest Honors (Summa Cum Laude) @
Georgia Institute of Technology
I serve as the architect and manager for Yahoo's next generation streaming and batch processing systems. Problems I've dealt with focus on increasing scale, reducing latency, improving operability, focusing on customer satisfaction, while driving quality in data and engineering best practices. Systems I've worked on span from streaming to monthly data intervals. The current scale is 100
I serve as the architect and manager for Yahoo's next generation streaming and batch processing systems. Problems I've dealt with focus on increasing scale, reducing latency, improving operability, focusing on customer satisfaction, while driving quality in data and engineering best practices. Systems I've worked on span from streaming to monthly data intervals. The current scale is 100 billion events/40 TB compressed input data per day.
Senior Manager/Principal Engineer @ - Chosen to lead both the next generation streaming and batch processing teams
- Prototyped batch processing with Spark
- Merged many fragmented data feeds into a single unified feed
- Designed the sessionization, join, and partitioning framework
- Created single transformation library to be used by streaming and batch system
- Created SQL-like DSL (select ... where ...) for creating Kafka Topics in Storm
- Handled planning and managed quarterly expectations for each team
- Was an author on four team submissions accepted for Tech Pulse (Yahoo's internal, annual technical conference)
- Defensive publication on smart partitioning
- Patent filed for sessionization methodology
- Received 2014 Yahoo! Excellent Award for being in the top 10% of the company based on performance From January 2014 to Present (1 year 10 months) Principal Software Engineer @ - Prototyped processing to publish aggregates for various dimensions to Nielsen
- Handle training on how to use the team's data feeds and tools
- Lots of data generation redesign
- Lead cross team group to come up with and implement new event model
- Leveraged HBase for large dimensional data look-ups for fact data
- Most importantly, I was our department's NCAA pool winner in 2012! From March 2012 to January 2014 (1 year 11 months) Senior Software Engineer @ - Migrated the entire legacy processing stack from in-house mapreduce processing to hadoop mapreduce. Procured an 800 node hadoop cluster. Still maintain backward compatibility by publishing into an off grid filer based warehouse.
- Handle customer and product support
- Handle hardware procurement and capacity planning
- Lots of various distributed data processing (PIG, Hadoop, Yahoo developed systems)
- Participate in college recruiting
- 2011 Q4 "You Rock" team award
- 2011 college recruitment award From March 2008 to March 2012 (4 years 1 month) Software Engineer @ - Created Yahoo's first hourly audience data pipeline, which processed all of Yahoo's user traffic.
- Created a dependency management tool to help with scheduling of various scripts, mapreduce style jobs, and unix commands. From March 2007 to March 2008 (1 year 1 month) Associate Software Engineer @ - Migrated Yahoo data processing stack from legacy daily data collection to an in-house near real time system (think Flume/Scribe).
- Reduced the runtimes of daily processing jobs (10 billion events, 2 TB of data) by several hours. From January 2006 to March 2007 (1 year 3 months) Intern @ - Using Perl and Java, created a daily job that sampled Yahoo's apache logs to analyze the user traffic in order to discover new os and browsers from user-agent strings.
- Evaluated third party os and browser classification tools. From May 2005 to August 2005 (4 months) Intern @ Using PHP and MySQL, created a web based OLAP tool to analyze Yahoo's data warehouse metadata. From June 2004 to August 2004 (3 months)
NDO, Statistics, Computer Science, Management Science and Engineering, 4.075 @ Stanford University From 2009 to 2012 BS, Computer Science, Economics (minor), Graduated Highest Honors (Summa Cum Laude) @ Georgia Institute of Technology From 2001 to 2005 High School, International Baccalaureate, Graduated Salutatorian @ Campbell High School From 1997 to 2001 Michael Natkovich is skilled in: Hadoop, Distributed Systems, Perl, MapReduce, Unix, MySQL, Java, Web Analytics, Apache Pig, Shell Scripting, Scalability, HBase, Grid Computing, Agile Methodologies, Data Management, Apache Storm, Apache Kafka, Hive
Looking for a different
Get an email address for anyone on LinkedIn with the ContactOut Chrome extension