Huge interest in Big Data and Cloud Computing. It is always fun to learn new technology and apply it in real life.
In-depth knowledge and experience in building "big data" analytics by processing petabytes worth of activity data using Kafka, Cassandra, Spark for Real-Time analytics, HBase, Hive for Ad-hoc query, Hadoop MapReduce, Pig, and Scalding for offline batch jobs.
Interesting Areas:
* Massive Open Online Course (MOOC), Large-scale Data Analytics, Real-Time Stream Processing, Distributed Systems, Web Search.
Open Source enthusiast, work seamlessly with Scala, Cassandra, Node.js, Memcached, CouchBase, Solr, Spring MVC, Rest.li etc.
Senior Software Engineer @ Work in Ads engineering team. Build up high performance, scalable Online Advertisement Platform and deliver reliable solutions for end users.
Build up Ads intelligence in a hybrid batch/realtime mode that can execute batch processing over historical data on Hadoop in the offline layer; and "in house real time index", for the speed layer, and then to merge the results into a final analytic. Build a monitoring system for Ads tracking in real-time that validates the quality and reliability of data pipeline.
Work on Ads APIs for campaign management, reporting, targeting, sponsored content (https://developer.linkedin.com/ads) From October 2013 to Present (2 years 3 months) San Francisco Bay AreaData Engineer @ Help to scale out data infrastructure for Coursera, an education platform that partners with top universities and organizations worldwide, with mission of universal access to the world’s best education.
Promote the effective and responsible use of data for:
- improving educational outcomes
- informing business decisions
- shaping company culture and communications
Build reliable Data Warehousing on AWS Cloud using: RDS, EC2, Redshift, S3, EMR, Data Pipeline. From September 2014 to Present (1 year 4 months) San Francisco Bay AreaSoftware Engineer @ - Build Ads analytics system using Hadoop MapReduce/Pig/Azkaban/Kafka. Complete the near real-time workflow in tracking, reporting, monitoring and provide comprehensive dashboard metrics.
- Work on project “Sponsored Update” (https://www.linkedin.com/ads/) that drives a revolution source in content marketing. Design and implement end-to-end platform.
- Contribute on "LinkedIn Targeted Sponsored Content Platform" and integrate with Ads Restful APIs. From August 2012 to October 2013 (1 year 3 months) San Francisco Bay AreaResearch Assistant @ Work on "Starfish", a self-tuning analytics system on Hadoop.
Optimized Hadoop performance with Amazon EC2
Built cost model for MapReduce jobs and an optimizer for automatic job configuration tuning.
Implemented a visualizer to demonstrate the profiler, what-if engine and optimizer.
Applied optimizations to MapReduce workflows of data-‐parallel jobs with Pig and Cascading. From September 2010 to July 2012 (1 year 11 months) Software Engineer Intern @ Work in the High Performance Computing team at EC2 group.
Designed and implement the automatic regression test framework on cloud stack using S3, SQS, Elastic MapReduce, RDS, Redshift, DynamoDB. From May 2011 to August 2011 (4 months) Software Engineer @ Work in Infrastructure Team
Core developer of Baidu App Engine, support PHP runtime in a sandbox, provide elastic and secure environment for Cloud Computing.
Web Server Consultant, develop modules for firewall, embedded script, load balancing. From January 2010 to August 2010 (8 months) Software Engineer @ Startup on an evolution of Real-time Search Engine.
Developed vertical search products of real estate search, ticket search, job search.
Distributed crawler system, feature extraction, text classification. From July 2007 to December 2009 (2 years 6 months)
MS @ From 2010 to 2012 Master's degree @ Duke University From 2010 to 2012 BS @ From 2003 to 2007
Websites:
http://www.cs.duke.edu/~dongfei/,
http://dongfeiwww.com/,
http://www.zhihu.com/people/dongfei