• Active Apache Spark contributor, especially with MLlib (machine learning). Expertise in distributed data processing and Apache Hadoop ecosystem like Spark, Hbase, YARN and Flink.
• Rich experience on implementing sophiscated machine learning algorithms on distributed platform. (LDA, neural netowrk/deep learning, distributed linear algebra, etc.) Practical understanding on Machine Learning fundamentals and graph analytics with fast prototyping skills.
• Expertise in large scale database system design and development, including data warehouse and reporting services. In-depth knowledge of index management, lock/concurrency/transaction, data partition and replication. Accumulated skills to locate bottleneck and performance tuning.
• Solid Java/Scala programming skills (6 years).
• Experience of C#/C++, Python, ASP.NET MVC, Azure (cloud) SQL/Storage/Cache development.
• Hands-on experience of JavaScript, Json, Node.js and AngularJS.
Software Engineer @ Extensively provide implementation, consultant and tuning advice to external industry Spark users. Drive and delivered the implementations of:
1) Artificial Neural network and Convolutional Neural Network.
2) Recommendation system (collaborative filtering and pagerank), based on Spark GraphX.
3) HDP (Hierarchical Dirichlet process) for topic modeling.
Active involvement in the development of Apache Spark, especially for MLlib (machine learning library), Streaming and core Spark.
1) Primary author of online LDA in MLlib.
2) Primary author of distributed linalg (matrix/vector) algorithms like QR Decomposition.
3) Primary author of ML (pipeline) feature components, including MinMaxScaler, CountVectorizer and several other NLP pre-processors.
4) Contribution to clustering, general linear model, feature transformation, pipeline and many other components.
Public speaking
1) Delivered training of Spark MLlib at AMPCamp 2015 Shanghai.
2) Presentation on topic modeling at China Hadoop Summit 2015.
3) Presentation about Spark MLlib 2015 in Spark forum for global Intel. From December 2014 to Present (11 months) Shanghai City, ChinaSDE @ I was one of the two primary developers of Curah! (www.curah.com/), a technical sharing site based on windows azure. Commited core features as well as cirtical infrastructure improvements.
Previously I was a member of the Commerce Transaction Platform. Focusing on massive data processing and robust financial ETL system on large scale. From April 2012 to December 2014 (2 years 9 months) chinaDeveloper Intern of Jazz Process Authoring Team at Rational Software @ Short-term intern, help develop eclipse plugins and administration website. From July 2010 to August 2010 (2 months) BeijingIntern & Campus Ambassdor @ Learn and promote Sun technologies (Java, MySQL, Solaris) on campus. Host and participate open source activities on local sites. From July 2008 to December 2009 (1 year 6 months)
Master's degree, Computer Science @ Zhejiang University From 2009 to 2012 Bachelor of Engineering (BEng), Computer Science @ Zhejiang University From 2005 to 2009 Yuhao Yang is skilled in: Apache Spark, Machine Learning, Big Data Analytics, Hadoop, Data Warehousing, SQL Tuning, Scala, Java, C#, HBase, Solaris 8/9/10, Data Mining, Windows Azure, JavaScript, Website Development, ASP.NET MVC, AngularJS, Node.js