Software Development Engineer @ 1. Media Analytics, Lambda Architecture (Yahoo Recommends, Internal Content Insights, etc.)
• Designed and Implemented real time data pipeline using open source technologies – Storm, Kafka and Druid, powering Yahoo Recommends (Recommendation as a Service), Internal Content Insights (Content Performance metrics), and Photon (Page latency Monitoring) and exposed RESTful APIs for users
• Designed and Implemented metadata service to join with fact data using Redis-based technology.
• Migrated major high latency hadoop ETL pipeline to new source feed
2. Yahoo Gemini, Native/Search Ads Platform (Application, Reporting and Forecasting)
• Joined team under demand for 3x growth within one month, 10x in 3 months due to Microsoft search deal
• Designed and Implemented rich entity model for controlling ads quality review flow and emergency unclogging system
• Stabilized existing oracle based reporting system under high demand (8 million queries per day) by designing and implementing server side async support, traceable error handling and separated sync/async use cases, etc.
• Experimenting with alternative technologies (Druid, Spark, Presto, Drill, etc.) for building new architecture with tiered
storage layer supporting separation of use cases to accommodate future growth in demand From July 2014 to Present (1 year 4 months) Sunnyvale, CAGraduate Assistant Researcher @ • Designed and implemented large-scale geodata processing tool in Python for highway network conflation (48 states and over 100K links per state), geo-coding and quality assurance with high efficiency while maintaining accuracy using geometric algorithms.
• Designed and maintained geospatial database with geographical locations, speed stations, highway polylines, etc.
• Conducted statistical programming, data mining and visualization for congestion measurement and highway performance analysis from large-scale traffic data From September 2010 to April 2014 (3 years 8 months) College Station, TX
Master of Science (M.S.), Computer Science, 4.0/4.0 @ Texas A&M University, College Station From 2012 to 2013 BS, Land Resources Management @ Zhejiang University From 2005 to 2009 Jian Shen is skilled in: Java, Spring Framework, Hibernate, Web Development, Software Engineering, Algorithms, Linux, Big Data Analytics, MySQL, Apache Storm, Apache Kafka, Apache ZooKeeper, Druid, Apache Spark, Scala, Django, Python, C++, Eclipse, Github, Socket Programming, Machine Learning, Hadoop, Scrum, Amazon Web Services..., Information Retrieval, ArcGIS, Google App Engine