*** Data Scientist in the making. ****
Currently, Kapil is working as a Research Engineer/ Data Scientist in the Maps Search team here at Apple.
He graduated from the Computer Science department of The Johns Hopkins University.
Prior to Apple, he was working in the Data Discovery/Data science team at Eventbrite. He was working on various big data and machine learning problems including recommendations and classification of events. He was also working on to improve the search relevancy of the results.
He has strong fundamentals in data structure and algorithms. He loves working on Big data and loves to solve Search and Machine learning related problem.
He has taken formal courses in field of Machine Learning, Natural Language Processing, Information Retrieval and Extraction and is ready to apply the knowledge gained over his course work to real life problems.
Before Eventbrite, he was working with ATT Interactive also called YellowPages.com. We worked in the Data insight group, where his role was to mine user query logs. The team was also responsible for finding patterns in user search which helps in improving the relevancy of the search. He gained substantial experience working on Hadoop and hive. He also gained experience in SOLR/Lucene.
His interests lies in applying Machine learning, Natural language and Information retrieval methodologies on big data by leveraging the power of Hadoop and see himself working as a Data Scientist in the future.
Specialties:
SOLR,
Lucene,
Cassandra,
HBase
Hadoop,
Hive,
Natural Language Processing,
NLP,
Information retrieval,
Machine learning,
Java,
Python,
Big data,
Recommendations.
Scikit Learn,
Neo4j,
Gephi
Research Engineer/ Machine Learning @ Improving Apple Maps and helping people find places by training one query at a time.
Working on machine learning methodology at various layers of result retrieval, ranking and selection. Also, working on Lucene backend pipeline for improving relevancy and ranking of results. From July 2014 to Present (1 year 6 months) Big Data, Search and Data Science Engineer @ Big Data, Search, Recommendation, Machine Learning. Data discovery.
Improving event classification using Machine Learning techniques.
Working on SOLR to improve search relevancy.
Recommending events using social graph.
Big data Hadoop pipeline: Building data pipeline using hadoop for query completion, spell checker, predicting user locations.
Writing hadoop piplelines to solve data mining problems.
Logistic Regression,
LibLinear, Octave
Hadoop
Java
Python
Django
Redis
Hive
SOLR
Lucene
Machine Learning,
Cassandra,
Scikit learn,
Neo4j,
Gephi From July 2012 to July 2014 (2 years 1 month) San Francisco Bay AreaSenior Software Engineer with Data Insight Team @ Working with Data Insight Team doing query log analysis, click through rates etc.
Worked on projects to mine user query/search logs and suggest spell correction and related queries to the users.
Kapil worked on extending an existing version of a data library which makes it extremely useful in reading Hive tables inside a map-reduce jobs.
Currently, Kapil is implementing the next version of Search Offline simulator(SOS). SOS is build upon Hadooop map/reduce paradim and mines the user's query logs. Its a useful product to output complex Dimensions and Metrics to the user, converting business requirements into useful information. SOS is also used as a pre-step to A/B testing where a comparison can be made between the current production version and a new candidate search version. Using the probabilistic CTR model its helps in determining how good(or bad) the new version will perform.
On an ad-hoc basis, Kapil is responsible for providing useful insight on user query logs and click through data using HIVE as a tool.
Worked on improving ranking algorithms for local search. Worked on end to end search engine using SOLR.
Hadoop, Map reduce
Hive
Scribe From January 2011 to July 2012 (1 year 7 months) San Francisco Bay AreaSoftware Engineer for Search @ I am mainly handling the search aspect of the product. I am working on SOLR and Lucene.
I implemented and designed the search and hashtags functions on the site.
I write a lot of code in PHP on the server side. I am also working on some ad-hoc Python code deployment, mailer projects.
I do write a lot stored procedures and sql queries in MySql.
XMPP, JS are some other cool technologies I am working on. From December 2009 to September 2010 (10 months) Software Intern at Aleph Point @ I am working as a Software developer for Aleph point. The work is challenging and involves great deal of knowledge in SOLR/Lucene. From June 2009 to February 2010 (9 months) Team member at CLSP'09 workshop @ During Summers 09, I volunteered for working in the CLSP workshop. I was a part of the n-gram team and under the tutelage of Prof. Satoshi Sekine we build an n-gram search engine.
More details can be found here
http://www.cs.jhu.edu/~kapild/files/projects.html#nsearch
Publications:
1) N-gram Search Engine with Patterns Combining Token, POS, Chunk and NE Information, Proceedings of LREC, 2010
by Satoshi Sekine, Kapil Dalwani
2) New Tools for Web-Scale N-grams. Dekang Lin, Ken Church, Heng Ji, Satoshi Sekine, David Yarowsky, Shane Bergsma,
Kailash Patil, Emily Pitler, Rachel Lathbury, Vikram Rao, Kapil Dalwani and Sushant Narsale, Proceedings of LREC, 2010 From June 2009 to July 2009 (2 months) Product Engineer @ I worked on 2 project in Coreobjects. Both of the projects were built from scratch.
The first was a audio sharing and streaming wesbite built on a small social network. It was built on MVC architecture with spring, hibernate and struts as its major components.
the second project was an Eclipse plugin which took input from user to build on the fly code using Freemarker as a template. The final product was a wesbite build on Adode Flex. From December 2006 to January 2008 (1 year 2 months) Software Engineer @ From August 2004 to November 2006 (2 years 4 months)
MS, Computer Science @ The Johns Hopkins University From 2008 to 2009 B.E., Electrical and Electronics Engineering @ Punjab Engineering College From 2000 to 2004 Kapil Dalwani is skilled in: Java, Data Mining, Machine Learning, Natural Language Processing, Information Retrieval, Solr, Hadoop, MapReduce, Algorithms, Big Data, scikit, Hive, Recommendation, Recommender Systems, Panda
Websites:
http://www.github.com/kapild,
http://kapilddatascience.wordpress.com,
http://www.apple.com