An Interactive 3D Recommender for TED Talks

Imagine that all TED talks form a universe, where galaxies are talks with similar topics. Wouldn’t it be fun to create a map of the galaxies so that we can nevigate visually? using topic modeling and Google’s Embedding Projecter, I created an interactive 3D map of the TED talks, in which people can explore topics they are interested in and get intuitive recommendations. Tools applied include: NLTK, Gensim, Word2Vec, scikit-learn, pymongo. (Click here to jump to the recommender demo.)

Read More

Edible Mushrooms - Classifier Comparison

What are the most important features that distinguish edible mushrooms from the poisonous ones? Can classifers that perfectly divide a dataset be our safe guide in the field? Here’s a story on data analysis meeting practical needs, with human error taken into account. Tools applided include Sklearn, AWS, postgreSQL, D3, Flask.

Read More

Predicting Sediment Concentration in the Mississippi River

The second project has two major components: web scraping (BeautifulSoup, Selenium) and linear regression (ScikitLearn, StatsModels). My topic came from previous experience as a geomorphologist and costal engineerer: is it possible to predict sediment concentration in the lower Mississippi River using upstream hydrological data?

Read More

Metis Project 1 - First Week & Pandas

It has been a short and productive week. For our first project, we dove right into Pandas and provided recommendations for an NGO in New York City using MTA turnstile data and more.

Read More