View on GitHub

DataZCW-Final-Project

Final project for ZCW Data's course - NLP Covid-19 Sentiment Pipeline/Dashboard

Covid-19 Photo

NLP Covid-19 Sentiment Pipeline

For our final project at Zip Code Wilmington, we chose to view sentiment analysis on COVID-19 for two different sources; Articles from a News API and Tweets from Twitter’s API. For the news api we will be using Airflow to gather new articles every hour regarding COVID-19. For the Twitter API we have used Kafka to produce a stream of all tweets regarding COVID-19.

After acquiring this data we run it through a Vader model to analyze sentiment of both the media and tweets. Then store it in a SQL database. Using airflow we will continuously clean the data and show our results using various visualization tools. Check out our pipeline below.


Pipeline Flow

Pipeline


Meet the team

Apoorva Shukla

GitHub
Connect on LinkedIn

James Kocher

GitHub
Connect on LinkedIn

James Thompson

GitHub
Connect on LinkedIn


APIs Used

Frameworks Used

Where to start

To run this program we ask you execute the follow steps.

-Set up a dotenv file with the approriate keys for News API and Twitter API

-Change your directory to StartFile and run the follow commands on your command line:

-From the airflow_dag directory add the file “final_project_dag.py” to your airflow home in the dags folder as well as set up your dotenv file in the same folder. Start airflow webserver and scheduler and turn on the final_project_dag.

-Change directory to the twitter_kafka folder and start running your kafka zookeeper and server. After that run both conusmer.py and producer.py simultaneously

-Open the visualation software and watch as results poor in on national sentiment towards COVID-19.