This article describes a frequency and sentiment analysis based on real-time tweets streams in relation to the four main candidates in the US Presidential Election.
The main objective was to deploy and to test the available connector between Apache NiFi and Apache Spark, so I decided to implement the following use case:
- Using Apache NiFi, get filtered tweets related to US presidential election
- Using Apache Spark, get the stream of tweets from Apache NiFi
- Use Spark streaming to transform and store the data into an Apache Hive table
- Use Apache Zeppelin notebook to display real-time results
At the end, I get real time analytics such as:
- frequency of tweets along the time per candidate
- percentage of negative, positive and neutral tweets per candidate
- opinion trends along the time for each candidate
The article is available on Hortonworks Community Connection website. And as always, please feel free to comment and/or ask questions.