DataSpeaks: COVID-19 Communication Analytics by Africa CDC Event. Sign Up

COVID-19 Communication Analytics by Africa CDC
This project analyzes how Africa CDC used Facebook and Instagram to communicate COVID-19 information across Africa. By collecting and examining 224 posts using data scraping tools and analyzing engagement patterns in R Studio, the study evaluates the effectiveness of multimedia content, messaging strategies, and audience interaction. The insights offer valuable recommendations for enhancing public health communication through social media.
By Mathew Shem
Biostatistician | Data Scientist | Founder, StatQuestJourney Hub
📌 Introduction
In the heat of the COVID-19 pandemic, effective communication was not just a need—it was a lifeline. As the African continent battled misinformation, poor health infrastructure, and a rising infection curve, Africa CDC turned to social media—Facebook and Instagram—to disseminate vital health messages.
This project—from scratch to insights—analyzes 224 Africa CDC social media posts to evaluate engagement, multimedia strategy, and public response. I collected and cleaned the data, performed in-depth analysis using R Studio, and interpreted the results in the light of public health communication.
👇 Here’s a complete walkthrough of the process and insights.
🗃️ 1. Data Collection
📌 Sources:
Facebook: Posts scraped from Africa CDC's official page
Instagram: Posts scraped from @AfricaCDC
🔧 Tools Used:
BeautifulSoup4 (Python) – for parsing HTML data.
Instaloader (Python) – for scraping Instagram posts.
ParseHub (GUI) – for collecting Facebook posts with timestamps, likes, comments, and shares.
🧪 Fields Collected:
Post text, Date, Likes, Comments, Shares, Hashtags (#COVID19)

🧹 2. Data Cleaning & Integration
After extraction, I encountered:
Missing shares values on Instagram (platform limitation)
Some engagement fields missing from Facebook
Steps Taken:
Removed duplicates
Handled nulls using complete case analysis
Merged datasets into a single data frame
📌 Cleaned dataset included only #COVID19-tagged posts between 2020–2022.
🔍 3. Data Exploration (EDA)
Using ggplot2, dplyr, and lubridate, I performed initial exploration to understand:
Posting trends over time
Engagement distribution across platforms
Type of multimedia and framing
Key Metrics:
Engagement Score = Likes + Comments + Shares
Engagement over time using line plots
Findings:
Facebook had consistently higher engagement
Engagement spiked during major health announcements (e.g., PACT initiative)
Instagram posts received more likes, but fewer comments/shares
🧠 4. Content Analysis
Categorization based on:
Visual Type: Infographics, Motivational images, Instructional videos
Message Type: Protective, Scientific, Emotional
Coding Framework:
Developed a structured Codebook to tag:
Multimedia presence
Crisis framing
Influencer use
Emotional tone
🔬 5. Statistical Analysis & Results
🧪 Inter-Rater Reliability
Coded posts with help of second coder
Fleiss’ Kappa = 0.89 → Excellent agreement
📊 Comparative Analysis
Facebook engagement: Avg. 120 likes, 45 comments, 15 shares
Instagram: Avg. 40 likes, 0.6 comments
📈 Engagement Trends Over Time
Peaks during campaigns, announcements
Posts with infographics and videos had 40–70% higher interaction
🖼️ (Insert Screenshot: Line chart of likes/comments over time)
📢 6. Research Questions Answered
✅ Q1: How did Africa CDC use multimedia on Facebook and Instagram?
Used infographics and videos (Facebook), text-heavy posts on Instagram
No posts featured influencers
✅ Q2: Differences in use of multimedia across platforms?
Facebook: More interactive, used visuals better
Instagram: Visually appealing but less engaging due to lack of text detail
✅ Q3: Effectiveness of multimedia?
Infographics boosted engagement by ~40%
Crisis posts on Facebook had 25% more shares
🖼️ (Insert Screenshot: Faceted plots of media type vs engagement)
💡 7. Interpretation & Insights

Key Insight: Public health agencies need platform-specific strategies—visual storytelling for Instagram, interactive informative content for Facebook.
🚧 8. Limitations
Instagram doesn’t support shares → skewed metrics
Focused only on two platforms
Multimedia content (images, videos) was not extracted due to scraping limits
🔮 9. Recommendations
Use influencers to boost engagement (predicted 200% increase in likes)
Add multimedia to Instagram (engagement could rise by 70%)
Conduct sentiment analysis in future for deeper audience understanding
Include platforms like X (Twitter) and YouTube for broader insight
📚 Tools & Libraries Used
Python: beautifulsoup4, instaloader, pandas
R: tidyverse, ggplot2, dplyr, lubridate
Scraping: ParseHub (Facebook GUI extraction)
Statistical Testing: Fleiss’ Kappa, Engagement ratio calculation
Documentation: RMarkdown, PowerPoint for presentation
🎤 About the Author
I am a passionate Biostatistician and Data Scientist, founder of StatQuestJourney Hub, and currently presenting this research at the DataSpeaks event. My journey from raw data scraping to detailed insights reflects my mission: using data to tell stories that matter.
📨 Contact: mathewshem90@gmail.com🔗 Follow the journey: [LinkedIn - Mathew Shem]
Power in Numbers
30
Programs
50
Locations
200
Volunteers
Project Gallery
