Title: Statistical models in text analytics and an application to measuring and forecasting polarization using topic flow modeling
Abstract: Statistical analysis typically starts with numerical data; however, text data can be a rich source of useful information in many problems. In this talk, I will present the popular Latent Dirichlet Allocation (LDA) model for topic modeling in text analytics. The hierarchical model can be extended to analyze different types of text data including spatial and temporal data. In this talk, I will present our work on adapting this model to study polarization in a timeseries of news articles. We scrape thousands of articles from online sources (>1,000 articles per day) and use topic modeling and sentiment analysis to build a measure of polarization across the topics. We show how this polarization trends over time by performing topic matching to identify "topic flows" across time over multiple days. We show that this measure correlates to real-life events and are in the process of conducting a simulation study to validate the methodology based on adaptations to a dynamic linear model. We have also performed an external validation of the basic concepts by administering a social survey to over 1,000 households in different parts of Virginia to understand the correlation between news consumption, political stance, and polarization. I will also briefly present a different project where we build a model on reviews and ratings of Amazon products to identify how these can be leveraged for automatic processing of these data.
Join Zoom Meeting https://clemson.zoom.us/j/98628483405?pwd=ZHV3Vi9zdFBoTVpnMUllaG5zMGxEQT09 Meeting ID: 986 2848 3405 Passcode: 766240
Wednesday, October 13 at 11:15am to 12:15pm
Martin Hall, M-103
220 Parkway Dr., Clemson, SC 29634, USA