Monday, May 2, 2022

Russian Troll Tweet data, Machine Learning with accuracy 99.6%

You can read motivation for my work here: https://myabakhova.blogspot.com/2022/01/russian-troll-data-investigation.html I published EDA, Feature Engineering and Machine Learning on my Kaggle account in 3 parts due to constrains of Kaggle resources:

Part 1. EDA

Part 2. Feature Engineering

Part 3. Machine Learning, a test accuracy 99.6% 

 Conclusion

Analysis of a Twitter account behavior helps a lot in determining paid trolls. The most helpful for detection properties are the ones related to propaganda methods. Apparently trolls have specified guidelines for their posts and they stick to them. I see it as convenient because we can set up filters for catching the most significant phenomena, and then check a whole account activity. 

In addition the most important for prediction features turned out to be not very dependable on languages but mostly on troll account activity. Thus we can do it for other languages, and do not limit it to Russian trolls posting English texts. 

Please upvote it on Kaggle if you like it!