In recent years, the spread of misinformation or "Fake News" has increased exponentially due to the increasing use of social media / online platforms. As a number of people rely on information on the Internet, it is necessary to combat information that is not true or misleading. Malicious information may affect a large number of people. This misleading information is limited not only to individuals or locations, but also scientific facts. While there is ample evidence, there is plenty of inaccurate knowledge accessible on social media and online platforms. In this project, we developed and evaluated vaious types of systems that will be able to distinguish whether or not the text presented includes "Fake News"
- Fake and real news dataset: Classifying the news
- Getting Real about Fake News: Text & metadata from fake & biased news sources around the web
Fake-News-Dataset: Two fake news datasets covering seven different news domains- BBC News Summary: Extractive Summarization of BBC News Articles
- All the news: 143,000 articles from 15 American publications
| Package | Version |
|---|---|
| numpy | 1.18.1 |
| pandas | 1.0.3 |
| tqdm | 4.46.0 |
| scikit-learn | 0.22.1 |
| keras | 2.3.1 |
| nltk | 3.4.5 |
| spacy | 2.2.3 |
| h5py | 2.10.0 |
| tensorflow | 2.2 |
| Flask | 0.12.2 |
To install requirements:
cd Fake-News-Detection
pip install -r requirements.txtTo run app locally:
cd Fake-News-Detection/app/
python app.py- Analysis : View
- Logistic Regression : View
- Linear Support Vector Classification : View
- XG Boost : View : Optimized
- LightGBM : View
- Sequential : View
- RNN + GloVe : View
- Final Sequential and RNN+GloVe trial View
View Working on : msa.datascience.app
Note: The server might be slow to respond depending on the load on the system The model has not been trained on test data (available on website) For more data use data from here they are also not trained on
- The dataset is mostly based on data from USA
- Server may be slow or non-responsive (hosted) localserver will work fine
- News from the years 2004 to 2005 and 2011 to 2018 Due to the dataset
- Output of
tqdmprogressbar is not visible on githubs notebook viewer it might show some error orHBox(children=(FloatProgress(value=0.0, max=83.0), HTML(value='')))
