Under the Spotlight: Web Tracking in Indian Partisan News Websites


Abstract

India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixelbased tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites. However, through cookie synchronization, more user IDs, are synchronized in Left websites than Right or Centre. Canvas fingerprinting is used similarly by Left and Right, and less by Centre. Invisible pixel-based tracking is 50% more intense in Centre-leaning websites than Right, and 25% more than Left. Desktop versions of news websites deliver more cookies than their mobile counterparts. A handful of third-parties are tracking users in most websites in this study. This paper, by demonstrating intense web tracking, has implications for research on overall privacy of users visiting partisan news websites in India.

Dataset and Codes

An anonymized version of the dataset and codes used in our paper is available for the research community.

  1. Methodology Dataset: This includes various lists of websites used in the machinery. (a) Websites list with leanings and (b) Disconnect List with categories. For more details check GITHUB page.

  2. Cookies and HTTP Logs Dataset: SQLite dumps of our crawls using OpenWPM. (a) Stateful crawls (b) Stateless crawls

  3. Codes: Codes and additional information of above mentined files are available at GITHUB

You can find the format of the dataset from here.


Contact Us


If you are interested in using this data, please fill the form to . Request specific data to get the link where you can download the data.

We are sharing the dataset under the terms and conditions specified here below. Please note that submitting the form indicates that you accept the terms and conditions of the data. In the form, please indicate which part of the dataset you need. If you do not get any email notification for your logged request within 24 hours, please e-mail us at netsys.noreply[at]gmail.com.

Dataset Terms and Conditions

  1. You will use the data solely for the purpose of non-profit research or non-profit education.

  2. You will respect the privacy of end users and organizations that may be identified in the data. You will not attempt to reverse engineer, decrypt, de-anonymize, derive or otherwise re-identify anonymized information.

  3. You will not distribute the data beyond your immediate research group.

  4. If you create a publication using our datasets, please cite our papers as follows.


@inproceedings{agarwal2021under,
  title={Under the Spotlight: Web Tracking in Indian Partisan News Websites},
  author={Agarwal, Vibhor and Vekaria, Yash and Agarwal, Pushkal and Mahapatra, Sangeeta and Set, Shounak and Muthiah, Sakthi Balan and Sastry, Nishanth and Kourtellis, Nicolas},
  booktitle={Proceedings of the International AAAI Conference on Web and Social Media},
  year={2021}
}