Format ==================== 1 Posts DataSet ==================== This is in CSV File Format. Contains basic information about post, language and post types as listed here: permalink: Sharechat post index ID. createdAt: date of post. itemType: type of post (text, image, video etc.). likes: likes on post when fetched. views: views on post when fetched. shares: shares to Whatsapp when fetched. tagText: similiar to hashtag, used while making a post. Eg. #GoodMorning tagHandle: ID of tagtext. language: language of user, selected from one of the 14 language forums while registering. ==================== 2. Images Dataset ==================== Image hashcodes and permalink (ID to refer post information as in above (1)) ------ This is in CSV file format. Contains the Facebook PDQ image clustering headers for images. For each image post the dataset contains the following: id: cluster ID as assigned from PDQ (similar images will have same cluster ID) to image. Please check paper for method we choose for distance and clustering. size: cluster size of cluster ID of which image has been grouped. hash: Hash codes generated from PDQ. norm, delta, quality, dims, readSec, hashSec: features set and obtained during computation of hash codes. permalink: index ID to link image data to posts data above (1). ocrText: Optical character recognition from Image text.