-
Notifications
You must be signed in to change notification settings - Fork 118
Open
Description
Hi I would like to run the training exercises on a Google Compute Engine cluster as I don't have an account on Amazon AWS. I was able to copy the wikipedia pagecounts data successfully to Google Compute Engines equivalent of S3 but I noticed that the data was enhanced to insert the date stamp as the 1st field in the input files. Can you provide me with a pointer to the code that you used to do this, or show me where I can copy the modified pagecounts data from ?
I copied the raw data from here:
http://dumps.wikimedia.org/other/pagecounts-raw/2009/
Any help you can provide would be much appreciated.
Metadata
Metadata
Assignees
Labels
No labels