A suitable corpus for training skip-though vectors

by user1767774   Last Updated May 16, 2018 14:19 PM

For training a variant of the notion of skip-though vectors, I need a long corpus of consecutive (related) sentences. The original skip-thought paper has used BookCorpus, but it is no longer available. Is there a similar dataset available online? I know Gutenberg project, but unfortunately its data is notoriously difficult to pre-process, and I'd also prefer more contemporary texts.

Thanks



Related Questions



Question about Continuous Bag of Words

Updated July 10, 2017 12:19 PM

How to handle numbers in NLP task

Updated May 09, 2017 12:19 PM