pomsquare.blogg.se

Download bbc cricket
Download bbc cricket






*.classes: Assignment of documents to natural classes, with each line corresponding to a document.*.docs: List of document identifiers, with each line corresponding to a column of the sparse data matrix.*.terms: List of content-bearing terms in the corpus, with each line corresponding to a row of the sparse data matrix.*.mtx: Original term frequencies stored in a sparse data matrix in Matrix Market format.The files contained in the archives given above have the following formats: The datasets have been pre-processed as follows: stemming (Porter algorithm), stop-word removal ( stop word list) and low term frequency filtering (count < 3) have already been applied to the data. Class Labels: 5 (athletics, cricket, football, rugby, tennis).Consists of 737 documents from the BBC Sport website corresponding to sports news articles in five topical areas from 2004-2005.Class Labels: 5 (business, entertainment, politics, sport, tech).Consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005.ICML 2006.Īll rights, including copyright, in the content of the original articles are owned by the BBC. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. If you make use of these datasets please consider citing the publication:ĭ. These datasets are made available for non-commercial and research purposes only. Two news article datasets, originating from BBC News, provided for use as benchmarks for machine learning research.








Download bbc cricket