Monday, August 01, 2011

Data Sets

1. TIGER/Line dataset (
2.2 millions California Roads in the TIGER/Line dataset widely used in spatial database research.

2. Kristina Lerman at USC ISI

a. Digg 2009
This anonymized data set consists of the voting records for 3553 stories promoted to the front page over a period of a month in 2009. The voting record for each story contains id of the voter and time stamp of the vote. In addition, data about friendship links of voters was collected from Digg.

Download Digg 2009 data set

b. Flickr personal taxonomies
This anonymized data set contains personal taxonomies constructed by 7,000+ Flickr users to organize their photos, as well as the tags they associated with the photos. Personal taxonomies are shallow hierarchies (trees) containing collections and their constituent sets (aka photo-albums) and collections.

Download Flickr data set

c. Wrapper maintenance
Wrappers facilitate access to Web-based information sources by providing a uniform querying and data extraction capability. When wrapper stops working due to changed in the layout of web pages, our task is to automatically reinduce the wrapper. The data sets used for experiments in our JAIR 2003 paper contain web pages downloaded from two dozen sources over a period of a year.

Data set

