Thursday, August 19, 2004

19 August 2004: Citeseer

So I made it back in once piece, and with a copy of Citeseer cunningly hidden inside the toaster in my hand-luggage.

For the uninitiated (which, frankly, most people in the world are when it comes to Citeseer), this is an autonomous citation indexing system and digital library with web-portal search funtionality. Or, more simply, Google, but for academic papers. Invented and (after much discussion with NEC) now once again maintained by Penn State University, it's a system that has garnered a huge amount of interest from users across the world. The boys at Penn State have scraped around 700,000 relevant scientific documents off the web by the time of my visit and they see no reason why they can't reach 2 million before too long. And they're giving it all to us.

Which is very nice, especially since we don't quite know what we're going to do with it yet. We could do the standard Semantic Web thing with it (see blog on Edinburgh), which would be to transfer the data to RDF and call it 'Semantic Citeseer', but there may yet be more to it than that. Half the problem is getting the data out of the system (the code is, well, proprietary, let's put it like that); the more fun half is analysing the network graph that is created by all these documents referencing each other, and seeing what patterns we can see in there. Laney's nodal points, except we know what we're looking for. Pattern Recognition. It always comes back to William Gibson, doesn't it?

A pain getting home again though. State College airport isn't the largest in the world, so when someone decided to land a private plane without putting the wheels down, the ensuing mess took the rest of the day to clear up. The news article (linked to above) sums up the scale of the airport: "dozens of travellers" being stranded for hours. When we eventually took off, the plane we were in was so tiny that people sitting near the back were asked to move further forwards to help balance the thing. But we made it to DC in one piece, and back to Heathrow from there, so a successful trip overall.

Installing Citeseer comes next. And that could be the real challenge.

