Mining internet history unearths hidden gems

Internet connection
Next time you send data down one of these, just remember it might last forever

It may seem that the internet is all about the here and now, but a new project from Adobe and a US university shows that there's gold to be mined in the accumulated layers of websites as they age.

Researchers at Adobe's Advanced Technologies Lab and Washington University have come up with a project they call Zoetrope that makes the history of the Web useful in a way sites like the Internet Archive can never be.

Crawling archives

The software stores the content of selected websites every hour and uses that to create searchable views of how they change over time.

Users can simply scroll through versions of a site – a news page, for example – to see what happened and how it developed or can focus on specific parts of a page.

Possibilities include tracking product prices to establish trends or even comparing specific variables on different sites.

Simpler processes

Zoetrope might easily show correlations between the number of goals scored in football matches and the amount of money spent on players' salaries, for example.

While such comparisons could be done be hand, the point of the project is to make it simple and intuitive to do so using much of what we already have scattered across different sites.

Indexing everything?

Zoetrope currently has a four-month database of 1,000 popular websites as its starting point. Researcher Eytan Adar explains: "It's impossible to crawl and capture some of these things at the rate at which they're changing.

"But for something like Zoetrope, it's a smaller percentage of the Web that we want to track. We don't actually need to get every single page that's out there."

Nevertheless, if sites like the Internet Archive come good on plans to share their data with Zoetrope, we could soon be looking at the internet in a very different way.

TOPICS

J Mark Lytle was an International Editor for TechRadar, based out of Tokyo, who now works as a Script Editor, Consultant at NHK, the Japan Broadcasting Corporation. Writer, multi-platform journalist, all-round editorial and PR consultant with many years' experience as a professional writer, their bylines include CNN, Snap Media and IDG.