How scraping changed SEO

Search Engine Optimization written in dice

(Image credit: Shutterstock/ivosar)

SEO has always stood as a unique endeavor. Not only are the participants pitted against one another as they wrestle for the first place of Search Engine Result Pages (SERPs), but, in nearly all cases, data-driven guesswork is the best tool they have.

Well, nowadays it might not be as much guesswork as it used to be. Since the advent of web scraping, SEOs have gained the ability to gather tons of data - something that is constantly being converted into the much-celebrated thing called “organic channel strategy”. Hopes of reaching #1 in SERPs are usually included.

About the author

Rasa Sosnovskytė is Head of SEO at Oxylabs.io.

Unsurprisingly, having tons of data helps a lot if your primary source of work is reverse engineering a close-to-magic black box. Today we could scarcely imagine starting SEO from scratch. It’s now an intertwined web of data, practical experience, and assumptions. Mostly data, though.

But how did we get here?

According to the Search Engine Journal, SEO, as a practice, began sometime in the late 1990s. The fun genesis story is that the manager of Jefferson Airplane (a rock band) was unhappy that their website was on the 4th page rather than the 1st. The boring genesis story is that the words “search engine optimization” were first used by John Audette, the owner of Multimedia Marketing Group, as part of some marketing strategy.

Whichever we choose to believe, SEO back then wasn’t all that similar to its current form. Search engines had yet to achieve dominance and the landscape of the internet was still peppered with human-edited catalogs such as Yahoo Directory or DMOZ.

People, as the era would call it, would “surf the web”, going from one website to another. Finding what you were looking for would take more than just a few clicks. As a result, a lot of then-SEO revolved around getting websites into catalogs the right way.

Eventually, search engines replaced the human-edited catalogs. While some of the latter were still being updated in 2017, it has been a much longer time since I’ve heard of anyone using them. I doubt you have, either.

There was one problem, though. Human-driven cataloging is fairly predictable. Search engines not so much, especially when the inner workings are a closely-guarded secret. SEO would become less of an accounting and more of an engineering activity.

SEOs were in luck, though. In the early days, search engines weren’t nearly as complex. Some may remember the days when putting in some keyword ten zillion times would make the page appear on the first page of SERPs. Such ranking algorithms were swiftly replaced by more sophisticated ones. Namely, Google’s PageRank in the early 2000s.

Google’s introduction into the scene was a step towards true complexity. Websites were now being rated on the amount of incoming and outgoing links instead of just keywords - something Google admits has been lifted from the academic world. Of course, that’s not very difficult to abuse either. And that's exactly what happened.

Since then, the ranking algorithms have only increased in complexity with Google divulging less and less about them. On rare occasions, John Mueller, Google’s Search Advocate, emerges from the alien-technology-driven black box to answer some questions, mostly in a vague fashion.

As a result of the constant changes, updates, and tweaks, ranking today is influenced by a humongous number of factors. Outsiders might even feel that SEO specialists speak a different tongue. No wonder there was a “Level 9001 SEO Wizard” job title trend back in the day.

But if ranking algorithms quickly escalated in intricacy, how did SEOs keep up with the trends? They kept up, mostly, through reverse engineering. SEOs shared knowledge between themselves, tested hypotheses, and wrote their findings on blogs.

A revolution happened around the 2010s when web scraping began to be more ubiquitous. “Household” names such as Ahrefs and Mangools were founded with the promise that the technology would change SEO.

The all-seeing eye

When you have to figure out how something works without having inside knowledge, the best way to do it is to attempt to break it. If that’s not possible, lots of data, analysis, and conclusions is the second best way.

That’s what a lot of SEOs did. Following in the footsteps of Google, developers created the web scraping and crawling tools that collected and indexed vast swaths of data.

Put simply, web scrapers are simply applications that work through some set of URLs (some of which they may acquire themselves), download the data, and present it in a readable format. Data is being collected from various sources, starting from the homepages of websites and finishing with SERPs themselves. The latter are sometimes referred to as SERP scrapers.

SERP scrapers are used more frequently than most SEO specialists expect. While smaller marketing agencies look elsewhere for data, a lot of the tools that are being used to develop insights daily are reliant upon SERP scrapers. According to our data, our SERP Scraper API usage has been growing steadily and our current YoY requests grew by +36%.

The idea is quite brilliant, really. SERPs are the best objective metric available. While Google might shake the results up once in a while, they mostly remain static, unless something about the website changes. That’s when you want to be looking at the results.

Unsurprisingly, SERP scrapers provide access to an all-seeing eye. Small shifts are noticed and sent to some cold, dark place to be analyzed. In turn, SEO tools provide recommendations based on the data collected and specialists write long essays about them.

And so, the concerted effort to reverse engineer a black box continues into eternity (or, at least, into the foreseeable future). Yet, scraping delivered a boon to SEOs. So much so that the better part of the profession revolves around scraping. You won’t find many SEOs that have no dedicated tools. Without scraping those tools couldn’t exist.

Conclusion

SEOs and search engines are engaged in a constant, but rather friendly, tug-of-war. The former always tries to figure out the newest changes in ranking algorithms. The latter make ranking algorithms more complex over time, partly to deliver better results, partly to avoid the potential for abuse.

Unfortunately for most SEOs, the tugs from the search engine side are usually fairly substantial. The only thing letting them survive are Skynet-esque robots (only good) that make predicting and recovering from tugs easier.

TOPICS