Yandex caught scraping Google SEO code

Someone using a laptop for SEO analysis.
(Image credit: Pixabay)

As TechRadar Pro reported earlier in January 2023, a former Yandex employee with a “political” motive has allegedly leaked a wide-ranging repository of source code for many of the web portal’s products, potentially shedding light on the dark art of search engine optimization.

BleepingComputer reports the employee leaked git sources totalling 44.7GB of files, containing “all of” Yandex’s source code except for its anti-spam rules, that were obtained in July 2022.

The raw source code won’t be of interest to everyone, Search Engine Land's report that 17,854 search ranking factors have been uncovered as part of the leak should be of interest to any person, business or publication looking to see their pages ranked highly in search engines.

Yandex leak SEO insights

A partial list of factors ranked by the Yandex search engine from one file in the codebase, shared by CEO of SEO consultancy MOG Media Martin MacDonald, does shed some light on the aspects of copy that Yandex applies weight to. 

Per Russian Search News, these include PageRank and several aspects of links such as age and relevancy, the perceived relevance of copy, host-reilability, and innate preferences towards specific sites with perceived authority, such as Wikipedia. 

A deeper, longer, more technical dive by Search Engine Land also shows that this priority also includes a “NEWS_AGENCY_RATING”, allowing Yandex’ search engine to show preference to certain news organizations.

Others include the number of unique visitors, percentages of organic traffic, and average domain rankings across queries.

However, it’s perhaps melodramatic, or a little desolate, for MacDonald to describe it as “the most interesting thing to have happened in SEO in years.”

While the leaked codebase certainly offers a raft of insights, it’s worth noting that many websites will be looking to rank well on Google over Yandex, purely because the former is far better known. 

Both companies have shared web engineers over the years, Yandex does use many of Google’s open source technologies, such as TensorFlow and BERT, and references to Google data appear in the leaked codebase.

However, Search Engine Land’s deep dive argues that the Yandex leak can give general insight into the anatomy of a modern search engine, but, per Russian Search News, many of the Yandex’ leaked ranking search factors go unused, or are officially considered depreciated. 

Even the technical deep dive admits many of Google (the search engine’s) known aspects, such as its crawler and index systems, differ from Yandex’.

All of this, combined with the age of the leaked codebase, makes it unclear as to how assumptions over how Yandex and Google may both rank pages will fare.

TOPICS
Luke Hughes
Staff Writer

 Luke Hughes holds the role of Staff Writer at TechRadar Pro, producing news, features and deals content across topics ranging from computing to cloud services, cybersecurity, data privacy and business software.

Read more
A person typing on a laptop to check battery life
How Google's new anti-scraping measures are forcing an industry evolution
An image of network security icons for a network encircling a digital blue earth.
Surviving Google’s JavaScript rendering shift: one month later
Perforator flame graph
This open source tool could save Google, Microsoft, billions by cutting CPU resources by 20% but even small businesses can benefit
Google Search
Google's stronghold on search is loosening ever so lightly, report finds, but don't expect it to crumble down overnight
Bing
Google fires back as Microsoft is accused of 'tricking' people into using Bing
Shadowed hands on a digital background reaching for a login prompt.
Private API keys and passwords found in AI training dataset - nearly 12,000 details leaked
Latest in Software & Services
woman listening to computer
AWS vs Azure: choosing the right platform to maximize your company's investment
A person at a desktop computer working on spreadsheet tables.
Trello vs Jira: which project management solution is best for you?
Autonomous finance
Quickbooks vs Quicken: what are the main strengths and weaknesses for your business
finance
Quickbooks vs Xero: which is the best for your business?
Group of people meeting
Zoom vs Google Meet: which is the best video conferencing tool for your business?
Fingers typing on a computer keyboard.
Microsoft 365 Personal vs Microsoft 365 Family: are there any real differences?
Latest in News
Pebble smartwatch countdown
Pebble confirms its smartwatch announcement is just hours away
Logo of YouTube Shorts
Is YouTube auto-playing Shorts when you open the app? Well, you’re not alone - here’s how to fix it
Google DeepMind panel discussion
“More sovereignty and protection” - Google goes all-in on UK AI with data residency, upskilling projects, and startup investments
Nintendo Switch 2
Nintendo Switch 2 expected to have AI upscaling and I can't wait to finally play Tears of the Kingdom with upgraded graphics
PowerColor Red Devil AMD RX 9070 XT graphics card shown side-on
Your next GPU could be from AMD, not Nvidia, if Team Red’s success with PC gamers continues
Intel Lunar Lake concept
Intel's Panther Lake processors won't arrive until Q1 2026 - corroborates previous delay rumors despite former Intel CEO's promise of 2025 launch