AI Web Crawlers Are Ruining the Internet

Jason Dookeran

AI web crawlers seem like a great idea on paper. Who doesn’t want a web crawler that can automatically index things and dynamically adjust its SEO rules? While this seems like a dream, the overhead is killing web pages and frustrating system admins.

What Are AI Web Crawlers?

Web crawlers, also known as web spiders or bots, are automated programs designed to browse the internet and gather information from various websites. They systematically visit web pages, read their content, and index relevant data for search engines like Google. By following links from one page to another, crawlers ensure that search engines have up-to-date information, allowing users to find the content they need quickly and efficiently. This process is essential for maintaining the functionality of search engines.

In addition to search engines, companies use web crawlers for various purposes, including data analysis and market research. These bots can collect information about competitors, track prices, and gather user-generated content. However, not all crawlers operate responsibly; some may ignore website guidelines or overload servers with excessive requests. So, if web crawlers are so important in our digital infrastructure, how can making them better with AI be a bad thing? It all comes from the impact these AI web crawlers have on the back-end infrastructure of websites.

How AI Web Crawlers Overload Servers

Overloaded plugs with flashing electric sparks. — WIROJE PATHI / Shutterstock.com

When any entity visits a website, it generates a series of data requests. Normally, a web server can handle thousands of these requests and not break a sweat. Traditional crawlers usually stagger their requests to websites, ensuring that they don’t overload and crash the servers. AI web crawlers, on the other hand, don’t take the server’s limitations into account.

AI web crawlers usually access the same content repeatedly, and instead of caching the content, they stream it through several filters to build a picture of what’s on the website. Moreover, they tend to ignore the instructions in the robots.txt file, indexing pages that the website doesn’t want to be indexed.

Typically, web crawlers use the User-Agent header to identify themselves. AI web crawlers usually don’t, making them even harder for websites to detect and block. Website system administrators are having a hard time limiting these AI web crawler requests and have to rely on reverse DNS lookups to figure out which requests to block.

How AI Web Crawlers are Destroying The Internet From the Inside Out

Why are AI web crawlers such a menace? It comes from how they overload web traffic on pages. When a traditional web crawler indexes a page, it usually sends a single request and collects data based on that request. AI web crawlers can send as many as sixty (or more) requests for the same web page, causing the server to hang as it processes all those requests.

When these requests hit the server and get swamped, things start moving slowly. Users start getting 503 Forbidden messages from the server as the bots are sucking up all the resources. Larger websites and expensive hosting packages can easily handle this load by redirecting resources. But the couple that just spun up a hobby WordPress, over the weekend? Nope, that site is going to crash.

Why Are There So Many AI Crawlers?

Lucas Gouveia / How-To Geek

Search engines still use traditional web crawlers since they’ve perfected their algorithm using these tools. So, where do the new AI web crawlers come from? This has a lot to do with the AI tech bubble that’s been taking the world by storm. Most startups are looking for unique and exciting ways to use AI, and putting them in web crawlers to siphon data from the open internet is a good start.

AI-powered web scraping is a game-changer for the wider world. From a business perspective, fewer resources are needed to collect relevant insights about potential customers. From a systems admin point of view, it means that their websites will be slammed with traffic, taking their data and giving them nothing in return. It’s a lose-lose exchange for small online businesses.

These small businesses stand to lose the most. By using AI web crawlers to search their pages, larger companies can extract insights about their customers and tailor products to cater to them. The result is that these small businesses can’t compete against the onslaught of AI web crawlers. Their sites go down, making them look unreliable. All the while, their data is being siphoned away.

There’s also a knock-on effect for buyers like you and me. Once products appear on larger websites, many consumers abandon smaller stores, relying on shipping and same-day delivery from larger retail suppliers. The result is smaller stores close down, leaving us with fewer choices. When there’s only one place to get what you want, you have to pay whatever price they offer you.

How Webmasters and System Admins Are Fighting Back

Luckily, all is not lost as yet. Some system admins are fighting back. Quite a few AI web crawlers avoid the robots.txt file, but for those that don’t, webmasters are excluding pages that could give those AI models the most data. Other webmasters are stopping user-agent searches, impacting their SEO score but making their sites more usable for you and me.

Another strategy is using CAPTCHAs, which require users to solve a challenge before accessing specific parts of a website. This deters less sophisticated bots while allowing legitimate users to navigate without difficulty. Webmasters also monitor server logs to identify and block troublesome bots that ignore guidelines. By combining these methods, webmasters and sysadmins can safeguard their websites and promote a healthier online environment focused on user experience.

AI Web Crawlers Are Making The Internet Into a Mess

As someone who knows how powerful AI is and has extensively used it in my own projects, I know how useful it can be. However, there’s always bad to go along with the good. AI web crawlers are a sign of a deteriorating internet. These agents collect and parse data, then use it to develop generic, unhelpful articles that seem interesting on the surface but offer no real benefit to us readers.

The battle between system admins and AI web crawlers might be the most important battle of the modern internet, yet few people see or hear about it. This could even be bigger than YouTube and its struggle against ad blockers. As an avid user of the internet, I hope the system admins win, and I can go back to reading interesting articles written by real people.

Source link

Welcome to Liberty Case

Welcome to Liberty Case

Welcome to Liberty Case

Forever

Recommended

1-Year

1-Month

Forever

Recommended

1-Year

1-Month

Welcome to Liberty Case

Become a member

Alia Bhatt’s Jigra inches closer to cross Rs 20 crores at BO | Filmfare.com

‘Canadians are at serious risk’: Jagmeet Singh calls for sanctions against Indian diplomats and ban on RSS amid diplomatic row | India News –...

Aguirre praises Jiménez after Mexico’s USMNT win

GOM’s Upcoming Meeting to Discuss GST Rates for Food, Textiles, and Footwear

Alia Bhatt’s Jigra inches closer to cross Rs 20 crores at BO | Filmfare.com

‘Canadians are at serious risk’: Jagmeet Singh calls for sanctions against Indian diplomats and ban on RSS amid diplomatic row | India News –...

Aguirre praises Jiménez after Mexico’s USMNT win

GOM’s Upcoming Meeting to Discuss GST Rates for Food, Textiles, and Footwear

AI Web Crawlers Are Ruining the Internet

What Are AI Web Crawlers?

How AI Web Crawlers Overload Servers

How AI Web Crawlers are Destroying The Internet From the Inside Out

Why Are There So Many AI Crawlers?

How Webmasters and System Admins Are Fighting Back

AI Web Crawlers Are Making The Internet Into a Mess

‘Canadians are at serious risk’: Jagmeet Singh calls for sanctions against Indian diplomats and ban on RSS amid diplomatic row | India News –...

The Revitalign Slippers With Arch Support a Podiatrist and I Love

‘Canadian Sikhs stalked by fear’: MP Jagmeet Singh seeks ban on RSS, sanctions against Indian diplomats | Today News

Subscribe for exclusive content

Categories

Welcome to Liberty Case

Welcome to Liberty Case

Welcome to Liberty Case

Subscribe to Trendy Voice

Forever

Recommended

1-Year

1-Month

Subscribe to Liberty Case

Forever

Recommended

1-Year

1-Month

Welcome to Liberty Case

Become a member

AI Web Crawlers Are Ruining the Internet

What Are AI Web Crawlers?

How AI Web Crawlers Overload Servers

How AI Web Crawlers are Destroying The Internet From the Inside Out

Why Are There So Many AI Crawlers?

How Webmasters and System Admins Are Fighting Back

AI Web Crawlers Are Making The Internet Into a Mess

Subscribe for exclusive content

Categories