SnykeBot

Snykebot: Snyke’s Web Monitoring Bot

Please note that our robot is still under development.
Some tests are currently performed
and this is the reason why you will see our bot hiting your website.

Snykebot is Snyke’s web-monitoring robot. It collects documents from the web to allow further analysis and provide our Snyke service. Once the documents have been checked they are deleted from our databases. On this page, you’ll find answers to the most commonly asked questions about how our web bot works. You can see the full list of bots here -> http://www.robotstxt.org/db.html

As one of the premier monitoring service on the web, Snyke focuses on providing the highest-quality service for our own users and for corporate partners. You can learn more about Snyke via our website, or you can register for Snyke service right here:

Frequently Asked Questions

How often will Snykebot access my web pages?
How do I request that Snyke not crawl parts or all of my site?
Why is Snykebot asking for a file called robots.txt which isn’t on my server?
Why is Snykebot trying to download incorrect links from my server? Or from a server that doesn’t exist?
Why is Snykebot downloading information from our “secret” web server?
Why isn’t Snykebot obeying my robots.txt file?
Why are there hits from multiple machines at Snyke.com all with user-agent Snykebot?
Why is Snykebot downloading the same page on my site multiple times?
What kinds of links does Snykebot follow?
My Snykebot question is not answered here. Where do I send my question?

Answers:

How often will Snykebot access my web pages?

For most sites, Snykebot should not access your site more than once every few hours on average. Since network delays are involved it is possible over short periods the rate will appear to be slightly higher. If you find that we are placing too high a load on your site, please let us know by sending us your question at our contact page.

How do I request Snyke to not get parts or all of my site?

Snykebot will never crawl your site entirely. It only checks for specific pages that our users consider useful to them and that they want to monitor. It is not a requirement for Snykebot to comply with the spider’s standard. Snykebot will act as a regular user and will get some of your pages on a periodic basis.

However, robots.txt is a standard document that can tell Snykebot future versions not to download some or all information from your web server. The format of the robots.txt file is specified in the Robot Exclusion Standard. When deciding which pages to crawl on a particular host, Snykebot will obey the first record in the robots.txt file with a User-Agent starting with “snykebot”. If no such entry exists, it will obey the first entry with a User-Agent of “*”.

There is a standard for robot exclusion at http://www.robotstxt.org/wc/exclusion.html#robotstxt. You can put a file on your server called robots.txt that can exclude Snykebot or other “web crawlers.” Snykebot has a user-agent of “SnykeBot”.

Why is Snykebot asking for a file called robots.txt which isn’t on my server?

robots.txt is a standard document that can tell Snykebot not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard. If you just want to prevent the “file not found” error messages in your web server log, create an empty file named robots.txt.

Why is Snykebot trying to download incorrect links from my server? Or from a server that doesn’t exist?

It is a property of the web that many links will be broken or outdated at any given time. Whenever anyone asks our bot to monitor a misspelled link that points to your site, or fails to update their configuration to reflect changes in your server, Snykebot will try to download an incorrect link from your site. Also, this is why you may get hits on a machine that is not even a web server.

Why is Snykebot downloading information from our “secret” web server?

It is almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your “secret” server to another web server, it is likely that your “secret” URL is in the referrer tag, and it can be stored and possibly published by the other web server in its referrer log. So, if there is a link to your “secret” web server or page on the web anywhere, it is likely that web users will find it and then tell Snykebot to check these pages.

Why isn’t Snykebot obeying my robots.txt file?

It is not a requirement for Snykebot to comply with the spider’s standard. Snykebot will not crawl your website. It will only fetch one (or several) page on a regular basis.

Why are there hits from multiple machines at Snyke.com all with user-agent Snykebot?

Snykebot was designed to be distributed on several machines to improve performance and scale as the web grows. Also, to cut down on bandwidth usage we would like to run many bots which run on machines close to the sites they are monitoring in the network. Also, Snykebot might use dynamic IP address.

Why is Snykebot downloading the same page on my site multiple times?

Snykebot is interested in dynamic websites and checks for changes within the page. Snykebot gets less pages than classical bots, but might come more often to read specific pages of your site.

Also, Snykebot will only get the textual part of your pages. Images are not retrieved so that we can cut down our bandwidth usage. If you still think Snykebot is not “polite” please contact us at : snykebot(@snyke.com)

What kinds of links does Snykebot follow?

Snykebot follows manually entered links. It does not crawl web sites.

My Snykebot question is not answered here. Where do I send my question?

Please send questions regarding our Snykebot technology via our contact form.