Crawler Policy

Posted: 13 July 2004 01:14 PM

[ Ignore ]

Newbie

Total Posts: 11

Joined 2010-09-28

One thing which is essential for any crawler-based search engine is to find a reasonable policy for your crawler.

Crawlers can consume a lot of ressources from other people, so you have to be careful what you do. The most efficient way for a crawler to fetch documents would be to start with the robots.txt file, analyze it and then start downloading all documents from this host as fast as possible. Thus time for crawling could be minimized, because you only have to download the robots.txt file once and DNS requests are also at a minimum.

Obviously the webmaster of the host wouldn’t be too happy about such a crawler policy. So I’m doing it like this: The crawler gets a junk (about ten or twenty) of URLs from a single host in a row. It fetches the appropriate robots.txt file and downloads all of the ten or twenty URLs from that host with a minimum time lag of one second between every request. Are those URLs downloaded the crawler contacts a different host.

I found that this policy is a good compromise between DNS and robots.txt traffic and being nice to the web hosts out there.

Signature

http://www.neomo.de - die Suchmaschinen-Alternative (Testversion)

runarb

Posted: 15 July 2004 12:31 AM

[ Ignore ] [ # 1 ]

Administrator

Total Posts: 41

Joined 2010-09-28

It fetches the appropriate robots.txt file and downloads all of the ten or twenty URLs from that host with a minimum time lag of one second between every request. Are those URLs downloaded the crawler contacts a different host.

Does the crawler have to sit around waiting for the 1 sec delay, or is it crawling other url’s in between?

How many url’s can you ca crawl a day using this setup? The Boitho crawler can crawl ca. 1,2 million url’s pr. day on a 2.4 gz laptop with 256 mb ram and 2 mbit ADSL line.

Fischerlaender

Posted: 15 July 2004 02:40 AM

[ Ignore ] [ # 2 ]

Newbie

Total Posts: 11

Joined 2010-09-28

Does the crawler have to sit around waiting for the 1 sec delay, or is it crawling other url’s in between?

The crawler isn’t sitting in the mean time, it is sleeping.
Because every crawler consists of several crawling processes, this doesn’t hurt the perfomance too bad - if any.

How many url’s can you ca crawl a day using this setup? The Boitho crawler can crawl ca. 1,2 million url’s pr. day on a 2.4 gz laptop with 256 mb ram and 2 mbit ADSL line.

It crawls about 700,000 URLs a day on a Celeron with 1500 MHz and 512 MB. The bandwith isn’t the bottleneck, it’s the hardware the crawler is running on. But because crawling is very easily parallelized, I did not put too much effort into performance improvements.
I should state that the crawler isn’t just crawling, it is also extracting links from the crawled pages and is even parsing the HTML code on-the-fly.

Signature

http://www.neomo.de - die Suchmaschinen-Alternative (Testversion)

Generalservices

Posted: 17 March 2016 07:16 AM

[ Ignore ] [ # 3 ]

Sr. Member

Total Posts: 805

Joined 2016-03-16

Life following a Chapter 7 bankruptcy is not as daunting as many would have you think. In fact, most of the negative information out there is spewed by the the creditors that lose out when you file for bankruptcy protection. skincarepearls.net

Honesty and integrity are very important to the building of society. These cornerstones build prosperity and peaceful social order. Dishonesty and greed are destructive forces that grab society down. fashionismydrug.org

Mongolia has a significant proven area for coal mining sector. In Mongolia the annual coal production is approximate about 5 million metric tons. Almost 85% of this production is used for sauna and electricity generation. healthbeautycare.net

During the last decade Taupo in New Zealand has continued to attract new house contractors and prove that not all small towns are heading downward. Read this article to see three reasons why Taupo continues to grow both in population size and in the economy. mensmeds.net

There are some points you can follow to build a successful lawn care business. Once you know these steps, it is important to constantly remind yourself of them or you may find yourself declining and slowly lose customers. interior2han.net

Home health service today offers so much to patients who are recuperating from surgery or illness. These services offered allow patients to recover comfortably in their own home, without thinking families who can’t provide many of those services. fatburningfurnacereport.org

In this article I explain research I conducted in order to understand why people believe balanced diet is a lot more expensive than unhealthy food. I examine the data I conducted and form a belief based on the results from my sample. insurancehealthquote.org

Narrowly defined, “moderate alcohol consumption” reduces health risk from heart problems. But it raises risk for various cancers, hepatitis, weak bones, immune suppression, accidents, and suicide. They don’t tell you that part. intuitionandhealing.org

The second half of getting sick is fussing with the health insurance claims process. I have some tips to pass-on from my own health claims experiences that might help you navigate this maze. dailyhealthtrends.net

Changes in the health care environment are here. As physician leaders, are you ready to build strong, flexible health care teams that will thrive? Three strategies are presented in this article that will position you and your team to ensure. salesmanagementsystems.net

Many homeowners are removing high needs grass from their yards and opting for a fun garden instead. If you are not ready to give up on your green carpet, here are some tips for a a healthy green, marijuana free lawn. cheapclothesshoppingonline.net

Generalservices

Posted: 17 March 2016 07:16 AM

[ Ignore ] [ # 4 ]

Sr. Member

Total Posts: 805

Joined 2016-03-16

Education for Sustainable Development (ESD) is a rather new field of education. We can see it as an innovative kind of future education for schools linking the child’s development with the future challenges of society. ehealthplanners.net

There are a small group of young Americans who are already taking advantage of education overseas. They gain credits toward their college degree at a cheaper price, get to experience the life overseas and travel around Asia. heavenhomes.org

Learn here why large businesses, small ones and professionals all are taking advantage of cloud business computing for a variety of reasons. The fog up system is described and predictions are made for its future expansion. gloucesterplumbing.net

Law is not the beginning place, nor is it the ending place when it comes to ethics. Our meaning ideas often shape the smoothness of our laws. Criticism of laws and proposals for change in our laws are often based on a shared meaning ideal that’s not being achieved. weightlossshakeshq.org

This article is obviously directed at men but ladies if you want to comment then feel free! Understand what it is to be a man! I’m not talking about what you learned from your dad or your friends, what I am talking about is how you as a man project yourself to others. prolawteam.net

Last year the UK winter caused a big freeze, this became unusual, as most winters in the uk are given by mild westerly winds from the Atlantic sea, but the winter months of 2009/2010 was to be much not the same as normal. clubsimenahotel.net

Fashion has a rippling influence on people from different walks of life. It gradually gains popularity among different strata of society. To help fashion contact the consumers, a large number of professionals are essential. certifiedhealthcoach.net

Shopping for camping chairs may seem simple enough, but knowing the best places to look will help you find the one you want while saving you both time and money. Hint: don’t start your search at the supermarket! marketyourpractice.net

Food shopping on a budget? When you enter any supermarket, you will fall victim to marketing. Marketing is simply the well researched science of what makes people buy. Don’t fall into the marketing trap. Here are 14 smart tips to help you save 10—15% of your money when food shopping… beautyandstyling.net

The Pappa Rocking Beach Chair is where the beach meets the veranda! So if your buying a beach chair that rocks likes a porch chair and is light, portable and folds up flat, then this beach chair is the one for you! safeguardfinancialusa.com

Generalservices

Posted: 17 March 2016 07:16 AM

[ Ignore ] [ # 5 ]

Sr. Member

Total Posts: 805

Joined 2016-03-16

If you are like me and love designer clothes at discount prices then you will find this article useful. It is full of tips on where to shop in New york city and how to get Designer labels at bargain prices, as well as on how to get around the city. firstcallservices.org

Shopping for gifts hasn’t been easier. This is very true if you go online to go for your gifts. Hundreds of thousands of merchants can market really beautiful, quality gifts at reasonable prices and you can shop from the comfort of your living room and laptop! veghealth.net

Everyone is buzzing about going green these days and the home is one of the best places to start. Since we spend so much time within our homes, it just makes sense to start being more eco-friendly there. Here, we’ll look at five ways you and your family can start going green with an eco home. seniorsinsuranceadvice.net

So what’s it like to own part of a home in a corner of the largest wine-producing region in England? Imagine some of your finest wines and similar descriptions come into play: history, depth, intricacy, uniqueness of character. How sweet it is. shopfisher.net

If you’re to buy a property on the beachfront, wouldn’t you choose the place that combines an expanding role on the international scene with accessible prices? If you said yes, consider a home or residence in Cancun. lyfood.net

Now that we’ve covered IT hardware, computer security and disaster recovery (please see parts 1 & 2 of this series), let’s look at the vast range of other bits and pieces that you need to make things happen quickly and efficiently in your home business. betaeducation.net

Luxury real estate in Victoria, BC, is truly some of the very best on earth. Macleans journal selected Victoria as the number one city in Canada in which to relocate a family business. Conde Nast Vacationer magazine voted Victoria the best over-all city in Canada for its environment and ambiance. childrensoralhealthohio.net

Dramatic changes have taken place in the housing market in Marbella. In fact over the full Costa del Sol. Property here has always been in demand by Upper Europeans as the climate is very warm and sunny most of the year and the cost of living has been historically low. myattorneyatlaw.net

Buyers and sellers, please be sure to entrust this transaction to a full time real estate professional with experience and a track record. Someone who is in a position to give you expert advice on homes in Summit NEW JERSEY or your market area. buddha-yoga.net

During the past week, the world has witnessed a series of events that influenced the various sectors of the growing global economy. On one hand, some events promised more developments and expansion not only in the international trade but also in specific countries. ansaricapital.com

‹‹ Operating Systems for running a Search Engine Threads on Anchors and Geo-Tags ››

Products

Demo

Documentation

About

Community