Monday, September 13, 2010

Search Engine Spam

Spam is usually associated with email -- and for a few folks a pork-based meat cube in a can. We all agree that email spam is a bad thing. And email programs around the world are taking more and more steps every day to exclude spam.

There is such as thing as search engine spam as well. In email spam, the spam is volumes of un-wanted emails. In search engine spam, the spam is volumes of un-wanted content on a web page. If you let this notion of search engines such as google defining something like "un-wanted content on a web page" you might get a little annoyed at the prospect of google (or anyone else) trying to tell me what I can and cannot put on my website.

Here is the deal. At the moment, google is not making comment about all the content on your website. Google is making comment about that content on your website that is there specifically to influence google's page rank of your website and to influence how high in the search engine results page (SERP) your website appears.

So what is search engine spam?

Sorry there is no simple definitive answer. Every search engine has a set of constantly evolved rules used to define what constitutes search engine spam. Basically search engine spam is whatever programmers working for the search engine companies define it to be today. Today's rules may be different than yesterday's and tomorrow's rules may be different than today's rules. Hence a definitive answer is not so simple.

But, if we look toward a general definition of search engine spam the answers are very simple. There is widespread agreement among search engine professionals as to the basics of what constitutes spam.

The most obvious and agreed upon search search engine spam tactics are:
  • hidden text
  • mirror sites
  • doorway pages

Hidden text

Hang on to your hats this part might get tricky. Hidden text is text that is hidden. There you have it.

If you have text that human readers of your website cannot see, then search engine programmers assume that you are doing something tricky directed at the search engine robots. This might seem a bit egocentric on the part of the search engine companies. But, the internet is their world and we only live in it. Also, I suppose that fact that zillions of black hat and grey hat web designers have used hidden text in exactly this way might explain why the feel it is directed at them.

How can text be made invisible to humans -- but viewable by robots?

  • white text on a white background
  • css style sheet tricks
  • microscopic font sizes
  • floating images over the top of the text
Text hidden in this way is usually just a bingy-bunch of keywords -- keywords repeated, and keywords in multiple variations. Some webmasters use this trick to accomplish keyword stuffing without making the page look ridiculous to a human reader.

If you try tricks like this you might get away with it for a short while, but eventually the search engine geeks will find a way to detect your tricks and then slap you silly for trying to out-smart them.

Mirror sites

A mirror site is a site that mirrors another site -- basically a duplicate site.

Now it is time to worry a little more about being accidentally slapped down. Ever noticed how godaddy and other registrars encourage you to register the .net, .org, .cc, .info, and dot everything as a means of protecting your branding? If you register the .com and .net for the same domain name and use a httpd.conf referral so that both names point to the same website you run the risk of having a search engine interpret this as mirror sites. They might think you are doing this as a trick or gimmick.

Before we continue along this line of discussion let's look at the types of mirror sites that the search engine manipulators use. SEM will duplicate (mirror) a website's content so that they can attach different keywords to the same content. One site will display the content with a particular set of keywords and another site will display the exact same content with a different set of keywords.

On the face of it, this is not a bad idea. Because different demographics search for the same (or similar) items using different keywords it makes sense to have different sites that are directed to each of the target demographic groups. Search engines do not want you to do this -- it clutters up their SERP pages. The search engine result pages are their pages and they do not like people doing stuff that clutters them up or has a negative impact on user experience of their pages.

Once upon a time search engines were a collection of links to internet websites indexed so that users could search for specific content. These listings were monetized by placing ads and other services on the margins. Now, however, search engines have become content.

Let me give you an analogy. A library has an index card catalog giving reference to the various books contained within its shelves. A library will typically exert no influence on the content of the index. An editor assembling an anthology of short stories does something very different. The editor will incorporate the stories directly in the anthology book. Thus the editor views the stories as content and will be very selective about which stories appear in the anthology. This is natural since each story included in an anthology reflects upon the anthology as a whole. That is the editors job.

Well, once upon a time search engine companies were like librarians. Somewhere along the line these search engine companies have developed into editors that view the presented pages as content not a card catalog.

Even if you think you have valid reasons (i.e. demographic targeting) it is best to never mirror your site. And, if you have registered multiple domain names as a branding ploy or a means to protect your trademark, do not point those domains toward the same website. It is risky. Search engines might be smart enough today to tell that you are using different names for the same content for legitimate reasons. However, in the future there could be a glitch that causes your site to look like you are "up to something." Also, having multiple links to the same content will dilute your page ranking. So don't do it.

Doorway pages

In order to define what a doorway page is, let's look at an example of why one might want to use a doorway page.

Let's say you have a product to sell. Let's say that your marketing guys have created a killer web page with with a super high conversion rate. I mean this page is a total killer. Anyone that gets to this page is totally compelled to buy. That is a very good thing. But, what if this page totally sucks as getting placement in the search engine rankings?

So now you have a page that converts like a king, but the page is not optimized for search engine placement. A natural idea would be to make a page that is optimized well for the search engines that will act as a doorway between the search engine and your killer sales page. This is a doorway page.

Problem is, search engine manipulators have used doorway pages in an abusive manner. Consequently, search engines now penalize websites that use this tactic. Not to worry though. If you truly have a killer conversion page just use ads to direct traffic to your page.

Please note, one of the methods used to mark a doorway page as a doorway page is website internal links. If you have a page with zero links to it from the rest of your website it starts to look like a doorway page.

The rule of thumb is "if you can't access a page from the interior of your website, it looks like a doorway page."

In conclusion

The search engine world is changing. It is your responsibility to look squeaky clean to the search engine robots. If you deliberately, or accidentally, use search engine spamming techniques you will be labeled as a search engine spammer and your page ranking will suffer the consequences.

Some of these so called tactics are innocent -- such as accidental mirror sites. But, search engine robots have little ability to detect your intention. As far as the search engine robots are concerned, if it looks like a duck, walks like a duck, quacks like a duck then it is a duck.  It is your job to make sure you do not look like a duck.

If multiple urls point to the same content, then take down all but one.

If you have registered multiple domain names for the same website,  drop one of the names and use a permanent redirect to inform search engines of the change.

If hidden text has crept into your website, for legitimate reasons or not, remove it. You will eventually be slapped down.

A final thought

There are more than a few sensible marketing ploys that can be misinterpreted as spamming. Be careful. You will never receive a warning, you will never be issued a citation, you will never have access to a defined system of arbitration. Your pages will simply be slammed. It is your job to not only stay within the spirit of the law, but also within the appearance of the law. It's not our world, we just live in it.