Hi Jill,

I have a question for you.  I know you are busy, so I’ll try to keep it to the point.

A lot of my clients pull data/products from semi-public databases to populate their websites, similar to how real estate agents show listings of homes on their sites.  I’ve been ensuring that each client has unique valuable and professionally edited content whenever possible on the rest of the site, but I’m afraid that if I make the portions of the site that use the semi-public data accessible to the search engines, they will find duplicate information on other sites and my client site(s) would not be indexed.

So, based on that “fear,” I have blocked off access to the robots (as much as can be done) to avoid them indexing the pages that have these data feeds and the corresponding details.  

So my question is...

Should I go to the extra length to derive unique content to each product or would I just be spinning my wheels due to the engines detecting the similarity of the page
s anyway?


Thank you for your time and I have always enjoyed your articles.

Regards,
Alexander

Jill's Response

Hi Alexander,

So many people have a misunderstanding of the whole duplicate content issue.

It’s fine to allow the search engines to index that content. The search engines wouldn't drop or refuse to index an entire site just because some pages had information that was also contained on other pages.  That's a common scenario that they know how to deal with appropriately (for the most part).  

The worst that will happen is that the search engines simply won’t index just those particular duplicated pages, or if they did, that the indexed pages wouldn't show up in the search results for their optimized keyword phrases. However, if you block the search engines from indexing any of the content via robots.txt, they definitely won’t index that content.

What I would recommend is wrapping your own unique content around the database-pulled content.  In other words, you’d add some copy before the listings (or whatever happens to be in the data feed), and perhaps after the feed info as well.  This would provide you with the best chance of having those pages indexed and possibly showing up in the rankings for the keyword phrases for which you choose to optimize.

Hope this helps!
Jill