Posts Tagged ‘ caffeine

Google to Upgrade its Memory? Assigned Startup MetaRAM’s Memory Chip Patents 20 November 2009 at 1:40 pm by admin

In August, the Official Google Blog announced an upgrade to Google’s infrastructure code-named Caffeine, aimed at making the search engine faster, and Google opened the system up for testing to people who might want to provide feedback. An interview with one of the developers behind the upgrade described it as an upgrade to the [...]

See the original post here:
Google to Upgrade its Memory? Assigned Startup MetaRAM’s Memory Chip Patents

+ Ask the SEOs - SMX East 2009 By admin 06 October 2009 at 1:48 pm and have No Comments

Danny’s assembled a grizzled group of veteran SEOs. They’re running through intros. Here’s yours:

Moderator: Danny Sullivan, Editor-in-Chief, Search Engine Land

Speakers:
Greg Boser, President and CEO, 3 Dog Media
Bruce Clay, President, Bruce Clay, Inc.
Vanessa Fox, Contributing Editor, Search Engine Land
Todd Friesen, VP of Search, Position Technologies
Rae Hoffman, Owner, Sugarrae Internet Consulting
Stephan Spencer, President & CEO, Netconcepts
Aaron Wall, Author, SEO Book

Ask The SEOs

Q: The Canadian portion of our site is under a different root folder but has the same content as the U.S. portion of the site. Is that duplication a problem?

Vanessa: Is it on a .ca site?

No.

Vanessa: Oh…

Stephan: Did you sent the geographic region in Webmaster Central?

No.

Vanessa: Do that first. But you may want to move it to a .ca domain.

Aaron: And redirect the Canada folder to the new domain.

Q: How do you deal with content when it’s been scraped from your site?

Stephan: In the bio of the article, link to the original URL of that article. When it’s ripped off it will be linking to the original version. Try to get all the duplicates to point to the canonical version of that article.

Greg: If you have your site ripped off from a site with more authority, then that site will probably rank for it. But typically that’s not the case.

Rae: If a scraper site is outranking you, you have bad SEO.

Vanessa: File a DCMA.

Aaron: You can use it to your advantage as a name and blame.

Vanessa: Yeah, like, look at this big brand that obviously likes my content.

Bruce: Copyright register your content so you have recourse.

Q: Will bounce rate or time on site ever become a factor in the ranking algorithm?

Vanessa: If you’re talking about analytics data, and Google using analytics data to rank your site, it would be hard for Google to rank things that way because not everyone has Google Analytics on their site. If you think about why you want to rank on search engines, it’s to get people engaged on your site. High bounce rate is a signal that you’re not doing that. That’s a bigger issue to me.

Todd: There are sites where a user clicks through and gets what they want in 10 seconds, then hits the back button. There’s too many cases like this that Google could roll bounce rate into the algorithm. If you have a ridiculously high bounce rate, don’t worry about your ranking. Worry about why you have a ridiculously high bounce rate.

Rae: I would just assume that they use everything. They say they don’t use the analytics data, and maybe they don’t, but the things they can tell about your site are beyond anything we can comprehend. So just assume they know everything.

Greg: There’s a time when your page is around the 12 to 17 position and it’ll jump onto the first page sometimes. We call that the audition period. It’s when Google’s testing your site out in place of something else on the front page position. We’ve found that during that time, if you concentrate on improving bounce rate, it’ll stick on the front page faster.

Bruce: It would be an easily spammed factor.

Danny: Google has lots of ways to cross check the data, though. So they can tell if there’s behavior out of the ordinary.

Q: My site recovered from a Google penalty six months ago for spammy backlinks. Is it safe to launch a sub-domain now?

Unanimous: Adding content won’t hurt you.

Q: What do you think the value of sub-domains in terms of ranking power? In the last few months I’ve noticed that Google has devalued keyword value on sub-domains.

Vanessa: I think that value is mostly in the anchor text of links.

Todd: Think of why you’d use a sub-domain. I don’t think using it just to add another keyword is that useful.

Vanessa: When monitoring someone else’s site, it can be hard to know what really caused the effect you’re noticing. You don’t know what else is happening on the site.

Greg: Any large scale project we do, we typically sub-domain. You don’t want to be overly granular with it — engines don’t like that. It’s seen as spam. But if you have a site with a very large topic base, you’ll find you can rank for head-related terms by breaking top-level categories into sub-domains. It’s also good for brand protection. For your brand name it will give you more listings.

Todd: For your brand, you should have at least two listings: your site and an indented listing. With a couple sub-domains you can have 40 percent of the page.

Q: Any insights on Google Caffeine?

Danny: Google hasn’t really said anything interesting about it yet. They’ve said they’re trying some new crawling, maybe some new ranking things.

Vanessa: I know that Google said it’s primarily an infrastructure change, not a ranking change. Though some people I’ve talked to have noticed a ranking change.

Todd: I’ve seen Universal Search results missing, like video results missing. But generally with ranking, we haven’t seen any big changes.

Greg: I’ve got a tool that collects data on both Google standard and Caffeine every day and have noticed a lot of changes. A trend toward home pages — trying to return the best site, not necessarily the best page.

Bruce: I’ve found pages in Caffeine that are older than the standard index. So it appears that the regular index is updating faster than the Caffeine index. If you see ranking changes, it may be that your competitor has been spidered more recently than yours. It’s hard to compare the two indices because they’re not in sync.

Todd: Keep in mind that the only people that know about Caffeine and are clicking on things are a bunch of search marketers.

Danny: I think Google’s also feeling a lot of pressure over Bing. No one thinks Google does search anymore. This is something they can point to and say, look, we’re still working on search.

Q: Any tips for Google Suggest?

Danny: In some cases you’ll get sites that come up. Typing “rhyme” into the search box suggests www.rhymezone.com.

Aaron: We actually were able to create phrases. If your domain name matches your keyword, you probably get a boost. A lot of Suggest looks to be navigational, as well.

Todd: I’d like to be able to clean up the suggestions from a brand management standpoint. Google Suggest is hurting you before the user even gets to the results page.

Vanessa: You can take advantage of the knowledge of the intent of the query. For instance, Bing breaks the query down into categories. You can drive interest by speaking to those categories.

Q: What’s the difference between Alexa ranking and Google PageRank?

Vanessa: Well, they’re the same by both being mythical numbers.

Danny: It’s like two independent critics reviewed a movie. There’s no relation.

Vanessa: Alexa is skewed because it only counts users that have a toolbar. A better way of gauging your site than PageRank is looking at your inbound links.

Rae: Don’t waste your time monitoring that number. Spend your time improving your site.

Todd: The two things to be concerned about in regards to toolbar PageRank is if it’s white or grey. If there’s even a pixel of green, you’re fine.

Q: What do you recommend for Sitemaps and making sure your site is crawled well.

Vanessa: I think you may as well submit an XML Sitemap because it doesn’t hurt you and it’s good to give them a better picture of your site. It doesn’t replace good information architecture. If it’s crawlable it can also help you with diagnostics if you submit it to Google and learn more accurate metrics than the search operator will give.

Aaron: If you block a page in robots.txt and someone links to it, it can still end up in the index. Instead you should include a noindex in the Head of the page. But if you include a noindex in the head and also exclude it through robots.txt, it won’t be crawled to see that it’s supposed to be noindexed.

Q: Should you pull old URLs from Sitemaps after you’ve redirected pages?

Wait for the redirects to get picked up. It’ll tell you in Webmaster Tools.

Q: For Flash, when should I use switch objects?

Vanessa: The biggest problem is that you need a separate URL for each interaction, otherwise it’s a mass of interactions.

Todd: You’ll come across sites that paid $2 million for an amazing Flash site. And they come to you and say we need SEO help. You can’t look at them and say, “You’re going to have to scratch that.” Copy the site in HTML.

Vanessa disagrees that you should just cloak your site as the solution.

Bruce: You can copy your top nav as links in the footer.

Q: How do you determine if a site is authoritative?

Rae: Does it rank well for main keywords?

Stephan: Does it have Sitelinks for non-brand keywords?

Vanessa: If you’re asking about sites that are linking to you, does the site have active stuff happening on it. Is it referring lots of visitors?

Greg: The time factor is one that you can’t fudge. Sometimes you may want to acquire a domain.

Vanessa: Acquiring a site and not changing the whois info is a little risky.

Q: Are the guidelines the same for optimizing dynamic sites?

Yes.

Vanessa: Google wrote a post on their blog this morning about dynamic parameters.

Q: What are the dumbest SEO mistakes you’ve each seen?

Todd: Pier One Imports launched a new site with breadcrumbs that followed the click path. It was totally unspiderable. They ended up just shutting off the ability to buy products on the site altogether because they didn’t want to fix it. You still can’t buy anything from their site.

Rae: At a site clinic several years ago. One site someone submitted was a scraper site.

Bruce: During a site review, one site had linked to all their other sites in the footer. One of the services phrases had a different site being linked to with each letter.

Todd: I bring this up at every conference hoping Bed Bath & Beyond will be at the conference. Every title is “Bed Bath & Beyond (Product)”. They also have 2400 lines of JavaScript at the top of the page.

Vanessa: One site had been around a year and wasn’t indexed but they didn’t know why. The host had actually blocked the site with robots.txt.

Greg: In the gaming industry there’s a lot of geo-targeting. More than once, a site will bounce Googlebot to the U.S. site because it comes from a U.S. IP, and so their UK site will never get indexed.

Excerpt from:
Ask the SEOs - SMX East 2009

+ Most Common Words in Google, Yahoo, Bing, and Ask, with Google Caffeine By admin 14 August 2009 at 10:04 pm and have No Comments

Just which words show up most frequently on the Web? I’m not sure that question can be answered, but it’s something I’ve wondered for a while.

With a beta version of Google’s future update, code named Caffeine recently released to allow people to experiment with, I thought I would do a few comparisons.

I found a few lists of the most common words in the English language, and came up with a top 50 to see how frequently those were estimated to show up in Google, Yahoo, Bing, Ask, and Google Caffeine. Those are shown in a table and a chart below.

I’m not sure how informative this might be, even after looking at it. It’s not a very scientific test as well. There are a few reasons for that:

One of them is that when you search at one of the search engines, you’ll see a message that says something like:

Results 1 – 100 of about xxx,xxx,xxx for [query term]

From at least one previous Google patent filing, we can guess that the total amount (xxx,xxx,xxx) of results listed is likely only an estimate, and not an actual count. That patent application told us that the number shown might be estimated based upon a look at anywhere from 2 percent to 10 percent of Google’s index. Since the Caffeine update is a complete infrastructure/database update, we may not be able to even guess that the estimates shown for the present day Google are created in the same way that the Caffeine updates might be.

We also can’t be sure that the numbers for Yahoo, Bing, and Ask are calculated in the same manner either.

Another is that while I may see one total count at Google for each term, if you looked up the same terms at Google, you might see different numbers because you may be searching at a different data center, and it’s quite possible that there are differences from one data center to another.

A third thing to keep in mind is that when we search at one of the search engines, we aren’t actually searching the Web. Instead, we’re searching the indexes of the Web that the search engines have created. That means that some pages may be indexed more than once under different URLs, that many pages on the Web may not be included since they haven’t been indexed yet, and that words that might appear on the Web as text in images or which are presented in Flash or hidden behind java script or log-in screens aren’t going to be counted.

The table below is number of total results in Millions. I sorted them by how frequently the terms tested appeared in Google Caffeine.

Query Google Caffeine Google Yahoo Bing Ask
a 19,320 17,570 31,200 7,800 1,280
in 15,850 13,980 30,200 7,850 900
to 15,220 13,500 27,500 8,920 1,740
the 14,850 13,900 28,800 8,170 747
of 14,760 12,990 28,000 7,310 794
and 13,980 12,950 28,000 7,490 789
for 12,110 10,720 26,800 7,740 769
by 12,080 10,420 27,000 6,120 956
on 11,260 9,940 25,100 5,610 598
is 9,580 8,870 22,600 4,250 699
I 9,220 8,250 18,600 3,860 686
all 9,110 7,580 27,200 6,990 1,020
this 8,890 7,870 21,500 5,790 585
with 8,490 6,300 20,900 2,440 636
it 7,700 6,860 19,300 4,190 542
at 7,410 6,600 20,800 3,930 552
from 7,340 6,920 18,400 4,160 521
or 7,030 6,210 19,500 3,940 567
you 6,760 5,930 19,900 5,080 543
as 6,460 5,750 15,400 3,550 884
your 6,360 5,470 19,500 3,790 495
an 6,260 5,520 16,500 3,780 489
are 6,260 5,760 18,100 163 578
be 6,120 5,460 17,100 3,990 473
that 5,780 5,260 15,200 5,650 405
do 5,500 5,020 13,000 2,090 410
not 5,500 4,870 15,600 4,550 418
have 4,870 4,390 14,500 4,130 468
one 4,330 3,870 12,300 2,750 375
can 4,150 3,690 13,300 3,030 367
was 3,930 3,610 10,400 2,960 361
if 3,810 3,500 11,200 2,660 345
we 3,780 3,370 12,400 3,430 358
but 3,610 3,340 10,100 1,680 327
what 3,290 2,850 11,600 3,080 322
which 3,020 2,810 7,750 1,810 300
there 2,970 2,770 8,340 1,450 262
when 2,850 2,600 8,360 1,580 306
use 2,730 2,250 12,300 1,830 327
their 2,690 2,680 8,210 1,650 254
they 2,650 2,440 8,260 1,670 293
how 2,470 2,170 9,050 1,730 289
he 2,200 2,040 6,060 1,420 190
were 2,130 2,100 5,320 2,770 203
his 2,030 1,880 5,310 858 182
had 1,860 2,240 5,090 966 191
each 1,370 1,290 4,150 1,090 164
said 1,210 1,350 4,060 857 128
she 953 882 3,030 1,200 95
word 780 685 2,280 469 80

I thought it would be helpful to present this information in a visually different manner as well. The chart that follows is in reverse order of the table above.

chart comparing estimates of the number of results for common words in Google Caffeine, Google, Yahoo, Bing, and Ask.

As I mentioned above, this is a completely unscientific view.

One thing that it definitely won’t do is provide an idea of how large the databases might be for each of the search engines. According to post at the Cuil blog on Bing, there is a way to try to make that comparison, but it relies upon looking at the number of search results for terms that are rare, rather than looking at the most frequently appearing words, like I have.


Copyright