Posted by rjonesx.
Alright, so here’s the situation. You have a million-product website. Your competitors have a lot of the same products. You need unique content. What do you do? The same thing everyone does — you turn to user-generated content. Problem solved, right?
User-generated content (UGC) can be an incredibly valuable source of content and organization, helping you build natural language descriptions and human-driven organization of site content. One common feature used by sites to take advantage of user-created content are tags, found everywhere from e-commerce sites to blogs. Webmasters can leverage tags to power site search, create taxonomies and categories of products for browsing, and to provide rich descriptions of site content.
This is a logical and practical approach, but can cause intractable SEO problems if left unchecked. For mega-sites, manually moderating millions of user-submitted tags can be cumbersome (if not wholly impossible). Leaving tags unchecked, though, can create massive problems with thin content, duplicate content, and general content sprawl. In our case study below, three technical SEOs from different companies joined forces to solve a massive tag sprawl problem. The project was led by Jacob Bohall, VP of Marketing at Hive Digital, while computational statistics services were provided by J.R. Oakes of Adapt Partners and Russ Jones of Moz. Let’s dive in.
What is tag sprawl?
We define tag sprawl as the unchecked growth of unique, user-contributed tags resulting in a large amount of near-duplicate pages and unnecessary crawl space. Tag sprawl generates URLs likely to be classified as doorway pages, pages appearing to exist only for the purpose of building an index across an exhaustive array of keywords. You’ve probably seen this in its most basic form in the tagging of posts across blogs, which is why most SEOs recommend a blanket “noindex, follow” across tag pages in WordPress sites. This simple approach can be an effective solution for small blog sites, but is not often the solution for major e-commerce sites that rely more heavily on tags for categorizing products.
The three following tag clouds represent a list of user-generated terms associated with different stock photos. Note: User behavior is generally to place as many tags as possible in an attempt to ensure maximum exposure for their products.
- USS Yorktown, Yorktown, cv, cvs-10, bonhomme richard, revolutionary war-ships, war-ships, naval ship, military ship, attack carriers, patriots point, landmarks, historic boats, essex class aircraft carrier, water, ocean
- ship, ships, Yorktown, war boats, Patriot pointe, old war ship, historic landmarks, aircraft carrier, war ship, naval ship, navy ship, see, ocean
- Yorktown ship, Warships and aircraft carriers, historic military vessels, the USS Yorktown aircraft carrier
As you can see, each user has generated valuable information for the photos, which we would want to use as a basis for creating indexable taxonomies for related stock images. However, at any type of scale, we have immediate threats of:
- Thin content: Only a handful of products share the user-generated tag when a user creates a more specific/defining tag, e.g. “cvs-10”
- Duplicate and similar content: Many of these tags will overlap, e.g. “USS Yorktown” vs. “Yorktown,” “ship” vs. “ships,” “cv” vs. “cvs-10,” etc.
- Bad content: Created by improper formatting, misspellings, verbose tags, hyphenation, and similar mistakes made by users.
Now that you understand what tag sprawl is and how it negatively effects your site, how can we address this issue at scale?
The proposed solution
In correcting tag sprawl, we have some basic (at the surface) problems to solve. We need to effectively review each tag in our database and place them in groups so further action can be taken. First, we determine the quality of a tag (how likely is someone to search for this tag, is it spelled correctly, is it commercial, is it used for many products) and second, we determine if there is another tag very similar to it that has a higher quality.
- Identify good tags: We defined a good tag as term capable of contributing meaning, and easily justifiable as an indexed page in search results. This also entailed identifying a “master” tag to represent groups of similar terms.
- Identify bad tags: We wanted to isolate tags that should not appear in our database due to misspellings, duplicates, poor format, high ambiguity, or likely to cause a low-quality page.
- Relate bad tags to good tags: We assumed many of our initial “bad tags” could be a range of duplicates, i.e. plural/singular, technical/slang, hyphenated/non-hyphenated, conjugations, and other stems. There could also be two phrases which refer to the same thing, like “Yorktown ship” vs. “USS Yorktown.” We need to identify these relationships for every “bad” tag.
For the project inspiring this post, our sample tag database comprised over 2,000,000 “unique” tags, making this a nearly impossible feat to accomplish manually. While theoretically we could have leveraged Mechanical Turk or similar platform to get “manual” review, early tests of this method proved to be unsuccessful. We would need a programmatic method (several methods, in fact) that we could later reproduce when adding new tags.
Keeping the goal in mind of identifying good tags, labeling bad tags, and relating bad tags to good tags, we employed more than a dozen methods, including: spell correction, bid value, tag search volume, unique visitors, tag count, Porter stemming, lemmatization, Jaccard index, Jaro-Winkler distance, Keyword Planner grouping, Wikipedia disambiguation, and K-Means clustering with word vectors. Each method either helped us determine whether the tag was valuable and, if not, helped us identify an alternate tag that was valuable.
- Method: One of the obvious issues with user-generated content is the occurrence of misspellings. We would regularly find misspellings where semicolons are transposed for the letter “L” or words have unintended characters at the beginning or end. Luckily, Linux has an excellent built-in spell checker called Aspell which we were able to use to fix a large volume of issues.
- Benefits: This offered a quick, early win in that it was fairly easy to identify bad tags when they were composed of words that weren’t included in the dictionary or included characters that were simply inexplicable (like a semicolon in the middle of a word). Moreover, if the corrected word or phrase occurred in the tag list, we could trust the corrected phrase as a potentially good tag, and relate the misspelled term to the good tag. Thus, this method help us both filter bad tags (misspelled terms) and find good tags (the spell-corrected term)
- Limitations: The biggest limitation with this methodology was that combinations of correctly spelled words or phrases aren’t necessarily useful for users or the search engine. For example, many of the tags in the database were concatenations of multiple tags where the user space-delimited rather than comma-delimited their submitted tags. Thus, a tag might consist of correctly spelled terms but still be useless in terms of search value. Moreover, there were substantial dictionary limitations, especially with domain names, brand names, and Internet slang. In order to accommodate this, we added a personal dictionary that included a list of the top 10,000 domains according to Quantcast, several thousand brands, and a slang dictionary. While this was helpful, there were still several false recommendations that needed to be handled. For example, we saw “purfect” correct to “perfect,” despite being a pop-culture reference for cat images. We also noticed some users reference this saying as “purrfect,” “purrrfect,” “purrrrfect,” “purrfeck,” etc. Ultimately, we had to rely on other metrics to determine whether we trusted the misspelling recommendations.
- Method: While a tag might be good in the sense that it is descriptive, we wanted tags that were commercially relevant. Using the estimated cost-per-click of the tag or tag phrase proved useful in making sure that the term could attract buyers, not just visitors.
- Benefits: One of the great features of this methodology is that it tends to have a high signal-to-noise ratio. Most tags that have high CPCs tend to be commercially relevant and searched frequently enough to warrant inclusion as “good tags.” In many cases we could feel confident that a tag was good just on this metric alone.
- Limitations: However, the bid value metric comes with some pretty big limitations, too. For starters, Google Keyword Planner’s disambiguation problem is readily apparent. Google combines related keywords together when reporting search volume and CPC data, which means a tag like “facbook” would return the same data as “facebook.” Obviously, we would prefer to map “facbook” to “facebook” rather than keep both tags, so in some cases the CPC metric wasn’t sufficient to identify good tags. A further limitation of the bid value was the difficulty of acquiring CPC data. Google now requires running active Adwords campaigns to get access to CPC value. It is no simple feat to look up 5,000,000 keywords in Google Keyword Planner, even if you have a sufficient account. Luckily, we felt comfortable that historical data would be trustworthy enough, so we didn’t need to acquire fresh data.
- Method: Similar to CPC, we could use search volume to determine the potential value of a tag. We had to be careful not to rely on the tag itself, though, since the tag could be so generic that it earns traffic unrelated to the product itself. For example, the tag “USS Yorktown” might get a few hundred searches a month, but “USS Yorktown T-shirt” gets 0. For all of the tags in our index, we tracked down the search volume for the tag plus the product name, in order to make sure we had good estimates of potential product traffic.
- Benefits: Like CPC, this metric did a very good job of consolidating our tag data set to just keywords that were likely to deliver traffic. In the vast majority of cases, if “tag + product” had search volume, we could feel confident that it is a good term.
- Limitations: Unfortunately, this method fell victim to the same disambiguation problem that CPC presents. Because Google groups terms together, it is possible that on some occasions two tags will be given the same metrics. For example: “pontoons boat,” “pontoonboat,” “pontoon boats,” “pontoon boat,” “pontoon boating,” and “pontoons boats” were in the same traffic volume group which also included tags like “yacht” and “yachts.” Moreover, there is no accounting for keyword difficulty in this metric. Some tags, when combined with product types, produce keywords that receive substantial traffic but will always be out of reach for a templated tag page.
- Method: This method was a no-brainer: protect the tags that already receive traffic from Google. We exported all of the tags from Google Analytics that had received search traffic from Google in the last 12 months. Generally speaking, this should be a fairly safe list of terms.
- Benefits: When doing experimental work with a client, it is always nice to be able to give them a scenario that almost guarantees improvement. Because we were able to protect tags that already receive traffic by labeling them as good (in the vast majority of cases), we could ensure that the client had a high probability of profiting from the changes we made and minimal risk of any traffic loss.
- Limitations: Unfortunately, even this method wasn’t perfect. If a product (or set of products) with high enough authority included a poor variation of a tag, then the bad variant would rank and receive traffic. We had to use other strategies to verify our selections from this method and devise a method to encourage a tag swap in the index for the correct version of a term.
- Description: The frequency with which a tag was used on the site was often a strong signal that we could trust the tag, especially when compared with other similar tags. By counting the number of times each tag was used on the site, we could bias our final set of trusted tags in favor of these more popular terms.
- Benefits: This was a great tie-breaker metric when we had two tags that were very similar but needed to choose just one. For example, sometimes two variants of a phrase were completely acceptable (such as a version with and without a hyphen). We could simply defer to the one with a higher tag count.
- Limitations: The clear limitation of tag frequency is that many of the most frequent tags were too generic to be useful. The tag “blue” isn’t particularly useful when it just helps people find “blue t-shirts.” The term is too generic and too competitive to warrant inclusion. Additionally, the inclusion of too broad of a tag would simply create a very large crawl vs. traffic-potential ratio. A common tag will have hundreds if not thousands of matching products, creating many pages of products for the single tag. If a tag produces 50 paginated product listings, but only has the potential to drive 10 visitors a year, it might not be worth it.
- Method: Stemming is a method used to identify the root word from a tag by scanning the word right to left and using various pattern matching rules to remove characters (suffixes) until you arrive at the word’s stem. There are a couple of popular stemmers available, but we found Porter stemming to be more accurate as a tool for seeing alternative word forms. You can geek out by looking at the Porter stemming algorithm in Snowball here, or you can play with a JS version here.
- Benefits: Plural and possessive terms can be grouped by their stem for further analysis. Running Porter stemming on the terms “pony” and “ponies” will return “poni” as the stem, which can then be used to group terms for further analysis. You can also run Porter stemming on phrases. For example, “boating accident,” “boat accidents,” “boating accidents,” etc. share the stem “boat accid.” This can be a crude and quick method for grouping variations. Porter stemming also is able to clean text more kindly, where others stemmers can be too aggressive for our efforts; e.g., Lancaster stemmer reduces “woman” to “wom,” while Porter stemmer leaves it as “woman.”
- Limitations: Stemming is intended for finding a common root for terms and phrases, and does not create any type of indication as to the proper form of a term. The Porter stemming method applies a fixed set of rules to the English language by blanket removing trailing “s,” “e,” “ance,” “ing,” and similar word endings to try and find the stem. For this to work well, you have to have all of the correct rules (and exceptions) in place to get the correct stems in all cases. This can be particularly problematic with words that end in S but are not plural, like “billiards” or “Brussels.” Additionally, this method does not help with mapping related terms such as “boat crash,” “crashed boat,” “boat accident,” etc. which would stem to “boat crash,” “crash boat,” and “boat acci.”
- Method: Lemmatization works similarly to stemming. However, instead of using a rule set for editing words by removing letters to arrive at a stem, lemmatization attempts to map the term to its most simple dictionary form, such as WordNet, and return a canonical “lemma” of the word. A crude way to think about lemmatization is just simplifying a word. Here’s an API to check out.
- Benefits: This method often works better than stemming. Terms like “ship,” “shipped,” and “ships” are all mapped to “ship” by this method, while “shipping” or “shipper,” which are terms that have distinct meaning despite the same stem, are retained. You can create an array of “lemma” from phrases which can be compared to other phrases resolving word order issues. This proved to be a more reliable method for grouping variations than stemming.
- Limitations: As with many of the methods, context for mapping related terms can be difficult. Lemmatization can provide better filters for context, but to do so generally relies on identifying the word form (noun, adjective, etc) to appropriately map to a root term. Given the inconsistency of the user-generated content, it is inaccurate to assume all words are in adjective form (describing a product), or noun form (the product itself). This inconsistency can present wild results. For example, “strip socks” could be intended as as a tag for socks with a strip of color on them, such as as “striped socks,” or it could be “stripper socks” or some other leggings that would be a match only found if there other products and tags to compare for context. Additionally, it doesn’t create associations between all related words, just textual derivatives, so you are still seeking out a canonical between mailman, courier, shipper, etc.
- Method: The Jaccard index is a similarity coefficient measured by Intersection over Union. Now, don’t run off just yet, it is actually quite straightforward.
Imagine you had two piles with 3 marbles in each: Red, Green, and Blue in the first, Red, Green and Yellow in the second. The “Intersection” of these two piles would be Red and Green, since both piles have those two colors. The “Union” would be Red, Green, Blue and Yellow, since that is the complete list of all the colors. The Jaccard index would be 2 (Red and Green) divided by 4 (Red, Green, Blue, and Yellow). Thus, the Jaccard index of these two piles would be .5. The higher the Jaccard index, the more similar the two sets.
So what does this have to do with tags? Well, imagine we have two tags: “ocean” and “sea.” We can get a list of all of the products that have the tag “ocean” and “sea.” Finally, we get the Jaccard index of those two sets. The higher the score, the more related they are. Perhaps we find that 70% of the products with the tag “ocean” also have the tag “sea”; we now know that the two are fairly well-related. However, when we run the same measurement to compare “basement” or “casement,” we find that they only have a Jaccard index of .02. Even though they are very similar in terms of characters, they mean quite different things. We can rule out mapping the two terms together.
- Benefits: The greatest benefit of using the Jaccard index is that it allows us to find highly related tags which may have absolutely no textual characteristics in common, and are more likely to have an overly similar or duplicate results set. While most of the the metrics we have considered so far help us find “good” or “bad” tags, the Jaccard index helps us find “related” tags without having to do any complex machine learning.
- Limitations: While certainly useful, the Jaccard index methodology has its own problems. The biggest issue we ran into had to do with tags that were used together nearly all the time but weren’t substitutes of one another. For example, consider the tags “babe ruth” and his nickname, “sultan of swat.” The latter tag only occurred on products which also had the “babe ruth” tag (since this was one of his nicknames), so they had quite a high Jaccard index. However, Google doesn’t map these two terms together in search, so we would prefer to keep the nickname and not simply redirect it to “babe ruth.” We needed to dig deeper if we were to determine when we should keep both tags or when we should redirect one to another. As a standalone, this method also was not sufficient at identifying cases where a user consistently misspelled tags or used incorrect syntax, as their products would essentially be orphans without “union.”
- Method: There are several edit distance and string similarity metrics that we used throughout this process. Edit Distance is simply some measurement of how difficult it is to change one word to another. For example, the most basic edit distance metric, Levenshtein distance, between “Russ Jones” and “Russell Jones” is 3 (you have to add “E”,”L”, and “L” to transform Russ to Russell). This can be used to help us find similar words and phrases. In our case, we used a particular edit distance measure called “Jaro-Winkler distance” which gives higher precedence to words and phrases that are similar at the beginning. For example, “Baseball” would be closer to “Baseballer” than to “Basketball” because the differences are at the very end of the term.
- Benefits: Edit distance metrics helped us find many very similar variants of tags, especially when the variants were not necessarily misspellings. This was particularly valuable when used in conjunction with the Jaccard index metrics, because we could apply a character-level metric on top of a character-agnostic metric (i.e. one that cares about the letters in the tag and one that doesn’t).
- Limitations: Edit distance metrics can be kind of stupid. According to Jaro-Winkler distance, “Baseball” and “Basketball” are far more related to one another than “Baseball” and “Pitcher” or “Catcher.” “Round” and “Circle” have a horrible edit distance metric, while “Round” and “Pound” look very similar. Edit distance simply cannot be used in isolation to find similar tags.
- Method: While Google’s choice to combine similar keywords in Keyword Planner has been problematic for predicting traffic, it has actually offered us a new method to identify highly related terms. Whenever two tags share identical metrics from Google Keyword Planner (average monthly traffic, historical traffic, CPC, and competition), we can conclude that there is an increased chance the two are related to one another.
- Benefits: This method is extremely useful for acronyms (which are particularly difficult to detect). While Google groups together COO and Chief Operating Officer, you can imagine that standard methods like those outlined above might have problems detecting the relationship.
- Limitations: The greatest drawback for this methodology was that it created numerous false positives among less popular terms. There are just too many keywords which have an annual search volume average of 10, are searched 10 times monthly, and have a CPC and competition of 0. Thus, we had to limit the use of this methodology to more popular terms where there were only a handful of matches.
- Benefits: When a tag could be mapped to a Wikipedia entry, this method proved to be a highly effective at providing validation that a tag had potential value, or creating a point of reference for related tags. If the Wikipedia community felt a tag or tag phrase was important enough to have an article dedicated to it, then the tag was more likely to be a valuable term vs. random entry or keyword stuffing by the user. Further, the methodology allows for grouping related terms without any bias on word order. Doing a search on Wikipedia creates a search results page (“pontoon boats”), or redirects you to a correction of the article (“disneyworld” becomes “Walt Disney World”). Wikipedia also tends to have entries for some pop culture references, so things that would get flagged as a misspelling, such as “lolcats,” can be vindicated by the existence of a matching Wikipedia article.
- Limitations: While Wikipedia is effective at delivering a consistent formal tag for disambiguation, it can at times be more sterile than user-friendly. This can run counter to other signals such as CPC or traffic volume methods. For example, “pontoon boats” becomes “Pontoon (Boat)”, or “Lily” becomes “lilium.” All signals indicate the former case as the most popular, but Wikipedia disambiguation suggests the latter to be the correct usage. Wikipedia also contains entries for very broad terms, like each number, year, letter, etc. so simply applying a rule that any Wikipedia article is an allowed tag would continue to contribute to tag sprawl problems.
- Method: Finally, we attempted to transform the tags into a subset of more meaningful tags using word embeddings and k-means clustering. Generally, the process involved transforming the tags into tokens (individual words), then refining by part-of-speech (noun, verb, adjective), and finally lemmatizing the tokens (“blue shirts” becomes “blue shirt”). From there, we transformed all the tokens into a custom Word2Vec embedding model based on adding the vectors of each resulting token array. We created a label array and a vector array of each tag in the dataset, then ran k-means with 10 percent of the total count of the tags as the value for number of centroids. At first we tested on 30,000 tags and obtained reasonable results.
Once k-means had completed, we pulled all of the centroids and obtained their nearest relative from the custom Word2Vec model, then we assigned the tags to their centroid category in the main dataset.
Tag Tokens Tag Pos Tag Lemm. Categorization [‘beach’, ‘photographs’] [(‘beach’, ‘NN’), (‘photographs’, ‘NN’)] [‘beach’, ‘photograph’] beach photo [‘seaside’, ‘photographs’] [(‘seaside’, ‘NN’), (‘photographs’, ‘NN’)] [‘seaside’, ‘photograph’] beach photo [‘coastal’, ‘photographs’] [(‘coastal’, ‘JJ’), (‘photographs’, ‘NN’)] [‘coastal’, ‘photograph’] beach photo [‘seaside’, ‘photographs’] [(‘seaside’, ‘NN’), (‘photographs’, ‘NN’)] [‘seaside’, ‘photograph’] beach photo [‘seaside’, ‘posters’] [(‘seaside’, ‘NN’), (‘posters’, ‘NNS’)] [‘seaside’, ‘poster’] beach photo [‘coast’, ‘photographs’] [(‘coast’, ‘NN’), (‘photographs’, ‘NN’)] [‘coast’, ‘photograph’] beach photo [‘beach’, ‘photos’] [(‘beach’, ‘NN’), (‘photos’, ‘NNS’)] [‘beach’, ‘photo’] beach photo
The Categorization column above was the centroid selected by Kmeans. Notice how it handled the matching of “seaside” to “beach” and “coastal” to “beach.”
- Benefits: This method seemed to do a good job of finding associations between the tags and their categories that were more semantic than character-driven. “Blue shirt” might be matched to “clothing.” This was obviously not possible without the semantic relationships found within the vector space.
- Limitations: Ultimately, the chief limitation that we encountered was trying to run k-means on the full two million tags while ending up with 200,000 categories (centroids). Sklearn for Python allows for multiple concurrent jobs, but only across the initialization of the centroids, which in this case was 11 — meaning that even if you ran on a 60-core processor, the number of concurrent jobs was limited by the number of initialization, which in this case, was again 11. We tried PCA (principal component analysis) to reduce the vector sizes (300 to 10) but the results were overall poor. Finally, because embeddings are generally built based on probabilistic closeness of terms in the corpus on which they were trained, there were matches that you could understand why they matched, but would obviously not have been the correct category (eg “19th century art” was picked as a category for “18th century art”). Finally, context matters and the word embeddings obviously suffer from understanding the difference between “duck” (the animal) and “duck” (the action).
Bringing it all together
Using a combination of the methods above, we were able to develop a series of methodology confidence scores that could be applied to any tag in our dataset, generating a heuristic for how to consider each tag going forward. These were case-level strategies to determine the appropriate methodology. We denoted these as follows:
- Good Tags: This mostly started as our “do not touch” list of terms which already received traffic from Google. After some confirmation exercises, the list was expanded to include unique terms with rankings potential, commercial appeal, and unique product sets to deliver to customers. For example, a heuristic for this category might look like this:
- If tag is identical to Wikipedia entry and
- Tag + product has estimated search traffic and
- Tag has CPC value then
- Mark as “Good Tag”
- Okay Tags: This represents terms that we would like to retain associated with products and their descriptions, as they could be used within the site to add context to a page, but do not warrant their own indexable space. These tags were mapped to be redirected or canonicaled to a “master,” but still included on a page for topical relevancy, natural language queries, long-tail searches, etc. For example, a heuristic for this category might look like this:
- If tag is identical to Wikipedia entry but
- Tag + product has no search volume
- Vector tag matches a “Good Tag”
- Mark as “Okay Tag” and redirect to “Good Tag”
- Bad Tags to Remap: This grouping represents bad tags that were mapped to a replacement. These tags would literally be deleted and replaced with a corrected version. These were most often misspellings or terms discovered through stemming/lemmatization/etc. where a dominant replacement was identified. For example, a heuristic for this category might look like this:
- If tag is not identical to either Wikipedia or vector space and
- Tag + product has no search volume
- Tag has no volume
- Tag Wikipedia entry matches a “Good Tag”
- Mark as “Bad Tag to Remap”
- Bad Tags to Remove: These are tags that were flagged as bad tags that could not be related to a good tag. Essentially, these needed to be removed from our database completely. This final group represented the worst of the worst in the sense that the existence of the tag would likely be considered a negative indicator of site quality. Considerations were made for character length of tags, lack of Wikipedia entries, inability to map to word vectors, no previous traffic, no predicted traffic or CPC value, etc. In many cases, these were nonsense phrases.
All together, we were able to reduce the number of tags by 87.5%, consolidating the site down to a reasonable, targeted, and useful set of tags which properly organized the corpus without wasting either crawl budget or limiting user engagement.
Conclusions: Advanced white hat SEO
It was nearly nine years ago that a well-known black hat SEO called out white hat SEO as being simple, stale, and bereft of innovation. He claimed that “advanced white hat SEO” was an oxymoron — it simply did not exist. I was proud at the time to respond to his claims with a technique Hive Digital was using which I called “Second Page Poaching.” It was a great technique, but it paled in comparison to the sophistication of methods we now see today. I never envisioned either the depth or breadth of technical proficiency which would develop within the white hat SEO community for dealing with unique but persistent problems facing webmasters.
I sincerely doubt most of the readers here will have the specific tag sprawl problem described above. I’d be lucky if even a few of you have run into it. What I hope is that this post might disabuse us of any caricatures of white hat SEO as facile or stagnant and inspire those in our space to their best work.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Posted by Dr-Pete
It’s hardly surprising that Google Home is an extension of Google’s search ecosystem. Home is attempting to answer more and more questions, drawing those answers from search results. There’s an increasingly clear connection between Featured Snippets in search and voice answers.
For example, let’s say a hedgehog wanders into your house and you naturally find yourself wondering what you should feed it. You might search for “What do hedgehogs eat?” On desktop, you’d see a Featured Snippet like the following:
Given that you’re trying to wrangle a strange hedgehog, searching on your desktop may not be practical, so you ask Google Home: “Ok, Google — What do hedgehogs eat?” and hear the following:
Google Home leads with the attribution to Ark Wildlife (since a voice answer has no direct link), and then repeats a short version of the desktop snippet. The connection between the two answers is, I hope, obvious.
Anecdotally, this is a pattern we see often on Google Home, but how consistent is it? How does Google handle Featured Snippets in other formats (including lists and tables)? Are some questions answered wildly differently by Google Home compared to desktop search?
Methodology (10K –> 1K)
To find out the answer to these questions, I needed to start with a fairly large set of searches that were likely to generate answers in the form of Featured Snippets. My colleague Russ Jones pulled a set of roughly 10,000 popular searches beginning with question words (Who, What, Where, Why, When, How) from a third-party “clickstream” source (actual web activity from a very large set of users).
I ran those searches on desktop (automagically, of course) and found that just over half (53%) had Featured Snippets. As we’ve seen in other data sets, Google is clearly getting serious about direct answers.
The overall set of popular questions was dominated by “What?” and “How?” phrases:
Given the prevalence of “How to?” questions, I’ve broken them out in this chart. The purple bars show how many of these searches generated Featured Snippets. “How to?” questions were very likely to display a Featured Snippet, with other types of questions displaying them less than half of the time.
Of the roughly 5,300 searches in the full data set that had Featured Snippets, those snippets broke down into four types, as follows:
Text snippets — paragraph-based answers like the one at the top of this post — accounted for roughly two-thirds of all of the Featured Snippets in our original data set. List snippets accounted for just under one-third — these are bullet lists, like this one for “How to draw a dinosaur?”:
Step 1 – Draw a small oval. Step 5 – Dinosaur! It’s as simple as that.
Table snippets made up less than 2% of the Featured Snippets in our starting data set. These snippets contain a small amount of tabular data, like this search for “What generation am I?”:
If you throw your money recklessly at your avocado toast habit instead of buying a house, you’re probably a millennial (sorry, content marketing joke).
Finally, video snippets are a special class of Featured Snippet with a large video thumbnail and direct link (dominated by YouTube). Here’s one for “Who is the spiciest memelord?”:
I’m honestly not sure what commentary I can add to that result. Since there’s currently no way for a video to appear on Google Home, we excluded video snippets from the rest of the study.
Google has also been testing some hybrid Featured Snippets. In some cases, for example, they attempt to extract a specific answer from the text, such as this answer for “When was 1984 written?” (Hint: the answer is not 1984):
For the purposes of this study, we treated these hybrids as text snippets. Given the concise answer at the top, these hybrids are well-suited to voice results.
From the 5.3K questions with snippets, I selected 1,000, excluding video but purposely including a disproportionate number of list and table types (to better see if and how those translated into voice).
Why only 1,000? Because, unlike desktop searches, there’s no easy way to do this. Over the course of a couple of days, I had to run all of these voice searches manually on Google Home. It’s possible that I went temporarily insane. At one point, I saw a spider on my Google Home staring back at me. Fearing that I was hallucinating, I took a picture and posted it on Twitter:
I was assured that the spider was, in point of fact, not a figment of my imagination. I’m still not sure about the half-hour when the spider sang me selections from the Hamilton soundtrack.
From snippets to voice answers
So, how many of the 1,000 searches yielded voice answers? The short answer is: 71%. Diving deeper, it turns out that this percentage is strongly dependent on the type of snippet:
Text snippets in our 1K data set yielded voice answers 87% of the time. List snippets dropped to just under half, and table snippets only generated voice answers one-third of the time. This makes sense — long lists and most tables are simply harder to translate into voice.
In the case of tables, some of these results were from different sites or in a different format. In other words, the search generated a Featured Snippet and a voice answer, but the voice answer was of a different type (text, for example) and attributed to a different source. Only 20% of Featured Snippets in table format generated voice answers that came from the same source.
From a search marketing standpoint, text snippets are going to generate a voice answer almost 9 out of 10 times. Optimizing for text/paragraph snippets is a good starting point for ranking on voice search and should generally be a win-win across devices.
Special: Knowledge Graph
What about the Featured Snippets that didn’t generate voice answers? It turns out there was quite a variety of exceptions in play. One exception was answers that came directly from the Knowledge Graph on Google Home, without any attribution. For example, the question “What is the nuclear option?” produces this Featured Snippet (for me, at least) on desktop:
On Google Home, though, I get an unattributed answer that seems to come from the Knowledge Graph:
It’s unclear why Google has chosen one over the other for voice in this particular case. Across the 1,000 keyword set, there were about 30 keywords where something similar happened.
Special: Device help
Google Home seems to translate some searches as device-specific help. For example, “How to change your name?” returns desktop results about legally changing your name as an individual. On Google Home, I get the following:
Other searches from our list that triggered device help include:
- How to contact Google?
- How to send a fax online?
- What are you up to?
Special: Easter eggs
Google Home has some Easter eggs that seem unique to voice search. One of my personal favorites — the question “What is best in life?” — generates the following:
Here’s a list of the other Easter eggs in our 1,000 phrase data set:
- How many letters are in the alphabet?
- What are your strengths?
- What came first, the chicken or the egg?
- What generation am I?
- What is the meaning of life?
- What would you do for a Klondike bar?
- Where do babies come from?
- Where in the world is Carmen Sandiego?
- Where is my iPhone?
- Where is Waldo?
- Who is your daddy?
Easter eggs are a bit less predictable than device help. Generally speaking, though, both are rare and shouldn’t dissuade you from trying to rank for Featured Snippets and voice answers.
Special: General confusion
In a handful of cases, Google simply didn’t understand the question or couldn’t answer the exact question. For example, I could not get Google to understand the question “What does MAGA mean?” The answer I got back (maybe it’s my Midwestern accent?) was:
On second thought, maybe that’s not entirely inaccurate.
One interesting case is when Google decides to answer a slightly different question. On desktop, if you search for “How to become a vampire?”, you might see the following Featured Snippet:
On Google Home, I’m asked to clarify my intent:
I suspect both of these cases will improve over time, as voice recognition continues to advance and Google becomes better at surfacing answers.
Special: Recipe results
Back in April, Google launched a new set of recipe functions across search and Google Home. Many “How to?” questions related to cooking now generate something like this (the question I asked was “How to bake chicken breast?”):
You can opt to find a recipe on Google search and send it to your Google Home, or Google can simply pick a recipe for you. Either way, it will guide you through step-by-step instructions.
Special: Health conditions
A half-dozen or so health questions, from general questions to diseases, generated results like the following. This one is for the question “Why do we sneeze?”:
This has no clear connection to desktop search results, and I’m not clear if it’s a signal for future, expanded functionality. It seems to be of limited use right now.
A handful of “How to?” questions triggered an unusual response. For example, if I ask Google Home “How to write a press release?” I get back:
If I say “yes,” I’m taken directly to a wikiHow assistant that uses a different voice. The wikiHow answers are much longer than text-based Featured Snippets.
How should we adapt?
Voice search and voice appliances (including Google Assistant and Google Home) are evolving quickly right now, and it’s hard to know where any of this will be in the next couple of years. From a search marketing standpoint, I don’t think it makes sense to drop everything to invest in voice, but I do think we’ve reached a point where some forward momentum is prudent.
First, I highly recommend simply being aware of how your industry and your major keywords/questions “appear” on Google Home (or Google Assistant on your mobile device). Look at the recipe situation above — for 99%+ of the people reading this article, that’s a novelty. If you’re in the recipe space, though, it’s game-changing, and it’s likely a sign of more to come.
Second, I feel strongly that Featured Snippets are a win-win right now. Almost 90% of the text-only Featured Snippets we tracked yielded a voice answer. These snippets are also prominent on desktop and mobile searches. Featured Snippets are a great starting point for understanding the voice ecosystem and establishing your foothold.
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Posted by MiriamEllis
My father, a hale and hearty gentleman in his seventies, simply won’t dine at a new restaurant these days before he checks its reviews on his cell phone. Your 23-year-old nephew, who travels around the country for his job as a college sports writer, has devoted 233 hours of his young life to writing 932 reviews on Yelp (932 reviews x @15 minutes per review).
Yes, our local SEO industry knows that my dad and your nephew need to find accurate NAP on local business listings to actually find and get to business locations. This is what makes our historic focus on citation data management totally reasonable. But reviews are what help a business to be chosen. Phil Rozek kindly highlighted a comment of mine as being among the most insightful on the Local Search Ranking Factors 2017 survey:
“If I could drive home one topic in 2017 for local business owners, it would surround everything relating to reviews. This would include rating, consumer sentiment, velocity, authenticity, and owner responses, both on third-party platforms and native website reviews/testimonials pages. The influence of reviews is enormous; I have come to see them as almost as powerful as the NAP on your citations. NAP must be accurate for rankings and consumer direction, but reviews sell.”
I’d like to take a few moments here to dive deeper into that list of review elements. It’s my hope that this post is one you can take to your clients, team or boss to urge creative and financial allocations for a review management campaign that reflects the central importance of this special form of marketing.
Ratings: At-a-glance consumer impressions and impactful rankings filter
Whether they’re stars or circles, the majority of rating icons send a 1–5 point signal to consumers that can be instantly understood. This symbol system has been around since at least the 1820s; it’s deeply ingrained in all our brains as a judgement of value.
So, when a modern Internet user is making a snap decision, like where to grab a taco, the food truck with 5 Yelp stars is automatically going to look more appealing than the one with only 2. Ratings can also catch the eye when Schema (or Google serendipity) causes them to appear within organic SERPs or knowledge panels.
All of the above is well-understood, but while the exact impact of high star ratings on local pack rankings has long been speculative (it’s only factor #24 in this year’s Local Search Ranking Factors), we may have just reached a new day with Google. The ability to filter local finder results by rating has been around for some time, but in May, Google began testing the application of a “highly rated” snippet on hotel rankings in the local packs. Meanwhile, searches with the format of “best X in city” (e.g. best burrito in Dallas) appear to be defaulting to local results made up of businesses that have earned a minimum average of 4 stars. It’s early days yet, but totally safe for us to assume that Google is paying increased attention to numeric ratings as indicators of relevance.
Because we’re now reaching the point from which we can comfortably speculate that high ratings will tend to start correlating more frequently with high local rankings, it’s imperative for local businesses to view low ratings as the serious impediments to growth that they truly are. Big brands, in particular, must stop ignoring low star ratings, or they may find themselves not only having to close multiple store locations, but also, to be on the losing end of competing for rankings for their open stores when smaller competitors surpass their standards of cleanliness, quality, and employee behavior.
Consumer sentiment: The local business story your customers are writing for you
Here is a randomly chosen Google 3-pack result when searching just for “tacos” in a small city in the San Francisco Bay Area:
We’ve just been talking about ratings, and you can look at a result like this to get that instant gut feeling about the 4-star-rated eateries vs. the 2-star place. Now, let’s open the book on business #3 and see precisely what kind of story its consumers are writing. This is the first step towards doing a professional review audit for any business whose troubling reviews may point to future closure if problems aren’t fixed. A full audit would look at all relevant review platforms, but we’ll be brief here and just look at Google and Yelp and sort negative sentiments by type:
It’s easy to ding fast food chains. Their business model isn’t commonly associated with fine dining or the kind of high wages that tend to promote employee excellence. In some ways, I think of them as extreme examples. Yet, they serve as good teaching models for how even the most modest-quality offerings create certain expectations in the minds of consumers, and when those basic expectations aren’t met, it’s enough of a story for consumers to share in the form of reviews.
This particular restaurant location has an obvious problem with slow service, orders being filled incorrectly, and employees who have not been trained to represent the brand in a knowledgeable, friendly, or accessible manner. Maybe a business you are auditing has pain points surrounding outdated fixtures or low standards of cleanliness.
Whatever the case, when the incoming consumer turns to the review world, their eyes scan the story as it scrolls down their screen. Repeat mentions of a particular negative issue can create enough of a theme to turn the potential customer away. One survey says only 13% of people will choose a business that has wound up with a 1–2 star rating based on poor reviews. Who can afford to let the other 87% of consumers go elsewhere?
There are 20 restaurants showing up in Google’s local finder for my “tacos” search, highlighted above. Taco Bell is managing to hold the #3 spot in the local pack right now, perhaps due to brand authority. My question is, what happens next, particularly if Google is going to amplify ratings and review sentiment in the overall local ranking mix? Will this chain location continue to beat out 4-star restaurants with 100+ positive reviews, or will it slip down as consumers continue to chronicle specific and unresolved issues?
No third-party brand controls Google, but your brand can open the book right now and make maximum use of the story your customers are constantly publishing — for free. By taking review insights as real and representative of all the customers who don’t speak up, and by actively addressing repeatedly cited issues, you could be making one of the smartest decisions in your company’s history.
Velocity/recency: Just enough of a timely good thing
This is one of the easiest aspects of review management to teach clients. You can sum it up in one sentence: don’t get too many reviews at once on any given platform but do get enough reviews on an ongoing basis to avoid looking like you’ve gone out of business.
For a little more background on the first part of that statement, watch Mary Bowling describing in this LocalU video how she audited a law firm that went from zero to thirty 5-star reviews within a single month. Sudden gluts of reviews like this not only look odd to alert customers, but they can trip review platform filters, resulting in removal. Remember, reviews are a business lifetime effort, not a race. Get a few this month, a few next month, and a few the month after that. Keep going.
The second half of the review timing paradigm relates to not running out of steam in your acquisition campaigns. One survey found that 73% of consumers don’t believe that reviews that are older than 3 months are still relevant to them, yet you will frequently encounter businesses that haven’t earned a new review in over a year. It makes you wonder if the place is still in business, or if it’s in business but is so unimpressive that no one is bothering to review it.
While I’d argue that review recency may be more important in review-oriented industries (like restaurants) vs. those that aren’t quite as actively reviewed (like septic system servicing), the idea here is similar to that of velocity, in that you want to keep things going. Don’t run a big review acquisition campaign in January and then forget about outreach for the rest of the year. A moderate, steady pace of acquisition is ideal.
Authenticity: Honesty is the only honest policy
For me, this is one of the most prickly and interesting aspects of the review world. Three opposing forces meet on this playing field: business ethics, business education, and the temptations engendered by the obvious limitations of review platforms to police themselves.
I recently began a basic audit of a family-owned restaurant for a friend of a friend. Within minutes, I realized that the family had been reviewing their own restaurant on Yelp (a glaring violation of Yelp’s policy). I felt sorry to see this, but being acquainted with the people involved (and knowing them to be quite nice!), I highly doubted they had done this out of some dark impulse to deceive the public. Rather, my guess was that they may have thought they were “getting the ball rolling” for their new business, hoping to inspire real reviews. My gut feeling was that they simply lacked the necessary education to understand that they were being dishonest with their community and how this could lead to them being publicly shamed by Yelp, if caught.
In such a scenario, there is definitely opportunity for the marketer to offer the necessary education to describe the risks involved in tying a brand to misleading practices, highlighting how vital it is to build trust within the local community. Fake positive reviews aren’t building anything real on which a company can stake its future. Ethical business owners will catch on when you explain this in honest terms and can then begin marketing themselves in smarter ways.
But then there’s the other side. Mike Blumenthal recently wrote of his discovery of the largest review spam network he’d ever encountered and there’s simply no way to confuse organized, global review spam with a busy small business making a wrong, novice move. Real temptation resides in this scenario, because, as Blumenthal states:
“Review spam at this scale, unencumbered by any Google enforcement, calls into question every review that Google has. Fake business listings are bad, but businesses with 20, or 50, or 150 fake reviews are worse. They deceive the searcher and the buying public and they stain every real review, every honest business, and Google.”
When a platform like Google makes it easy to “get away with” deception, companies lacking ethics will take advantage of the opportunity. All we can do, as marketers, is to offer the education that helps ethical businesses make honest choices. We can simply pose the question:
Is it better to fake your business’ success or to actually achieve success?
On a final note, authenticity is a two-way street in the review world. When spammers target good businesses with fake, negative reviews, this also presents a totally false picture to the consumer public. I highly recommend reading about Whitespark’s recent successes in getting fake Google reviews removed. No guarantees here, but excellent strategic advice.
Owner responses: Your contributions to the consumer story
In previous Moz blog posts, I’ve highlighted the five types of Google My Business reviews and how to respond to them, and I’ve diagrammed a real-world example of how a terrible owner response can make a bad situation even worse. If the world of owner responses is somewhat new to you, I hope you’ll take a gander at both of those. Here, I’d like to focus on a specific aspect of owner responses, as it relates to the story reviews are telling about your business.
We’ve discussed above the tremendous insight consumer sentiment can provide into a company’s pain points. Negative reviews can be a roadmap to resolving repeatedly cited problems. They are inherently valuable in this regard, and by dint of their high visibility, they carry the inherent opportunity for the business owner to make a very public showing of accountability in the form of owner responses. A business can state all it wants on its website that it offers lightning-quick service, but when reviews complain of 20-minute waits for fast food, which source do you think the average consumer will trust?
The truth is, the hypothetical restaurant has a problem. They’re not going to be able to resolve slow service overnight. Some issues are going to require real planning and real changes to overcome. So what can the owner do in this case?
- Whistle past the graveyard, claiming everything is actually fine now, guaranteeing further disappointed expectations and further negative reviews resulting therefrom?
- Be gutsy and honest, sharing exactly what realizations the business has had due to the negative reviews, what the obstacles are to fixing the problems, and what solutions the business is implementing to do their best to overcome those obstacles?
Let’s look at this in living color:
In yellow, the owner response is basically telling the story that the business is ignoring a legitimate complaint, and frankly, couldn’t care less. In blue, the owner has jumped right into the storyline, having the guts to take the blame, apologize, explain what happened and promise a fix — not an instant one, but a fix on the way. In the end, the narrative is going to go on with or without input from the owner, but in the blue example, the owner is taking the steering wheel into his own hands for at least part of the road trip. That initiative could save not just his franchise location, but the brand at large. Just ask Florian Huebner:
“Over the course of 2013 customers of Yi-Ko Holding’s restaurants increasingly left public online reviews about “broken and dirty furniture,” “sleeping and indifferent staff,” and “mice running around in the kitchen.” Per the nature of a franchise system, to the typical consumer it was unclear that these problems were limited to this individual franchisee. Consequently, the Burger King brand as a whole began to deteriorate and customers reduced their consumption across all locations, leading to revenue declines of up to 33% for some other franchisees.”
Positive news for small businesses working like mad to compete: You have more agility to put initiatives into quick action than the big brands do. Companies with 1,000 locations may let negative reviews go unanswered because they lack a clear policy or hierarchy for owner responses, but smaller enterprises can literally turn this around in a day. Just sit down at the nearest computer, claim your review profiles, and jump into the story with the goal of hearing, impressing, and keeping every single customer you can.
Big brands: The challenge for you is larger, by dint of your size, but you’ve also likely got the infrastructure to make this task no problem. You just have to assign the right people to the job, with thoughtful guidelines for ensuring your brand is being represented in a winning way.
NAP and reviews: The 1–2 punch combo every local business must practice
When traveling salesman Duncan Hines first published his 1935 review guide Adventures in Good Eating, he was pioneering what we think of today as local SEO. Here is my color-coded version of his review of the business that would one day become KFC. It should look strangely familiar to every one of you who has ever tackled citation management:
No phone number on this “citation,” of course, but then again telephones were quite a luxury in 1935. Barring that element, this simple and historic review has the core earmarks of a modern local business listing. It has location data and review data; it’s the 1–2 punch combo every local business still needs to get right today. Without the NAP, the business can’t be found. Without the sentiment, the business gives little reason to be chosen.
Are you heading to a team meeting today? Preparing to chat with an incoming client? Make the winning combo as simple as possible, like this:
- We’ve got to manage our local business listings so that they’re accessible, accurate, and complete. We can automate much of this (check out Moz Local) so that we get found.
- We’ve got to breathe life into the listings so that they act as interactive advertisements, helping us get chosen. We can do this by earning reviews and responding to them. This is our company heartbeat — our story.
From Duncan Hines to the digital age, there may be nothing new under the sun in marketing, but when you spend year after year looking at the sadly neglected review portions of local business listings, you realize you may have something to teach that is new news to somebody. So go for it — communicate this stuff, and good luck at your next big meeting!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
Posted by randfish
With the ubiquity of blogs, one of the questions we hear the most is how to come up with the right topics for new posts. In today’s episode of Whiteboard Friday, Rand explores six different paths to great blog topic ideas, and tells you what you need to keep in mind before you start.
Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week, we’re going to chat about blog post ideas, how to have great ones, how to make sure that the topics that you’re covering on your blog actually accomplish the goals that you want, and how to not run out of ideas as well.
The goals of your blog
So let’s start with the goals of a blog and then what an individual post needs to do, and then I’ll walk you through kind of six formats for coming up with great ideas for what to blog about. But generally speaking, you have created a blog, either on your company’s website or your personal website or for the project that you’re working on, because you want to:
- Attract a certain audience, which is great.
- Capture the attention and amplification, the sharing of certain types of influencers, so that you can grow that audience.
- Rank highly in search engines. That’s not just necessarily a goal for the blog’s content itself. But one of the reasons that you started a blog is to grow the authority, the ranking signals, the ability to rank for the website as a whole, and the blog hopefully is helping with that.
- Inspire some trust, some likeability, loyalty, and maybe even some evangelism from your readers.
- Provide a reference point for their opinions. So if you are a writer, an author, a journalist, a contributor to all sorts of sources, a speaker, whatever it is, you’re trying to provide a home for your ideas and your content, potentially your opinions too.
- Covert our audience to take an action. Then, finally, many times a blog is crafted with the idea that it is a first step in capturing an audience that will then take an action. That could be buy something from you, sign up for an email list, potentially take a free trial of something, maybe take some action. A political blog might be about, “Call your Congress person.” But those types of actions.
What should an individual post do?
From there, we get into an individual post. An individual post is supposed to help with these goals, but on its own doesn’t do all of them. It certainly doesn’t need to do more than one at a time. It can hopefully do some. But one of those is, generally speaking, a great blog post will do one of these four things and hopefully two or even three.
I. Help readers to accomplish a goal that they have.
So if I’m trying to figure out which hybrid electric vehicle should I buy and I read a great blog post from someone who’s very, very knowledgeable in the field, and they have two or three recommendations to help me narrow down my search, that is wonderful. It helps me accomplish my goal of figuring out which hybrid car to buy. That accomplishment of goal, that helping of people hits a bunch of these very, very nicely.
II. Designed to inform people and/or entertain them.
So it doesn’t have to be purely informational. It doesn’t have to be purely entertainment, but some combination of those, or one of the two, about a particular topic. So you might be trying to make someone excited about something or give them knowledge around it. It may be knowledge that they didn’t previously know that they wanted, and they may not actually be trying to accomplish a goal, but they are interested in the information or interested in finding the humor.
III. Inspiring some amplification and linking.
So you’re trying to earn signals to your site that will help you rank in search engines, that will help you grow your audience, that will help you reach more influencers. Thus, inspiring that amplification behavior by creating content that is designed to be shared, designed to be referenced and linked to is another big goal.
IV. Creating a more positive association with the brand.
So you might have a post that doesn’t really do any of these things. Maybe it touches a little on informational or entertaining. But it is really about crafting a personal story, or sharing an experience that then draws the reader closer to you and creates that association of what we talked about up here — loyalty, trust, evangelism, likeability.
6 paths to great blog topic ideas
So knowing what our blog needs to do and what our individual posts are trying to do, what are some great ways that we can come up with the ideas, the actual topics that we should be covering? I have kind of six paths. These six paths actually cover almost everything you will read in every other article about how to come up with blog post ideas. But I think that’s what’s great. These frameworks will get you into the mindset that will lead you to the path that can give you an infinite number of blog post ideas.
1. Are there any unanswered or poorly answered questions that are in your field, that your audience already has/is asking, and do you have a way to provide great answers to those?
So that’s basically this process of I’m going to research my audience through a bunch of methodologies, going to come up with topics that I know I could cover. I could deliver something that would answer their preexisting questions, and I could come up with those through…
- Surveys of my readers.
- In-person meetings or emails or interviews.
- Informal conversations just in passing around events, or if I’m interacting with members of my audience in any way, social settings.
- Keyword research, especially questions.
So if you’re using a tool like Moz’s Keyword Explorer, or I think some of the other ones out there, Ahrefs might have this as well, where you can filter by only questions. There are also free tools like Answer the Public, which many folks like, that show you what people are typing into Google, specifically in the form of questions, “Who? What? When? Where? Why? How? Do?” etc.
So I’m not just going to walk you through the ideas. I’m also going to challenge myself to give you some examples. So I’ve got two — one less challenging, one much more challenging. Two websites, both have blogs, and coming up with topic ideas based on this.
So one is called Remoters. It’s remoters.net. It’s run by Aleyda Solis, who many of you in the SEO world might know. They talk about remote work, so people who are working remotely. It’s a content platform for them and a service for them. Then, the second one is a company, I think, called Schweiss Doors. They run hydraulicdoors.com. Very B2B. Very, very niche. Pretty challenging to come up with good blog topics, but I think we’ve got some.
Remote Worker: I might say here, “You know what? One of the questions that’s asked very often by remote workers, but is not well-answered on the internet yet is: ‘How do I conduct myself in a remote interview and present myself as a remote worker in a way that I can be competitive with people who are actually, physically on premises and in the room? That is a big challenge. I feel like I’m always losing out to them. Remote workers, it seems, don’t get the benefits of being there in person.'” So a piece of content on how to sell yourself on a remote interview or as a remote worker could work great here.
Hydraulic doors: One of the big things that I see many people asking about online, both in forums which actually rank well for it, the questions that are asked in forums around this do rank around costs and prices for hydraulic doors. Therefore, I think this is something that many companies are uncomfortable answering right online. But if you can be transparent where no one else can, I think these Schweiss Doors guys have a shot at doing really well with that. So how much do hydraulic doors cost versus alternatives? There you go.
2. Do you have access to unique types of assets that other people don’t?
That could be research. It could be data. It could be insights. It might be stories or narratives, experiences that can help you stand out in a topic area. This is a great way to come up with blog post content. So basically, the idea is you could say, “Gosh, for our quarterly internal report, we had to prepare some data on the state of the market. Actually, some of that data, if we got permission to share it, would be fascinating.”
We can see through keyword research that people are talking about this or querying Google for it already. So we’re going to transform it into a piece of blog content, and we’re going to delight many, many people, except for maybe this guy. He seems unhappy about it. I don’t know what his problem is. We won’t worry about him. Wait. I can fix it. Look at that. So happy. Ignore that he kind of looks like the Joker now.
We can get these through a bunch of methodologies:
- Research, so statistical research, quantitative research.
- Crowdsourcing. That could be through audiences that you’ve already got through email or Facebook or Twitter or LinkedIn.
- Insider interviews, interviews with people on your sales team or your product team or your marketing team, people in your industry, buyers of yours.
- Proprietary data, like what you’ve collected for your internal annual reports.
- Curation of public data. So if there’s stuff out there on the web and it just needs to be publicly curated, you can figure out what that is. You can visit all those websites. You could use an extraction tool, or you could manually extract that data, or you could pay an intern to go extract that data for you, and then synthesize that in a useful way.
- Multimedia talent. Maybe you have someone, like we happen to here at Moz, who has great talent with video production, or with audio production, or with design of visuals or photography, or whatever that might be in the multimedia realm that you could do.
- Special access to people or information, or experiences that no one else does and you can present that.
Those assets can become the topic of great content that can turn into really great blog posts and great post ideas.
Remote Workers: They might say, “Well, gosh, we have access to data on the destinations people go and the budgets that they have around those destinations when they’re staying and working remotely, because of how our service interacts with them. Therefore, we can craft things like the most and least expensive places to work remotely on the planet,” which is very cool. That’s content that a lot of people are very interested in.
Hydraulic doors: We can look at, “Hey, you know what? We actually have a visual overlay tool that helps an architect or a building owner visualize what it will look like if a hydraulic door were put into place. We can go use that in our downtime to come up with we can see how notable locations in the city might look with hydraulic doors or notable locations around the world. We could potentially even create a tool, where you could upload your own visual, photograph, and then see how the hydraulic door looked on there.” So now we can create images that will help you share.
3. Relating a personal experience or passion to your topic in a resonant way.
I like this and I think that many personal bloggers use it well. I think far too few business bloggers do, but it can be quite powerful, and we’ve used it here at Moz, which is relating a personal experience you have or a passion to your topic in some way that resonates. So, for example, you have an interaction that is very complex, very nuanced, very passionate, perhaps even very angry. From that experience, you can craft a compelling story and a headline that draws people in, that creates intrigue and that describes something with an amount of emotion that is resonant, that makes them want to connect with it. Because of that, you can inspire people to further connect with the brand and potentially to inform and entertain.
There’s a lot of value from that. Usually, it comes from your own personal creativity around experiences that you’ve had. I say “you,” you, the writer or the author, but it could be anyone in your organization too. Some resources I really like for that are:
- Photos. Especially, if you are someone who photographs a reasonable portion of your life on your mobile device, that can help inspire you to remember things.
- A journal can also do the same thing.
- Conversations that you have can do that, conversations in person, over email, on social media.
- Travel. I think any time you are outside your comfort zone, that tends to be those unique things.
Remote workers: I visited an artist collective in Santa Fe, New Mexico, and I realized that, “My gosh, one of the most frustrating parts of remote work is that if you’re not just about remote working with a laptop and your brain, you’re almost removed from the experience. How can you do remote work if you require specialized equipment?” But in fact, there are ways. There are maker labs and artist labs in cities all over the planet at this point. So I think this is a topic that potentially hasn’t been well-covered, has a lot of interest, and that personal experience that I, the writer, had could dig into that.
Hydraulic doors: So I’ve had some conversations with do-it-yourselfers, people who are very, very passionate about DIY stuff. It turns out, hydraulic doors, this is not a thing that most DIYers can do. In fact, this is a very, very dramatic investment. That is an intense type of project. Ninety-nine percent of DIYers will not do it, but it turns out there’s actually search volume for this.
People do want to, or at least want to learn how to, DIY their own hydraulic doors. One of my favorite things, after realizing this, I searched, and then I found that Schweiss Doors actually created a product where they will ship you a DIY kit to build your own hydraulic door. So they did recognize this need. I thought that was very, very impressive. They didn’t just create a blog post for it. They even served it with a product. Super-impressive.
4. Covering a topic that is “hot” in your field or trending in your field or in the news or on other blogs.
The great part about this is it builds in the amplification piece. Because you’re talking about something that other people are already talking about and potentially you’re writing about what they’ve written about, you are including an element of pre-built-in amplification. Because if I write about what Darren Rowse at ProBlogger has written about last week, or what Danny Sullivan wrote about on Search Engine Land two weeks ago, now it’s not just my audience that I can reach, but it’s theirs as well. Potentially, they have some incentive to check out what I’ve written about them and share that.
So I could see that someone potentially maybe posted something very interesting or inflammatory, or wrong, or really right on Twitter, and then I could say, “Oh, I agree with that,” or, “disagree,” or, “I have nuance,” or, “I have some exceptions to that.” Or, “Actually, I think that’s an interesting conversation to which I can add even more value,” and then I create content from that. Certainly, social networks like:
- Subreddits. I really like Pocket for this, where I’ll save a bunch of articles, and then I’ll see which one might be very interesting to cover or write about in the future. News aggregators are great for this too. So that could be a Techmeme in the technology space, or a Memeorandum in the political space, or many others.
Remote workers: You might note, well, health care, last week in the United States and for many months now, has been very hot in the political arena. So for remoters, that is a big problem and a big question, because if your health insurance is tied to your employer again, as it was before the American Care Act, then you could be in real trouble. Then you might have a lot of problems and challenges. So what does the politics of health care mean for remote workers? Great. Now, you’ve created a real connection, and that could be something that other outlets would cover and that people who’ve written about health care might be willing to link to your piece.
Hydraulic doors: One of the things that you might note is that Eater, which is a big blog in the restaurant space, has written about indoor and outdoor space trends in the restaurant industry. So you could, with the data that you’ve got and the hydraulic doors that you provide, which are very, very common, well moderately common, at least in the restaurant indoor/outdoor seating space, potentially cover that. That’s a great way to tie in your audience and Eater’s audience into something that’s interesting. Eater might be willing to cover that and link to you and talk about it, etc.
The last two, I’m not going to go too into depth, because they’re a little more basic.
5. Pure keyword research-driven.
So this is using Google AdWords or keywordtool.io, or Moz’s Keyword Explorer, or any of the other keyword research tools that you like to figure out: What are people searching for around my topic? Can I cover it? Can I make great content there?
6. Readers who care about my topics also care about ______________?
Essentially taking any of these topics, but applying one level of abstraction. What I mean by that is there are people who care about your topic, but also there’s an overlap of people who care about this other topic and who also care about yours.
hydraulic doors: People who care about restaurant building trends and hydraulic doors has a considerable overlap, and that is quite interesting.
Remote workers: It could be something like, “I care about remote work. I also care about the gear that I use, my laptop and my bag, and those kinds of things.” So gear trends could be a very interesting intersect. Then, you can apply any of these other four processes, five processes onto that intersection or one level of an abstraction.
All right, everyone. We have done a tremendous amount here to cover a lot about blog topics. But I think you will have some great ideas from this, and I look forward to hearing about other processes that you’ve got in the comments. Hopefully, we’ll see you again next week for another edition of Whiteboard Friday. Take care.
Posted by jocameron
Reporting can be the height of tedium. You spend your time making those reports, your client may (or may not) spend their time trying to understand them. And then, in the end, we’re all left with some unanswered questions and a rumble in the tum of dissatisfaction.
I’m going to take some basic metrics, throw in some culinary metaphors, and take your client reporting to the next level.
By the end of this article you’ll know how to whip up intelligent SEO reports for your clients (or potential clients) that will deliver actionable insights any search chef worth their salt would be proud of.
[Part one] Freshly foraged keywords on sourdough to power your campaign
I’ve got intel on some really tasty keywords; did you know you can scoop these up like wild porcini mushrooms using your website categories? The trick is to find the keywords that you can use to make a lovely risotto, and discard the ones that taste nasty.
The overabundance of keywords has become a bit of a challenge for SEOs. Google is better at gauging user intent — it’s kind of their thing, right? This results in the types of keywords that send traffic to your clients expanding, and it’s becoming trickier to track every. single. keyword. Of course, with a budget big enough almost anything is possible, but why hemorrhage cash on tracking the keyword minutiae when you can wrangle intelligent data by tracking a sample of keywords from a few pots?
With Keyword Explorer, you can save your foraged terms to lists. By bundling together similar “species,” you’ll get a top-level view of the breadth and depth of search behavior within the categories of your niche. Easily compare volume, difficulty, opportunity, and potential to instigate a data-driven approach to website architecture. You’ll also know, at a glance, where to expand on certain topics and apply more resources to content creation.
With these metrics in hand and your client’s industry knowledge, you can cherry-pick keywords to track ranking positions week over week and add them to your Moz Pro campaign with the click of a button.
What’s the recipe?
Step 1: Pluck keywords from the category pages of your client’s site.
Step 2: Find keyword suggestions in Keyword Explorer.
Step 3: Group by low lexicon to bundle together similar keywords to gather up that long tail.
Step 4: Analyze and save relevant results to a list
Step 5: Head to the Keyword Lists and compare the metrics: where is the opportunity? Can you compete with the level of difficulty? Is there a high-volume long tail that you can dig in to?
Step 6: Add sample keywords from your pots directly to your campaign.
Bonus step: Repeat for products or other topic segments of the niche.
Don’t forget to drill into the keywords that are turning up here to see if there are categories and subcategories you hadn’t thought of. These can be targeted in existing content to further extend the relevancy and reach of your client’s content. Or it may inspire new content which can help to grow the authority of the site.
Why your client will be impressed
Through solid, informed research, you’ll be able to demonstrate why their site should be structured with certain categories on the top-level navigation right down to product pages. You’ll also be able to prioritize work on building, improving, or refining content on certain sections of the site by understanding the breakdown of search behavior and demand. Are you seeing lots of keywords with a good level of volume and lower difficulty? Or more in-depth long tail with low search volume? Or fewer different keywords with high search volume but stronger competition?
Let the demand drive the machine forward and make sure you’re giving the hordes what they want.
All this helps to further develop your understanding of the ways people search so you can make informed decisions about which keywords to track.
[Part two] Palate-cleansing lemon keyword label sorbet
Before diving into the next course you need to cleanse your palate with a lemon “label” sorbet.
In Part One, we talked about the struggle of maintaining gigantic lists of keywords. We’ve sampled keywords from our foraged pots, keeping these arranged and segmented in our Moz Pro campaign.
Now you want to give those tracked keywords a more defined purpose in life. This will help to reinforce to your client why you’re tracking these keywords, what the goal is for tracking them, and in what sort of timeframe you’re anticipating results.
Types of labels may include:
- Local keywords: Is your business serving local people, like a mushroom walking tour? You can add geo modifiers to your keywords and label them as such.
- Long-tail keywords: Might have lower search volume, but focused intent can convert well for your client.
- High-priority keywords: Where you’re shoveling more resources, these keywords are more likely impacting the other keyword segments.
- Brand keywords: Mirror, mirror on the wall… yeah, we all want those vanity keywords, don’t lie. You can manage brand keywords automatically through “Manage Brand Rules” in Moz Pro:
A generous scoop of tasty lemon “label” sorbet will make all the work you do and progress you achieve infinitely easier to report on with clear, actionable focus.
What’s the recipe?
Step 1: Label your keywords like a pro.
Step 2: Filter by labels in the Ranking tab to analyze Search Visibility for your keyword segments.
In this example, I’m comparing our visibility for “learn” keywords against “guide” keywords:
Step 3: Create a custom report for your keyword segments.
Step 4: Add a drizzle of balsamic vinegar by triggering the Optimize button — now you can send the latest on-page reporting with your super-focused ranking report.
Why your client will be impressed
Your ranking reports will be like nothing your client has ever tasted. They will be tightly focused on the segments of keywords you’re working on, so they aren’t bamboozled by a new slew of keywords or a sudden downward trend. By clearly segmenting your piles of lovely keywords, you’ll be proactively answering those inevitable queries about why, when, and in what form your client will begin to see results.
With the on-page scores updating automatically and shipping out to your client’s inbox every month via a custom report, you’ll be effortlessly highlighting what your team has achieved.
[Part three] Steak sandwich links with crispy competitor bacon
You’re working with your client to publish content, amplifying it through social channels and driving brand awareness through PR campaigns.
Now you want to keep them informed of the big wins you’ve had as a result of that grind. Link data in Moz Pro focuses on the highest-quality links with our Mozscape index, coming from the most prominent pages of authoritative sites. So, while you may not see every link for a site within our index, we’re reporting the most valuable ones.
Alongside our top-quality steak sarnie, we’re add some crispy competitor bacon so you can identify what content is working for the other sites in your industry.
What’s the recipe?
Step 1: Check that you have direct competitors set up on your campaign.
Step 2: Compare link metrics for your site and your competitors.
Step 4: Head to Top Pages to see what those competitors are doing to get ahead.
Step 5: Compile a delicious report sandwich!
Step 6: Make another report for Top Pages for the bacon-filled sandwich experience.
Why your client will be impressed
Each quality established link gives your client a clear idea of the value of their content and the blood, sweat, and tears of your team.
These little gems are established and more likely to have an impact on their ranking potential. Don’t forget to have a chat with your client where you explain that a link’s impact on rankings takes time.
By comparing this directly with the other sites battling it out for top SERP property, it’s easier to identify progress and achievements.
By highlighting those pesky competitors and their top pages by authority, you’re also getting ahead of that burning question of: How can we improve?
[Part four] Cinnamon-dusted ranking reports with cherry-glazed traffic
Rankings are a staple ingredient in the SEO diet. Much like the ever-expanding keyword list, reporting on rankings has become something we do without thinking enough about that what clients can do with that information.
Dish up an all-singing, all-dancing cinnamon-dusted rankings report with cherry-glazed traffic by illustrating the direct impact these rankings have on organic traffic. Real people, coasting on through the search results to your client’s site.
Landing Pages in Moz Pro compares rankings with organic landing pages, imparting not just the ranking score but the value of those pages. Compliments to the chef, because that good work is down to you.
What’s the recipe?
Step 1: Track your target keywords in Moz Pro.
Step 2: Check you’ve hooked up Google Analytics for that tasty traffic data.
Step 3: Discover landing pages and estimated traffic share.
As your SEO work drives more traffic to those pages and your keyword rankings steadily increase, you’ll see your estimated traffic share go up.
If your organic traffic from search is increasing but your ranking is dropping off, it’s an indication that this keyword isn’t the driving force.
Why your client will be impressed
We all send ranking reports, and I’m sure clients just love it. But now you can dazzle them with an insight into what those rankings mean for the lifeblood of their site.
You can also take action by directing more energy towards those well-performing keywords, or investigate what worked well for those pages and replicate it across other keywords and pages on your site.
It’s time to say “enough is enough” and inject some flavor into those bland old SEO reports. Your team will save time and your clients will thank you for the tasty buffet of reporting delight.
Next Level is our educational series combining actionable SEO tips with tools you can use to achieve them. Check out any of our past editions below:
- Hunting Down SERP Features to Understand Intent & Drive Traffic
- I’ve Optimized My Site, But I’m Still Not Ranking—Help!
- Diving for Pearls: A Guide to Long Tail Keywords
- Be Your Site’s Hero: An Audit Manifesto
- How to Defeat Duplicate Content
- Conquer Your Competition with These Three Moz Tools
- 10 Tips to Take the Moz Tools to the Next Level
Posted by JoyHawkins
Recently, I’ve had a lot of people ask me how to deal with duplicate listings in Google My Business now that MapMaker is dead. Having written detailed instructions outlining different scenarios for the advanced local SEO training manual I started selling over at LocalU, I thought it’d be great to give Moz readers a sample of 5 pages from the manual outlining some best practices.
What you need to know about duplicate GMB listings
Before you start, you need to find out if the listing is verified. If the listing has an “own this business” or “claim this business” option, it is not currently verified. If missing that label, it means it is verified — there is nothing you can do until you get ownership or have it unverified (if you’re the one who owns it in GMB). This should be your first step before you proceed with anything below.
- Do the addresses on the two listings match? If the unverified duplicate has the same address as the verified listing, you should contact Google My Business support and ask them to merge the two listings.
- If the addresses do not match, find out if the business used to be at that address at some point in time.
- If the business has never existed there:
- Pull up the listing on Maps
- Press “Suggest an edit”
- Switch the toggle beside “Place is permanently closed” to Yes
- Select “Never existed” as the reason and press submit. *Note: If there are reviews on the listing, you should get them transferred before doing this.
- If the business has never existed there:
- If the duplicate lists an address that is an old address (they were there at some point but have moved), you will want to have the duplicate marked as moved.
Service area businesses
- Is the duplicate listing verified? If it is, you will first have to get it unverified or gain access to it. Once you’ve done that, contact Google My Business and ask them to merge the two listings.
- If the duplicate is not verified, you can have it removed from Maps since service area businesses are not permitted on Google Maps. Google My Business allows them, but any unverified listing would follow Google Maps rules, not Google My Business. To remove it:
- Pull up the listing on Maps
- Press “Suggest an edit”
- Switch the toggle beside “Place is permanently closed” to Yes
- Select “Private” as the reason and press submit. *Note: If there are reviews on the listing, you should get them transferred before doing this.
Public-facing professionals (doctors, lawyers, dentists, realtors, etc.) are allowed their own listings separate from the office they work for, unless they’re the only public-facing professional at that office. In that case, they are considered a solo practitioner and there should only be one listing, formatted as “Business Name: Professional Name.”
Solo practitioner with two listings
This is probably one of the easiest scenarios to fix because solo practitioners are only supposed to have one listing. If you have a scenario where there’s a listing for both the practice and the practitioner, you can ask Google My Business to merge the two and it will combine the ranking strength of both. It will also give you one listing with more reviews (if each individual listing had reviews on it). The only scenario where I don’t advise combining the two is if your two listings both rank together and are monopolizing two of the three spots in the 3-pack. This is extremely rare.
If the business has multiple practitioners, you are not able to get these listings removed or merged provided the practitioner still works there. While I don’t generally suggest creating listings for practitioners, they often exist already, leaving people to wonder what to do with them to keep them from competing with the listing for the practice.
A good strategy is to work on having multiple listings rank if you have practitioners that specialize in different things. Let’s say you have a chiropractor who also has a massage therapist at his office. The massage therapist’s listing could link to a page on the site that ranks highly for “massage therapy” and the chiropractor could link to the page that ranks highest organically for chiropractic terms. This is a great way to make the pages more visible instead of competing.
Another example would be a law firm. You could have the main listing for the law firm optimized for things like “law firm,” then have one lawyer who specializes in personal injury law and another lawyer who specializes in criminal law. This would allow you to take advantage of the organic ranking for several different keywords.
Keep in mind that if your goal is to have three of your listings all rank for the exact same keyword on Google, thus monopolizing the entire 3-pack, this is an unrealistic strategy. Google has filters that keep the same website from appearing too many times in the results and unless you’re in a really niche industry or market, it’s almost impossible to accomplish this.
Practitioners who no longer work there
It’s common to find listings for practitioners who no longer work for your business but did at some point. If you run across a listing for a former practitioner, you’ll want to contact Google My Business and ask them to mark the listing as moved to your practice listing. It’s extremely important that you get them to move it to your office listing, not the business the practitioner now works for (if they have been employed elsewhere). Here is a good case study that shows you why.
If the practitioner listing is verified, things can get tricky since Google My Business won’t be able to move it until it’s unverified. If the listing is verified by the practitioner and they refuse to give you access or remove it, the second-best thing would be to get them to update the listing to have their current employer’s information on it. This isn’t ideal and should be a last resort.
Listings for employees (not public-facing)
If you find a listing for a non-public-facing employee, it shouldn’t exist on Maps. For example: an office manager of a law firm, a paralegal, a hygienist, or a nurse. You can get the listing removed:
- Pull up the listing on Maps
- Press “Suggest an edit”
- Switch the toggle beside “Place is permanently closed..” to Yes
- Select “Never existed” as the reason and press submit.
Listings for deceased practitioners
This is always a terrible scenario to have to deal with, but I’ve run into lots of cases where people don’t know how to get rid of listings for deceased practitioners. The solution is similar to what you would do for someone who has left the practice, except you want to add an additional step. Since the listings are often verified and people usually don’t have access to the deceased person’s Google account, you want to make sure you tell Google My Business support that the person is deceased and include a link to their obituary online so the support worker can confirm you’re telling the truth. I strongly recommend using either Facebook or Twitter to do this, since you can easily include the link (it’s much harder to do on a phone call).
Creating practitioner listings
If you’re creating a practitioner listing from scratch, you might run into issues if you’re trying to do it from the Google My Business dashboard and you already have a verified listing for the practice. The error you would get is shown below.
There are two ways around this:
- Create the listing via Google Maps. Do this by searching the address and then clicking “Add a missing place.” Do not include the firm/practice name in the title of the listing or your edit most likely won’t go through, since it will be too similar to the listing that already exists for the practice. Once you get an email from Google Maps stating the listing has been successfully added, you will be able to claim it via GMB.
- Contact GMB support and ask them for help.
We hope you enjoyed this excerpt from the Expert’s Guide to Local SEO! The full 160+-page guide is available for purchase and download via LocalU below.
Posted by BenjaminEstes
Google’s PageSpeed Insights is an easy-to-use tool that tests whether a web page might be slower than it needs to be. It gives a score to quantify page performance. Because this score is concrete, the PageSpeed Insights score is often used as a measure of site performance. Similarly to PageRank years back, folks want to optimize this number just because it exists. In fact, Moz has a popular article on this subject: How to Achieve 100/100 with the Google Page Speed Test Tool.
For small sites on common CMSes (think WordPress), this can be accomplished. If that’s you, PageSpeed Insights is a great place to start. For most sites, a perfect score isn’t realistic. So where do we start?
That’s what this post is about. I want to make three points:
- Latency can hurt load times more than bandwidth
- PageSpeed Insights scores shouldn’t be taken at face value
- Improvement starts with measurement, goal setting, and prioritization
I’m writing with SEO practitioners in mind. I’ll skip over some of the more technical bits. You should walk away with enough perspective to start asking the right questions. And you may make better recommendations as a result.
Disclaimer: HTTP2 improves some of the issues discussed in this post. Specifically, multiple requests to the same server are less problematic. It is not a panacea.
Latency can hurt load times more than bandwidth
A first look at PageSpeed Insights’ rules could make you think it’s all about serving fewer bytes to the user. Minify, optimize, compress. Size is only half the story. It also takes take time for your request simply to reach a server. And then it takes time for the server to respond to you!
What happens when you make a request?
If a user types a URL into a browser address bar and hits enter, a request is made. Lots of things happen when that request is made. The very last part of that is transferring the requested content. It’s only this last bit that is affected by bandwidth and the size of the content.
Fulfilling a request requires (more or less) these steps:
- Find the server
- Connect to the server
- Wait for a response
- Receive response
Each of these steps takes time, not just the last. The first three are independent of file size; they are effectively constant costs. These costs are incurred with each request regardless of whether the payload is a tiny, minified CSS file or a huge uncompressed image.
Why does it take time to get a response?
The factor we can’t avoid is that network signals can’t travel faster than the speed of light. That’s a theoretical maximum; in reality, it will take longer than that for data to transfer. For instance, it takes light about 40ms for a round trip between Paris and New York. If it takes twice that time for data to actually cross the Atlantic, then the minimum time it will take to get a response from a server is 80ms.
This is why CDNs are commonly used. CDNs put servers physically closer to users, which is the only way to reduce the time it takes to reach the server.
How much does this matter?
Check out this chart (from Chrome’s DevTools):
All of the values in the red box are what I’m considering “latency.” They total about 220ms. The actual transfer of content took 0.7ms. No compression or reduction of filesize could help this; the only way to reduce the time taken by the request is to reduce latency.
Don’t we need to make a lot of requests to load a page?
Fortunately, once a server has been found (“DNS Lookup” in the image above), the browser won’t need to look it up again. It will still have to connect, and we’ll have to wait for a response.
A skeptical read of PageSpeed Insights tests
All of the PageSpeed Insights evaluations cover things that can impact site speed. For large sites, some of them aren’t so easy to implement. And depending on how your site is designed, some may be more impactful than others. That’s not to say you have an excuse not to do these things — they’re all best-practice, and they all help. But they don’t represent the whole site speed picture.
With that in mind, here’s a “skeptical reading” of each of the PageSpeed Insights rules.
Tests focusing on reducing bandwidth use
Unless you have huge images, this might not be a big deal. This is only measuring whether images could be further compressed — not whether you’re loading too many.
Will likely reduce overhead only by tens of KB. Latency will have a bigger impact than response size.
Will likely reduce overhead only by tens of KB. Latency will have a bigger impact than response size.
Probably not as important as consolidating JS into a single file to reduce the number of requests that have to be made.
Tests focusing on reducing latency
Leverage browser caching
Definitely let’s cache our own files. Lots of the files that could benefit from caching are probably hosted on 3rd-party servers. You’d have to host them yourself to change cache times.
Reduce server response time
Threshold on PSI is too high. It also tries to exclude the physical latency of the server—instead looking only at how long it takes the server to respond once it receives a request.
Avoid landing page redirects
A valid concern, but can be frustratingly difficult. Having zero requests on top of the initial page load to render above-the-fold content isn’t necessary to meet most performance goals.
Prioritize visible content
Actually kind of important.
Don’t treat these as the final word on site performance! Independent of these tests, here are some things to think about. Some aren’t covered at all by PageSpeed Insights, and some are only covered halfway:
- Caching content you control.
- Reducing the amount of content you’re loading from 3rd-party domains.
- Reducing server response time beyond the minimum required to pass PageSpeed Insights’ test.
- Moving the server closer to the end user. Basically, use a CDN.
- Reducing blocking requests. Ensuring you’re using HTTP2 will help here.
How to start improving
The screenshots in this post are created with Chrome DevTools. It’s built into the browser and allows you to inspect exactly what happens when a page loads.
Instead of trusting the Pagespeed Insights tool, go ahead and load your page in Chrome. Check out how it performs. Look at what requests actually seem to take more time. Often the answer will be obvious: too much time will be spent loading ads, for instance.
If a perfect PageSpeed Insights score isn’t your goal, you need to know what your goal will be. This is important, because it allows you to compare current performance to that goal. You can see whether reducing bandwidth requirements will actually meet your goal, or whether you also need to do something to reduce latency (use a CDN, handle fewer requests, load high-priority content first).
Prioritizing page speed “fixes” is important — that’s not the only type of prioritization. There’s also the question of what actually needs to be loaded. PageSpeed Insights does try to figure out whether you’re prioritizing above-the-fold content. This is a great target. It’s also not a perfect assessment; it might be easier to split content into “critical” and “non-critical” paths, regardless of what is ostensibly above the fold.
For instance: If your site relies on ad revenue, you might load all content on the page and only then begin to load ads. Figuring out how to serve less is a challenge best tackled by you and your team. After all, PageSpeed Insights is a one-size-fits-all solution.
The story so far has been that PageSpeed Insights can be useful, but there are smarter ways to assess and improve site speed. A perfect score doesn’t guarantee a fast site.
If you’re interested in learning more, I highly recommend checking out Ilya Grigorik’s site and specifically this old-but-good introduction deck. Grigorik is a web performance engineer at Google and a very good communicator about site speed issues.