I’ve written over 3,000 words walking you through the process of scraping the “People Also Ask” Google feature to mine keyword opportunities.
Feel free to read it, it’s thorough, and should help avoid any errors during the process. However, it’s quite long, so a quick TL;DR summary is below:
To Scrape “People Also Ask” (using Scrapebox):
- Formulate your seed Google keyword URLs (i.e. https://www.google.com/search?q=Keyword)
- Fire up scrapebox “Custom Data Grabber” and use this pattern: before_after=<div class=”ygGdYd related-question-pair”>|</div>
- NOTE! – This was up to date as of May 1, 2021. Google’s search result pages change all the time. This DIV class WILL CHANGE (it may already have changed). To figure out the best DIV class to use, Right Click + Inspect and see what the current class is.
- Run the data grabber over your seed list, it will spill out keywords for you.
- Use those keywords to create new URLs and feed it back into the same process.
To Scrape “People Also Ask” (using Screaming Frog):
- Formulate your seed Google keyword URLs (i.e. https://www.google.com/search?q=Keyword)
- Fire up Screaming Frog “Custom Extraction”, select “CSSPath”, and use the pattern: .related-question-pair
- Make sure to set your User Agent to a “normal” user agent such as Chrome, IE, etc.
- Run the crawler. If the pattern is correct (see note above), it will extract each related query in a column. This can be exported to a CSV and manipulated in Excel.
Voila! That’s it.
Now, if you need any help with the various steps in the (Scrapebox) process, it’s all laid out below. I also wax poetic about content marketing a bit (forgive me).
Content Marketing and Keyword Research
The combination of SEO and content marketing has created tremendous potential for reaching new customers, users and eyeballs.
Really, people take to Google to find information about anything and everything, so the depth of opportunity for targeting new keywords, questions or pain points is virtually unlimited.
But in a world with unlimited potential, how can you go about finding the best topics to cover for you content? Is there a more informed approach then just “publish and pray”?
Of course there is. It’s called keyword research…
However, most keyword research sticks to the higher volume keywords. This is great, but only one part of the puzzle.
Search suggest tools help find longer tail keywords, and have become a staple of many SEOs. However a slightly different approach, which is entirely “Question-centric”, is to look at “People Also Ask”.
I’m going to dive into a great way to pull as many of those juicy questions as you need to help power your content marketing and keyword research, and keep yourself one step ahead of your competition.
What You Will Learn
- Introduction to Keyword Research
- Search Suggest Tools
- Why “People Also Asked” is Different
- Step-by-Step How-to: Scraping “People Also Asked”
- What now? Post Scrape Evaluation
Introduction to Keyword Research
For experienced SEOs and Internet Marketers, keyword research is nothing new. The standard process is this:
- Leverage available tools (most commonly Keywords Everywhere, Ahrefs, or if you’ve got an active Adwords campaign you can still leverage keyword date from Google AdWords Keyword Planner).
- Enter main product/service/topic.
- Collect relevant related terms/topics.
- Prioritize based on relevance, search volume, etc.
This is a nice approach when just starting out. You want to create a foundational keyword set that your site can target and eventually rank for over time. All your link building and promotional efforts will help drive ranking for those valuable keywords.
Let’s walk through this example in the blogging niche. If we want to start a site about blogging, we might go to the Google AdWords keyword tool and get a handful of keywords.
Starting with the extremely generic keyword “blogging”, we’d get this result:
Still generic, but some neat keywords to drill down into, let’s go for “How to start a blog”, with 135,000 searches per month (not too shabby).
Feeding that back into Keyword Planner, we get this as related searches:
Perhaps we are starting to drill down, but you’ll start to see a lot of the same high volume, high competition keywords.
What becomes apparent when you are using Keyword planner is that “long tail” keywords are not readily available. This is a problem for a few reasons:
- Limits content targeting to high competition keywords.
- Doesn’t allow insights into discrete customer needs.
- Provides some semantic flexibility, but generally stays within the proposed topic.
Keep in mind that this tool is meant to help develop AdWords campaigns, and long-tail terms often don’t have enough search volume to warrant bidding on AdWords. It makes sense that a tool like this wouldn’t be the perfect source for tracking down content marketing keywords.
So, what are some ways to surface longer tail terms? Enter Search Suggest tools:
Search Suggest Tools
Google introduced search suggest all the way back in 2008, so as you probably know, it provides suggestions for your keyword before you finish typing it in, like this screenshot:
While this provides a great user experience and saves time typing/misspelling long queries for regular search users, it also provides a wonderful source of keyword research for SEOs looking to get lots of longer tail keywords that don’t necessarily get surfaced in the keyword planner.
(When Google introduced its search suggest tool, I wonder if it had the foresight to see that it would be the target of so much scraping by SEOs.)
There’s lots of different tools that scrape this data and present it to the user:
Ubersuggest.io (the originator).
AnswerthePublic.com (uses the “five W’s” to suggest questions, along with other phrases).
These all generally use the same process, something along the lines of:
- Original Query (10 suggestions)
- Original Query + “a” through “z” (10 suggestions each)
- Original Query + “0” through “9” (10 suggestions each)
- Original Query + miscellaneous other terms (prepositions, five W’s, etc.).
It does a pretty good job at grabbing an extremely large amount of suggestions. These need to be combed through for relevant terms.
Taking our “blogging” example, if we fed that into UberSuggest you’d get 329 terms, here’s a sample:
Not bad, but there is a lot sorting to do and obviously a lot of irrelevant keywords for starting a content marketing initiative.
What if there was a better way?
Why “People Also Ask” is different
Google recently launched the “Featured Snippet” which takes up “Position 0” (i.e. above Position 1 in the SERPs) and attempts to answer a user’s search question.
If we go back to “how to start a blog”, you’ll see the answer box pop up:
While this presents a unique SEO opportunity to rank above all the other results (more on that later), what is more pertinent to our current discussion is the handy suggestions that accompany this answer box, i.e. “People Also Ask”, see below:
Pretty neat, huh?
Wouldn’t it be great if we could pull together all the questions that users ask about our niche? Wouldn’t that make a great source of both content topic opportunities, marketing opportunities, and all types of insights into our audience?
Well you can! I’m going to walk you through the exact process, leveraging the simple tools of Scrapebox and Excel, to pull together all those questions into a handy spreadsheet for analysis.
But first, let’s go over exactly what separates this from the run of the mill search suggest tools mentioned above…
Doesn’t require the keyword
The suggestions are not limited to a keyword match, they are sourced from questions also asked by users. This means you get a wider understanding of all the semantically related questions, and not just limited to which ones contain your target keyword.
Semantic Graph
Because we aren’t simply pulling keyword matches, you can get a full understanding of the semantic graph around your niche. Google’s powerful semantic analysis will detect this, allowing you to establish comprehensive relevance.
Multiple tiers of depth
The process I will outline below will feedback questions into Google and pull more and more suggestions, this gives multiple layers of depth, full delving into the niche.
Position 0 and the Answer Box
As mentioned above, not only do each of these keywords present opportunity for reaching out to your audience, but they also each have an Answer Box for themselves. This means by targeting it specifically you can jump to position 0!
Step by Step How-To: Scraping “People Also Ask”
I have not come across a tool that Scrapes these terms for you, so until one becomes publicly available we will need to take the scraping into our own hands.
For this we will need a few tools:
- A working copy of Scrapebox.
- Some dedicated proxies (5-10 will do, but more if you want to go faster).
- Excel or another spreadsheet that you are comfortable using.
If you’ve got all those handy, let’s continue!
Web Scraping 101
So, you can understand a little bit about what we do here, let’s cover a bit about “scraping”, and what exactly it means, and how we leverage it for our purposes.
Web scraping is a simple process, it only has 2 steps:
- A webpage is loaded (i.e.: a google search result page).
- The page data is processed, and target data is extracted (i.e.: questions in the “People Also Ask” section).
Simple, right? The tricky part comes in the following ways:
- Google hates scrapers and will hit any suspicious traffic with captcha’s or outright ban. Therefore, we need to take it slow and use proxy IPs.
- Extracting the data we want can be complicated, but knowing the code of Google SERPs we can formulate a Regular Expression that extracts the data we want. (Don’t worry, I’ve handled this for you).
Intro to Scrapebox
As its name states, Scrapebox is all about scraping. It’s a go-to tool for both spammers, blackhats as well as white-hats looking to pull data and automate tasks. If you are serious about SEO, you should really learn your way around the program.
When you open it up, it might seem overwhelming, but don’t fear. It’s quite simple.
Let’s walk through the sections below:
Harvester and Keywords (top left)
We won’t be using this in the process, so feel free to ignore.
However, if you’re curious: This would be where you formulate a giant list of keywords which can be Google’d/Bing’d/etc., and Scrapebox would pull all the URLs that Google/Bing returns.
This is a very common use of Scrapebox, it can be useful for pull URL sources for spam tools, finding opportunities for outreach, anything that requires a bunch of Googling can be automated here.
URLs Harvested (top center)
As the name states this would be where URLs get populated when harvested by the harvester. However, it’s also a repository for URLs that we want to use for other purposes (i.e. grabbing data, checking indexation, etc.)
Think of this like a URL bucket that acts as a source for various other tools.
Manage Lists (right side)
This menu contains all the various tools that Scrapebox contains. There’s a ton of options here, so don’t worry if you don’t understand all of them.
We’ll be interested in the “Custom Data Grabber”, which is listed in the “Grab/Check” menu.
Comment Poster
I’ve never used this section and probably never will. If you want to spam blog comments, GSA Search Engine Ranker is the gold standard, not Scrapebox.
Now that we’ve done the grand tour of your Scrapebox install, let’s dive into the actual scrape for our “People Also Ask” questions.
The Scrape Process
Here’s what we will be doing:
- Getting a list of “seed” questions and creating Google search query URLs for those questions.
- Setting up our Custom Data Grabber to pull the “People Also Ask” suggestions.
- Export questions, turn them into a new round of search queries.
- Rinse and repeat.
Source keywords
How do we come up with source keywords? Depends on your niche.
For larger sites what I usually do is take a full list of products, services, topics or keywords and prefix them with “What is”, “What are”, “How to” and other question terms.
If we wanted to start smaller, we can come up with just a handful of questions that are pertinent to a niche, and go from there.
Taking our “blogging” niche from above, let’s start with a handful that I’ve pull together manually:
- How do I blog?
- How do you start a successful blog?
- How do you start a blog post?
- How do you start a writing blog?
- How can I create my personal blog?
- how to content marketing
Turning that into a Google search URL is simple. Google search results URL are formatted like this:
https://www.google.com/search?q=keyword.
Make use of the handy “Concatenate” function in Excel to match up these strings.
Then, make sure you swap out spaces with the plus (+) sign, or you might get some errors when scraping.
A common annoyance here is that you’ll need to find and replace over pure text, NOT the concatenate formula. So, you’ll need to do a quick Copy + Paste Special (Values) to turn the query URLs into pure text.
You’ll get the following after all is said and done:
Ok, we are ready to feed these in Scrapebox and start finally scraping something.
Copy these URLs to the clipboard and head over to your Scrapebox menu.
Get these URLs into our “Harvester” by going to the menu:
“Import URL List” -> “Paste/Replace from Clipboard”.
You’ll see it appear in the Harvester URL section, like below:
We are now set with our Google SERP URLs, ready to grab any data that appear on them. But how do we do that?
Easy, custom data grabber!
Custom Data Grabber
Ok, now we get a little nerdy.
In Scrapebox, navigate to:
“Grab/Check” -> “Custom Data Grabber” -> “Create / Edit Custom Grabber”
If this is your first time with this tool you will see all blanks, fill them in as you see below:
Then click “Save as New Module”.
Once that is done, highlight the “People Also Ask” module you just created on the left-hand pane and click “Edit Module Masks”.
This “Mask” is where we will identify a pattern to look for/extract from our Google SERP.
See screen below:
Yours will be blank, fill in with the fields as above and click “Save as new mask”.
Here is the “regex” portion (although not technically regex), so you can copy/paste:
before_after=<div class=”ygGdYd related-question-pair”>|</div>
NOTE: As mentioned above, this pattern will change. To find the new DIV class, Right Click + Inspect the code of the “People Also Ask” section of the Google SERP.
You’ve now created the Custom Data Grabber that you need to use to grab those questions. Close out back to the Scrapebox main screen.
Now to start the scraping!
Navigate to:
“Grab/Check” -> “Custom Data Grabber” -> “People Also Ask”. You should see the following screen:
Set the “Delay” to 10 seconds or more with 5-10 proxies. With more proxies you can have a shorter delay, but no need to rush, rather slow and steady then burn your proxies.
Also, if you have a large set of seed URLs, you should check “Save URLs with extracted data”.
I do this to keep track of where certain questions originated from (i.e. what category, etc.)
Click “Start” and watch the scraping happen before your eyes!
The Results (Round 1)
This first batch of results I like to call “Round 1”, because it is just the beginning. Often, I don’t know if any of the seed questions will results in Answer Box/People Also Ask, so this is more of a test round.
Once the scraping is complete click “Show Data Folder”. This will open a folder with text files inside. Click the most recent text file to see the juicy scraped data.
You should see something like this:
Pretty cool, no?
How can we make sense of this and give it a more presentable format? If you are handy at Excel the answer may be obvious, but for everyone else I will walk through my process.
Wrangling Data in Excel
Copy + Paste the text from above into Excel.
Navigate to the Top Menu, under Data -> “Text to Columns”
(Text to columns is a super useful tool for parsing data based on simple delimiters. In this case we have the “pipe” symbol (“|”) which separates our origin URLs from our Questions.)
A simple 3 step process:
Step 1: Choose “Delimited”. Click Next.
Step 2: Uncheck any “Delimiters” and Fill in “Other” with the pipe symbol (“|”). Click Next.
Step 3: Keep the “Column data format” as “General” and click “Finish”.
And boom, you are left with a nicely formatted list of original URLs in one column and questions in the next.
Feedback Loop
Now, the real value from this process comes when you feed your results back into the search engine to grab even more questions. Use the same process as before for creating the Google search URLs and replacing spaces with +’s.
I also like to mark each set with it’s appropriate “Round” number. This makes it easier to not feed duplicate questions in each round.
After doing this (and removing duplicate Questions) I’m left with the following pretty little spreadsheet:
Now you can take all the “New Query URLs” and pump them back into Scrape Box for Round 2 (and then Round 3 and so on)!
The final Product
Following this exact process, beginning with 6 seed questions and (going for 4 rounds), I was able to generate over 275 unique questions.
See screenshot:
Not bad!
Starting with a large seed set, such as products, services or categories/topics would likely lead to a much larger set.
What Now? Post-Scrape Evaluation
Ok, great, now we’ve got a ton of questions that users are asking about our niche. What now?
While it would be great to just dump this over to a content writer, the truth of the matter is that it requires a bit more thought than that. We’ll need to evaluate the questions on a few points.
Topical Relevance
Are the questions relevant to our space? Sometimes this process can spiral off on a tangent, pulling in irrelevant questions.
It’s easy enough to eliminate these through a cursory manual review.
FAQs vs. 1-off articles
Of the relevant questions that remain, what is the best way to address them?
Are they significant enough to warrant an article dedicated to them? Or are they minor enough that they could be placed in a FAQ/Top Questions-style article and only have a paragraph dedicated to them.
A few points here:
- Check search volume – Higher volume keywords will likely require a fuller article to rank.
- Check competition – Do a quick search for the keyword and see what ranks. If it’s a paragraph from within a larger article, that’s a good indication of what Google is rewarding, and can help steer your strategy.
Market Opportunities
These questions are more than just keywords, they are insights into the inquisitive mind of your audience. It’s possible that these questions can surface unexploited market opportunities and user stories.
Keep an open mind when read through these questions, you never know what gold nuggets you might find.
Ranking in the Answer Box
Ah yes, back to shrewd, tactical SEO. These questions are not only great insight into your audience, but all of them have the Answer Box in position 0. This means that with the correct formatting you can rank in Position 0 just by ranking somewhere on the 1st page.
There are lots of guides online that dissect how to best rank in the Answer Box, but just a few tips here:
- Use the exact question in an H1/H2 tag.
- Include bullet points, lists or tables when possible (the answer box/featured snippet loves structured content).
- Use lots of “LSI” or synonyms within and around your answer.
- Include images with relevant alt text.
Wrapping it up
Keyword research is the foundation of any online marketing or content marketing initiative. Without knowledge of your audience’s pain points, you won’t be able to properly address them. This goes for the high-volume keywords all the way to the microscopic long-tail.
When Google introduced Search suggest it provided unprecedented insight into the very long tail and very specific queries that users search for, and the “People Also Ask” section is another example of that, only now in question form, which is brilliantly applicable to many different content and marketing initiatives.
Combining these question insights with the tactical elements of ranking in the Featured Snippet/Answer Box, the “People Also Asked” bring a wonderful layer of strategy and relevance that can complement any competitive digital initiative, large or small.