Can't access innerText property using Puppeteer - .$$eval and .$$ is not yielding results - JavaScript

I am working on a web scrapper that searches Google for certain things and then pulls text from the result page, and I am having an issue getting Puppeteer to return the text I need. What I want to return is an array of strings.

Let's say I have a couple nested divs within a div, and each has text like so:

 <div class='mainDiv'>
   <div>Mary Doe </div>
   <div> James Dean </div>

In the DOM, I can do the following to get the result I need:


This yields: ["Mary Doe", "James Dean"].

I understand that Puppeteer doesn't return NodeLists, and instead it uses JSHandles, but I still can't figure out how to get any information using the prescribed methods. See below for what I have tried in Puppeteer and the corresponding console output:

In every scenario, I do await page.waitFor('selector') to start.

Scenario 1 (using .$$eval()):

const genreElements = await page.$$eval('div.mainDiv', el => el);
console.log(genreElements) // [] 

Scenario 2 (using evaluate):

function extractItems() {
   const extractedElements = document.querySelectorAll('div.mainDiv')[0].innerText.split('\n')
   return extractedElements

let items = await page.evaluate(extractItems)
console.log(items) // UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'innerText' of undefined

Scenario 3 (using evaluateHandle):

const selectorHandle = await page.evaluateHandle(() => document.querySelectorAll('div.mainDiv'))
const resultHandle = await page.evaluate(x => x[0], selectorHandle)
console.log(resultHandle) // undefined

Any help or guidance on how I am implementing or how to achieve what I am looking to do is much appreciated. Thank you!


3 Answers Can't access innerText property using Puppeteer - .$$eval and .$$ is not yielding results - JavaScript

Using page.$eval:

const names = await page.$eval('.mainDiv', (element) => {
    return element.innerText

Here the element is retrieved by selector and directly passed to the function to be evaluated.

Using page.evaluate:

const namesElem = await page.$('.mainDiv');
const names = await page.evaluate(namesElem => namesElem.innerText, namesElem);

This is basically the first method split up into two steps. The interesting part is that ElementHandles can be passed as arguments in page.evaluate() and can be evaluated like JSHandles.

Note that for simplicity and clarification I used the methods for retrieving single elements. But page.$$() and page.$$eval() work the same way while selecting multiple elements and returning an array instead.

1 weeks ago

Try it like this:

let names = page.evaluate(() => [...document.querySelectorAll('.mainDiv div')].map(div => div.innerText))

That way you can test the whole thing in the chrome console.

1 weeks ago

Use page.$$eval() or page.evaluate():

You can use page.$$eval() or page.evaluate() to run Array.from(document.querySelectorAll()) within the page context and map() the innerText of each element to the result array:

const names_1 = await page.$$eval('.mainDiv > div', divs => => div.innerText));
const names_2 = await page.evaluate(() => Array.from(document.querySelectorAll('.mainDiv > div'), div => div.innerText));

Note: Keep in mind that if you use Puppeteer to automate searches on Google, you may be temporarily blocked and end up with an "Unusual traffic from your computer network" notice, requiring you to solve a reCAPTCHA. This may break your web scraper, so proceed with caution.

1 weeks ago