June 11, 2020 Jonny Steiner

Introducing Puppeteer from Installation to Demonstration

Share this knowledge!

In software testing, automation tools certainly play an important role. They help us ease the testing process. One such tool is the Puppeteer. Puppeteer is a NodeJS library that provides a high-level API to control Chrome or Chromium over the special DevTools Protocol. The Puppeteer runs headless (without a UI interface) by default, but can be configured to run full (non-headless) Chrome, Chromium. Let’s take a look at how to install and start using Puppeteer, as well as demonstrate several ways to use it. For the examples, shown in this article, the workstation running under Ubuntu Linux was used. Install Puppeteer on Windows or Mac OS whichever you prefer.

Setup

Before we begin, we need to setup NodeJS. You can download and install it, using this command:

~ sudo apt-get install nodejs

 

Next, you need to install the npm packages manager.

~ sudo apt-get install npm

 

And then using npm we can install the “Puppeteer” library.

~npm i Puppeteer

 

Also you must install these dependencies for correct working, enlisted in the command below:

~sudo apt-get install gconf-service libasound2 libatk1.0-0 libatk-bridge2.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

 

For the examples used in this tutorial, we also need to install the Puppeteer recorder. Puppeteer recorder is a special extension for Google Chrome. With this extension, you can record your activities on any site in a NodeJS code. You can download this extension from the Chrome store. Later in the article, we are going to use this extension for demonstration purposes. The screenshots below show the installation of the Puppeteer recorder.

puppeteer recorder

Everything is ready, and you can now use Puppeteer.

Creation of the script using the Chrome Puppeteer extension.

After installing all the necessary libraries and components, we are going to record a simple script that allows you to open a browser and navigate to the Experitest website. Also, we will modify this autotest code and add a method to take a pdf document. Activate the Puppeteer recorder from the Chrome extensions.

puppeteer recorder 2

Click on the “Record” button and open the site “expiretest.com” Scroll down to the bottom of the page and click on the “News” button in the page footer.

footer

Click on the stop button in the Puppeteer extension.

puppeteer stop

After that, you can see the NodeJS script.

Then create the file Example.js and copy the NodeJS code into this file, by clicking on the copy to clipboard link of the recorder form and pasting it to the file.

Script analysis and modification

Let’s examine this code line by line. Here we set up the Puppeteer library like a dependence.

const Puppeteer = require(‘Puppeteer’);

 

The main function starts from this line. This function automates actions in the browser.

(async () => {

 

Then we launch the browser and write a reference to it to the constant.

const browser = await Puppeteer.launch();

 

If we want to launch chromium in the non-headless mode (with UI) we need to set the option “headless: true”. But this is where the problem may arise, because Puppeteer does not allow to generate pdf files in the non-headless mode. Therefore, we leave this method unchanged. With the following line of code we create a new page in the browser.

 

const page = await browser.newPage();

 

Then we assign a method “page.waitForNavigation” to the constant “navigationPromise”.

const navigationPromise = page.waitForNavigation();

 

Then we navigate to experitest.com.

await page.goto(‘https://experitest.com/’)

 

This method makes Chromium set the viewport by inferring the current screen resolution.

await page.setViewport({width: 1920, height: 563 });

 

Then the program waits for the element which we need to click to appear.

await page.waitForSelector('.row > .col-xl-2:nth-child(7) > .footer-menu-column > li:nth-child(5) > a');

 

Here the click on the element is performed.

await page.click('.row > .col-xl-2:nth-child(7) > .footer-menu-column > li:nth-child(5) > a');

 

And the browser is closed.

await browser.close();

 

Let’s modify the code of the test by adding the page.pdf method to create a pdf document.

await page.pdf({path:'example.pdf',
	               format:'A4',
		    printBackground:true
		 });

 

The “path” option specifies the name of the pdf file and its location. Also, for the correct display of the document add the option “ waitUntil: ‘networkidle0’. This option forces navigation to finish if there have not been network connections for at least 500ms.

const navigationPromise = page.waitForNavigation({ waitUntil: 'networkidle0' });

 

After all modifications our code takes the following form:

So our script is ready. Go to the folder with the script and launch it with the following command.

node example.js

 

After executing the code, the file example.pdf will appear next to the file. Open it, you will see the pdf document of this site’s contents.

puppeteer example

Creation of the script for scraping information from a site

Suppose we need to parse the first news headline on the site “https://experitest.com/mobile-application-testing-news/”. Let’s create the file examplescrap.js and write code into it line by line. The beginning of this script is the same as the previous one. We set up the “Puppeteer” library like a dependence, launch the browser, and navigate to the site with news.

 const Puppeteer = require('Puppeteer');
(async () => {
	const url = 'https://experitest.com/mobile-application-testing-news/';
const browser = await Puppeteer.launch();
	const page = await browser.newPage();

 

Then we assign the method page.evaluate() to the constant “data”.

const data = await page.evaluate(() => {

After that we add the document method querySelector(locator).innerText to the constant title. Text found by a specific locator is written into the constant title. Then this constant is returned in the method page.evaluate():

 

const title = document.querySelector('#content2 > div > ul > li.news-link > h3').innerText;
    return { 
	title
    }
});

 

 

The returned constant date is displayed in the log and then the browser closes.

 

console.log(data);
	await browser.close();
})();

 

 

So everything is ready, now we can run this script.

puppeteer example

Great, this script provided us with the name of the first news article on the site expiritest.com.

Conclusion

In this article, we have familiarized with the Puppeteer node js library. Namely, we discussed the installation of nodes and this library, the structure of the code. We acquainted with the methods of PDF document creation and web scraping. We also looked at the extension for Google Сhrome “Puppeteer”, which allows you to record all your actions in the form of a Javascript code.

Guy ArieliCTO

Share this knowledge!