Help Center

Dealing with infinite scroll

Last Updated: Aug 11, 2016 08:17AM PDT

What do we mean by "infinite scroll"?


Instead of having links to page 1, page 2, page 3 etc, on some websites, more items appear when you scroll down the page.


What's the problem?


Infinite scroll can be a tricky one, because the URL generally remains static (it doesn't change, even when you're on a different page).

Website also handle this in various different ways structurally, so it's not always possible to get around.

But... here are some tips and tricks to hack the URL.
 

How can you get around this?

 
If we can find out the underlying URL pattern for the different pages or 'pagination', then we are on to a winner.

There are a few tips and tricks to finding these URL patterns.
 

Tip 1.

 

Using the example:

http://www.pinko.com/en-gb/catalog/index/springsummer

Firstly, before you scroll down, right-click>inspect and click on the network tab.

Then, clear any existing activity by hitting the clear button next to the red circle on the lefthand side.
 

Now, scroll down on the page until more items appear.
 
Look for an action with the type "xhr"  
 

Click on this and look at the Headers tab on the right hand side.
 

You can see that when you scroll down the page, the site makes a GET request to:

http://www.pinko.com/en-gb/catalog/index/springsummer?pg=4

The ?pg=4 is the URL parameter that corresponds to the page number.

If you go directly to this URL it skips straight to those items - n
ow we know how the website really paginates!
 

So create your Extractor to:

http://www.pinko.com/en-gb/catalog/index/springsummer?pg=1
 
and add the URLs:
 
http://www.pinko.com/en-gb/catalog/index/springsummer?pg=2
http://www.pinko.com/en-gb/catalog/index/springsummer?pg=3
http://www.pinko.com/en-gb/catalog/index/springsummer?pg=4
http://www.pinko.com/en-gb/catalog/index/springsummer?pg=5
http://www.pinko.com/en-gb/catalog/index/springsummer?pg=6
etc.
 
to the setting page on the dashboard.
 
Save, and Run URLS!
 

Tip 2.


As I mentioned, websites are build in many different ways. Here is another example.

https://shop.boggi.com/categoria-prodotto/pe16/giacche-classiche-pe16/
 
  • Right-click>inspect and click on the network tab.

  • Beginning with the same methodology, 

  • Clear any existing activity by hitting the clear button next to the red circle on the lefthand side.

  • Scroll down on the page until more items appear.

  • Look for an action with the type "xhr"  
     



This time the Request URL looks like it is to some kind of PHP script which used a form to make a POST request. It is not necessary to understand this. Just that we can't use this URL.
 
Instead you can look into the elements of the HTML and see if the is any reference to page numbers in there.

Right-click on one of the items that appears after scrolling down and click inspect, and look for the Elements rather than the Network tab. 



This should take you to the location of that item in the html. You're looking for anything that says page/pagination/scroll/scrolling/lazyload etc. (You can use command + f and search directly for these terms as an alternative)
 
You can see the "li" tags are elements that contain details of each item and that this list extends as you scroll down.
 
Below you can see this: 
 
 

​Bingo.
 

You can use these URLs to define your pages! So...
 

Create you Extractor to:

https://shop.boggi.com/categoria-prodotto/pe16/giacche-classiche-pe16/page/1/


and add the URLs:
 
https://shop.boggi.com/categoria-prodotto/pe16/giacche-classiche-pe16/page/1/
https://shop.boggi.com/categoria-prodotto/pe16/giacche-classiche-pe16/page/2/
https://shop.boggi.com/categoria-prodotto/pe16/giacche-classiche-pe16/page/3/
https://shop.boggi.com/categoria-prodotto/pe16/giacche-classiche-pe16/page/4/
to the setting page on the dashboard.


Great, take me to the Extractor!
 
c2d12fc2f876f019701e1c3951e354bd@importio.desk-mail.com
http://assets1.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/en/portal/articles/autocomplete