Sometimes things don’t work as expected. Here we outline some common issues to look out for and how to resolve them without having to contact support!
- My Extraction was successful, but 0 rows were extracted.
- My Extractor is not pulling data from a specific column(s).
Main cause: The structural integrity of the html for the website has changed since the Extractor was trained.
How do I know? `Check the log file
If our assumption was correct, the log file is going to look something like this:
The reason the URL is successful is because the page was rendered correctly by the Extraction engine, it just did not match the training of the Extractor, hence there was no data in the locations it was looking.
How do I fix it?
Go to “Edit” mode of the Extractor.
If the website no longer matches the training, you will see the following message.
When you press save and close, you will most likely see a blank page.
Go to the “Edit” tab and clear the selected column - this will remove the current training.
Now use point and click to re-train the column for the URL you have re-added.
Repeat for each column and then Save and run.
Now we should see the Run return rows! Download the CSV to check.
We have data!
But why did it look right in the Extractor editor?
Because the webpage in side the Extractor is a cached version of the page as it was when the Extractor was originally trained - a snapshot. This is why we need to add the page again, so we have a fresh view of the current state.
Main cause: Similar to above, a small amendment has been made to the html for the data trained for a column and it now lies outside the original training.
How do I know?: The CSV or JSON will look normal but be missing from one column.
How do I fix it?
Go to 'Edit' mode of the Extractor
Notice the inside the Extractor, everything looks fine and data is populating the column that had missing data when the Extractor was run.
So why doesn't this come out when I run Extractor?
This is because the webpage inside the Extractor is a cached version of the page from when the Extractor was trained. We need to re-add this page to see what the latest version looks like!
Go to 'Add URL', re-enter the original URL and press Go.
Because most of the data is returned in the CSV, we expect this page to load successfully in the Extractor and should see the following message:
Save and Close this and then in the top left hand corner change 'Auto rows' to 'Single row'. Notice how the new page has updated data and is missing data from the review column. Now we understand why the CSV is missing this output too. The data for this column now lies outside of the original training. We need to re-train that column to so that the training aligns with where the data is on the page.
NB. If you have multiple rows per URL, do not use 'Single row' as this will compress all of your rows into one.
Go to the 'Edit' View and select the column with missing data. Clear the data in the column.
Now re-train the required data-point via point-and-click (or via XPath as in this Example). We can see now that the column captures data in page 2, which is what we want!
Go back to the 'Data' view to check that all desired columns are populated. Then we can save and Run the Extractor.