Help Center

Building an Extractor

Last Updated: Aug 11, 2016 09:23AM PDT

Everything you need to know about creating an import.io Extractor. Start from the top, or jump to a heading if you know what you're looking for.

Enter a URL
Data View
Website View
Adding data to columns
Add or Manage URLs
Save


Enter a URL


Head to import.io and enter a URL that contains the data you want to extract.



The majority of the time, the Extractor is so smart that it finds all the data and organises it for you! You will be taken to the Data view, where you can see a table of data. If the table is completely different from the data you wanted, click "create a blank table" to start from scratch.



You can either start from scratch, or begin with a predefined template of columns.


Data View




The Data view will show you the current state of your data in a table view, enabling you to see that data has been populated for each column/field.

You can add, remove, and rename columns from this view - see columns. You can also apply regular expressions, set default column values, set a required column, or choose to output html.

Test the training of your Extractor against similarly structured URLs using Add or Manage URLs


Website View




In Website view you can tell the Extractor where the data you want is on a page, enabling you to see the data appearing in a column as you click elements on a webpage.

You can add, remove, and rename columns in this view - see columns. It is also possible to write a manual XPath or toggle CSS off here.

 

Adding data to columns

 

When you create a column, you will be taken to the Website view where you can add data by simply clicking on it. You will notice that as you hover over the page, elements will be highlighted to help you specify the data that you want.


Single item pages



If you’re working with only one item (single item page), there is only one row to capture, so things are a little simpler. When you click on the element that represents the data you want in a particular column, it will automatically be added to that column and you can simply move on to the next column.

For a full list of column functions see Column functions in Website view.


Multiple item pages 

If the page contains more than one item (multiple item page), as you click on the data for one element (e.g. “name”) the Extractor will automatically add the corresponding elements for each item and you will see them appear in the column as new rows.

If the Extractor picks elements that are incorrect, you can simply reject them. Hover the mouse over a selected element and click.

Equally, if the Extractor misses items, you can continue to add them by clicking on them.
Once you are happy with the data you have added to a column you can save it and move on to the next!

‚Äč

 

Add or Manage URLs


Before you configure your Extractor with your complete URL list, you can use this option to check that your training picks up the correct data on a similar URLs.

NB This is just to test your training. If you have a list of URLs to use with the Extractor, you can configure this from the dashboard. See Adding URLs to your Extractor.



You can choose which URL to view using the dropdown selection. You can do this in both the Data view and Website view.


If there are any data points missing, you can apply further training in the website view.


Undo/Redo 


If you mis-clicked and accidentally deleted a column, or want to undo some training, its easy - Just hit undo. Or Redo if you change your mind!


Save Extractor


When you have finished training your Extractor, saving it will add it to your dashboard, where you can configure your Extractor to run with multiple URLs, set up a schedule, or download your data!

For advanced training options, see the Advanced article list.


Great, let's get started!

Any questions? Ask the community!

 
c2d12fc2f876f019701e1c3951e354bd@importio.desk-mail.com
http://assets1.desk.com/
false
desk
Loading
seconds ago
a minute ago
minutes ago
an hour ago
hours ago
a day ago
days ago
about
false
Invalid characters found
/customer/en/portal/articles/autocomplete