Everything you need to know about creating an import.io Extractor. Start from the top, or jump to a heading if you know what you're looking for.
Enter a URL
Adding data to columns
Add or Manage URLs
Head to import.io and enter a URL that contains the data you want to extract.
The majority of the time, the Extractor is so smart that it finds all the data and organises it for you! You will be taken to the Data view, where you can see a table of data. If the table is completely different from the data you wanted, click "create a blank table" to start from scratch.
You can either start from scratch, or begin with a predefined template of columns.
The Data view will show you the current state of your data in a table view, enabling you to see that data has been populated for each column/field.
You can add, remove, and rename columns from this view - see columns. You can also apply regular expressions, set default column values, set a required column, or choose to output html.
Test the training of your Extractor against similarly structured URLs using Add or Manage URLs
In Website view you can tell the Extractor where the data you want is on a page, enabling you to see the data appearing in a column as you click elements on a webpage.
You can add, remove, and rename columns in this view - see columns. It is also possible to write a manual XPath or toggle CSS off here.
When you create a column, you will be taken to the Website view where you can add data by simply clicking on it. You will notice that as you hover over the page, elements will be highlighted to help you specify the data that you want.
Single item pages
If you’re working with only one item (single item page), there is only one row to capture, so things are a little simpler. When you click on the element that represents the data you want in a particular column, it will automatically be added to that column and you can simply move on to the next column.
For a full list of column functions see Column functions in Website view.
If the page contains more than one item (multiple item page), as you click on the data for one element (e.g. “name”) the Extractor will automatically add the corresponding elements for each item and you will see them appear in the column as new rows.
Multiple item pages
If the Extractor picks elements that are incorrect, you can simply reject them. Hover the mouse over a selected element and click.
Equally, if the Extractor misses items, you can continue to add them by clicking on them.
Once you are happy with the data you have added to a column you can save it and move on to the next!
Before you configure your Extractor with your complete URL list, you can use this option to check that your training picks up the correct data on a similar URLs.
NB This is just to test your training. If you have a list of URLs to use with the Extractor, you can configure this from the dashboard. See Adding URLs to your Extractor.
You can choose which URL to view using the dropdown selection. You can do this in both the Data view and Website view.
If there are any data points missing, you can apply further training in the website view.
If you mis-clicked and accidentally deleted a column, or want to undo some training, its easy - Just hit undo. Or Redo if you change your mind!
When you have finished training your Extractor, saving it will add it to your dashboard, where you can configure your Extractor to run with multiple URLs, set up a schedule, or download your data!
For advanced training options, see the Advanced article list.
Great, let's get started!
Any questions? Ask the community!