While modules like Feeds can often handle simple import or syncing needs, I often find a client will need more fine-grained control. Maybe they’re grabbing data from an API that uses complex authentication, or from a CSV file over FTP. Maybe they need special logic that you can’t achieve through Feeds Tamper. These tools are great for very standard operations, but it doesn’t take much to make using them more trouble than they’re worth. For special cases, here’s a quick guide to rolling your own custom importer using the Drupal 7 Batch API.

.info File

Like any custom module, you’ll start by creating a new folder under sites/all/modules/custom with the desired machine name of your custom module. In this case, we’ll call it custom_import. In that folder, create a file with the same machine name and the .info extension. Here’s what custom_import.info looks like:

name = Custom Import
description = Demonstration of custom importing with the Drupal 7 Batch API.
core = 7.x

Configuration Form

Create a .module file using the same naming convention of the .info file (i.e., custom_import.module in this case). Everything else will go here. We’re going to need a way to trigger the batch import, so we’re going to create a simple Drupal configuration form. For this, we’ll need to implement hook_permission() to register a permission for who can see the form, hook_menu() to register the menu item and page callback, and finally implement the callback itself. In this case, I decided to call it custom_import_settings(), but it can follow whatever name you like so long as it matches the value supplied to ‘page arguments’ in hook_menu().

function custom_import_permission(){
  $permissions = array(
    'administer custom import' => array(
      'title' => t('Administer custom import settings.')
    )
  );

  return $permissions;
}

function custom_import_menu(){
  $items = array();

  $items['admin/config/system/custom-import'] = array(
    'title' => 'Custom Import',
    'description' => 'Settings page for a Drupal 7 Batch API custom import.',
    'access arguments' => array('administer custom import'),
    'page callback' => 'drupal_get_form',
    'page arguments' => array('custom_import_settings')
  );

  return $items;
}

function custom_import_settings($form, $form_state){
  $form['custom_import_button'] = array(
    '#type' => 'submit',
    '#name' => 'custom_import_button',
    '#value' => 'Execute Custom Import'
  );

  return $form;
}

If you have additional settings that may affect the import, you can add them as fields to the page callback. I often do this when I need a configurable URL or the import is intended to work off of an uploaded CSV file.

Batch Process

Until now, we haven’t done anything terribly complex. We’ve created a custom module, registered a permission, and created a configuration page to launch the import. Now we need to show what happens when the button gets pressed. To do this, we create a _submit() handler for our previous settings form. This will generate an array of batch operations that feed into the batch_set() function. The batch_set() function, in turn, feeds each individual record through our custom batch operation. Finally, we’ll need a function to run when the batch process is finished.

function custom_import_settings_submit($form, $form_state){
  /* Access the file or API endpoint and use it to generate the $operations array. Each item should be an array, with the first element as the name of the op function (e.g., "custom_import_op") and the second element as an array of arguments to pass. This will vary significantly based on the data source and format. For example, here's how we might access a public API endpoint formatted in XML:

  $import_contents = file_get_contents('https://someapi.com/endpoint/xml');
  if (!$import_contents){
    drupal_set_message(t('Unable to fetch import data.'), 'error');
    return array();
  }
  $import_data = simplexml_load_string($import_contents);
  if (!$import_data){
    drupal_set_message(t('Unable to parse import data.'), 'error');
    return array();
  }
  $operations = array();
  foreach ($import_data as $item){
    $operations[] = array('custom_import_op', array($event->asXML()));
  }

  */

  batch_set(array(
    'title' => 'Performing Import',
    'finished' => 'custom_import_finished',
    'operations' => $operations
  ));
}

function custom_import_op($item, &$context){
  if (!isset($context['sandbox']['progress'])){
    $context['sandbox']['progress'] = 0;
  }
  $item = simplexml_load_string($item);

  try {

    /* Most import operations involve nodes, but there's no hard rule that says they must. Here is where you will build and save the data you've imported using built-in Drupal functions. If the remote system uses some form of ID, it often helps to make a field to contain it and reference it to load an existing node back in for updating. This way, you avoid duplication and can refine and re-run your import as often as necessary without cluttering up your content. For example: */

    $existing_nid = db_select('field_data_field_remote_id', 'i')->fields('i', array('entity_id'))->condition('i.entity_type', 'node')->condition('i.bundle', 'custom_content_type')->condition('i.field_remote_id_value', (string) $item->id)->execute()->fetchField();
    if (!empty($existing_nid) && is_numeric($existing_nid) && $existing_nid > 0){
      $node = node_load($existing_nid);
    }
    else {
      $node = new stdClass();
      $node->type = 'custom_content_type';
      $node->uid = 1;
      $node->status = 1;
      $node->language = language_default();
      $node->field_remote_id[language_default()]['value'] = $item->id;
    }

    /* Map each field in $item to the fields in $node before saving. */
    $node_wrapper = entity_metadata_wrapper('node', $node);
    $node_wrapper->field_remote_id->set($item->id);
    $node_wrapper->save();
  }
  catch (Exception $e){
    drupal_set_message(t('Error encountered while importing: !error_message', array('!error_message' => $e->getMessage())), 'error');
  }

  $context['message'] = t('Syncing "!item_name".', array('!item_name' => $item->name));
  $context['sandbox']['progress'] += 1;
}

function custom_import_finished($success, $results, $operations){
  if ($success){
    drupal_set_message('Operation complete.');
  }
  else {
    $error_operation = reset($operations);
    drupal_set_message(t('An error occurred while importing with arguments : @args', array('@operation' => $error_operation[0], '@args' => print_r($error_operation[0], TRUE))));
  }
}

Result

If you’ve done everything correctly, you should end up with a configuration page that contains a button. When pressed, that button fetches the data you specify and processes it one record at a time using the exact logic you need. It’ll even show you a handy progress bar and any errors that may be encountered along the way.

It’s worth noting that this is just a scaffolding. I’ve used the above code in different contexts. Sometimes I separate the try/catch portion of the batch op into its own function so I can also use it during a cron run, for instance, creating a process that runs automatically as well as on demand. Sometimes the data is in an uploaded CSV file or JSON feed. Whatever your need, this should help get you started. Please share any questions or improvements in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *