Ranking field values

At the Esri International User Conference this year, an attendee came to the analysis island to ask “how do I create a rank field?”. They had run the Generate Near Table geoprocessing tool (see illustration of table below) and were looking for a way to further rank the distances. Ideally, the table would be updated to include a RANK field starting at ‘1’ for the smallest distance for each PATIENT and increasing sequentially based on the distance. The rank could then be used to facilitate further review and reporting. We were able to come up with the addRanks function below in a few minutes automating a key missing piece of the user’s workflow.

Original table
pre_ranking

Table after running addRanks function.
post_ranking

import arcpy

def addRanks(table, sort_fields, category_field, rank_field='RANK'):
    """Use sort_fields and category_field to apply a ranking to the table.

    Parameters:
        table: string
        sort_fields: list | tuple of strings
            The field(s) on which the table will be sorted.
        category_field: string
            All records with a common value in the category_field
            will be ranked independently.
        rank_field: string
            The new rank field name to be added.
    """

    # add rank field if it does not already exist
    if not arcpy.ListFields(table, rank_field):
        arcpy.AddField_management(table, rank_field, "SHORT")

    sort_sql = ', '.join(['ORDER BY ' + category_field] + sort_fields)
    query_fields = [category_field, rank_field] + sort_fields

    with arcpy.da.UpdateCursor(table, query_fields,
                               sql_clause=(None, sort_sql)) as cur:
        category_field_val = None
        i = 0
        for row in cur:
            if category_field_val == row[0]:
                i += 1
            else:
                category_field_val = row[0]
                i = 1
            row[1] = i
            cur.updateRow(row)

if __name__ == '__main__':
    addRanks(r'C:\Data\f.gdb\gen_near_table_patients2hosp',
             ['distance'], 'patient', 'rank')

Note: dBase (and shapefiles) does not support ORDER BY as used above by arcpy.da.UpdateCursor’s sql_clause argument.

Inventorying data: a new approach

ArcPy list functions give you the options to list out a particular data type for a given workspace, but expanding that out to a directory tree meant cobbling together those list functions with Python’s os.walk and lots of updates to the workspace environment. It can be done (as shown here), but in my experience it is a process which is easy to get wrong.

Python’s os.walk is noteworthy and useful, because it does all of this, but is limited to file types. It can’t peer into a geodatabase to identify feature classes for example.

At 10.1 service pack 1, we added arcpy.da.Walk, Walk takes care of all that workspace handling for you and mimics os.walk in arguments and behaviors.

The below code wraps arcpy.da.Walk in a generator function to return a full path to all appropriate datatypes under a given workspace.

import os
import arcpy

def inventory_data(workspace, datatypes):
    """
    Generates full path names under a catalog tree for all requested
    datatype(s).

    Parameters:
    workspace: string
        The top-level workspace that will be used.
    datatypes: string | list | tuple
        Keyword(s) representing the desired datatypes. A single
        datatype can be expressed as a string, otherwise use
        a list or tuple. See arcpy.da.Walk documentation 
        for a full list.
    """
    for path, path_names, data_names in arcpy.da.Walk(
            workspace, datatype=datatypes):
        for data_name in data_names:
            yield os.path.join(path, data_name)


for feature_class in inventory_data(r"c:\data", "FeatureClass"):
    do_something(feature_class)

Shifting features

Shifting (or moving) features is a snap using the arcpy.da module’s UpdateCursor. By modifying the SHAPE@XY token, it modifies the centroid of the feature and shifts the rest of the feature to match. This approach will hold for point, polyline or polygon features.

To modify only a single or subset of features in a feature layer, apply a selection to that layer and pass the layer in as the input to shift_features.

Word of caution, this is using UpdateCursor, so features will be permanently modified. So, back up your data if you may potentially want to reverse the updates.

import arcpy

def shift_features(in_features, x_shift=None, y_shift=None):
    """
    Shifts features by an x and/or y value. The shift values are in
    the units of the in_features coordinate system.

    Parameters:
    in_features: string
        An existing feature class or feature layer.  If using a
        feature layer with a selection, only the selected features
        will be modified.

    x_shift: float
        The distance the x coordinates will be shifted.

    y_shift: float
        The distance the y coordinates will be shifted.
    """

    with arcpy.da.UpdateCursor(in_features, ['SHAPE@XY']) as cursor:
        for row in cursor:
            cursor.updateRow([[row[0][0] + (x_shift or 0),
                               row[0][1] + (y_shift or 0)]])

    return

Building feature classes from NumPy arrays

With ArcGIS 10.1, a NumPy array can be easily converted into a point feature class using the arcpy.da.NumPyArrayToFeatureClass function.

Notable is that other geometry types such as Polygon, Polyline or Multipoint are not supported by NumPyArrayToFeatureClass. However, all the tools needed to create other geometry types from NumPy are there. The numpy_array_to_features function below combines the new arcpy.da.InsertCursor and NumPy methods to turn a NumPy array into features.

import numpy
import arcpy


def numpy_array_to_features(in_fc, in_array, geom_fields, id_field):
    """
    Insert new features into an existing feature class (polygon,
    polyline or multipoint) based on a NumPy array.

    Parameters
    ----------
    in_fc : string
        An existing feature class to which new features will be added.

    in_array : structured NumPy array
        Array must include fields representing x and y coordinates, and
        an ID field.

    geom_fields: list of strings | string
        Field(s) representing x- and y-coordinates.
        If only a single numpy field is required (such as a field that
        has x,y coordinates included in a tuple) the field name can be
        passed in within a list or as a string.

    id_field: string
        The field that identifies how coordinates are grouped.  All
        coordinates with a common id value will be combined (in order
        of occurrence) into an output feature.
        The id_field is used in both the array and the feature class
        (i.e., the field name must exist in the feature class)

    """
    # Establish minimum number of x,y pairs to create proper geometry
    min_xy_dict = {'Polygon': 3, 'Polyline': 2, 'Multipoint': 1}
    min_xy_pairs = min_xy_dict[arcpy.Describe(in_fc).shapeType]

    if isinstance(geom_fields, list) and len(geom_fields) == 1:
        # Can't access a single field via a list later, extract the
        # only value
        geom_fields = geom_fields[0]

    with arcpy.da.InsertCursor(in_fc, ['SHAPE@', id_field]) as cursor:
        unique_array = numpy.unique(in_array[id_field])  # unique ids

        # Iterate through unique sets, get array that matches unique
        # value, convert coordinates to a list and insert via cursor.
        for unique_value in unique_array:
            a = in_array[in_array[id_field] == unique_value]
            if len(a) >= min_xy_pairs:  # skip if not enough x,y pairs
                cursor.insertRow([a[geom_fields].tolist(), unique_value])
            else:
                pass  # skip if not enough x,y pairs

    return

For example, you have an array that looks like this small one below:

>>> my_array
array([(21239.1, 51531.6, u'A'), (18648.5, 51551.1, u'A'),
       (18638.7, 54173.9, u'A'), (15989.5, 54153.5, u'B'),
       (13365.4, 51510.9, u'B'), (16004.5, 51524.3, u'B')], 
      dtype=[('SHAPE@X', '<f8'), ('SHAPE@Y', '<f8'), ('ID', '<U1')])

To convert this array into features, pass in an existing feature class, the array, a list of the x,y coordinate fields, and another field which controls how those coordinates are grouped. With the array listed above, 2 triangle-shaped polygon features will get added, one for the ID of ‘A’ and one for ‘B’.


numpy_array_to_features(polygon_fc, my_array, ['SHAPE@X', 'SHAPE@Y'], 'ID')

Also see Working with NumPy in ArcGIS.

Getting arcpy.da rows back as dictionaries

Though arcpy.da‘s cursors return rows as lists, you can easily transform these on-the-fly with just a little code on your part:

Or if you’d like to be able to use a syntax similar to the old arcpy cursors and do row.COLUMN_NAME to fetch a value, you could use namedtuples:

And then to get a little more sophisticated, what about an update cursor that lets you use dictionaries? Note that in this example the row will ALWAYS update without any intervention on your part once you go to the next one, so be careful:

All the pieces are there in the Python standard library and arcpy.da to customize how you get your data in and out. The reason that arcpy.da returns lists and tuples is because they act as a sort of lowest-common-denominator of data structures, and in large datasets things like a dictionary key lookup benchmarks much, much slower than a simple list item assignment.