Get coded-value descriptions when accessing data with cursors

This question was recently asked on twitter:

@arcpy Can you point me to an example python script where a SearchCursor returns coded-value descriptions instead of the codes?”

With ArcGIS 10.1, the data access module contains functions for listing subtypes and listing domains. Here are recipes to get descriptions rather than just the codes either for subtypes or domains:

1. Get subtype descriptions

fc = r'c:\data\Portland.gdb\roads'
# Create a dictionary of subtype codes and descriptions.
desc_lu = {key: value['Name'] for (key, value) in arcpy.da.ListSubtypes(fc).iteritems()}
with arcpy.da.SearchCursor(fc, "SUBTYPEFIELD") as cur:
    for row in cur:
        print(desc_lu[row[0]])

2. Get domain descriptions

gdb = r'c:\data\Portland.gdb'
# Get the dictionary of codes and descriptions for the Floodplain_Rules domain.
desc_lu = [d.codedValues for d in arcpy.da.ListDomains(gdb) if d.name == 'Floodplain_Rules'][0]
with arcpy.da.SearchCursor(os.path.join(gdb, "floodplain"), 'RuleID') as cur:
    for row in cur:
        print(desc_lu[row[0]])

* NOTE: If you are running ArcGIS 10.1 with no service packs, just do the following to get a dictionary of subtype codes and descriptions. ListSubtypes was enhanced at ArcGIS 10.1 SP1.

desc_lu = arcpy.da.ListSubtypes(fc)

Ranking field values

At the Esri International User Conference this year, an attendee came to the analysis island to ask “how do I create a rank field?”. They had run the Generate Near Table geoprocessing tool (see illustration of table below) and were looking for a way to further rank the distances. Ideally, the table would be updated to include a RANK field starting at ‘1’ for the smallest distance for each PATIENT and increasing sequentially based on the distance. The rank could then be used to facilitate further review and reporting. We were able to come up with the addRanks function below in a few minutes automating a key missing piece of the user’s workflow.

Original table
pre_ranking

Table after running addRanks function.
post_ranking

import arcpy

def addRanks(table, sort_fields, category_field, rank_field='RANK'):
    """Use sort_fields and category_field to apply a ranking to the table.

    Parameters:
        table: string
        sort_fields: list | tuple of strings
            The field(s) on which the table will be sorted.
        category_field: string
            All records with a common value in the category_field
            will be ranked independently.
        rank_field: string
            The new rank field name to be added.
    """

    # add rank field if it does not already exist
    if not arcpy.ListFields(table, rank_field):
        arcpy.AddField_management(table, rank_field, "SHORT")

    sort_sql = ', '.join(['ORDER BY ' + category_field] + sort_fields)
    query_fields = [category_field, rank_field] + sort_fields

    with arcpy.da.UpdateCursor(table, query_fields,
                               sql_clause=(None, sort_sql)) as cur:
        category_field_val = None
        i = 0
        for row in cur:
            if category_field_val == row[0]:
                i += 1
            else:
                category_field_val = row[0]
                i = 1
            row[1] = i
            cur.updateRow(row)

if __name__ == '__main__':
    addRanks(r'C:\Data\f.gdb\gen_near_table_patients2hosp',
             ['distance'], 'patient', 'rank')

Note: dBase (and shapefiles) does not support ORDER BY as used above by arcpy.da.UpdateCursor’s sql_clause argument.

Shifting features

Shifting (or moving) features is a snap using the arcpy.da module’s UpdateCursor. By modifying the SHAPE@XY token, it modifies the centroid of the feature and shifts the rest of the feature to match. This approach will hold for point, polyline or polygon features.

To modify only a single or subset of features in a feature layer, apply a selection to that layer and pass the layer in as the input to shift_features.

Word of caution, this is using UpdateCursor, so features will be permanently modified. So, back up your data if you may potentially want to reverse the updates.

import arcpy

def shift_features(in_features, x_shift=None, y_shift=None):
    """
    Shifts features by an x and/or y value. The shift values are in
    the units of the in_features coordinate system.

    Parameters:
    in_features: string
        An existing feature class or feature layer.  If using a
        feature layer with a selection, only the selected features
        will be modified.

    x_shift: float
        The distance the x coordinates will be shifted.

    y_shift: float
        The distance the y coordinates will be shifted.
    """

    with arcpy.da.UpdateCursor(in_features, ['SHAPE@XY']) as cursor:
        for row in cursor:
            cursor.updateRow([[row[0][0] + (x_shift or 0),
                               row[0][1] + (y_shift or 0)]])

    return

Building feature classes from NumPy arrays

With ArcGIS 10.1, a NumPy array can be easily converted into a point feature class using the arcpy.da.NumPyArrayToFeatureClass function.

Notable is that other geometry types such as Polygon, Polyline or Multipoint are not supported by NumPyArrayToFeatureClass. However, all the tools needed to create other geometry types from NumPy are there. The numpy_array_to_features function below combines the new arcpy.da.InsertCursor and NumPy methods to turn a NumPy array into features.

import numpy
import arcpy


def numpy_array_to_features(in_fc, in_array, geom_fields, id_field):
    """
    Insert new features into an existing feature class (polygon,
    polyline or multipoint) based on a NumPy array.

    Parameters
    ----------
    in_fc : string
        An existing feature class to which new features will be added.

    in_array : structured NumPy array
        Array must include fields representing x and y coordinates, and
        an ID field.

    geom_fields: list of strings | string
        Field(s) representing x- and y-coordinates.
        If only a single numpy field is required (such as a field that
        has x,y coordinates included in a tuple) the field name can be
        passed in within a list or as a string.

    id_field: string
        The field that identifies how coordinates are grouped.  All
        coordinates with a common id value will be combined (in order
        of occurrence) into an output feature.
        The id_field is used in both the array and the feature class
        (i.e., the field name must exist in the feature class)

    """
    # Establish minimum number of x,y pairs to create proper geometry
    min_xy_dict = {'Polygon': 3, 'Polyline': 2, 'Multipoint': 1}
    min_xy_pairs = min_xy_dict[arcpy.Describe(in_fc).shapeType]

    if isinstance(geom_fields, list) and len(geom_fields) == 1:
        # Can't access a single field via a list later, extract the
        # only value
        geom_fields = geom_fields[0]

    with arcpy.da.InsertCursor(in_fc, ['SHAPE@', id_field]) as cursor:
        unique_array = numpy.unique(in_array[id_field])  # unique ids

        # Iterate through unique sets, get array that matches unique
        # value, convert coordinates to a list and insert via cursor.
        for unique_value in unique_array:
            a = in_array[in_array[id_field] == unique_value]
            if len(a) >= min_xy_pairs:  # skip if not enough x,y pairs
                cursor.insertRow([a[geom_fields].tolist(), unique_value])
            else:
                pass  # skip if not enough x,y pairs

    return

For example, you have an array that looks like this small one below:

>>> my_array
array([(21239.1, 51531.6, u'A'), (18648.5, 51551.1, u'A'),
       (18638.7, 54173.9, u'A'), (15989.5, 54153.5, u'B'),
       (13365.4, 51510.9, u'B'), (16004.5, 51524.3, u'B')], 
      dtype=[('SHAPE@X', '<f8'), ('SHAPE@Y', '<f8'), ('ID', '<U1')])

To convert this array into features, pass in an existing feature class, the array, a list of the x,y coordinate fields, and another field which controls how those coordinates are grouped. With the array listed above, 2 triangle-shaped polygon features will get added, one for the ID of ‘A’ and one for ‘B’.


numpy_array_to_features(polygon_fc, my_array, ['SHAPE@X', 'SHAPE@Y'], 'ID')

Also see Working with NumPy in ArcGIS.

Getting arcpy.da rows back as dictionaries

Though arcpy.da‘s cursors return rows as lists, you can easily transform these on-the-fly with just a little code on your part:

Or if you’d like to be able to use a syntax similar to the old arcpy cursors and do row.COLUMN_NAME to fetch a value, you could use namedtuples:

And then to get a little more sophisticated, what about an update cursor that lets you use dictionaries? Note that in this example the row will ALWAYS update without any intervention on your part once you go to the next one, so be careful:

All the pieces are there in the Python standard library and arcpy.da to customize how you get your data in and out. The reason that arcpy.da returns lists and tuples is because they act as a sort of lowest-common-denominator of data structures, and in large datasets things like a dictionary key lookup benchmarks much, much slower than a simple list item assignment.

Create a list of unique field values

This function can be used to return a unique list of field values. It takes advantage of the new data access module in ArcGIS 10.1.

def unique_values(table, field):
    with arcpy.da.SearchCursor(table, [field]) as cursor:
        return sorted({row[0] for row in cursor})

Here is an example of the return values:

>>> unique_values(r’C:\boston\city.gdb\crime’, ‘DAY’)
[u’Mon’, u’Sun’, u’Tues’, u’Wed’]

Calculate a unique value for each record

In this example, a unique value is calculated for each record starting with 1. The result is 1, 2, 3 … to the number of records. This is often necessary when you have to do some spatial operation and perform a join or relate using that field. It can also be useful when your table doesn’t have an OBJECTID field.

This code uses the data access module available with ArcGIS 10.1.

import arcpy
with arcpy.da.UpdateCursor(r"C:\Data\city.gdb\crime", "UID") as rows:
    for i, row in enumerate(rows, 1):
        row[0] = i
        rows.updateRow(row)