Inventorying data: a new approach

ArcPy list functions give you the options to list out a particular data type for a given workspace, but expanding that out to a directory tree meant cobbling together those list functions with Python’s os.walk and lots of updates to the workspace environment. It can be done (as shown here), but in my experience it is a process which is easy to get wrong.

Python’s os.walk is noteworthy and useful, because it does all of this, but is limited to file types. It can’t peer into a geodatabase to identify feature classes for example.

At 10.1 service pack 1, we added arcpy.da.Walk, Walk takes care of all that workspace handling for you and mimics os.walk in arguments and behaviors.

The below code wraps arcpy.da.Walk in a generator function to return a full path to all appropriate datatypes under a given workspace.

import os
import arcpy

def inventory_data(workspace, datatypes):
    """
    Generates full path names under a catalog tree for all requested
    datatype(s).

    Parameters:
    workspace: string
        The top-level workspace that will be used.
    datatypes: string | list | tuple
        Keyword(s) representing the desired datatypes. A single
        datatype can be expressed as a string, otherwise use
        a list or tuple. See arcpy.da.Walk documentation 
        for a full list.
    """
    for path, path_names, data_names in arcpy.da.Walk(
            workspace, datatype=datatypes):
        for data_name in data_names:
            yield os.path.join(path, data_name)


for feature_class in inventory_data(r"c:\data", "FeatureClass"):
    do_something(feature_class)

Advertisements

Recursive list feature classes

UPDATE: At 10.1 service pack 1, this can be more easily achieved using arcpy.da.Walk. Also see Inventorying data.

The following function (recursive_list_fcs) will return a list of all feature classes that fall under the input workspace, including those not seen by the file system such as in a file geodatabase.

import os
import arcpy


def recursive_list_fcs(workspace, wild_card=None, feature_type=None):
    """Returns a list of all feature classes in a tree.  Returned
    list can be limited by a wildcard, and feature type.
    """
    preexisting_wks = arcpy.env.workspace
    arcpy.env.workspace = workspace

    try:
        list_fcs = []
        for root, dirs, files in os.walk(workspace):
            arcpy.env.workspace = root
            fcs = arcpy.ListFeatureClasses(wild_card, feature_type)
            if fcs:
                list_fcs += [os.path.join(root, fc) for fc in fcs]

            # Pick up workspace types that don't have a folder
            #  structure (coverages, file geodatabase do)
            workspaces = set(arcpy.ListWorkspaces()) - \
                         set(arcpy.ListWorkspaces('', 'FILEGDB')) -\
                         set(arcpy.ListWorkspaces('', 'COVERAGE'))

            for workspace in workspaces:
                arcpy.env.workspace = os.path.join(root, workspace)
                fcs = arcpy.ListFeatureClasses(wild_card,
                                               feature_type)

                if fcs:
                    list_fcs += [os.path.join(root, workspace, fc)
                                 for fc in fcs]

            for dataset in arcpy.ListDatasets('', 'FEATURE'):
                ds_fcs = arcpy.ListFeatureClasses(wild_card,
                                                  feature_type,
                                                  dataset)
                if ds_fcs:
                    list_fcs += [os.path.join(
                        root, workspace, dataset, fc)
                                 for fc in ds_fcs]

    except Exception as err:
        raise err
    finally:
        arcpy.env.workspace = preexisting_wks

    return list_fcs