Tuesday, October 22, 2013

Updating DataSources in ArcGis via ArcPy when implementing a new filing layout.

Information:

To better understand the problem and solution, it is perhaps prudent you know a little about the old and new filing structure. Previously, all geological and GIS data where stored on one drive and engineering data on another, both drives used a specialized filing structure. The old filing system was cumbersome and difficult to understand. This made deleting or retrieving data for a particular job extremely difficult. The other problem was that any company owned or perpetual data was stored on either of each drive depending what the data related to.

So a new filing system has since been implemented. Now there is a knowledge drive and a project drive. The project drive holds all data related to a specific project whether it be engineering, GIS, or geological. The knowledge drive holds data that is owned by our company, and again the data could be administrative, GIS, geological or engineering.


The new filing structure makes it extremely easy to retrieve and look up data base on a project. It is also many times easier to purge all data related to a particular project. It also ensures that any data that is consider perpetual and needs to be on our system for long periods of time will not be accidentally removed when project data is purge.

The Problem:

For the most part our GIS personnel update existing ArcGis map files year after year, spoiling the previous year's work. Thus, I.T. is forced to retrieve data from tape for any scenarios involving data corruption or legal issues. This is contrary to policy that mandates any data for the past three years must be instantly retrievable. So policy now demands that our personnel copy old maps and GIS data to a new project location for returning clients. The issue with this particular scenario is that the links to the data sources don't get updated automatically when performing a copy and paste, and they are not relative. There is also the added difficulty that various data sources are being moved to different locations.

The Solution:

Please note all the code are examples and are more of a proof concept then an actual plug and play solution. The reason being that every company implements a different filing system, so the code has been adjust so that it may be applicable to a hypothetical scenario. 

First, lets fix any data sources that might be relative and were moved when the current map data was copied from the existing location to the new location.

mxd = arcpy.mapping.MapDocument("CURRENT")


OldWrkSpace = arcpy.GetParameterAsText(0)
NewWrkSpace = arcpy.GetParameterAsText(1)

mxd.findAndReplaceWorkspacePaths(OldWrkSpace, NewWrkSpace)

Next, lets fix any data that might be in found in the knowledge drive.


filesInNewDrives = [] 
oldDriveLetter = ""
knowledgeDriveLetter = ""

# Within a string search for char, return array of it's indexes. 
def FindChar(s, ch):
    return [i for i, ltr in enumerate(s) if ltr == ch]

# Builds a list of all files within a particular directory, this is done recursively.
def GetAllFilesInDir(dirToSearch, dirSearchResults):
    for root, subFolders, files in os.walk(dirToSearch):
        for file in files:
            dirSearchResults.append(root + "\\" + file)


# Search for a file name, in a list of file paths.
def SearchForFileInList(dataSourceFile, listOfFiles):
    for afile in listOfFiles:         
        if (dataSourceFile in afile):
            return afile
    return ""

# Return the data source file, and parent directory.
def GetDataSourceFileName(currentDataSource):
    indexOfSlashes = FindChar(currentDataSource, "\\")
    return currentDataSource[(indexOfSlashes[-2]+1):]

# Return the data source file name only.
def GetDataSourceShortName(currentDataSource):
    indexOfSlashes = FindChar(currentDataSource, "\\")
    return currentDataSource[(indexOfSlashes[-1]+1):]

def GetDataSourceWorkSpace(dataSource):
    indexOfSlashes = FindChar(dataSource, "\\")
    return dataSource[0:indexOfSlashes[-1]]

mxd = arcpy.mapping.MapDocument("CURRENT")

GetAllFilesInDir(knowledgeDir, filesInNewDrives)

for lyr in arcpy.mapping.ListLayers(mxd):
    if lyr.dataSource.startswith(oldDriveLetter):
       dataSourceFileName = GetDataSourceFileName(lyr.dataSource)
       arcpy.AddMessage("Searching for data source file name: " + dataSourceFileName)
       dataSourceNewFileLocation = ""
       dataSourceNewFileLocation = SearchForFileInList(dataSourceFileName, filesInNewDrives)
       if (dataSourceNewFileLocation is ""):
           dataSourceFileName = GetDataSourceShortName(lyr.dataSource)
           dataSourceNewFileLocation = SearchForFileInList(dataSourceFileName, filesInNewDrives)
       if (dataSourceNewFileLocation != ""):
           arcpy.AddMessage("Found data source file at: " + dataSourceNewFileLocation)
           lyr.findAndReplaceWorkspacePath(GetDataSourceWorkSpace(lyr.dataSource), GetDataSourceWorkSpace(dataSourceNewFileLocation))


Finally, move any data source not found in the new project directory or the knowledge drive to the new project directory.

def copy(src, dest):
    try:
        shutil.copytree(src, dest, ignore=shutil.ignore_patterns('*.lock', 'UTM_ZONES.shp'))
    except OSError as e:
        # If the error was caused because the source wasn't a directory
        if e.errno == errno.ENOTDIR:
            shutil.copy(src, dst)
        else:
            print('Directory not copied. Error: %s' % e)


mxd = arcpy.mapping.MapDocument("CURRENT")

destinationDir = NewWrkSpace + "\\DS"

for lyr in arcpy.mapping.ListLayers(mxd):
    indexOfSlashes = find(lyr.dataSource, "\\")
    dataSourceName = lyr.dataSource[0:indexOfSlashes[-1]]
    arcpy.AddMessage("Fixing by copying datasource " + GetDataSourceWorkSpace(lyr.dataSource))
    # Ensure data source doesn't already exist on project directory
    if not (os.path.exists(destinationDir + dataSourceName)):
        arcpy.AddMessage("Copying " + GetDataSourceWorkSpace(lyr.dataSource) + " to " + destinationDir + dataSourceName)
        copy(GetDataSourceWorkSpace(lyr.dataSource), destinationDir + dataSourceName)
        lyr.findAndReplaceWorkspacePath(GetDataSourceWorkSpace(lyr.dataSource), destinationDir + dataSourceName, False)
        
    
Please remember this is only a skeleton of the final solution, the code is not meant to be used straight out of the box, there are modification that are required and the final solution included more logic to prevent some errors from occurring.






No comments:

Post a Comment