Loading

Effective Cartography

Mapping and Analysis with Categorical Data

Our purpose in mapping and exploring geographic data sets may be to discover useful distinctions and associations among tings represented by data and then to portray these on a map. Most of the time, we will be using data that was collected for purposes different from ours. Quite often, the category schemes we find incorporate very fine categorical distinctions, most of which are irrelevant for our purpose. This tutorial will examine some typical problems of using categorical referencing systems and to work some techniques for transforming one category scheme to another. This tutorial will take us through some of the fundamental principles and operations pertaining to tables and feature classes in ArcMap, including graphical, spatial, and attribute selections, and table joins.

Begin With a Question

As always, it is useful to begin with a question. In our case, we are interested in comparing quantities and proportions of Residential Commercial and Industrial land in the cities of Somerville and Cambridge. This is a fairly simple conceptual model and it happens that we have data that can represent the 1999 Land Use from MassGIS. One of our problems is that the massGIS dataset uses a much finer scheme of categories to characterize "Use". So the task of developing the data model for this experiment will require us to recategorize the data. Once we have done this, we should be able to use Spatial Selections and Summaries to generate the information to answer the question. To translate this into the terms introduced in our sketch of Models for Research and Decision Support we are going to build a data model that uses the observations made by the MassGIS as a representation of Land Use in these two citis in 1999. We are then going to transform these data with a couple of asciative procedures: the first, by assciating polygons according to a generalized semantic classes of use, and then associating them according to categories of space corresponding to the spatial categories known as: "Somerville" and "Cambridge".


Download the sample dataset

Deeper Reading

These pages describe in more detail the relationships between intentions, conceptual models, data, referencing systems, metadata, and purposeful transformation and portrayal of data.


Explore the Land Use Data

Lets get to know the data we are going to use to represent land use. It is very important to understand the spatial and categorical granularity of these data so we can evaluate how well it suits our purposes. So read the metadata, and explore the methodology by which the data were collected. Especially examine what is said about the Minimum Mapping Unit and check out the story of the 21 and 37 class coding scheme for land use. Add the 1999 Land Use Layer to your map and explore the attribute table.


A quick Tour of Table Selections and Summaries

Before we get into recategorizing our data, lets take a look at the tools we have for selecting and summarizing information according to spatial association. IN this demonstration, we will use a spatial selection and a table summary to see how we can tabulate land uses per town.

References

Summarize Land Uses in Somerville

  1. Select the town of Somerville from the Towns layer
  2. Use Select by Location to select the L:and Use polygons that have theri centrois in the selected Town polygon
  3. Use a Table Summary to tabulate the land uses in Somerville as observed by the MassGIS in 1999

Discussion

What do you think about these numbers. Lets take Industrial, for example. Do you think that the number we get is exactly correct? Of course, this judgement requires us to establish an ideal conceptual description of what we mean by Industrial Land. If what we mean is Areas larger than one acre that are identifiable from 1999 aerial photographs, then this estimate is likely to be very good. This is a very useful thing to remember: the data are usually a very good representation of what they are intended to represent. Therefore, we can have Very Good Data, if we can somehow match our conceptual identification to the reality of the data. If we are interested in small bits of industrial land that may be tucked into neighborhoods, as is one of the more interesting aspects of Somerville, we may expect that many of these will be omitted in the estimates generated from this particular dataset.

What if rather than thinkiing about the estimates for individual categories, what if we were more interested in comparing the relative proportions among our four land use categories. Could it be that our numbers might be systematically incorrect, and yet the general trends might stil lbe useful in a comparison of the Land Use and Zoning Map? Do you think that the errors of ommission and comission subject to Industrial, Residential and Commercial mightr be random and just as likely to ommit as to commit? If so, then the law of averages might tell us that comparisons made with these data might be useful.


Transformation of Categorical Data with Lookup Tables

When mapping categorical data you should ordinarily use no more than 7 categories. It is very difficult for a user to keep track of more than this as they look from the legend to the map. The categories we choose should be tailored to the specific question we are asking. Our conceptual model of Land Use has just three meaningful distinctions:

  1. Residential
  2. Commercial
  3. Industrial
  4. Other

We could use the metadata and the legend editor to name and re-group the land use, but this would be fairly tedious. So we will explore a more systematic method known as Lookup Tables that let us create a simple table that maps the land use codes to categories. As a start, we can cut and paste information from the metadata into a simple text file that we can open in ArcMap. This table can be used to Lookup supplementary information about each land use code. The lookup process can be auromated through a process known as a Join. In our case, the 21 Class Category Lookup Table can be joined with the attributes table of Land Use polygons. Through this join, the various category names from the lookup table are joined to the apropriate rows in the polygon attribute table.

We can then edit the lookup table to add our own category scheme to more closely match the concepts in our own model. A couple of niggling technicalities we will encounter in this endeavor include the fact that ArcMap won;t let us edit tables based on text files or excel spreadsheets. If we used either of these techniques to create a lookup table, we will need to export the table to the sort of table that ArcMap is more comfortable editing, such as a dBase format table or a Geodatabase table. The differences between these and other formats is discussed inmore depth on the page An Overview of Spatial Data Formats.

References

Some Notes on Exchanging Tables

  1. The most flexible way of dealing with tables in ArcMap is the Dbase (.dbf) format. This format can be read and written by OpenOffice. Excel can read dbf, but won't write to it in the latest version.
  2. ArcMap can open Excel's .xlsx format directly, but these tables cannot be modified wither in Excel or in ArcMap when they are open in ArcMap. So if you create tables in Excel, it is best to convert them to DBF right away by right-clickng them in the arcmap table of contents and choosing Data Export Data.
  3. A DBase table cannot have more than 9 characters in its filed names, nore can these names contain spaces or special chaacters that aeren;t simple letters, Numbers and underbar characters. Nor can a field name begin with a numeral. So keep this in mind if you are creating tables in excel.
  4. A Comma delimited text file is an easy way to encode and exchange information between programs, although some sematic information about data types, such as dates and numbers can get lost in the translation.
  5. The problem of data types not being maintained emerges most often when data values consist of numerals but are actually character strings (like zip codes.) In this case, leading zeros are chopped off, and can sometimes be difficult to add back on.
  6. You can also save tables in an ArcGIS Geodatabase. This format has fewer restrictions on field names, ect, but is less portable.

Play with Lookup Tables

  1. Explore the Data Dictionary for the 21 Class land use code.
  2. Consider how you might use the legend editor to reclassify these codes into a concise categorization that emphasizes the distinctions that are critical for our question.
  3. Copy and paste the data dictionary from the metadata into excel. Today we are lucky because we can paste this into excel and it simply fills in rows and columns of a table that reauirtes few modifications. Often making a lokup table is more of a headache than this.
  4. Save your excel spreadsheet as an .xlsx file in the work_pbc/arcmap/data folder. You wil lalso see several other handy looup tables in there.
  5. Use the Add Data button to add the excel table in arcmap. Note that you have to double-click the excel icon in the add data dialog to see the various worksheets in the excel table.
  6. Join the new lookup table with the land use layer. In our case, we would fill out the Join Dialog
  7. Before we can add a column to this lookup table we need to export it to a DBF table.
  8. Now we will add a new field to hold descriptions for our special land use catagory system. Lets name our new field Simple_lu
  9. We can now select sets of rows in the lookup table and create a higher-order classification by calculating values of the field for selected records. It is easy to update the values foer selected rows by right-clicking the field name and choosing Field Calculator. Remember that Text values should eb surrounded by double quotes.
  10. Now you can use the symbology editor to easily make a map showing your new classes!!
  11. Note that when you update values in this lookup table, the land use feature-class that is joined with the table automatically "looks up" the new class values, and these are accessible in the legend editor!
  12. Now, repeat your summary of land uses in Somerville using the new table categories.

The Build-Out Study

Now you should be able to take what we have learned above to recategorize the zoning layer and perform a similar summary for the amount of land in Somerville that is zoned for various land uses. Note that my zoning lookup table (in the work_pbc/arcmap/data folder) includes the an estimate of the Floor Area Ratio for each zoning district. Challenge yourself to figure out how to estimate the amount of building square feet might be buildable according to zoning in Somerville. After that, challenge yourself to critique this model in terms of its conception, and the fitness of the data and procedures that you have applied.


Nationwide Business Data

A useful source of categorical data with a very useful lookup table

In your studies of neighborhoods and their contexts you may find it useful to look at a very fine-grained representation aof commercial activity that has been taken from the ESRI Business analyst. A sample for Cambridge and Somerville is included in the Sources/InfoUSA folder. Note that the metadata for this layer can be accessed by righ-clik ncg the table in the table of contents or in ArcCatalog and choosing the Data > View Item Description option. Not that if you are in an unpatched version of ArcMap 10, then this option only works through ArcCatalog. The metadata for these layers is lacking in explanation for the methodology used to collect the data on entities and their locations and other attributes. This lack of documentation makes it very difficult to talk precisely about what these data mean and what they are useful for. Nevertheless they are very interesting data to look at.

If we are interested in comparing neighborhoods with regard to the presenrs of coffee shops and laundromats, it would not be to odifficult to check this data to get a notion of its accuracy with regard to this particular application. It is decidedly more difficult to check the 1990 data -- particularly for a place for which we don't have first-hand memory.

The sources/InfoUSA folder includes one point file taken from the 2010 ESRI Business Analyst software and another taken from a 1999 CD product published by InfoUSA. The Business analyst data fro the entire US can be accessd at the GSD by going to Goliath/geo/esri_business analyst. The 1999 data for the entire US can be extracted by GSD users by following instructions at .