Thursday, October 18, 2018

Shapefiles: Long attribute names

Attribute names used in shapefiles are limited to a ridiculous 10 (in words: ten) characters and there is no official means to map these shortened names to longer, more telling ones. YAML to the rescure.

In order to provide a mapping for our customers we ship an additional, dead-simple YAML file that makes it possible to at least look up what the original, unabridged name was. The format is as follows:
shortName0: "A_rather_lengthy_attribute_name"
shortName1: "Another_rather_lengthy_attribute_name"
shortName2: "An_attribute_name_no_sane_person_would_come_up_with"
So essentially it is a table but as it is in valid YAML format there is no reason why it should not become a common sight and supported by GIS software. Here's example of how generated shapefiles my look like:
linestring.cpg point.cpg polygon.cpg
linestring.dbf        point.dbf        polygon.dbf
linestring.prj        point.prj        polygon.prj
linestring.shp        point.shp        polygon.shp
linestring.shx        point.shx        polygon.shx
linestring.yml        point.yml        polygon.yml
Mandatory files are:

  • .shp — shape format; the feature geometry itself
  • .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
  • .dbf — attribute format; columnar attributes for each shape, in dBase IV format
Additional standard files:
  • .prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
  • .cpg — used to specify the code page (only for .dbf) for identifying the character encoding to be used
Additional non-standard file:
  • .yml — attribute name map (for .dbf) assigns the shortened 10 character attribute name one that can have an (in principle) arbitrary number of characters - File in YAML format as described above.

No comments:

Post a Comment