Thursday, October 18, 2018

Shapefiles: Long attribute names

Attribute names used in shapefiles are limited to a ridiculous 10 (in words: ten) characters and there is no official means to map these shortened names to longer, more telling ones. YAML to the rescure.

In order to provide a mapping for our customers we ship an additional, dead-simple YAML file that makes it possible to at least look up what the original, unabridged name was. The format is as follows:
shortName0: "A_rather_lengthy_attribute_name"
shortName1: "Another_rather_lengthy_attribute_name"
shortName2: "An_attribute_name_no_sane_person_would_come_up_with"
So essentially it is a table but as it is in valid YAML format there is no reason why it should not become a common sight and supported by GIS software. Here's example of how generated shapefiles my look like:
linestring.cpg point.cpg polygon.cpg
linestring.dbf        point.dbf        polygon.dbf
linestring.prj        point.prj        polygon.prj
linestring.shp        point.shp        polygon.shp
linestring.shx        point.shx        polygon.shx
linestring.yml        point.yml        polygon.yml
Mandatory files are:

  • .shp — shape format; the feature geometry itself
  • .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
  • .dbf — attribute format; columnar attributes for each shape, in dBase IV format
Additional standard files:
  • .prj — projection format; the coordinate system and projection information, a plain text file describing the projection using well-known text format
  • .cpg — used to specify the code page (only for .dbf) for identifying the character encoding to be used
Additional non-standard file:
  • .yml — attribute name map (for .dbf) assigns the shortened 10 character attribute name one that can have an (in principle) arbitrary number of characters - File in YAML format as described above.

Shapefiles: Specification of character encoding

CPG files are part of the multi-file shapefile (sic!) format identifying the encoding used by the DBF. The issue with this file is that it is incredibly hard to find any document on how the encoding is actually identified.At the very best you find that it contains a single string and that you use UTF-8 to identify utf-8 encoded Unicode. That doesn't help a lot if you use codepage 1252. It took my quite some effort to find a blog entry that has a translation of what originally was found on a Belarus website (which seems to have crossed the river Styx).

Encoding ID Encoding name Additional ID Other names
1252 Western iso-8859-1 (*) iso8859-1, iso_8859-1, iso-8859-1, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO_646, irv:1991, ISO646-US, us, IBM367, cp367, csASCII, latin1, iso_8859-1:1987, iso-ir-100, ibm819, cp819, Windows-1252
20105 ASCII us-ascii us-acii, ascii
28592 Central European (ISO) iso-8859-2 iso8859-2, iso-8859-2, iso_8859-2, latin2, iso_8859-2:1987, iso-ir-101, l2, csISOLatin2
1250 Central European (Windows) Windows-1250 Windows-1250, x-cp1250
1251 Cyrillic (Windows) Windows-1251 Windows-1251, x-cp1251
1253 Greek (Windows) Windows-1253 Windows-1253
1254 Turkish (Windows) Windows-1254 Windows-1254
932 Japanese (Shift-JIS) shift_jis shift_jis, x-sjis, ms_Kanji, csShiftJIS, x-ms-cp932
51932 Japanese (EUC) x-euc-jp Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese, x-euc-jp, x-euc
50220 Japanese (JIS) iso-2022-jp csISO2022JP, iso-2022-jp
1257 Baltic (Windows) Windows-1257 windows-1257
950 Traditional Chinese (BIG5) big5 big5, csbig5, x-x-big5
936 Simplified Chinese (GB2312) gb2312 GB_2312-80, iso-ir-58, chinese, csISO58GB231280, csGB2312, gb2312
20866 Cyrillic (KOI8-R) koi8-r csKOI8R, koi8-r
949 Korean (KSC5601) ks_c_5601 ks_c_5601, ks_c_5601-1987, korean, csKSC56011987
1255 (logical) Hebrew (ISO-logical) Windows-1255 iso-8859-8i
1255 (visual) Hebrew (ISO-Visual) iso-8859-8 ISO-8859-8 Visual, ISO-8859-8 , ISO_8859-8, visual
862 Hebrew (DOS) dos-862 dos-862
1256 Arabic (Windows) Windows-1256 Windows-1256
720 Arabic (DOS) dos-720 dos-720
874 Thai Windows-874 Windows-874
1258 Vietnamese Windows-1258 Windows-1258
65001 Unicode UTF-8 UTF-8 UTF-8, unicode-1-1-utf-8, unicode-2-0-utf-8
65000 Unicode UTF-7 UNICODE-1-1-UTF-7 utf-7, UNICODE-1-1-UTF-7, csUnicode11UTF7, utf-7
50225 Korean (ISO) ISO-2022-KR ISO-2022-KR, csISO2022KR
52936 Simplified Chinese (HZ) HZ-GB-2312 HZ-GB-2312
28594 Baltic (ISO) iso-8869-4 ISO_8859-4:1988, iso-ir-110, ISO_8859-4, ISO-8859-4, latin4, l4, csISOLatin4
28585 Cyrillic (ISO) iso_8859-5 ISO_8859-5:1988, iso-ir-144, ISO_8859-5, ISO-8859-5, cyrillic, csISOLatinCyrillic, csISOLatin5
28597 Greek (ISO) iso-8859-7 ISO_8859-7:1987, iso-ir-126, ISO_8859-7, ISO-8859-7, ELOT_928, ECMA-118, greek, greek8, csISOLatinGreek
28599 Turkish (ISO) iso-8859-9 ISO_8859-9:1989, iso-ir-148, ISO_8859-9, ISO-8859-9, latin5, l5, csISOLatin5

(*) except when 128-159 is used, use Windows-1252

Monday, September 24, 2018

Debian under Windows 10: Mount/Umount

Debian GNU/Linux is available for Windows users through the Windows store as an app for the Windows Subsystem for Linux (WSL). That works quite nicely but, USB drives can be a bit annoying to use - there is no automatic mounting.

To facilitate mounting/unmounting these drives I wrote two BASH functions that take a device letter (it does not matter if you use uppercase or lowercase) as an argument.

  • wmount mounts the drive (and generates a mount point if necessary).
  • wumount unmounts the drive (and keeps the mount point for later use)
Maybe you have some use for these functions

wmount () {
  if [[ $# -ne 1 || ! ($1 =~ [a-zA-Z]) ]]; then
    echo Usage $0 [drive_letter]
    return 1
  fi

  drive_letter=$(echo $1|tr '[:lower:]' '[:upper:]')
  mount_dir="/mnt/$(echo $1|tr '[:upper:]' '[:lower:]')"

  if [[ ! -d $mount_dir ]]; then
    sudo mkdir $mount_dir
  fi

 sudo mount -t drvfs $drive_letter: $mount_dir
}

wumount () {
  if [[ $# -ne 1 || ! ($1 =~ [a-zA-Z]) ]]; then
    echo Usage $0 [drive_letter]
    return 1
  fi

  mount_dir="/mnt/$(echo $1|tr '[:upper:]' '[:lower:]')"
  if [[ -d $mount_dir ]]; then
    grep -qs "$mount_dir " /proc/mounts
    if [[ $? ]]; then
      sudo umount $mount_dir
    fi
  fi
}
ldsajffd

Friday, February 10, 2017

GNU/Linux shell: A bit of random

From a number of different tasks of equal priority listed in a file (for the sake of argument listed in the lines 123 through 321) I wanted to choose the next one to perform in a random manner using ordinary shell tools. I came up with this solution:
    seq 123 312 | sort -R | head -1
This generates a sequence from 123 through 321, sorts it in a random manner and prints the first number of the randomized list.

This solution neither generates cryptographic random numbers nor saves memory or CPU time but it is straightforward and does the job at hand without any fancy tools or tricks. Perhaps you like it.

Thursday, January 26, 2017

My Polar Sea Ice Page may soon be dysfunctional thanks to Trump

If Donald Trump goes on with his War on Truth, he will likely drain the source I tap with my page. I use data provided by the US National Snow & Ice Data Center and it is very likely that Trump will do anything he can to stop them from providing current data about the sea ice coverage. For the very simple reason that the plain data alone already clearly shows that the world's climate is going haywire. This is because the Arctic and Antarctic region is hit hardest by climate change in terms.

Sunday, December 11, 2016

Interactive Maps and JavaScript

There are two major libraries for displaying interactive maps using JavaScript.

  • Leaflet is quite easy to use, yet powerful enough for many if not most applications.
  • OpenLayers 3 has more features but requires you to learn more before you can use it.

OpenLayers is used at our company to provide an application that is used in a number of locations, namely for the map of Cologne (at www.cologne.de). Seems as if they don't update their English pages that regularly, koeln.de is updated considerably more frequently.

Anyway, for a private project I use Leaflet instead. It visualizes the location of the Stolpersteins in Bonn and the nearby region. You can find it at stolpersteine.codeforbonn.de. It is part of the Code for Bonn project which in turn is part of the Code for Germany project, the German twin of Code for America.

The stolpersteins shown are taken directly from the OpenStreetMap database so adding a new stolperstein to the database means improving OpenStreetMap.

Math in the World of HTML and JavaScript

At work I use a lot of JavaScript as our company (among other products) offers a map client that runs in browsers. As a side effect I frequently come across interesing JavaScript libraries. I find worth sharing. Here is an example of using JavaScript to display math formulas in a nicely formatted manner. To give an example, Einstein's famous formula will look like this:

$E=mc^2$

The library used for this display is MathJax. It should be sufficient for everything formula you reasonably expect on a website that does not dig deep into natural sciences or mathemtatics. In most cases it should even suffice for these fields of application. Here is a slightly odder example:

$U^{ik} = \frac{c_g^2}{4\pi G} = \left(-g^{im} \Phi_{mr}\Phi^{rk} + \frac{1}{4} g^{ik}\Phi_{rm}\Phi^{mr} \right), \quad -\nabla_\beta U^{\alpha\beta} = \Phi_k^\alpha J^k$

Well, maybe we better say it should be sufficient for almost any application ;-)