Chapter 2 Basic Knowledge on GIS

2.1 Geographic Information Systems

Geographic Information Systems (GIS) store, analyze, and visualize data for geographic positions on Earth’s surface. The four major features of GIS are listed below.

  • Create geographic data.
  • Manage spatial data in a database.
  • Analyze and find patterns.
  • Map Visualization.

In this notebook, we would focus on the issue about spaitial analysis and making visualized maps.

The detailed concept of GIS can be derived on the textbook “Introduction to Geographic Information Systems (Kang Tsung Chang)”.

2.2 Well-Known Text (WKT)

Well-known text (WKT) is a text markup language for representing vector geometry objects. The common type of geometric objects are marked as shown in the table below.

Type Graph WKT
Point POINT (3 3)
LineString LINESTRING (1 4, 3 3, 5 5)
Polygon POLYGON ((1 4, 2 2, 4 1, 5 5, 1 4))

The WKT types illustrated above are single geometry. However, sometimes we want to mark all the geometry together to signify the same attribute, single geometry may seem to be difficult to express. For instance, there are five campuses in NCTU. If we want to combine all the points into one single geometry, in order to represent “NCTU”, it is hard to express by POINT text. Thus, we need multi geometries to mark it.

Multi geometries are available to represent more than one geometry of the same dimension in a single object, including MultiPoint, MultiLineString, MultiPolygon. Geometry Collection is the geometries of different dimensions.

2.3 Shapefile

The shapefile format is a universal geospatial vector data format. It is developed and regulated by Esri, which is an international supplier of GIS software. The shapefile format can illustrate the vector features: points, lines, and polygons. In fact, shapefile is not a single file, but a collection containing four mandatory files. Each file is briefly introduced as the following.

File Features
.shp shape format; the feature geometry
.dbf attribute format; attributes for each shape, stored as two-dimensional table
.prj projection description, using a well-known text representation of coordinate reference systems
.shx shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly

2.4 Coordinate Reference System (CRS)

Geometric data is not geospatial unless it is accompanied by coordinate reference system (CRS) information, which allows GIS to display and operate the data accurately. It includes two major components, datum and projection. Datum is a model of the shape of the earth. It has angular units (degrees) and defines the starting point (0,0), and hence the coordinate can represent a specific spot on the earth. Projection is a mathematical transformation of the angular measurements on a round earth to a flat surface. The units associated with a given projection are usually linear (feet, meters, etc.).

There are two common types of coordinate systems:

Type Features Example
Geographic Coordinate Systems A global or spherical coordinate system such as latitude–longitude. EPSG:4326
(World Geodetic System 1984)
Projected Coordinate System Based on a map projection such as transverse Mercator, Albers equal area, or Robinson, which project the spherical surface onto a two-dimensional coordinate plane. EPSG:3826
(TWD97 / TM2 zone 121, Taiwan Datum 1997)

To understand the difference of CRS type clearly, let’s take NCTU for example. The coordinate of it can be recorded as (120.999645, 24.789071) in EPSG:4326. x coordinate represents longitude, while y coordinate is latitude. Also, it can be recorded as (249964.105, 2742415.017) in EPSG:3826. In this projected coordinate system, we define 121°E as standard line, and 250 kilometers west of it is defined as false northing, while the equator is defined as false easting. x coordinate represents that the spot is located at 249964.105 meters east of the false northing. y coordinate illustrates that the spot is located at 2742415.017 meters north of the equator.
Something interesting… The center of NCTU (Guangfu Campus) is very close to standard line (249964.105 m ~ 250 km). In fact 121°E traverses the campus!

2.5 Why Use R?

R is a programming language mainly for statistical computing and graphics. With a wide range of packages, R supports advanced geospatial statistics, modeling and visualization. In addition, integrated development environments such as RStudio have made it more user-friendly, allowing us to easily analyze spatial data and make maps.

Using R to do analysis, we need first to download the software and its development environments (RStudio).

2.6 Required Packages

To conduct the geocomputation in R, it is required to download sf package.

To analyze the data more efficiently, it is suggested to take advantage of dplyr package.

To display the map, ggplot2 package is strongly recommended. (There are other packages to make the map as well; nonetheless, ggplot2 is more powerful and easier to learn.)

install.packages("sf")
install.packages("dplyr")
install.packages("ggplot2")
library(sf)
library(dplyr)
library(ggplot2)

Data
Data required in this book are provided here, please download the data and place it in the same directory of the R script.

Besides download data directly from the link above, we can connect to Github to install the package TWspdata to obtain the same data set. The code is shown below.

install.packages("devtools")
devtools::install_github("ChiaJung-Yeh/TWspdata")
library(TWspdata)