PostGIS

PostGIS is a spatial extension for the PostgreSQL relational database that was created by Refractions Research Inc, as a spatial database technology research project. Refractions is a GIS and database consulting company in Victoria, British Columbia, Canada, specializing in data integration and custom software development.

PostGIS is now a project of the OSGeo Foundation and is developed and funded by many FOSS4G developers and organizations all over the world that gain great benefit from its functionality and versatility.

The PostGIS project development group plans on supporting and enhancing PostGIS to better support a range of important GIS functionality in the areas of OpenGIS and SQL/MM spatial standards, advanced topological constructs (coverages, surfaces, networks), data source for desktop user interface tools for viewing and editing GIS data, and web-based access tools.

SELECT superhero.name
FROM city, superhero
WHERE ST_Contains(city.geom, superhero.geom)
AND city.name = 'Gotham';

Docker

docker run --name some-postgis -e POSTGRES_PASSWORD=mysecretpassword -d postgis/postgis

docker run -it --link some-postgis:postgres --rm postgres sh -c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR" -p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres'

TOC

PostGIS Installation
Using PostGIS: Data Management and Queries
Raster Data Management, Queries, and Applications
Using PostGIS Geometry: Building Applications
Performance tips
PostGIS Reference
Raster Reference
Topology
Address Standardizer
PostGIS Extras
PostGIS Special Functions Index
Appendix
Resources

Cheatsheet

psql
PostGIS

PostGIS Installation

Link

Using PostGIS: Data Management and Queries

Link

GIS Objects
PostGIS Geography Type
Using OpenGIS Standards
Loading GIS (Vector) Data
Retrieving GIS Data
Building Indexes
Complex Queries

Raster Data Management, Queries, and Applications

Link

Loading and Creating Rasters
Raster Catalogs
Building Custom Applications with PostGIS Raster

Using PostGIS Geometry: Building Applications

Link

Performance tips

Link

Small tables of large geometries
CLUSTERing on geometry indices
Avoiding dimension conversion
Tuning your configuration

PostGIS Reference

Link

PostGIS Geometry/Geography/Box Data Types
Table Management Functions
Geometry Constructors
Geometry Accessors
Geometry Editors
Geometry Validation
Spatial Reference System Functions
Geometry Input
Geometry Output
Operators
Spatial Relationships
Measurement Functions
Geometry Processing
Affine Transformations
Clustering Functions
Bounding Box Functions
Linear Referencing
Trajectory Functions
SFCGAL Functions
Long Transaction Support
Version Functions
Grand Unified Custom Variables (GUCs)
Troubleshooting Functions

Raster Reference

Link

Raster Support Data types
Raster Management
Raster Constructors
Raster Accessors
Raster Band Accessors
Raster Pixel Accessors and Setters
Raster Editors
Raster Band Editors
Raster Band Statistics and Analytics
Raster Inputs
Raster Outputs
Raster Processing
Raster Operators
Raster and Raster Band Spatial Relationships
Raster Tips

Topology

Link

Topology Types
Topology Domains
Topology and TopoGeometry Management
Topology Constructors
Topology Editors
Topology Accessors
Topology Processing
TopoGeometry Constructors
TopoGeometry Editors
TopoGeometry Accessors
TopoGeometry Outputs
Topology Spatial Relationships

Address Standardizer

Link

This is a fork of the PAGC standardizer (original code for this portion was PAGC PostgreSQL Address Standardizer).

PostGIS Extras

Link

PostGIS Special Functions Index

Link

PostGIS Aggregate Functions
PostGIS Window Functions
PostGIS SQL-MM Compliant Functions
PostGIS Geography Support Functions
PostGIS Raster Support Functions
PostGIS Geometry / Geography / Raster Dump Functions
PostGIS Box Functions
PostGIS Functions that support 3D
PostGIS Curved Geometry Support Functions
PostGIS Polyhedral Surface Support Functions
PostGIS Function Support Matrix
New, Enhanced or changed PostGIS Functions

Appendix

Link

Resources

psql

psql is the interactive unix command line tool for interacting with Postgres/PostGIS.

Common Commands

log-in / connect to a database name by doing psql -d database_name
for doing admin type things such as managing db users, log in as the postgres user: psql postgres;
to create a database: CREATE DATABASE database_name;
to connect to a database: \c database_name;
to delete a database DROP DATABASE database_name;
to connect when starting psql use the -d flag like: psql -d nyc_noise
to list all databases: \l
to list all the table in a database: \dt
to list extensions installed in a database: \dx
to quit psql: \q
to grant privileges to a user (requires logging in as postgres ):

GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;
to enable the hstore extension ( for key : value pairs, useful when working with OpenStreetMap data) do: CREATE EXTENSION hstore
to view columns of a table: \d table_name
to list all columns in a table (helpful when you have a lot of columns!):
select column_name from information_schema.columns where table_name = 'my_table' order by column_name asc;
to rename a column:
alter table noise.hoods rename column noise_sqkm to complaints_sqkm;
to change a column’s data type:
alter table noise.hoods alter column noise_area type float;
to compute values from two columns and assign them to another column: update noise.hoods set noise_area = noise/(area/1000);
to search by wildcard use the like (case sensitive) or ilike (treats everything as lowercase) command:
SELECT count(*) from violations where inspection_date::text ilike '2014%';

to insert data into a table:

INSERT INTO table_name (column1, column2)
VALUES
	(value1, value2);

to insert data from another table:

INSERT INTO table_name (value1, value2)
SELECT column1, column2
FROM other_table_name

to remove rows using a where clause:
DELETE FROM table_name WHERE some_column = some_value

list all column names from a table in alphabetical order:

select column_name
from information_schema.columns
where table_schema = 'public'
and table_name = 'bk_pluto'
order by column_name;

List data from a column as a single row, comma separated:
1. SELECT array_to_string( array( SELECT id FROM table ), ',' )
2. SELECT string_agg(id, ',') FROM table
rename an existing table:
ALTER TABLE table_name RENAME TO table_name_new;
rename an existing column of a table:
ALTER TABLE table_name RENAME COLUMN column_name TO column_new_name;

Find duplicate rows in a table based on values from two fields:

  select * from (
    SELECT id,
    ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
    FROM Photos
  ) dups
  where
  dups.Row > 1

credit: MatthewJ on stack-exchange

Bulk Queries are efficient when doing multiple inserts or updates of different values. For example,

--- update some rows with new values
  UPDATE election_results o
  SET votes=n.votes, pro=n.pro
  FROM (VALUES (1,11,9),
               (2,44,28),
               (3,25,4)
        ) n(county_id,votes,pro)
  WHERE o.county_id = n.county_id;

--- insert new values
  INSERT INTO election_results (county_id,voters,pro)
  VALUES  (1, 11,8),
          (12,21,10),
          (78,31,27);    

The INSERT and UPDATE queries can be combined to what is often referred to as an UPSERT query:

  WITH
  -- make a temporary (as in for this query only) table of values
  n(ip,visits,clicks) AS (
    VALUES ('192.168.1.1',2,12),
           ('192.168.1.2',6,18),
           ('192.168.1.3',3,4)
  ),
  -- update existing rows
  upsert AS (
    UPDATE page_views o
    SET visits=n.visits, clicks=n.clicks
    FROM n WHERE o.ip = n.ip
    RETURNING o.ip
  )
  -- insert missing rows
  INSERT INTO page_views (ip,visits,clicks)
  SELECT n.ip, n.visits, n.clicks FROM n
  WHERE n.ip NOT IN (
    SELECT ip FROM upsert
  );    

credit: FASTER DATA UPDATES WITH CARTODB

Importing Data

import data from a CSV file using the COPY command:

  COPY noise.locations (name, complaint, descript, boro, lat, lon)
  FROM '/Users/chrislhenrick/tutorials/postgresql/data/noise.csv' WITH CSV HEADER;

import a CSV file “AS IS” using csvkit’s csvsql (requires python, pip, csvkit, psycopg2):
```
  csvsql --db postgresql:///nyc_pluto --insert 2012_DHCR_Bldg.csv
```

Exporting Data

export data as a CSV with Headers using COPY:

  COPY dob_jobs_2014 to '/Users/chrislhenrick/development/nyc_dob_jobs/data/2014/dob_jobs_2014.csv' DELIMITER ',' CSV Header;

to the current workspace without saving to a file:

  COPY (SELECT foo FROM bar) TO STDOUT CSV HEADER;

from the command line w/o connecting to postgres:

  psql -d dbname -t -A -F"," -c "select * from table_name" > output.csv

Joining Tables Using a Shared Key

From CartoDB’s tutorial Join data from two tables using SQL

Join two tables that share a key using an INNER JOIN(Postgresql’s default join type):

  SELECT table_1.the_geom,table_1.iso_code,table_2.population
  FROM table_1, table_2
  WHERE table_1.iso_code = table_2.iso

To update a table’s data based on that of a join:

  UPDATE table_1 as t1
  SET population = (
    SELECT population
    FROM table_2
    WHERE iso = t1.iso_code
    LIMIT 1
  )

aggregate data on a join (if table 2 has multiple rows for a unique identifier):

  SELECT
    table_1.the_geom,
    table_1.iso_code,
    SUM(table_2.total) as total
  FROM table_1, table_2
  WHERE table_1.iso_code = table_2.iso
  GROUP BY table_1.iso_code, table_2.iso

update the value of a column based on the aggregate join:

  UPDATE table_1 as t1
  SET total =  (
    SELECT SUM(total)
    FROM table_2
    WHERE iso = t1.iso_code
    GROUP BY iso
  )

Upgrading Postgres

This Tutorial was very helpful for upgrading on Mac OS X via homebrew.

WARNING: Back up your data before doing this incase you screw up like I did!

Basically the steps are:

Shut down Postgresql:
launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
Create a new Postgresql9.x data directory:
initdb /usr/local/var/postgres9.4 -E utf8

Run the pg_upgrade command:

 pg_upgrade \
 -d /usr/local/var/postgres \
 -D /usr/local/var/postgres9.4 \
 -b /usr/local/Cellar/postgresql/9.3.5_1/bin/ \
 -B /usr/local/Cellar/postgresql/9.4.0/bin/ \
 -v

Change kernel settings if necessary:
```
 sudo sysctl -w kern.sysv.shmall=65536
 sudo sysctl -w kern.sysv.shmmax=16777216
```
- I also ran sudo vi /etc/sysctl.conf and entered the same values:
```
 kern.sysv.shmall=65536
 kern.sysv.shmmax=16777216
```
- re-run the pg_upgrade command in step 3

Move the new data directory into place:

 cd /usr/local/var
 mv postgres postgres9.2.4
 mv postgres9.3 postgres

Start the new version of PostgreSQL: launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
- check to make sure it worked:
```
psql postgres -c "select version()"
psql -l
```
Cleanup:
- vacuumdb --all --analyze-only
- analyze_new_cluster.sh*
- delete_old_cluster.sh*
- brew cleanup postgresql
  (* scripts were generated in same the directory where pg_upgrade was ran)

PostGIS

PostGIS is the extension for Postgres that allows for working with geometry data types and doing GIS operations in Postgres.

Common Commands

to enable PostGIS in a Postgres database do: CREATE EXTENSION postgis;
to enable PostGIS topology do: CREATE EXTENSION postgis_topology;
to support OSM tags do: CREATE EXTENSION hstore;

create a new table for data from a CSV that has lat and lon columns:

  create table noise.locations
  (                                     
	name varchar(100),
	complaint varchar(100), descript varchar(100),
	boro varchar(50),
	lat float8,
	lon float8,
	geom geometry(POINT, 4326)
  );

inputing values for the geometry type after loading data from a CSV:
update noise.locations set the_geom = ST_SetSRID(ST_MakePoint(lon, lat), 4326);
adding a geometry column in a non-spatial table:
select addgeometryColumn('table_name', 'geom', 4326, 'POINT', 2);
calculating area in EPSG 4326:
alter table noise.hoods set area = (select ST_Area(geom::geography));

Common Spatial Queries

You may view more of these in my intro to Visualizing Geospatial Data with CartoDB.

Find all polygons from dataset A that intersect points from dataset B:

SELECT a.*
FROM table_a_polygons a, table_b_points b
WHERE ST_Intersects(a.the_geom, b.the_geom);

Find all rows in a polygon dataset that intersect a given point:

-- note: geometry for point must be in the order lon, lat (x, y)
SELECT * FROM nyc_tenants_rights_service_areas
where
ST_Intersects(
  ST_GeomFromText(
   'Point(-73.982557 40.724435)', 4326
  ),
  nyc_tenants_rights_service_areas.the_geom    
);

Or using ST_Contains:

SELECT * FROM nyc_tenants_rights_service_areas
where
st_contains(
  nyc_tenants_rights_service_areas.the_geom,
  ST_GeomFromText(
   'Point(-73.917104 40.694827)', 4326
  )      
);

Counting points inside a polygon:

With ST_Containts():

SELECT us_counties.the_geom_webmercator,us_counties.cartodb_id,
count(quakes.the_geom)
AS total
FROM us_counties JOIN quakes
ON st_contains(us_counties.the_geom,quakes.the_geom)
GROUP BY us_counties.cartodb_id;

To update a column from table A with the number of points from table B that intersect table A’s polygons:

update noise.hoods set num_complaints = (
	select count(*)
	from noise.locations
	where
	ST_Intersects(
		noise.locations.geom,
		noise.hoods.geom
	)
);

Select data within a bounding box
Using ST_MakeEnvelope

HINT: You can use bboxfinder.com to easily grab coordinates of a bounding box for a given area.

SELECT * FROM some_table
where geom && ST_MakeEnvelope(-73.913891, 40.873781, -73.907229, 40.878251, 4326)

Select points from table a that do not fall within any polygons in table b
This method makes use of spatial indexes and the indexes on gid for better performance

SELECT
  a.gid,
  a.st_address,
  a.city,
  a.st_num,
  a.the_geom
FROM
  points AS a LEFT JOIN
  polygons AS b ON
  ST_Intersects(a.the_geom, b.the_geom)
WHERE b.gid IS NULL;

credit: Nicklas Avén

Make a line from a series of points

SELECT ST_MakeLine (the_geom ORDER BY id ASC)
AS the_geom, route
FROM points_table
GROUP BY route;

Order points in a table by distance to a given lat lon
This one uses CartoDB’s built-in function CDB_LatLng which is short hand for doing:
SELECT ST_Transform( ST_GeomFromText( 'Point(-73.982557 40.724435)',),4326)

SELECT * FROM table
ORDER BY the_geom <->
CDB_LatLng(42.5,-73) LIMIT 10;

Access the previous row of data and get value (time, value, number, etc) difference

WITH calc_duration AS (
 SELECT
 cartodb_id,
 extract(epoch FROM (date_time - lag(date_time, 1) OVER(ORDER BY date_time))) AS duration_in_seconds
 FROM tracking_eric
 ORDER BY date_time
)
UPDATE tracking_eric
SET duration_in_seconds = calc_duration.duration_in_seconds
FROM calc_duration
WHERE calc_duration.cartodb_id = tracking_eric.cartodb_id

Select population density

In this query we cast the geometry data type to the geography data type to get units of measure in meters.

SELECT pop_sqkm,
 round( pop / (ST_Area(the_geom::geography)/1000000))
 as psqkm
 FROM us_counties

Repair Invalid Geometries
Sometimes when data is imported into PostGIS geometries get screwed up. If you get an error message like:

ERROR:  GEOSIntersects: TopologyException: side location conflict at -116.03227135270012 33.309736898054787

You can try doing:

UPDATE tablename SET geom=ST_MAKEVALID(geom) WHERE NOT ST_ISVALID(geom);

Spatial Indexing

Makes queries hella fast. OSGeo has a good tutorial.

Basically the steps are:
CREATE INDEX table_name_gix ON table_name USING GIST (geom);
VACUUM ANALYZE table_name
CLUSTER table_name USING table_name_gix;
Do this every time after making changes to your dataset or importing new data.

Importing Spatial Data to PostGIS

Using shp2pgsql

Create an SQL file with a CREATE TABLE statement:

 shp2pgsql -I -s 4326 nyc-pediacities-hoods-v3-edit.shp noise.hoods > noise.sql

Or for using the geography data type do:

 shp2pgsql -G -I nyc-pediacities-hoods-v3-edit.shp noise.nyc-pediacities-hoods-v3-edit_geographic > nyc_pediacities-hoods-v3-edit.sql

If you’d like to transform your data from one SRID to another, you may pass two EPSG codes separated by a colon to the -s flag:

 shp2pgsql -I -s 2236:4326 shapefile_in_2236.shp schema.table-name > table-name.sql

Then run the SQL using psql:

 psql -d nyc_noise -f noise.sql

Or for the geography type above:

 psql -d nyc_noise -f nyc_pediacities-hoods-v3-edit.sql

Alternatively do both of the above in a single command:

 shp2pgsql -I -s 4326 nyc-pediacities-hoods-v3-edit.shp noise.hoods | psql -d nyc_noise

Using osm2pgsql

To import an OpenStreetMap extract in PBF format do:
osm2pgsql -H localhost --hstore-all -d nyc_from_osm ~/Downloads/newyorkcity.osm.pbf

Using ogr2ogr

Example importing a GeoJSON file into a database called nyc_pluto:

ogr2ogr -f PostgreSQL \
PG:"host='localhost' user='chrislhenrick' port='5432' \
dbname='nyc_pluto' password=''" \
bk_map_pluto_4326.json -nln bk_pluto

Exporting Spatial Data from PostGIS

The two main tools used to export spatial data with more complex geometries from Postgres/PostGIS than points are pgsql2shp and ogr2ogr.

Using pgsql2shp

pgsql2shp is a tool that comes installed with PostGIS that allows for exporting data from a PostGIS database to a shapefile format. To use it you need to specify a file path to the output shapefile (just stating the basename with no extension will output in the current working directory), a host name (usually this is localhost), a user name, a password for the user, a database name, and an SQL query.

pgsql2shp -f <path to output shapefile> -h <hostname> -u <username> -P <password> databasename "<query>"

A sample export of a shapefile called my_data from a database called my_db looks like this:

pgsql2shp -f my_data -h localhost -u clhenrick -P 'mypassword' my_db "SELECT * FROM my_data "

Using ogr2ogr

Note: You may need to set the GDAL_DATA path if you git this error:

ERROR 4: Unable to open EPSG support file gcs.csv.
Try setting the GDAL_DATA environment variable to point to the
directory containing EPSG csv files.

If on Linux / Mac OS do this: export GDAL_DATA=/usr/local/share/gdal
If on Windows do this: C:\> set GDAL_DATA=C:\GDAL\data

To Export Data
Use ogr2ogr as follows to export a table (in this case a table called dob_jobs_2014) to a GeoJSON file (in this case a file called dob_jobs_2014_geocoded.geojson):

ogr2ogr -f GeoJSON -t_srs EPSG:4326 dob_jobs_2014_geocoded.geojson \
PG:"host='localhost' dbname='dob_jobs' user='chrislhenrick' password='' port='5432'" \
-sql "SELECT bbl, house, streetname, borough, jobtype, jobstatus, existheight, proposedheight, \
existoccupancy, proposedoccupany, horizontalenlrgmt, verticalenlrgmt, ownerbusinessname, \
ownerhousestreet, ownercitystatezip, ownerphone, jobdescription, geom \
FROM dob_jobs_2014 WHERE geom IS NOT NULL"

note: you must select the column containing the geometry (usually geom or wkb_geometry) for your exported layer to have geometry data.