Revision as of 07:29, 10 June 2016

The following database rules explain the usage of the database called oedb, which will become a part of the openmod internet presence. For further details see also here.

Data Documentation

All data included in the databases has be documented! On this wiki page you learn, how to do so.
All abreviations have to be documentated in the!

Naming of Data

The data in the database is organised in schemata and tables, which names are important to find around.

Database Name

The name of the database is oedb.

Database Schema

The structure of the database is realised via the naming of the schemata, which follows the following rules:

always lower case
no points, no commas
no spaces
no dates
use underscores

The name starts with the type of the schema:
1. orig for original data
2. calc for processed data

The name includes the distinct subject area or source.

Example: orig_vg250

Database Table

always lower case
no points, no commas
no spaces
no dates
use underscores

name starts with the source (e.g. zensus)
main value (e.g. population)
if separated by [attribute] (e.g. by_gender)
with resolution [tupel] (e.g. per_mun)

Example: zensus_population_by_gender_per_mun

Data Integrity

Data Integrity is one aspect of ensuring data quality the in the oedb.

General

Primary Key [PK]
Grants to oeuser

Geografic Data

Attribut name is geom.
Data type is geometry (or raster).
One of the Geometric Types is defined (https://www.postgresql.org/docs/current/static/datatype-geometric.html)
The CRS is defined as EPSG (http://spatialreference.org/ref/epsg/)
- Original data stays with the original CRS!
- Prefered CRS of the oedb are:

WGS84 - EPSG: 4326 (http://spatialreference.org/ref/epsg/4326/)
ETRS89 / ETRS-LAEA - EPSG: 3035 (http://spatialreference.org/ref/epsg/3035/)

Spacial index GIST

Data Referencing

Original Data (orig)

Tables are annotated by a comment in form of a json string:

{"Name": "The Full Name",
"Source": ["Name", "www.website.com / registation required"],
"Reference date": ["2013"],
"Date of collection": ["01.08.2013"],
"Original file": ["346-22-5.xls"],
"Spatial resolution": ["Germany"],
"Description": ["Example Data (annual totals)", "Regional level: national"],

"Table fields": [
{"name":"id",
"description"Unique identifier"",
"description_german":"",
"unit":"" },

{"name":"year",
"description"Reference Year"",
"description_german":"",
"unit":"" },

{"name":"example_value",
"description"Some important value"",
"description_german":"",
"unit":"EUR" }],

"Changes":[
 { "name":"Joe Nobody",
 "mail":"joe.nobody@gmail.com (fake)",
 "date":"16.06.2014",
 "comment":"Created table" },

 { "name":"Joana Anybody",
 "mail":"joana.anybody@gmail.com (fake)",
 "date":"17.07.2014",
 "comment":"Translated field names"}],

"ToDo": ["Some datasets are odd -> Check numbers against another data"],
"Licence": ["Licence – Version 2.0 (dl-de/by-2-0; [http://www.govdata.de/dl-de/by-2-0])"],
"Instructions for proper use": ["Always state licence"]}

Processed Data (calc)

{"Name": "Results",<br/>"Date of collection": ["01.08.2013"],<br/>"Spatial resolution": ["Germany"],<br/>"Description": ["Financial key figures of German municipalities (annual totals)", "Regional level: municipalities, association of municipalities"],<br/>

"Table fields": [<br/>{"name":"id",<br/>"description"Unique identifier"",<br/>"description_german":"",<br/>"unit":"" },

{"name":"year",<br/>"description"Reference Year"",<br/>"description_german":"",<br/>"unit":"" },

{"name":"example_value",<br/>"description"Some important value"",<br/>"description_german":"",<br/>"unit":"EUR" }],

"Changes":[<br/> { "name":"Autor1",<br/> "mail":"Autor1@e-mail.com",<br/> "date":"16.06.2014",<br/> "comment":"Created table" },

 { "name":"Autor2",<br/> "mail":"Autor2@e-mail.com",<br/> "date":"17.07.2014",<br/> "comment":"Translated field names"}],

"ToDo": ["Some datasets are odd -> Check numbers against another data"],<br/>"Licence": ["Datenlizenz Deutschland – Namensnennung – Version 2.0 (dl-de/by-2-0; [http://www.govdata.de/dl-de/by-2-0 http://www.govdata.de/dl-de/by-2-0])"],<br/>"Instructions for proper use": ["Always state licence"]}<br/>

Processed Data (calc) - Row Annotation

Each row has to be annotated by a json dictionary that must contain the following fields:

origin: Link or textual description of the data set this row origins from.
method: Method used to calculate this row from above origin (e.g. Link to a python script)
assumption: A list of dictionaries. Each dictionary describes an assumption and annotates the affected rows.
- begin: First column affected by the assumption
- end: Last column affected by the assumption
- type: Type of the problem that had to be solved. Each type requires one or more additional keys in this dictionary. Possible types and their required additional keys are:
  - gap: A not all fields could be calculated and/or filled,
    - solution: Method that was used to generate date to fill this gap (e.g. linear interpolation)

multiplicity: A field could be filled by several values
- values: Possible Values that could have been used
  - solution: Method that was used to select one value (e.g. Minimum)

An examplatory dictionary:

{
"origin":"https://data.openmod-initiative.org/data/oedb/orig_db/table
"method":"https://github.com/openego/data_processing/blob/master/calc_ego_substation/Voronoi_ehv.sql"
"assumptions":
 [
 {
 "type": "gap"
 "begin": "step_15"
 "end": "step_34"
 "solution": "linear_interpolation"
 } 
 ]
}

@@ Line 1: / Line 1: @@
+The following database rules explain the usage of the database called oedb, which will become a part of the openmod internet presence. For further details see also [[Database|here]].<br/>
 = Data Documentation<br/> =
-All data included in the databases has be documented!
+*All data included in the databases has be documented! On this wiki page you learn, how to do so.<br/>
+*All abreviations have to be documentated in the [[Category:Glossary|Glossary]]!<br/>
-All abreviations have to be documentated in the Glossary!
+= Naming of Data<br/> =
-= Naming of Data =
+The data in the database is organised in schemata and tables, which names are important to find around.
 == Database Name<br/> ==
-The name of the database is '''oedb'''
+The name of the database is '''oedb.'''<br/>
 <br/>
@@ Line 16: / Line 19: @@
 == Database Schema ==
-The structure of the database is realised via the naming of the schema!
+The structure of the database is realised via the naming of the schemata, which follows the following rules:<br/>
-*always lower case
+*always lower case<br/>
 *no points, no commas
 *no spaces
@@ Line 26: / Line 29: @@
 <br/>
-*name starts with type of schema
+*The name starts with the type of the schema:
 *#'''orig''' for original data
-*#'''calc''' for processed data
+*#'''calc''' for processed data<br/>
-*name includes distinct subject area or source
+<br/>
+*The name includes the distinct subject area or source.<br/>
+<br/>
 Example: ''orig_vg250''
@@ Line 54: / Line 61: @@
 = Data Integrity<br/> =
+Data Integrity is one aspect of ensuring [[Data_quality|data quality]] the in the oedb.
 == General ==
@@ Line 68: / Line 77: @@
 == Geografic Data<br/> ==
-*Attribut name is '''geom'''
+*Attribut name is '''geom.'''
-*Data type is '''geometry''' (or raster)
+*Data type is '''geometry''' (or raster).
 *One of the '''Geometric Types''' is defined ([https://www.postgresql.org/docs/current/static/datatype-geometric.html https://www.postgresql.org/docs/current/static/datatype-geometric.html])
 *The '''CRS''' is defined as EPSG ([http://spatialreference.org/ref/epsg/ http://spatialreference.org/ref/epsg/])
@@ Line 83: / Line 92: @@
 = Data Referencing =
 === Original Data (orig)<br/> ===
@@ Line 132: / Line 138: @@
 <br/>
+<br/>
 === Processed Data (calc)<br/> ===
@@ Line 149: / Line 156: @@
 "ToDo": ["Some datasets are odd -> Check numbers against another data"],<br/>"Licence": ["Datenlizenz Deutschland – Namensnennung – Version 2.0 (dl-de/by-2-0; [http://www.govdata.de/dl-de/by-2-0 http://www.govdata.de/dl-de/by-2-0])"],<br/>"Instructions for proper use": ["Always state licence"]}<br/>
-</pre>
+</pre><br/>
-<br/>
+<br/>
 ==== Processed Data (calc) - Row Annotation ====

About

Help

Search

Navigation

Search

Interaction

Toolbox

Namespaces

Variants

Views

Actions

DatabaseRules

Revision as of 07:29, 10 June 2016

Contents

Data Documentation

Naming of Data

Database Name

Database Schema

Database Table

Data Integrity

General

Geografic Data

Data Referencing

Original Data (orig)

Processed Data (calc)

Processed Data (calc) - Row Annotation

Log in / create account

About

Help

Search

Navigation

Search

Interaction

Toolbox

Namespaces

Variants

Views

Actions

DatabaseRules

Revision as of 07:29, 10 June 2016

Contents

Data Documentation

Naming of Data

Database Name

Database Schema

Database Table

Data Integrity

General

Geografic Data

Data Referencing

Original Data (orig)

Processed Data (calc)

Processed Data (calc) - Row Annotation