|
|
Line 1: |
Line 1: |
| | | |
− | The following database rules explain the usage of the database called oedb, which will become a part of the openmod internet presence. For further details see also [[Database|here]].<br/> | + | The following database rules explain the usage of the database called oedb, which will become a part of the openmod internet presence. For further details see also [[Database|here]].<br/><br/><br/> |
| | | |
| = Data Documentation<br/> = | | = Data Documentation<br/> = |
| | | |
| *All data included in the databases has be documented! On this wiki page you learn, how to do so.<br/> | | *All data included in the databases has be documented! On this wiki page you learn, how to do so.<br/> |
− | *All abreviations have to be documentated in the [[Category:Glossary|Glossary]]!<br/> | + | *All abreviations have to be documentated in the [[Category:Glossary|Glossary]]!<br/><br/> |
| | | |
| = Naming of Data<br/> = | | = Naming of Data<br/> = |
| | | |
− | The data in the database is organised in schemata and tables, which names are important to find around. | + | The data in the database is organised in schemata and tables, which names are important to find around.<br/><br/> |
| | | |
| == Database Name<br/> == | | == Database Name<br/> == |
Line 15: |
Line 15: |
| The name of the database is '''oedb.'''<br/> | | The name of the database is '''oedb.'''<br/> |
| | | |
− | <br/> | + | <br/><br/> |
| | | |
| == Database Schema == | | == Database Schema == |
Line 30: |
Line 30: |
| | | |
| *The name starts with the type of the schema: | | *The name starts with the type of the schema: |
− | *#'''orig''' for original data | + | *#'''orig''' for original data: Data that is directly available in other sources and underwent no editing other than transfering it to a data base friendly format |
− | *#'''calc''' for processed data<br/> | + | *#'''calc''' for processed data: Data that origin from original data or processed data via some tool, energy model or other modifications. The specific data and methods used for this computations must be mentioned.<br/> |
| | | |
| <br/> | | <br/> |
Line 43: |
Line 43: |
| <br/> | | <br/> |
| | | |
− | == <br/>Database Table == | + | == Database Table == |
| | | |
| *always lower case<br/> | | *always lower case<br/> |
Line 52: |
Line 52: |
| | | |
| <br/> | | <br/> |
| + | |
| + | The table name consists of |
| | | |
| #name starts with the source (e.g. zensus) | | #name starts with the source (e.g. zensus) |
Line 58: |
Line 60: |
| #with resolution [tupel] (e.g. per_mun) | | #with resolution [tupel] (e.g. per_mun) |
| | | |
− | Example: ''zensus_population_by_gender_per_mun''<br/> | + | Example: ''zensus_population_by_gender_per_mun<br/>''<br/> |
| | | |
| = Data Integrity<br/> = | | = Data Integrity<br/> = |
| | | |
− | Data Integrity is one aspect of ensuring [[Data_quality|data quality]] the in the oedb. | + | Data Integrity is one aspect of ensuring [[Data quality|data quality]] the in the oedb.<br/> |
| | | |
| == General == | | == General == |
| | | |
− | *Primary Key [PK]<br/> | + | *Each table should have a single auto-incremented integer column as Primary Key with the corresponding table constraint.<br/> |
− | *Grants to oeuser | + | *Grant all privilegs to oeuser<br/><br/> |
− | | + | |
− | <br/> | + | |
− | | + | |
− | <br/>
| + | |
− | | + | |
− | <br/> | + | |
| | | |
| == Geografic Data<br/> == | | == Geografic Data<br/> == |
Line 91: |
Line 87: |
| <br/> | | <br/> |
| | | |
− | = Data Referencing = | + | = Data Referencing = |
| + | |
| + | There are several usefull attributes on a table users have to know in order to work with the contained data efficiently. These informations are separated in two major categories: |
| + | |
| + | #Comments on tables: Contains metadata that define several aspects for the whole table |
| + | #Comments on rows: Store important informations on a specific row and the stored data |
| | | |
| === Original Data (orig)<br/> === | | === Original Data (orig)<br/> === |
Line 136: |
Line 137: |
| "Instructions for proper use": ["Always state licence"]}</pre><br/> | | "Instructions for proper use": ["Always state licence"]}</pre><br/> |
| | | |
− | <br/>
| |
| | | |
− | <br/>
| |
| | | |
| === Processed Data (calc)<br/> === | | === Processed Data (calc)<br/> === |
Revision as of 08:19, 10 June 2016
The following database rules explain the usage of the database called oedb, which will become a part of the openmod internet presence. For further details see also here.
Data Documentation
- All data included in the databases has be documented! On this wiki page you learn, how to do so.
- All abreviations have to be documentated in the [[Category:Glossary|Glossary]]!
Naming of Data
The data in the database is organised in schemata and tables, which names are important to find around.
Database Name
The name of the database is oedb.
Database Schema
The structure of the database is realised via the naming of the schemata, which follows the following rules:
- always lower case
- no points, no commas
- no spaces
- no dates
- use underscores
- The name starts with the type of the schema:
- orig for original data: Data that is directly available in other sources and underwent no editing other than transfering it to a data base friendly format
- calc for processed data: Data that origin from original data or processed data via some tool, energy model or other modifications. The specific data and methods used for this computations must be mentioned.
- The name includes the distinct subject area or source.
Example: orig_vg250
Database Table
- always lower case
- no points, no commas
- no spaces
- no dates
- use underscores
The table name consists of
- name starts with the source (e.g. zensus)
- main value (e.g. population)
- if separated by [attribute] (e.g. by_gender)
- with resolution [tupel] (e.g. per_mun)
Example: zensus_population_by_gender_per_mun
Data Integrity
Data Integrity is one aspect of ensuring data quality the in the oedb.
General
- Each table should have a single auto-incremented integer column as Primary Key with the corresponding table constraint.
- Grant all privilegs to oeuser
Geografic Data
- WGS84 - EPSG: 4326 (http://spatialreference.org/ref/epsg/4326/)
- ETRS89 / ETRS-LAEA - EPSG: 3035 (http://spatialreference.org/ref/epsg/3035/)
Data Referencing
There are several usefull attributes on a table users have to know in order to work with the contained data efficiently. These informations are separated in two major categories:
- Comments on tables: Contains metadata that define several aspects for the whole table
- Comments on rows: Store important informations on a specific row and the stored data
Original Data (orig)
Tables are annotated by a comment in form of a json string:
{"Name": "The Full Name",
"Source": ["Name", "www.website.com / registation required"],
"Reference date": ["2013"],
"Date of collection": ["01.08.2013"],
"Original file": ["346-22-5.xls"],
"Spatial resolution": ["Germany"],
"Description": ["Example Data (annual totals)", "Regional level: national"],
"Table fields": [
{"name":"id",
"description"Unique identifier"",
"description_german":"",
"unit":"" },
{"name":"year",
"description"Reference Year"",
"description_german":"",
"unit":"" },
{"name":"example_value",
"description"Some important value"",
"description_german":"",
"unit":"EUR" }],
"Changes":[
{ "name":"Joe Nobody",
"mail":"joe.nobody@gmail.com (fake)",
"date":"16.06.2014",
"comment":"Created table" },
{ "name":"Joana Anybody",
"mail":"joana.anybody@gmail.com (fake)",
"date":"17.07.2014",
"comment":"Translated field names"}],
"ToDo": ["Some datasets are odd -> Check numbers against another data"],
"Licence": ["Licence – Version 2.0 (dl-de/by-2-0; [http://www.govdata.de/dl-de/by-2-0])"],
"Instructions for proper use": ["Always state licence"]}
Processed Data (calc)
{"Name": "Results",<br/>"Date of collection": ["01.08.2013"],<br/>"Spatial resolution": ["Germany"],<br/>"Description": ["Financial key figures of German municipalities (annual totals)", "Regional level: municipalities, association of municipalities"],<br/>
"Table fields": [<br/>{"name":"id",<br/>"description"Unique identifier"",<br/>"description_german":"",<br/>"unit":"" },
{"name":"year",<br/>"description"Reference Year"",<br/>"description_german":"",<br/>"unit":"" },
{"name":"example_value",<br/>"description"Some important value"",<br/>"description_german":"",<br/>"unit":"EUR" }],
"Changes":[<br/> { "name":"Autor1",<br/> "mail":"Autor1@e-mail.com",<br/> "date":"16.06.2014",<br/> "comment":"Created table" },
{ "name":"Autor2",<br/> "mail":"Autor2@e-mail.com",<br/> "date":"17.07.2014",<br/> "comment":"Translated field names"}],
"ToDo": ["Some datasets are odd -> Check numbers against another data"],<br/>"Licence": ["Datenlizenz Deutschland – Namensnennung – Version 2.0 (dl-de/by-2-0; [http://www.govdata.de/dl-de/by-2-0 http://www.govdata.de/dl-de/by-2-0])"],<br/>"Instructions for proper use": ["Always state licence"]}<br/>
Processed Data (calc) - Row Annotation
Each row has to be annotated by a json dictionary that must contain the following fields:
- origin: Link or textual description of the data set this row origins from.
- method: Method used to calculate this row from above origin (e.g. Link to a python script)
- assumption: A list of dictionaries. Each dictionary describes an assumption and annotates the affected rows.
- begin: First column affected by the assumption
- end: Last column affected by the assumption
- type: Type of the problem that had to be solved. Each type requires one or more additional keys in this dictionary. Possible types and their required additional keys are:
- gap: A not all fields could be calculated and/or filled,
- solution: Method that was used to generate date to fill this gap (e.g. linear interpolation)
- multiplicity: A field could be filled by several values
- values: Possible Values that could have been used
- solution: Method that was used to select one value (e.g. Minimum)
An examplatory dictionary:
{
"origin":"https://data.openmod-initiative.org/data/oedb/orig_db/table
"method":"https://github.com/openego/data_processing/blob/master/calc_ego_substation/Voronoi_ehv.sql"
"assumptions":
[
{
"type": "gap"
"begin": "step_15"
"end": "step_34"
"solution": "linear_interpolation"
}
]
}