Closed
Description
What is the bug?
Datetime columns are not always treated correctly/consistently:
- For GeoPackage, when a file is "translated" to a geopackage file:
- naive datetimes (no timezone information) are interpreted/written as being UTC to the destination file, which changes the the datetime information significantly. This only happens when the data is internally treated via arrow (more info). With the classic internal treatment (e.g. when explodecollection,... is specified) this does not occur.
- If the source geopackage has a column with localized timezone (e.g. +4), the datetime in that column is converted to UTC. This is not ideal as the localization information is lost, but the time stays ~correct, so this problem isn't as significant.
- For FlatGeoBuffer the timezone information is always ignored when this is the destination file format.
This issue is already being discussed in the context of the integration in pyogrio via the C API here. Just posting it here as well for completeness sake and to document that the same behaviour occurs in ogr2ogr as well when arrow is used internally.
Steps to reproduce the issue
UPDATE: added FlatgeoBuffer to the script + with versus without using arrow
import warnings
from pathlib import Path
import geopandas as gpd
import pandas as pd
from osgeo import gdal
from shapely import Point
warnings.filterwarnings("ignore")
gdal.UseExceptions()
input_gdf = gpd.GeoDataFrame(
data={
"datetime_naive": pd.to_datetime(["2021-01-01 00:00:00", "2021-01-01 00:00:00", "2021-01-01 00:00:00"]),
"datetime_utc": pd.to_datetime(["2021-01-01 00:00:00+00:00", "2021-01-01 00:00:00+00:00", "2021-01-01 00:00:00+00:00"]),
"datetime_tz_local": pd.to_datetime(["2021-01-01 00:00:00+04:00", "2021-01-01 00:00:00+04:00", "2021-01-01 00:00:00+04:00"]),
},
geometry=[Point(0, 0), Point(0, 0), Point(0, 0)],
crs=31370,
)
for suffix in [".gpkg", ".fgb"]:
for arrow in ["YES", "NO"]:
gdal.SetConfigOption("OGR2OGR_USE_ARROW_API", arrow)
src = Path(f"C:/temp/src_arrow-{arrow}{suffix}")
src.unlink(missing_ok=True)
input_gdf.to_file(src)
dst = Path(f"C:/temp/dst_arrow-{arrow}{suffix}")
dst.unlink(missing_ok=True)
ds_output = gdal.VectorTranslate(srcDS=src, destNameOrDestDS=dst)
ds_output = None
src_gdf = gpd.read_file(src)
dst_gdf = gpd.read_file(dst)
print(f"=== result for {suffix}, {arrow=} ===")
print(src_gdf.drop(columns=["geometry"]))
print(dst_gdf.drop(columns=["geometry"]))
Output:
=== result for .gpkg, arrow='YES' ===
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
1 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
2 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+00:00 2020-12-31 20:00:00+00:00
1 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+00:00 2020-12-31 20:00:00+00:00
2 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+00:00 2020-12-31 20:00:00+00:00
=== result for .gpkg, arrow='NO' ===
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
1 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
2 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
1 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
2 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
=== result for .fgb, arrow='YES' ===
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
1 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
2 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 2021-01-01
1 2021-01-01 2021-01-01 2021-01-01
2 2021-01-01 2021-01-01 2021-01-01
=== result for .fgb, arrow='NO' ===
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
1 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
2 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
datetime_naive datetime_utc datetime_tz_local
0 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
1 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
2 2021-01-01 2021-01-01 00:00:00+00:00 2021-01-01 00:00:00+04:00
Versions and provenance
- OS: Windows 11
- gdal version: 3.9.2, installed from conda-forge
Additional context
No response
Metadata
Metadata
Assignees
Labels
No labels