Skip to content

Commit 2199885

Browse files
EdAbatimartinhohoffandyjessendavidggphyJ535D165
authored
Update with upstream/master (#4)
* Fix usage examples (J535D165#190) * Fix broken links (J535D165#186) This commit fixes broken links in readme. * Add threshold None and label docstrings for String (J535D165#189) * Add support for pandas==2 (J535D165#192) * Replace setup.py by pyproject.toml (J535D165#195) * Lint with Ruff and format with Black (J535D165#196) * Lint with Ruff and format with Black * Fix more lint issues * Fix datasets submodule * Fix all lint errors * Fix importerror * Replace flake8 in github action by ruff * Fix linter * Fix abstractmethod errors * Fix test with incorrect error * Update ci-workflow.yml * Update CI docs generation and CI pipeline (J535D165#197) * Bump minimal versions of dependencies * Update the docs CI pipeline (J535D165#198) * Add requirements to .readthedocs.yaml * Bump minimal Python version in documentation * Add pre-commit hooks (J535D165#199) * Update CI pipeline for publishing package * disable docs and publish GH actions * only trigger on PR * fixed linting * updated to latest ruff * Update GitHub Actions workflows --------- Co-authored-by: Martinho Hoffman <[email protected]> Co-authored-by: andyjessen <[email protected]> Co-authored-by: David GG <[email protected]> Co-authored-by: Jonathan de Bruin <[email protected]>
1 parent d3cdb24 commit 2199885

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+425
-3293
lines changed

.gitattributes

-1
This file was deleted.

.github/workflows/ci-workflow.yml

+21-14
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: tests
22

3-
on: [push, pull_request]
4-
3+
# on: [push, pull_request]
4+
on: [pull_request]
55
jobs:
66
build:
77

@@ -10,33 +10,40 @@ jobs:
1010
fail-fast: false
1111
matrix:
1212
python-version: ["3.8", "3.9", "3.10", "3.11"]
13-
13+
pandas-version: ["1.0", "2.0"]
1414
steps:
15-
- uses: actions/checkout@v3
15+
- uses: actions/checkout@v4
1616
- name: Set up Python ${{ matrix.python-version }}
17-
uses: actions/setup-python@v4
17+
uses: actions/setup-python@v5
1818
with:
1919
python-version: ${{ matrix.python-version }}
20+
- name: Install pandas
21+
run: |
22+
pip install pandas~=${{ matrix.pandas-version }}
2023
- name: Package recordlinkage
2124
run: |
2225
pip install --upgrade pip
23-
pip install wheel
24-
python setup.py bdist_wheel sdist
26+
pip install build
27+
python -m build
2528
- name: Install recordlinkage
2629
run: |
2730
pip install networkx>=2
2831
pip install ./dist/recordlinkage-*.whl
29-
# - name: Lint with flake8
30-
# run: |
31-
# pip install flake8
32-
# # stop the build if there are Python syntax errors or undefined names
33-
# flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
34-
# # exit-zero treats all errors as warnings
35-
# flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
3632
- name: Test with pytest
3733
run: |
3834
pip install pytest
3935
# remove recordlinkage to prevent relative imports (use installed package)
4036
# this is like wrapping stuff in a src folder
4137
rm -r recordlinkage/
4238
pytest
39+
lint:
40+
runs-on: ubuntu-latest
41+
steps:
42+
- uses: actions/checkout@v4
43+
- uses: actions/setup-python@v5
44+
- name: Install ruff
45+
run: |
46+
pip install ruff
47+
- name: Lint with ruff
48+
run: |
49+
ruff .

.github/workflows/python-package.yml

+11-44
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
# name: deploy-and-release
1+
# name: Upload Python Package
2+
23
# on:
3-
# push:
4-
# tags:
5-
# - 'v*' # Push events to matching v*, i.e. v1.0, v20.15.10
4+
# release:
5+
# types: [published]
6+
7+
# permissions:
8+
# contents: read
69

710
# jobs:
811
# deploy:
@@ -13,50 +16,14 @@
1316
# uses: actions/setup-python@v4
1417
# with:
1518
# python-version: '3.x'
16-
# - name: Get the version (git tag)
17-
# id: get_version
18-
# run: |
19-
# echo ${GITHUB_REF/refs\/tags\/v/}
20-
# echo ::set-output name=VERSION::${GITHUB_REF/refs\/tags\/v/}
2119
# - name: Install dependencies
2220
# run: |
2321
# python -m pip install --upgrade pip
24-
# pip install setuptools wheel
25-
# - name: Build
26-
# run: |
27-
# python setup.py sdist bdist_wheel
28-
# - name: Create Release
29-
# id: create_release
30-
# uses: actions/[email protected]
31-
# env:
32-
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
33-
# with:
34-
# tag_name: ${{ github.ref }}
35-
# release_name: Release ${{ github.ref }}
36-
# draft: false
37-
# prerelease: false
38-
# - name: Upload Release Asset (Wheel)
39-
# id: upload-release-asset-whl
40-
# uses: actions/[email protected]
41-
# env:
42-
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
43-
# with:
44-
# upload_url: ${{ steps.create_release.outputs.upload_url }}
45-
# asset_path: ./dist/recordlinkage-${{ steps.get_version.outputs.VERSION }}-py3-none-any.whl
46-
# asset_name: recordlinkage-${{ steps.get_version.outputs.VERSION }}-py3-none-any.whl
47-
# asset_content_type: application/x-wheel+zip
48-
# - name: Upload Release Asset (Sdist)
49-
# id: upload-release-asset-sdist
50-
# uses: actions/[email protected]
51-
# env:
52-
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
53-
# with:
54-
# upload_url: ${{ steps.create_release.outputs.upload_url }}
55-
# asset_path: ./dist/recordlinkage-${{ steps.get_version.outputs.VERSION }}.tar.gz
56-
# asset_name: recordlinkage-${{ steps.get_version.outputs.VERSION }}.tar.gz
57-
# asset_content_type: application/zip
22+
# pip install build
23+
# - name: Build package
24+
# run: python -m build
5825
# - name: Publish package
59-
# uses: pypa/gh-action-pypi-publish@master
26+
# uses: pypa/gh-action-pypi-publish@release/v1
6027
# with:
6128
# user: __token__
6229
# password: ${{ secrets.pypi_password }}

.github/workflows/render-docs.yml

+9-16
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,19 @@
1-
# name: Build HTML on macOS
1+
# name: Build HTML with Sphinx
22
# on: [push, pull_request]
33
# jobs:
4-
# html-macos:
5-
# runs-on: macos-latest
4+
# html-sphinx:
5+
# runs-on: ubuntu-latest
66
# steps:
77
# - name: Clone repo
8-
# uses: actions/checkout@v3
9-
# with:
10-
# fetch-depth: 0
11-
# - name: Install pandoc
12-
# run: |
13-
# brew install pandoc
8+
# uses: actions/checkout@v2
149
# - name: Set up Python
15-
# uses: actions/setup-python@v4
10+
# uses: actions/setup-python@v2
1611
# with:
17-
# python-version: '3.8'
18-
# - name: Install recordlinkage
19-
# run: |
20-
# python -m pip install .[all]
21-
# - name: Install docs dependencies
12+
# python-version: '3.10'
13+
# - name: Install recordlinkage and docs tools
2214
# run: |
23-
# python -m pip install -r docs/requirements.txt
15+
# sudo apt install pandoc
16+
# python -m pip install .[docs]
2417
# - name: Build HTML
2518
# run: |
2619
# python -m sphinx -W --keep-going --color docs/ _build/html/

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11

22
recordlinkage/datasets/krebsregister/*
33

4+
recordlinkage/_version.py
5+
46

57
.DS_Store
68
*/.DS_Store

.pre-commit-config.yaml

+4-5
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,13 @@ repos:
1313
rev: 24.1.1
1414
hooks:
1515
- id: black
16-
exclude: versioneer.py
1716
- repo: https://github.com/asottile/pyupgrade
1817
rev: v3.15.0
1918
hooks:
2019
- id: pyupgrade
2120
args: [--py38-plus]
22-
exclude: versioneer.py
23-
- repo: https://github.com/PyCQA/flake8
24-
rev: 7.0.0
21+
- repo: https://github.com/charliermarsh/ruff-pre-commit
22+
rev: v0.2.1
2523
hooks:
26-
- id: flake8
24+
- id: ruff
25+
args: [--fix]

.readthedocs.yaml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
version: 2
2+
3+
build:
4+
os: ubuntu-22.04
5+
tools:
6+
python: "3.11"
7+
8+
sphinx:
9+
configuration: docs/conf.py
10+
11+
python:
12+
install:
13+
- method: pip
14+
path: .
15+
extra_requirements:
16+
- docs

MANIFEST.in

-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
1-
include versioneer.py
2-
include recordlinkage/_version.py
3-
41
recursive-include recordlinkage/datasets/febrl *.csv
52
recursive-include recordlinkage/datasets/krebsregister *.csv
63

7-
84
global-exclude test_*.py
95
global-exclude *_test.py

README.md

+3-6
Original file line numberDiff line numberDiff line change
@@ -120,24 +120,21 @@ The main features of this Python record linkage toolkit are:
120120
The most recent documentation and API reference can be found at
121121
[recordlinkage.readthedocs.org](http://recordlinkage.readthedocs.org/en/latest/).
122122
The documentation provides some basic usage examples like
123-
[deduplication](http://recordlinkage.readthedocs.io/en/latest/notebooks/data_deduplication.html)
123+
[deduplication](http://recordlinkage.readthedocs.io/en/latest/guides/data_deduplication.html)
124124
and
125-
[linking](http://recordlinkage.readthedocs.io/en/latest/notebooks/link_two_dataframes.html)
125+
[linking](http://recordlinkage.readthedocs.io/en/latest/guides/link_two_dataframes.html)
126126
census data. More examples are coming soon. If you do have interesting
127127
examples to share, let us know.
128128

129129
## Installation
130130

131-
The Python Record linkage Toolkit requires Python 3.6 or higher. Install the
131+
The Python Record linkage Toolkit requires Python 3.8 or higher. Install the
132132
package easily with pip
133133

134134
``` sh
135135
pip install recordlinkage
136136
```
137137

138-
Python 2.7 users can use version \<= 0.13, but it is advised to use
139-
Python \>= 3.5.
140-
141138
The toolkit depends on popular packages like
142139
[Pandas](https://github.com/pydata/pandas),
143140
[Numpy](http://www.numpy.org), [Scipy](https://www.scipy.org/) and,

benchmarks/bench_comparing.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import recordlinkage as rl
2-
from recordlinkage.datasets import load_febrl1, load_febrl4
2+
from recordlinkage.datasets import load_febrl1
3+
from recordlinkage.datasets import load_febrl4
34

45

56
class CompareRecordLinkage:

benchmarks/bench_indexing.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import recordlinkage as rl
2-
from recordlinkage.datasets import load_febrl1, load_febrl4
2+
from recordlinkage.datasets import load_febrl1
3+
from recordlinkage.datasets import load_febrl4
34

45

56
class PairsRecordLinkage:

0 commit comments

Comments
 (0)