Python: Text partitioning module #427

JTremb · 2023-04-12T13:22:34Z

Motivation and Context

Porting the text partitioning module to python

Description

This is adding the Text partitioning module in semantic_kernel/semantic_functions/semantic_text_partitioner.py
and the function_extention in semantic_kernel/semantic_functions/function_extension.py
Compared to the C# version the files were added directly into the semantic_functions directory instead of the semantic_functions/partitioning to not have too many nested directories.
Added the configuration for pytest in vscode settings.json

Unit tests :

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows SK Contribution Guidelines (https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
The code follows the .NET coding conventions (https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions) verified with dotnet format
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

…rosoft#144) ### Motivation and Context This PR simplifies `@sk_*` decorators while porting the core TextSkill ### Description This PR is a first step at adapting the python codebase to be more *pythonic*, it contains the following modifications : 1. Merged the decorators `@sk_function_context_parameter`, `@sk_function_input` , `@sk_function_name` with `@sk_function`. The decorators were replaced with new kwargs on `sk_function` : `name`, `input_description`, `input_default_value` 2. The `name` kwarg is optional, the name of the method will be used if none is provided. 3. Ported core skill - TextSkill 4. Added some pytest unit test for SK decorators and TextSkill 5. Changed how skills are imported in the kernel by using instance of the class, not relying on static method anymore for the discovery. e.g. ``` kernel.import_skill(TextSkill()) ```

### Motivation and Context This PR adds a Lint Github workflow so that code style rules can be enforced for PRs. 1. This forms a baseline workflow that can be used as a template for future workflows that will be added. 2. It helps enforce code style rules. Currently it only checks PyCodeStyle and PyFlakes. Others will be added in the future. 3. Contributes to automated testing 4. Does not fix any open issue. ### Description Added Ruff as a dev dependency. Added Github workflow which runs Ruff. --------- Co-authored-by: Aditya Gudimella <[email protected]> Co-authored-by: Devis Lucato <[email protected]>

### Motivation and Context Fixes formatting issue to make Lint github workflow passes. ### Description Contains only reformatting changes.

### Motivation and Context 1. Why is this change required? compatibility for python 3.9 2. What problem does it solve? if user use PromptTemplateEngine with skills inside, it does not work. 3. What scenario does it contribute to? PromptTemplateEngine will work 4. If it fixes an open issue, please link to the issue here. microsoft#182 ### Description detailed in microsoft#182 similar concept with microsoft#169 ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows SK Contribution Guidelines (https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) - [x] The code follows the .NET coding conventions (https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions) verified with `dotnet format` - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄

### Motivation and Context 1. Why is this change required? `pytest .` raises error 2. What problem does it solve? pass all test 3. What scenario does it contribute to? `infer_delegate_type` will not raise error 4. If it fixes an open issue, please link to the issue here. microsoft#168 ### Description static function has no attribute __wrapped__ and this raises error. use __func__ for this. I think `wrapped = getattr(value, "__wrapped__", getattr(value, "__func__", None))` is reasonable for this issue. ```@staticmethod def infer_delegate_type(function) -> DelegateTypes: # Get the function signature function_signature = signature(function) awaitable = iscoroutinefunction(function) for name, value in DelegateInference.__dict__.items(): if name.startswith("infer_") and hasattr( > value.__wrapped__, "_delegate_type" ): E AttributeError: 'staticmethod' object has no attribute '__wrapped__' ../semantic_kernel/orchestration/delegate_inference.py:240: AttributeError ======================================================================================= short test summary info ======================================================================================= FAILED test_text_skill.py::test_can_be_imported - AttributeError: 'staticmethod' object has no attribute '__wrapped__' FAILED test_text_skill.py::test_can_be_imported_with_name - AttributeError: 'staticmethod' object has no attribute '__wrapped__' ===================================================================================== 2 failed, 9 passed in 0.23s ===================================================================================== ```

### Motivation and Context Porting the FileIOSkill to python. ### Description This is the port of the FileIOSkill to python with unit tests. ``` kernel = sk.create_kernel() kernel.import_skill(FileIOSkill(), "file") context = kernel.create_new_context() context["path"] = "test_file_io_skill.txt" context["content"] = "Hello, world!" ``` Using the same function names than the C# version ``` {{file.readAsync $path}} {{file.writeAsync}} ``` modification to the depencies : - Adding the package : aiofiles - Adding the dev package : pytest-asyncio

…nb` (microsoft#166) ### Motivation and Context Please help reviewers and future users, providing the following information: 1. Why is this change required? `SemanticTextMemory.save_reference_async` do nothing for its storage. 2. What problem does it solve? `save_reference_async` will work appropriately 3. What scenario does it contribute to? embeddings 4. If it fixes an open issue, please link to the issue here. microsoft#165 ### Description I add missing codes with references of `SemanticTextMemory.save_information_async` and C# version of `SemanticTextMemory.save_reference_async`

### Motivation and Context This PR provides a path to using the ChatGPT API in the Python Preview of Semantic Kernel. In addition, this provides `Azure*` versions of many existing models (so Python users can leverage Azure OpenAI). I think that, in general, it may be worth considering how best to work with models that have different modalities: now we have text completions, embeddings, chat completions (and I'd imagine images/etc. may be nice to support someday too). Regardless, this PR provides a fast path to using the exciting new Chat APIs from OpenAI with SK! See the new `python/tests/chat_gpt_api.py` for a usage example.

### Motivation and Context Port of the core TimeSkill ### Description This PR adds the core TimeSkill with unit tests. ``` kernel = sk.create_kernel() kernel.import_skill(TimeSkill(), "time") ``` ``` sk_prompt = """ {{time.now}} """ ```

…rsion) (microsoft#200) ### Motivation and Context The C# Semantic Kernel has recently undergone an upgrade to its `PromptTemplateEngine`. This brings Python back in line with the semantics of Prompt Templates in C#. ### Description Here, unlike the original port, I've tried to make things more pythonic/idiomatic (instead of trying to directly mirror the C# codebase). I've also brought over the corresponding unit tests (and added some of my own as I was building/validating).

### Motivation and Context We recently merged an upgrade to the `PromptTemplateEngine`, let's make the rest of the tests consistent with the directory structure used in that PR. ### Description This PR does three (small) things: 1. Makes the `./tests` directory have consistent structure 2. Re-names some of the "tests" that were at the top-level of the `./tests` dir to `./tests/end-to-end` (which better describes their purpose: things like `basics.py` and `memory.py` are end-to-end examples of using SK and ways to verify nothing is horribly broken) 3. Applies `isort` (which we plan to add to our linting workflow here on GitHub soon)

### Motivation and Context This PR fixes microsoft#235 and adds a test for the functionality. This is ahead of the planning skill. ### Description 1. Fix to fix the typo in the `from_dict()` method in `prompt_template_config.py` 2. Add assignment back to skill config so results from JSON aren't discarded in `import_semantic_skill_from_directory.py` 3. Add tests

microsoft#203) Added logger warning and error for cosine similarity computation for zero vectors

### Motivation and Context Right now, there are lots of Config classes in SK. This doesn't feel very "pythonic" and can be a bit confusing at times. To reduce the amount of indirection, this PR removes several config classes from the Python port. NOTE: this PR is also critical preparation for a large change to re-sync with how C# SK handles multi-modality. I have changes ready for adding a `./connectors` dir and syncing with the Text/Chat/Image/Embedding support, but there's a few "prep" PRs I need to get through first. ### Description This PR makes the following changes: 1. We remove the `(Azure)OpenAIConfig` classes (simplifying the code related to creating/managing backends) 2. We remove the `./configuration` sub-directory and move the `KernelConfig` class to the module's root dir (this mirrors the C# version) 3. We re-tool the `KernelConfig` class to behave similarly to the current C# version (again, this is prep for later changes RE multi-modality) 4. We make corresponding updates across the code/tests now that we've removed some config classes In future PRs, it'd be great to also simplify: 1. `(Chat|Completion)RequestSettings` 2. `(ReadOnly)SkillsCollection` and related classes 3. Take a hard look at `ContextVariables` and `SKContext` (maybe `ContextVariables` is just a `dict` in SK python)

…crosoft#317) ### Motivation and Context The `python-preview` branch is under active development and many docs/tests/examples are getting a bit out of sync. There are also a few small cleanup chores/refactors to make _usage_ easier (e.g., to make examples cleaner). I've grouped these small fixes (plus a few bug fixes) into this "cleanup" PR. ### Description This PR does the following: 1. Updates the out-of-date example in `README.md` 2. Fixes all 5 of the notebooks we support right now (1,2,3,4, and 6; notebook 5 is planner, I've blanked out most of the code there so we don't confuse people, especially as the planner API is changing) 3. Fixes a bug w/ `stop_sequences` in `(Azure)OpenAITextCompletion` 4. Fixes a bug introduced in our upgrades to the embeddings cosine-sim check (we should add more thorough tests here) 5. Cleans up tests 6. Ensures end-to-end tests are all runnable 7. Fixes up the `kernel_extensions` to work more naturally (so that end users can use them directly on the `kernel` object and get good type-hinting)

…all__

### Motivation and Context In an effort to make the Python port more idiomatic, let's remove the `Verify` class and the `./diagnostics` sub-module. There are many more follow-on tasks to do here, but this is a good start (and already a large enough change). ### Description This PR does the following: 1. Removes the `Verify` class, and re-writes all instances of `Verify.*` 2. Adds a `validation.py` in the `./utils` sub-module to hand some of the more complex cases from `Verify` (checking that skills/functions/and function params have valid names) 3. Removes the rest of the `./diagnostics` sub-module (w/ a longer-term goal of removing all/most of our custom exception classes and, instead, using appropriate built-in error classes)

### Motivation and Context microsoft#309 ### Description missing default value in `stop_sequences`

…o delimiter

### Motivation and Context AAD tokens offer greater authentication security and is used by several products. ### Description Add support for Azure Active Directory auth for the `Azure*` backends.

) ### Motivation and Context READMEs and examples are out of date and showing incorrect code. There are also a few bugs in the SKFunction blocking simpler syntax. Extend SKFunction to allow synchronous calls and have simpler syntax when async is not required. ### Description * Update homepage README, moving all Python examples under python/README * Make SKFunction callable as per v0 * Fix bugs in SKFunction * Fix examples using realistic code * Allow to use functions synchronously

### Motivation and Context Porting from C# ### Description The `semantic_text_partitioner` class and `SKFunctionBase.aggregate_partitioned_results_async(...)` method still need to be implemented for the skill to be operational, but for the sake of modularity and PR granularity, I will leave these implementations outside the scope of this particular PR. --------- Co-authored-by: Kit (Hong Long Nguyen) <[email protected]>

### Motivation and Context The pip package was uploaded without a LICENSE file and without a license mentioned in `pyproject.toml`. I tried to reupload, but the filenames are the same. This updates the version number so we can upload a new version of the package to pip with a LICENSE. ### Description - Update package version to 0.2.1dev in pyproject.toml. - Add `license = "MIT"` to pyproject.toml - Ran `poetry build` and saw the built packages with the new version: ``` Building semantic-kernel (0.2.1.dev) - Building sdist - Built semantic_kernel-0.2.1.dev0.tar.gz - Building wheel - Built semantic_kernel-0.2.1.dev0-py3-none-any.whl ``` - Uploaded a test to https://test.pypi.org/project/semantic-kernel/#description. Note that this version is 0.2.0 because I did not include `license = "MIT"` the first time with 0.2.1 and couldn't reupload to testpypi with 0.2.1 again, so I had to go back down to 0.2.0 which I had not uploaded yet.

### Motivation and Context Unit tests were failing due to a missing setter. Added setter in SKContext --------- Co-authored-by: Jerome Tremblay <[email protected]>

dluc · 2023-04-14T00:39:47Z

@JTremb we've merged the python branch into main and GitHub doesn't allow to point the PR to main because of some rebase steps. Could you send the PR again?

### Motivation and Context Porting the text partitioning module to python (Reopening of PR #427 ) ### Description - This is adding the Text partitioning module in `semantic_kernel/semantic_functions/semantic_text_partitioner.py` and the function_extention in `semantic_kernel/semantic_functions/function_extension.py` - Compared to the C# version the files were added directly into the `semantic_functions` directory instead of the `semantic_functions/partitioning` to not have too many nested directories.

### Motivation and Context Porting the text partitioning module to python (Reopening of PR microsoft#427 ) ### Description - This is adding the Text partitioning module in `semantic_kernel/semantic_functions/semantic_text_partitioner.py` and the function_extention in `semantic_kernel/semantic_functions/function_extension.py` - Compared to the C# version the files were added directly into the `semantic_functions` directory instead of the `semantic_functions/partitioning` to not have too many nested directories.

### Motivation and Context  this PR prepares the frontend for microsoft#377 and removes the need for any AAD configuration environment variables. ### Description - removes `REACT_APP_AUTH_TYPE` and all variables starting with `REACT_APP_AAD_` - calls the `/authConfig` endpoint when the app first loads and if needed, renders the `MsalProvider` using the fetched config. - updates workflows and deployment scripts accordingly  ### Contribution Checklist  - [X] The code builds clean without any errors or warnings - [X] The PR follows the [Contribution Guidelines](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/chat-copilot/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [X] All unit tests pass, and I have added new tests where possible - [X] I didn't break anyone 😄

dluc and others added 30 commits March 17, 2023 01:50

Initial commit

8590ec5

Python: Github action fixes - formatting (microsoft#151)

b3ed63b

### Motivation and Context Fixes formatting issue to make Lint github workflow passes. ### Description Contains only reformatting changes.

python: Porting TimeSkill (microsoft#209)

dbbb082

### Motivation and Context Port of the core TimeSkill ### Description This PR adds the core TimeSkill with unit tests. ``` kernel = sk.create_kernel() kernel.import_skill(TimeSkill(), "time") ``` ``` sk_prompt = """ {{time.now}} """ ```

added logger warning and error for cosine similarity computation for … (

d05d29f

microsoft#203) Added logger warning and error for cosine similarity computation for zero vectors

adding text partitioner module

7ca62da

Adding __init__ in semantic function and partitioning functions in __…

8546047

…all__

fix: add missing default value (microsoft#310)

77ce80b

### Motivation and Context microsoft#309 ### Description missing default value in `stop_sequences`

porting function_extensiont that aggregates partitioned results

9f8f38c

adds the test for function_extension

2ac6d77

Adding the None in separator list to be able to split when there is n…

3c8a643

…o delimiter

test partitioner tests

6529e6f

formatting

40839c9

Merge branch 'python-preview' into feature/textPartitioning

f561d98

formatting

6d04193

enabling pytest in vscode

fa2f57b

Python: support Azure AD auth (microsoft#340)

e42cc76

### Motivation and Context AAD tokens offer greater authentication security and is used by several products. ### Description Add support for Azure Active Directory auth for the `Azure*` backends.

mkarle and others added 14 commits April 10, 2023 13:24

Fixing unit tests for Windows. Adding LICENSE to python folder

1d500dd

Style checks

b5fca2e

Create python-build-package.yml

65f3084

Update and rename python-build-package.yml to python-build-wheel.yml

852a94f

Update python-build-wheel.yml

16a2a04

Update python-build-wheel.yml

b250f4a

Update python-build-wheel.yml

2dc2dad

python: fix unit tests - sk_function skill_collection (microsoft#405)

54d44fd

### Motivation and Context Unit tests were failing due to a missing setter. Added setter in SKContext --------- Co-authored-by: Jerome Tremblay <[email protected]>

renaming variable to be more consistent

cce593f

Merge branch 'python-preview' into feature/textPartitioning

16f2011

formatting

073c834

adding aggregate_partionned_results_async to __all__

2f15740

alexchaomander added the python Pull requests for the Python Semantic Kernel label Apr 12, 2023

dluc changed the base branch from python-preview-archived-dont-delete to main April 14, 2023 00:37

dluc closed this Apr 14, 2023

JTremb mentioned this pull request Apr 14, 2023

Python: Text partitioning module #450

Merged

5 tasks

JTremb deleted the feature/textPartitioning branch June 29, 2023 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Text partitioning module #427

Python: Text partitioning module #427

JTremb commented Apr 12, 2023

dluc commented Apr 14, 2023

Python: Text partitioning module #427

Python: Text partitioning module #427

Conversation

JTremb commented Apr 12, 2023

Motivation and Context

Description

Contribution Checklist

dluc commented Apr 14, 2023