Skip to content

Commit 012327b

Browse files
authored
Merge pull request #2297 from myhloli/dev
feat: add support for JPEG images and update documentation
2 parents a47b17c + fcb5660 commit 012327b

File tree

5 files changed

+6
-6
lines changed

5 files changed

+6
-6
lines changed

.github/ISSUE_TEMPLATE/bug_report.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ body:
1717
description: >
1818
Please search the MinerU [Readme](https://github.com/opendatalab/MinerU), [Issues](https://github.com/opendatalab/MinerU/issues) and [Discussions](https://github.com/opendatalab/MinerU/discussions) to see if a similar bug report already exists.
1919
options:
20-
- label: I have searched the MinerU [Docs](https://github.com/opendatalab/MinerU) and found no similar bug report.
20+
- label: I have searched the MinerU [Readme](https://github.com/opendatalab/MinerU) and found no similar bug report.
2121
required: true
2222
- label: I have searched the MinerU [Issues](https://github.com/opendatalab/MinerU/issues) and found no similar bug report.
2323
required: true

magic_pdf/data/read_api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ def read_local_office(path: str) -> list[PymuDocDataset]:
116116
shutil.rmtree(temp_dir)
117117
return ret
118118

119-
def read_local_images(path: str, suffixes: list[str]=['.png', '.jpg']) -> list[ImageDataset]:
119+
def read_local_images(path: str, suffixes: list[str]=['.png', '.jpg', '.jpeg']) -> list[ImageDataset]:
120120
"""Read images from path or directory.
121121
122122
Args:

projects/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@
44

55
- [llama_index_rag](./llama_index_rag/README.md): Build a lightweight RAG system based on llama_index
66
- [gradio_app](./gradio_app/README.md): Build a web app based on gradio
7-
- [web_demo](./web_demo/README.md): MinerU online [demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/) localized deployment version
7+
- ~~[web_demo](./web_demo/README.md): MinerU online [demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/) localized deployment version~~(Deprecated)
88
- [web_api](./web_api/README.md): Web API Based on FastAPI
99
- [multi_gpu](./multi_gpu/README.md): Multi-GPU parallel processing based on LitServe

projects/README_zh-CN.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,6 @@
44

55
- [llama_index_rag](./llama_index_rag/README_zh-CN.md): 基于 llama_index 构建轻量级 RAG 系统
66
- [gradio_app](./gradio_app/README_zh-CN.md): 基于 Gradio 的 Web 应用
7-
- [web_demo](./web_demo/README_zh-CN.md): MinerU在线[demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/)本地化部署版本
7+
- ~~[web_demo](./web_demo/README_zh-CN.md): MinerU在线[demo](https://opendatalab.com/OpenSourceTools/Extractor/PDF/)本地化部署版本~~(已过时)
88
- [web_api](./web_api/README.md): 基于 FastAPI 的 Web API
99
- [multi_gpu](./multi_gpu/README.md): 基于 LitServe 的多 GPU 并行处理

projects/web_api/app.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
pdf_extensions = [".pdf"]
3030
office_extensions = [".ppt", ".pptx", ".doc", ".docx"]
31-
image_extensions = [".png", ".jpg"]
31+
image_extensions = [".png", ".jpg", ".jpeg"]
3232

3333
class MemoryDataWriter(DataWriter):
3434
def __init__(self):
@@ -128,7 +128,7 @@ def process_file(
128128
Tuple[InferenceResult, PipeResult]: Returns inference result and pipeline result
129129
"""
130130

131-
ds = Union[PymuDocDataset, ImageDataset]
131+
ds: Union[PymuDocDataset, ImageDataset] = None
132132
if file_extension in pdf_extensions:
133133
ds = PymuDocDataset(file_bytes)
134134
elif file_extension in office_extensions:

0 commit comments

Comments
 (0)