DOCUMENTATION

Deployment Instructions

THE DEPLOYMENT BRANCH IS - backend/main

clone the branch backend/main with the command below

git clone -b backend/main https://github.com/zuri-training/Pro_Team_41_Chunk-File.git

install the requirement for the project with the command below

pip install -r requirements.txt

Make migrations for the project folder- chunk_41

python manage.py makemigrations chunkit
python manage.py migrate chunkit

The achitecture is Monolith (Django Templating)

static and media files configured

Please when specifying the static path on the server, remember to specfify the media path too.

The project folder name is chunk_41. it is where setting.py, wsgi.py can be found.

The app folder name is chunkit

the static folder name is static

the media folder name is media

Background

A platform that accepts CSV or JSON large files and chunks them into smaller bits.

Our Vision

Helping clients manage their large files easily.

Our Client

Zuri

Our Target Market

Large CSV and JSON file users (Developers, companies)

Our User Personas

Developers, Data Analyst, Multinationals

Project Description

This project, ChunkIT, is a project initiative introduced by the Zuri team. ChunkIT is a platform that accepts CSV or JSON large files and chunks them into smaller bits. It also allows users to save or download their files in zipped format anytime. Chunking is simply the process of splitting large files into smaller files called chunks without losing their content or quality.

ChunkIT Objectives

Solve problems regarding issues of having large CSV and JSON files
Allow user to split large CSV and JSON files
Allow user to save files and have the possibility of downloading in the future

ChunkIT Solutions

Create means of chunking CSV and JSON files seamlessly
Easily import, save and download your CSV and JSON files anytime
Chunking as much as 250MB of CSV and JSON files without a fee
Splitting files into different sizes and numbers

ChunkIT Key features

Landing Page, FAQ/ContactUs Page, About Us page & Documentation page

Accessible to all users, authenticated and unauthenticated

Sign In and Sign Up page

To authenticate users to access the chunking platform

Dashboard

Where the user can chunk their files and optionally save them on the platform for future downloading

Dashboard - Library Page

User can upload a file and chunk it according to their preference. The user's saved files are also listed in the library page.

Dashboard - Account settings Page

User can view their account details and change their authentication details

User Flow

Unauthenticated User:

The user visits the Landing page and can view the platform's features.

The user can access the platform's documentation and other pages on the header section.

The user can create an account by navigating to the sign up page. When they successfully create an account, they are authorised to access the dashboard where they can chunk and save their files.

Authenticated User:

The user now has full access to all our services and can chunk any CSV or JSON file he wants

The user has a dashboard where he can upload, chunk, save or download files at any time.

The user also has a account settings page that they can access anytime they log in.

What to expect in future versions

Allow user to view statistics of splittings done previously
Allow user to sort split CSV and JSON files easily
Merge CSV and JSON
Allow user to chunk more file formats

Technologies Used

HTML, CSS

HTML & CSS add structure and style to the webpages.

Bootstrap

The Bootstrap framework was used to quickly design the front end of the platform.

Javascript (Vanilla)

Vanilla JS was used to create an rich interfaces, add speed to the client side and add functionality to the platform's authentication.

Python (Django)

Django was used for the rapid development of a secure and maintainable platform at the backend.

MySQL

MySQL was used to provide comprehensive support for the applications developed in Django and to store user data.

Challenges we faced

Cracking the chunking functionality

We planned to implement 3 methods of chunking:

Chunking by size (which was the primary method to implement)
- The user chooses the size of each chunk to be generated from the original file, then the original file is divided according to the specified size.
- e.g. A user uploads a 100 MB file and chooses a chunk size of 5 MB. 20 files of roughly 5 MB each would be generated from the 100 MB file.
Chunking by number of rows
- The user chooses the number of rows each file chunk should have.
- e.g. With a file of 20 rows, and the user chooses 10 rows per chunk, approximately 2 files will be generated.
Chunking by number of files/chunks
- The user selects the number of files they would like to be generated from the original file.
- e.g. From an 80 MB file, a user requires 5 files. Approximately 5 files of 16 MB will be generated from the original file.

Chunking by size is possible; however, we discovered that the end product is not usable at times. For example, when a user uploads a JSON file of 20 MB and they require chunks of 2MB, generating files of exactly 2MB each meant that the file may not meet the JSON file standards.

Solution (Cracking the chunking functionality)

We decided to attempt chunking by the number of files, which was easier to implement for the team.

We used the Pandas library to implement the algorithm to chunk the files. The following is a snippet of our code:

import pandas as pd

if url.split(".")[-1] == 'csv':
    df = pd.read_csv(url)
    rows_per_file = df.shape[0] // file_count
    folder_name = str(settings.BASE_DIR) + "\\temp\\" + str(int(time.time()*1000))
    os.makedirs(folder_name)
    for row_start in range(0, df.shape[0], rows_per_file):
        new_file  = df[row_start:row_start+rows_per_file]
        new_file.to_csv(f"{folder_name}/chunk_{row_start}.csv")

N.B. Although the user may select the number of chunks/files they require, they may find that the files generated are one chunk/file more than requested.

This was inevitable because as the algorithim is chunking the files, it prioritises the usability of the file. This means that if the required number of chunks/files are generated but they are not usable, the algorithm will try to increase the number of chunks to meet the CSV or JSON file standards, such that the file is usable.

Deploying to the server

Implementing the platform according to UI/UX design specifications

Our team consists of Product Designers, Frontend Developers and Backend Developers. The Product Designers carried out the necessary research to inform the UI designs, and shared the hi-fidelity screens and relevant resources with the Frontend developers for platform implementation. The Frontend developers then communicated with the backend developers about any chances made to the UI designs that would influence the backend features.

Although the Frontend developers managed to implement the UI design changes as accurately as possible, the user flow between each screen was not developed according to the Product Design teams expectations in some instances. The Frontend developers interpreted the flow different from the Designers, whilst the Backend developers also had their own interpretation.

Solution (Implementing the platform according to UI/UX design specifications)

Towards the end of the project, we started having some designers joining developer meetings to find out how developers interpret the UI designs. This uncovered the areas where there were misunderstandings, and we were able to rectify most of the inconsistencies that were on our platform. Communication between all subgroups of the team was important to ensure the end product is seamless.

Product Specialization

Mobile Phones and Laptops

Project Status

First Phase completed, Next Phase yet to start

Github Collaboration

Fork this repository and create a project folder on your local machine
Navigate to the terminal (pointing to your project folder/directory), Clone and then open it up on your prefered code editor

git clone https://github.com/<your github username>/Pro_Team_41_Chunk-File.git

Open terminal and set upsream branch:

git remote add upstream https://github.com/zuri-training/Pro_Team_41_Chunk-File.git

Pull upstream to get the latest update from the original repo (https://github.com/zuri-training/Pro_Team_41_Chunk-File.git)

git pull upstream main

Create a new branch for the task your are doing eg:

git checkout -b support-module

After making changes, do

git add .

Commit your changes with a descriptive commit message

git commit -m "commit message"

To make sure there are no conflicts:

git pull upstream main

Push changes to your new branch:

git push origin your-current-branch-name

Create a pull request to the main branch

Project Links

Our testable link

https://chunkt.pythonanywhere.com/

Our frontend implementation link

https://zuri-training.github.io/Pro_Team_41_Chunk-File/

Credits

Team members

Name	Github Username	Role
Peter Felix	@thejourneybeginsng	Product Designer (Team Lead)
Efosa Ero	@Efoxa	Full Stack Developer (Assistant Team Lead)
Nicole Moyo	@beverly-m	Product Designer (Assistant Team Lead)
Ohayi James Chukwuka	@Sanctogiacomo	Product Designer
Daniel Ukoha	@Superfly101	Frontend Developer
Louis Binah	@BINAH25	Full Stack Developer
John Ojibo	@jkull247	Frontend Developer
Emmanuel Osaite	@Vixxena	Product Designer
Chukwuebuka Joshua Ezeokechukwu	@Ebuka500	Product Designer
Marthar Nderitu	@MNderi	Frontend Developer
Metu Jane	@MetuJane	Product Designer
Esther Oyebode	@EstherOyebode	Product Designer
Azeez Olayinka Bankole	@Olabanky	Frontend Developer
Adedamola Alausa	@Theadedamola	Product Designer
Francis Udeh	@UgoKing	Product Designer
Augusta Okwor	@AugustaOkwor1	Product Designer
Oyindamola Aina	@Dammina001	Frontend Developer
Oyetoke Anu	@Oyetokeanu	Product Designer
Queen Iheanacho	@Preshtyrace	Product Designer
Chukwudebere Emmanuel Onyinyechi	@Daberetech	Product Designer
Adewole Abdulazeez	@TechFlow247	Product Designer
Omonigho Seth	@nigho-seth	Product Designer
Judah Ndukwu	@Cleverley1	Product Designer
Joseph Igbekoyi	@Jaay06	Frontend Developer

Resources

These are platforms that helped us build the project:

DOCUMENTATION

Understanding the process of chunking files

Chunking is a process of splitting large files into smaller files called chunks. In some applications, such as remote data compression, data synchronisation, and data duplication, chunking is important because it determines the duplicate detection performance of the system. Chunk File is a small and handy application designed to help you split large files into pieces of a set size, so you can easily transfer them without losing any files. The use of web applications with the function of splitting large CSV or JSON files into small files is for easy opening and archiving

What is ChunkIT?

ChunkIt is a platform that accepts CSV or JSON large files and breaks them into smaller bits. For this application to work, the file being uploaded must be in the right format and within the acceptable size range. Successfully chunked files can either be downloaded soon after the chunking process or saved on the platform for future downloads.

How does it work?

ChunkIt is a web-based platform that splits large or heavy CSV and JSON files. When a user uploads a large file of up to 250MB to our platform, our Python Panda modules begin authenticating the files to determine if they are CSV or JSON. If the files are none of these determined inputs or bigger than the size, our platform would not accept them, but if the data input is true, it accepts the files after determining the state, and the process of splitting commences through the help of the panda modules. The data is stacked in smaller files querying the number of parts or the size the user needs to split it. This is done through the use of logic. When the process of splitting is completed, the result is zipped using shutil modules, ready for the user to download.

Getting Started

For a user to get started with using the chunking feature of the platform, they need to create an account by registering with their email address to become an authenticated user. This means that an unauthenticated user can not use the chunking feature. However, the unauthenticated user can interact with the platform’s documentation by accessing the resources tab in the header section. They can also go through the platform’s landing page and FAQ section to learn more about its features.

Uploading and chunking a file on ChunkIT

Upon creating an account, the user is redirected to the user dashboard where they can start uploading files they want to process. They choose the option to upload a new file, and a screen appears that allows them to upload a file. The platform currently supports chunking JSON and CSV files; however, more file formats will be supported in future versions. The user then uploads their file by either dragging and dropping it on the screen, or browsing through their device’s file system. The platform can only accept files up to 250 MB in size and are also of the correct format, CSV or JSON. Once the file uploaded by the user satisfies the requirements, the user can choose the size of the chunk files they require. The size of the chunk should not be above the original file size. If it is, the file will not be chunked. If the size of the chunk is within the acceptable range, the user is directed to a screen where they can download a zipped file containing the chunked file. The user can also choose to download the files later, and their files are saved on the dashboard.

What to do after the file is chunked

When chunking is over user can save or download the file. The user can also come back later to continue the process without the risk of losing files. Downloaded files are automatically zipped for easy transfer. Past chunks can be accessed on the user dashboard when needed.

List of available features

Analyse files
Split JSON
Split CSV
Rename File
Split by number of chunks
Save chunked CSV files
Save chunked JSON files
Download chunked CSV files
Download chunked JSON files
Delete CSV chunks
Delete JSON chunks
View chunk history

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
Front-end		Front-end
Team_41_Assigned_Tasks		Team_41_Assigned_Tasks
asert		asert
chunk_41		chunk_41
chunk_app		chunk_app
chunkit		chunkit
img		img
media/processed-json-files		media/processed-json-files
static		static
templates		templates
.gitignore		.gitignore
Augusta_Okwor.md		Augusta_Okwor.md
README.md		README.md
index.html		index.html
manage.py		manage.py
requirements.txt		requirements.txt

zuri-training/Pro_Team_41_Chunk-File

Folders and files

Latest commit

History

Repository files navigation

Background

Table of Contents

Project Description

ChunkIT Objectives

ChunkIT Solutions

ChunkIT Key features

Landing Page, FAQ/ContactUs Page, About Us page & Documentation page

Sign In and Sign Up page

Dashboard

Dashboard - Library Page

Dashboard - Account settings Page

User Flow

Unauthenticated User:

Authenticated User:

What to expect in future versions

Technologies Used

HTML, CSS

Bootstrap

Javascript (Vanilla)

Python (Django)

MySQL

Challenges we faced

Cracking the chunking functionality

Solution (Cracking the chunking functionality)

N.B. Although the user may select the number of chunks/files they require, they may find that the files generated are one chunk/file more than requested.

Deploying to the server

Implementing the platform according to UI/UX design specifications

Solution (Implementing the platform according to UI/UX design specifications)

Product Specialization

Project Status

Github Collaboration

Project Links

Our testable link

Our frontend implementation link

Credits

Team members

Resources

DOCUMENTATION

Understanding the process of chunking files

What is ChunkIT?

How does it work?

Getting Started

Uploading and chunking a file on ChunkIT

What to do after the file is chunked

List of available features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 24

Languages

Packages