Investigating Execution-Aware Language Models for Code Optimization

Install dependencies

python3 -m venv execaware
source execaware/bin/activate

pip install -r requirements.txt

Configuration

Most of the scripts make use of the config/config.ini file to define all the paths required for the experiments. We provide the template to fill with the specific paths. Specifically, the following paths must be defined:

gem5_path: Path to the gem5 simulator [1].
all_input_output_path: Directory containing all test cases provided by CodeNet [2]. The data for these cases is also extracted by the traces collection pipeline discussed in the subsequent section.
eval_sandbox_path: Directory designated for storing the evaluation metrics.

Traces collection

We employed the pipeline presented in TRACED [3] to collect execution traces.

Datasets

After collecting the execution traces, proceed by executing the following commands.

The default tracing path are ./tracing/pretraining for pre-training traces and ./tracing/pie for with the fine-tuning traces. To change these paths, it is necessary to modify and thoroughly verify all relevant dataset scripts to ensure compatibility.

In addition, the original version of the PIE [4] dataset is required to build the dataset. The default path is assumed to be ./pie/.

# Build all the datasets required for the experiments
bash dataset.sh

Variable states quantization

Here, we outline the quantization strategy adopted for the variables states as indicated in the related paper.

Run Experiments - Model Training and Inference

# Execute all the experiments
bash run_experiment_line_executions.sh
bash run_experiment_line_coverage.sh
bash run_experiment_branch_coverage.sh
bash run_experiment_variable_states.sh

Evaluation

Specify the predictions path and the sandbox name for running the evaluation. Run the script for each prediction file obtained by the inference.

# create evaluation folder
mkdir -p ./eval

# Perform evaluation (example on strategy S1 and line executions LE)
bash eval_finetuning.sh "./s1_ft_LE" "./eval/LE_s1"

References

[1] gem5 Simulator

[2] CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

[3] TRACED

[4] pie4perf

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
src		src
.gitignore		.gitignore
README.md		README.md
dataset.sh		dataset.sh
eval_finetuning.sh		eval_finetuning.sh
finetune.sh		finetune.sh
infer.sh		infer.sh
pretrain.sh		pretrain.sh
pstates.png		pstates.png
requirements.txt		requirements.txt
run_experiment_branch_coverage.sh		run_experiment_branch_coverage.sh
run_experiment_line_coverage.sh		run_experiment_line_coverage.sh
run_experiment_line_executions.sh		run_experiment_line_executions.sh
run_experiment_program_states.sh		run_experiment_program_states.sh
split_df.sh		split_df.sh
strat_1_2_dataset_finetuning.sh		strat_1_2_dataset_finetuning.sh
strat_1_dataset_pretraining.sh		strat_1_dataset_pretraining.sh
strat_2_dataset_pretraining.sh		strat_2_dataset_pretraining.sh
strat_3_dataset_LC_BC_PS_finetuning.sh		strat_3_dataset_LC_BC_PS_finetuning.sh
strat_3_dataset_LE_finetuning.sh		strat_3_dataset_LE_finetuning.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Investigating Execution-Aware Language Models for Code Optimization

Install dependencies

Configuration

Traces collection

Datasets

Variable states quantization

Run Experiments - Model Training and Inference

Evaluation

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SpencerLabAQ/exec-aware-code-opt

Folders and files

Latest commit

History

Repository files navigation

Investigating Execution-Aware Language Models for Code Optimization

Install dependencies

Configuration

Traces collection

Datasets

Variable states quantization

Run Experiments - Model Training and Inference

Evaluation

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages