#!/usr/bin/env python # coding: utf-8 --- sidebar_label: 'Video Processing' sidebar_position: 6 description: "Parallel Video Resizing via File Sharding" --- # # Video Processing # # # [![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau) # # Many data engineering workloads consist of embarrassingly parallel workloads where you want to run a simple execution on a large number of files. In this example tutorial, we will run a simple video filter on a large number of video files. # # ## TD;LR # Running video files with Bacalhau # ## Prerequisite # # To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation) # ## Submit the workload # # To submit a workload to Bacalhau, we will use the `bacalhau docker run` command. # In[ ]: get_ipython().run_cell_magic('bash', '--out job_id', 'bacalhau docker run \\\n --wait \\\n --wait-timeout-secs 100 \\\n --id-only \\\n -i ipfs://Qmd9CBYpdgCLuCKRtKRRggu24H72ZUrGax5A9EYvrbC72j:/inputs \\\n linuxserver/ffmpeg -- \\\n bash -c \'find /inputs -iname "*.mp4" -printf "%f\\n" | xargs -I{} ffmpeg -y -i /inputs/{} -vf "scale=-1:72,setsar=1:1" /outputs/scaled_{}\'\n') # The job has been submitted and Bacalhau has printed out the related job id. We store that in an environment variable so that we can reuse it later on. # In[ ]: get_ipython().run_line_magic('env', 'JOB_ID={job_id}') # The `bacalhau docker run` command allows one to pass input data volume with a `-i ipfs://CID:path` argument just like Docker, except the left-hand side of the argument is a [content identifier (CID)](https://github.com/multiformats/cid). This results in Bacalhau mounting a *data volume* inside the container. By default, Bacalhau mounts the input volume at the path `/inputs` inside the container. # # We created a 72px wide video thumbnails for all the videos in the `inputs` directory. The `outputs` directory will contain the thumbnails for each video. We will shard by 1 video per job, and use the `linuxserver/ffmpeg` container to resize the videos. # :::tip # [Bacalhau overwrites the default entrypoint](https://github.com/filecoin-project/bacalhau/blob/v0.2.3/cmd/bacalhau/docker_run.go#L64) so we must run the full command after the `--` argument. In this line you will list all of the mp4 files in the `/inputs` directory and execute `ffmpeg` against each instance. # ::: # # ## Checking the State of your Jobs # # - **Job status**: You can check the status of the job using `bacalhau list`. # In[ ]: get_ipython().run_cell_magic('bash', '', 'bacalhau list --id-filter=${JOB_ID} --no-style\n') # When it says `Published` or `Completed`, that means the job is done, and we can get the results. # - **Job information**: You can find out more information about your job by using `bacalhau describe`. # In[ ]: get_ipython().run_cell_magic('bash', '', 'bacalhau describe ${JOB_ID}\n') # - **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory. # In[ ]: get_ipython().run_cell_magic('bash', '', 'mkdir -p ./results # Temporary directory to store the results\nbacalhau get --output-dir ./results ${JOB_ID} # Download the results\n') # After the download has finished you should see the following contents in results directory. # ## Viewing your Job Output # # To view the file, run the following command: # In[ ]: get_ipython().run_cell_magic('bash', '', '# Copy the files to the local directory, to allow the documentation scripts to copy them to the right place\ncp results/outputs/* ./ && rm -rf results/outputs/*\n# Remove any spaces from the filenames\nfor f in *\\ *; do mv "$f" "${f// /_}"; done\n') # ### Display the videos # # To view the videos, we will use **glob** to return all file paths that match a specific pattern. # In[ ]: import glob from IPython.display import Video, display for file in glob.glob('*.mp4'): display(Video(filename=file)) # # # # # ## Need Support? # # For questions, feedback, please reach out in our [forum](https://github.com/filecoin-project/bacalhau/discussions)