#!/usr/bin/env python
# coding: utf-8
---
sidebar_label: 'Video Processing'
sidebar_position: 6
description: "Parallel Video Resizing via File Sharding"
---
# # Video Processing
#
#
# [![stars - badge-generator](https://img.shields.io/github/stars/bacalhau-project/bacalhau?style=social)](https://github.com/bacalhau-project/bacalhau)
#
# Many data engineering workloads consist of embarrassingly parallel workloads where you want to run a simple execution on a large number of files. In this example tutorial, we will run a simple video filter on a large number of video files.
#
# ## TD;LR
# Running video files with Bacalhau
# ## Prerequisite
#
# To get started, you need to install the Bacalhau client, see more information [here](https://docs.bacalhau.org/getting-started/installation)
# ## Submit the workload
#
# To submit a workload to Bacalhau, we will use the `bacalhau docker run` command.
# In[ ]:
get_ipython().run_cell_magic('bash', '--out job_id', 'bacalhau docker run \\\n --wait \\\n --wait-timeout-secs 100 \\\n --id-only \\\n -i ipfs://Qmd9CBYpdgCLuCKRtKRRggu24H72ZUrGax5A9EYvrbC72j:/inputs \\\n linuxserver/ffmpeg -- \\\n bash -c \'find /inputs -iname "*.mp4" -printf "%f\\n" | xargs -I{} ffmpeg -y -i /inputs/{} -vf "scale=-1:72,setsar=1:1" /outputs/scaled_{}\'\n')
# The job has been submitted and Bacalhau has printed out the related job id. We store that in an environment variable so that we can reuse it later on.
# In[ ]:
get_ipython().run_line_magic('env', 'JOB_ID={job_id}')
# The `bacalhau docker run` command allows one to pass input data volume with a `-i ipfs://CID:path` argument just like Docker, except the left-hand side of the argument is a [content identifier (CID)](https://github.com/multiformats/cid). This results in Bacalhau mounting a *data volume* inside the container. By default, Bacalhau mounts the input volume at the path `/inputs` inside the container.
#
# We created a 72px wide video thumbnails for all the videos in the `inputs` directory. The `outputs` directory will contain the thumbnails for each video. We will shard by 1 video per job, and use the `linuxserver/ffmpeg` container to resize the videos.
# :::tip
# [Bacalhau overwrites the default entrypoint](https://github.com/filecoin-project/bacalhau/blob/v0.2.3/cmd/bacalhau/docker_run.go#L64) so we must run the full command after the `--` argument. In this line you will list all of the mp4 files in the `/inputs` directory and execute `ffmpeg` against each instance.
# :::
#
# ## Checking the State of your Jobs
#
# - **Job status**: You can check the status of the job using `bacalhau list`.
# In[ ]:
get_ipython().run_cell_magic('bash', '', 'bacalhau list --id-filter=${JOB_ID} --no-style\n')
# When it says `Published` or `Completed`, that means the job is done, and we can get the results.
# - **Job information**: You can find out more information about your job by using `bacalhau describe`.
# In[ ]:
get_ipython().run_cell_magic('bash', '', 'bacalhau describe ${JOB_ID}\n')
# - **Job download**: You can download your job results directly by using `bacalhau get`. Alternatively, you can choose to create a directory to store your results. In the command below, we created a directory and downloaded our job output to be stored in that directory.
# In[ ]:
get_ipython().run_cell_magic('bash', '', 'mkdir -p ./results # Temporary directory to store the results\nbacalhau get --output-dir ./results ${JOB_ID} # Download the results\n')
# After the download has finished you should see the following contents in results directory.
# ## Viewing your Job Output
#
# To view the file, run the following command:
# In[ ]:
get_ipython().run_cell_magic('bash', '', '# Copy the files to the local directory, to allow the documentation scripts to copy them to the right place\ncp results/outputs/* ./ && rm -rf results/outputs/*\n# Remove any spaces from the filenames\nfor f in *\\ *; do mv "$f" "${f// /_}"; done\n')
# ### Display the videos
#
# To view the videos, we will use **glob** to return all file paths that match a specific pattern.
# In[ ]:
import glob
from IPython.display import Video, display
for file in glob.glob('*.mp4'):
display(Video(filename=file))
#
#
#
#
# ## Need Support?
#
# For questions, feedback, please reach out in our [forum](https://github.com/filecoin-project/bacalhau/discussions)