Frequently Asked Questions
Q: How do I reset my password?
A: Contact support@ace-bioinformatics.org with your username and request a password reset.
Q: Why is my job pending?
A: Your job is waiting for resources. Common reasons include:
- No available compute nodes
- Waiting for other higher-priority jobs
- Resource limits like CPU or memory requests are too high
Usesqueue -u $USERandscontrol show job <jobid>for more details.
Q: Can I increase my storage quota?
A: Yes. Contact support@ace-bioinformatics.org to request a quota increase and explain your needs.
Q: How do I know which modules are available?
A: Use module avail to see all available modules. If nothing shows, make sure you're using the correct environment or login node.
Q: My script works on my laptop but fails here. Why?
A: HPC systems use different environments. You need to load modules and run jobs via Slurm (sbatch) instead of running them interactively.
Q: What is a Slurm script and how do I create one?
A: A Slurm script is a shell script with special #SBATCH directives that tell the scheduler how to run your job. Example:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=result.out
#SBATCH --ntasks=1
#SBATCH --time=01:00:00
module load python
python myscript.py
Q: Should I run everything on the login node?
A: No. Login nodes are for preparing jobs only. Use sbatch or srun to run on compute nodes.
Q: What is the difference between sbatch, srun, and salloc?
A:
sbatch: Submits a job script to the queuesrun: Runs a job directly, often inside a scriptsalloc: Starts an interactive session with allocated resources
Q: How do I monitor my job?
A: Use squeue -u $USER for job status, sacct for historical data, and scontrol show job <jobid> for detailed info.
Q: My job was killed or failed. How do I find out why?
A: Check your .out or .err files and run sacct -j <jobid> --format=JobID,State,ExitCode.
Q: Can I run graphical applications?
A: Yes, if the cluster supports X11 forwarding. You need to SSH with -X or -Y and load the necessary modules.
Q: What’s the best way to transfer files to/from the HPC?
A: Use scp, rsync, or a file transfer service if provided. Example:
scp myfile.txt user@hpc.example.com:/path/to/dir
Q: Why does my Python/R script fail to import libraries?
A: Make sure the right module is loaded or use a virtual environment/conda environment installed in your home directory.
Q: Can I install custom packages?
A: Yes. Use conda, virtualenv, or build from source in your user space.
Q: How do I cancel a job?
A: Use scancel <jobid> to stop a running or pending job.
Q: How do I use more than one CPU or node?
A: Add #SBATCH --ntasks=<N> or #SBATCH --nodes=<N> in your script. Match this with parallel code or tools like mpirun.
Q: What is a module and why do I need it?
A: Modules load software packages with the correct paths. Use module load <name> to make a tool available.
Q: My job exceeds memory and crashes. How do I fix this?
A: Increase your memory request using #SBATCH --mem=4G or analyze your script for memory leaks.
Q: How do I run a job array?
A: Add #SBATCH --array=1-10 in your script. Each task will run independently with its own $SLURM_ARRAY_TASK_ID.
Q: What is the best way to test a script before full submission?
A: Run an interactive session using salloc or test with smaller inputs and shorter time limits.
Q: Can I automate job submissions?
A: Yes, using loops or shell scripts that call sbatch. Example:
for i in {1..10}; do
sbatch myjob.sh $i
done
Q: How do I check cluster usage?
A: Use sinfo to see partitions and node states. Some clusters may also have a web dashboard or monitoring tools.