Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at


hpr1430 :: thebestofyoutube.com download script

A hacked script to download youtube videos

<< First, < Previous, , Latest >>

Thumbnail of Ken Fallon
Hosted by Ken Fallon on 2014-01-24 is flagged as Explicit and is released under a CC-BY-SA license.
Bash, YouTube, download. 9.
The show is available on the Internet Archive at: https://archive.org/details/hpr1430

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:38:32

Bash Scripting.

This is an open series in which Hacker Public Radio Listeners can share their Bash scripting knowledge and experience with the community. General programming topics and Bash commands are explored along with some tutorials for the complete novice.

In episode "Thu 2013-12-19: hpr1404 Editing pre-recorded audio in Audacity" I walked you through editing a podcast, by the magic of editing this is been posted after the other show has aired. The plan here is to get people to share their useful hacks to show how elegant, or in my case ugly, code can be. As Knightwise says "Getting technology to work for you."™
Feel free to share your own hacks with us.

https://hackerpublicradio.org/eps.php?id=1404
https://hackerpublicradio.org/eps/hpr1430-downloader.bash.txt


#!/bin/bash
# Downloads videos from youtube based on selection from https://thebestofyoutube.com
# (c) Ken Fallon https://kenfallon.com
# Released under the CC-0

maxtodownload=10
savepath="/mnt/media/Videos/tv/youtube/bestofyoutube"
savedir="${savepath}/$(\date -u +%Y-%m-%d_%H-%M-%SZ_%A)"
mkdir -p ${savedir}
logfile="${savepath}/downloaded.log"

# Gather the list
seq 1 ${maxtodownload} | while read videopage;
do 
  thisvideolist=$(wget --quiet "https://bestofyoutube.com/index.php?page=${videopage}" -O - | 
  grep 'www.youtube.com/embed/' | 
  sed 's#^.*www.youtube.com/embed/##' | 
  awk -F '"|?' '{print "https://www.youtube.com/watch?v="$1}')
  for thisvideo in $(echo $thisvideolist);
  do 
    if [ "$( grep "${thisvideo}" "${logfile}" | wc -l )" -eq 0 ];
    then
      echo "Found the new video ${thisvideo}"
      echo ${thisvideo} >> ${logfile}_todo
    else
      echo "Already downloaded ${thisvideo}"
    fi
  done
done

# Download the list
if [ -e ${logfile}_todo ];
then
  tac ${logfile}_todo | youtube-dl --batch-file - --ignore-errors --no-mtime --restrict-filenames \
    --max-quality --format mp4 --write-auto-sub -o ${savedir}'/%(autonumber)s-%(title)s-%(id)s.%(ext)s'
  cat ${logfile}_todo >> ${logfile}
  rm ${logfile}_todo
fi


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2014-02-04 15:27:52 by Roan

seq will do descending counts

seq 100 -1 1

seq FIRST INCREMENT LAST

Comment #2 posted on 2014-02-08 17:22:17 by Cloud

Great podcast and brilliant idea for a series, but...

now I need to upgrade my broadband to allow for all these great videos that I wasn't getting before!

Comment #3 posted on 2014-02-09 11:14:28 by Dave Morriss

The power of modern Bash

The power of modern Bash I wondered why you used:

seq 1 ${maxtodownload} | while read videopage; do

as opposed to:

for (( videopage=1; videopage&lt;=${maxtodownload}; videopage++ )) do

or (if you don't like repeating 'videopage' three times):

for videopage in {1..10} do

You can even do more fancy stuff like:

for i in {001..0010}; do

for i in {0010..001}; do

for c in {a..h}; do

I find I almost never use the 'seq' command in today's version of Bash.

Comment #4 posted on 2014-02-16 10:51:39 by Ken Fallon

Because ...

I got into the habit of using while loops because it deals with spaces in input better or so I've found., but mostly I can work in "blocks" up to the pipe "|" is one block. Test. Debug. Then on to the next block. That makes It easier to debug on the command line, where most of these start.

Not using seq makes the script too bashey :) but that argument holds little water I know.

Comment #5 posted on 2014-05-23 15:07:11 by Jim Zatorski

Extra video downloaded

Is anyone else having an extra video appear EVERYDAY (usually the same one)?

I have tracked it down to the "--max-quality" switch. The man page shows an expected "=FMT" clause.

Comment #6 posted on 2014-06-10 08:10:42 by APCR

Thanks for the script. It works great. Can anyone tell me how to run it as a cron job? I have copied the file to /etc/cron.daily but it does not run. Do I have to run a script that actions this script?

Comment #7 posted on 2014-06-10 14:34:22 by Ken Fallon

cron

To get this to work in cron you need to make sure that the script is executable. Assuming the script is called "boyt.bash" and is in your own bin directory "/home/apcr/bin/boyt.bash".

chmod +x /home/apcr/bin/boyt.bash

Check that is runs by just typing: /home/apcr/bin/boyt.bash

After that it should run in cron.

Comment #8 posted on 2015-03-03 04:59:39 by Ian

I have copied the script but when I try to run it it says:

toshy@toshy-Satellite-A300:~/Desktop$ ./boyt.sh awk: line 0: regular expression compile failed (missing operand) "|?

Comment #9 posted on 2015-03-05 09:29:31 by Ken Fallon

One thing I missed is that the logfile needs to exist the first time you run it so it may produce errors.

@Ian I just tried it on another computer and it didn't complain. It could be that copying and pasting from the web page is causing problems. Try downloading it with wget

wget -O ./boyt.sh https://hackerpublicradio.org/eps/hpr1430-downloader.bash.txt

then running it

sh +x ./boyt.sh

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the letter P in HPR stand for?
Are you a spammer?
What is the HOST_ID for the host of this show?
What does HPR mean to you?