2014-07-05发表2014-07-06更新Python4 分钟读完 (大约592个字)

下载网易公开课的Python脚本

po主算法奇菜无比，看到163的公开课上有《算法导论》，就弄了个小脚本下载。

结果发现，在OSX上，竟然比迅雷快！快好多！当然，这全怪po主当年中二，办了广电的网络…而且OSX下的迅雷运行起来有时发热会很大！

Talk is cheap…

用到了requests库。
用到了wget下载工具。

import requests
import re
import os
from subprocess import call


def urls_iter(origin_url='http://v.163.com/special/opencourse/algorithms.html', begin_from=1):
    """
    *origin_url* is the origin download page of 163 OpenCourse, the default value is 'algorithms course from MIT'.
    *begin_from* can make the function jump to the No.*begin_from* item to start download.
    """
    download_request = requests.get(origin_url)

    course_list_pattern = re.compile(r'<table class="m-clist" id="list2".*?>(.*?)</table>', flags=re.DOTALL)
    course_pattern = re.compile(r'<tr class="(?:u-odd|u-even)">.*?</tr>', flags=re.DOTALL)
    course_text = course_list_pattern.search(download_request.text).group(1)
    course_list = course_pattern.findall(course_text)

    course_name_pattern = re.compile(r'<a href=.*?>(.*?)</a>', flags=re.DOTALL)
    course_video_pattern = re.compile(r'<a class="downbtn" href=[\'""](.*?)[\'""].*?>.*?</a>', flags=re.DOTALL)

    for index, course in enumerate(course_list, 1):
        if index < begin_from:
            continue

        index = '{:0>2}_'.format(index)
        video_title = index + course_name_pattern.search(course).group(1) + '.mp4'
        video_address = course_video_pattern.search(course).group(1)
        yield (video_title, video_address)


def download_course(download_list, download_dir='/Users/zealot/Downloads/algorithms'):
    """
    *download_list* is a collection contains a list of tuple whose 1st element is the filename of the video,
        and 2nd element is the download url of the video.
    *download_dir* defines the directory where the files should be stored in.

    function using common download tool `wget` to fetch videos one at a time.
    raise error if `wget` not found.
    """
    with open('/dev/null') as black_hole:
        if call(['which', 'wget'], stdout=black_hole):
            raise OSError('command not found: wget')

    if not os.path.exists(download_dir):
        os.makedirs(download_dir)

    for video_title, video_address in download_list:
        call(['wget', '-c', video_address, '-O', os.path.join(download_dir, video_title)])

if __name__ == '__main__':
    download_course(urls_iter())

找链接全部用的正则；
函数默认行为是帮po主从http://v.163.com/special/opencourse/algorithms.html下载《算法导论》视频，并默认帮po主存到/Users/zealot/Downloads/algorithms目录里；
wget -c支持断点续传，对于广电渣网络来说简直是福音；

当然，如果你想要下载《傅里叶变换及其应用》可以把最后的调用改为：

if __name__ == '__main__':
    download_course(urls_iter('http://v.163.com/special/opencourse/fouriertransforms.html'),
                    '/Users/zealot/downloads/fouriertransforms')

效果不错~

-EOF-

下载网易公开课的Python脚本

喜欢这篇文章？打赏一下作者吧

评论

链接

分类

标签

Your browser is out-of-date!