1. 断点续传原理:

    在HTTP数据包中,可以增加Range头,这个头以字节为单位指定请求的范围,来下载范围内的字节流。意思就是说,我可以把要下载的数据包分成很多块,分别下载,假如下载出错时,会记录到下载了哪些块,然后接着下载即可。

    req.headers['Range'] = 'bytes=%s-%s' % (block[1], block[2])

     

    1. 下载的核心代码:

    req = request.Request(url)

    req.headers['Range'] = 'bytes=%s-%s' % (block[1], block[2])

    res = request.urlopen(req)

    while 1:

    chunk = res.read(buffer_size)

    if not chunk:

    break

    with lock:

    fobj.seek(block[1])

    fobj.write(chunk)

    block[1] += len(chunk)

    fobj.flush()

     

    解析url得到resread得到文件内容,再把读到的write到本地下载的文件

     

    1. 解析url

    def get_file_info(url):

     

    res = request.urlopen(url)

    headers = dict(res.headers)

    size = int(headers.get('Content-Length', 0))

    lastmodified = headers.get('last-modified', '')

    name = None

    if 'Content-Disposition' in headers.keys() :

    name = headers['Content-Disposition'].split('filename=')[1]

    if name[0] == '"' or name[0] == "'":

    name = name[1:-1]

    else:

    name = url.split('/')[-1]

    print(name)

    return FileInfo(url, name, size, lastmodified)

    利用urllib模块解析url,得到想要的url信息

     

    1. 多线程下载线程池:

    # start monitor

    threading.Thread(target=_monitor, args=(infopath, file_info, blocks)).start()

    #pool = multiprocessing.Pool(processes = 3)

    # start downloading

    with open(workpath, 'wb') as fobj:

    args = [(url, blocks[i], fobj, buffer_size) for i in range(len(blocks)) if blocks[i][1] < blocks[i][2]]

    if thread_count > len(args):

    thread_count = len(args)

    pool = ThreadPool(thread_count)

    pool.map(_worker,args)

    pool.close()

    pool.join()

    首先在之前得到url解析的信息,块,下载文件描述符,buffer大小,传入运行函数,建立一个pool,设置线程池个数,假如10个,而我们有20个块要下载的话,就会先填满线程池,等到里面的线程结束一个便填一个。直至20个块跑完。

     

    1. 设置一些命令行参数

    import argparse

    print("hello")

    parser = argparse.ArgumentParser(description='Download file by multi-threads.')

    parser.add_argument('url', type=str, help='url of the download file')

    parser.add_argument('-o', type=str, default=None, dest="output", \

    help='output file')

    parser.add_argument('-t', type=int, default=defaults['thread_count'], \

    dest="thread_count", help='thread counts to downloading')

    parser.add_argument('-b', type=int, default=defaults['buffer_size'], \

    dest="buffer_size", help='buffer size')

    parser.add_argument('-s', type=int, default=defaults['block_size'], \

    dest="block_size", help='block size')

    argv = sys.argv[1:]

    这里用到了argparse模块,如实例,我们可以在跑脚本时添加命令行参数,比如设置url,线程数,块大小等等。

    1. 最后达到的效果:

    Python download.py http://dev.mysql.com/get/Downloads/MySQL-5.6/mysql-5.6.33-linux-glibc2.5-x86_64.tar.gz -t 20

     

     

    hello

    mysql-5.6.33-linux-glibc2.5-x86_64.tar.gz

    mysql-5.6.33-linux-glibc2.5-x86_64.tar.gz.ing

    Downloading  http://dev.mysql.com/get/Downloads/MySQL-5.6/mysql-5.6.33-linux-glibc2.5-x86_64.tar.gz

    0.0

    0.0

    1.4988513123524485

    2.6066979345259975

    3.812295729244271

    5.148228420688845

    99.93483255163684

    99.96741627581842

    times: 180s

     

    代码由网上一个python2的断点续传代码修改为python3而来。

     

     




    Related posts

    coded by nessus
    分享:  DeliciousGReader鲜果豆瓣CSDN网摘
    Trackback

    no comment untill now

    Add your comment now

    无觅相关文章插件