Python-爬取"我去图书馆"座位编码

背景

曾几何时,去图书馆都是在终端上拿校园卡刷,这就意味着,人必须去,当然啦也有拿着卡代刷的,确实不妥。很久没去过图书馆了,现在的图书馆都采用微信工作号“我去图书馆”,在上面进行预约,然后在预约后规定时间里去图书馆终端上刷码,同时也可以进行明日预约,这个功能能够让很多人不用担心明天早起排队,然后看似很棒的东西,居然出现刷坐程序,特别是明日预约,瞬间被预约完成,有点恶心,但是还是那句话,技术本身是无罪的,有”罪”的是使用的人。这不一个同学介绍了一个同学,他拿到了刷票程序,python实现,但祖传自南京某大学,不适用whut啊,因此我好好的看了一下,然后首先要解决的就是获取微信的sessionID,这个已经可以通过抓包实现,见https://fanjiajia.cn/2018/11/21/Mac%E4%B8%8B%E4%BD%BF%E7%94%A8Charles%E6%8A%93%E5%8C%85Android/
然后就是要获取图书馆的位置编码,也就是如何给图书馆的位置变号的。

爬取位置页面

首先肯定是要获取位置页面的html,同上抓包工具,抓到了url,But这个链接不能在浏览器中直接打开,浏览器会提示说请用微信客户端打开,如果在爬虫程序中直接使用request,那么封装的Header肯定指定的发起浏览器需要和微信使用的一直。那怎么办呢,不着急,在抓包中,直接获取Text,这就是返回的html文本,copy出来,存在本地,然后读取本地文件进行解析。

  • 截取html文件中的位置绘制部分如下:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    <div class="grid_cell grid_1" data-key="5,7" style="left:280px;top:210px;">
    <em>3</em>
    </div>
    <div class="grid_cell grid_active" data-key="5,8" style="left:315px;top:210px;">
    <em>2</em>
    </div>
    <div class="grid_cell grid_8" data-key="5,11" style="left:420px;top:210px;">
    <em></em>
    </div>
    <div class="grid_cell grid_8" data-key="5,12" style="left:455px;top:210px;">
    <em></em>
    </div>
    <div class="grid_cell grid_8" data-key="5,13" style="left:490px;top:210px;">
    <em></em>
    </div>
    <div class="grid_cell grid_8" data-key="5,14" style="left:525px;top:210px;">
    <em></em>
    </div>
    <div class="grid_cell grid_active" data-key="5,17" style="left:630px;top:210px;">
    <em>29</em>
    </div>
  • 分析上面的代码,看见一个div就是一个位置,位置的编号就是em的文本内容,我们主要是需要这个位置data-key

    页面解析

    上面的代码已经很清楚了,需要获取em标签文本不为空时的父表情的data-key属性。
  • 第一步:导入需要的库,
    1
    from bs4 import BeautifulSoup

这里只需要用BeautifulSoup来解析即可。

  • 第二步: 定义获取一个房间(一个html)的位置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    def getOneRoomSeats(url):
    with open(url,'r') as wb_data:
    soup = BeautifulSoup(wb_data,'lxml')
    # print(Soup.prettify())
    seats = {}

    for tag in soup.find_all('em'): # for循环遍历所有a标签,并把返回列表中的内容赋给t
    if tag.string != None:
    seats[tag.string] = tag.parent.get('data-key')
    return seats

这里采用字典的方式进行存储,比如{'29':'5,17'},就代表29号座的信息,那么最后一个房间的所有座位都变成了这样的字典形式。

  • 第三步: 获取所有房间的位置信息
    1
    2
    3
    4
    5
    6
    7
    8
    def getAllRoomSeats(urls):
    AllLibSeats = {}
    a = 1
    for url in urls:
    seats = getOneRoomSeats(url)
    AllLibSeats[a] = seats
    a = a + 1
    return AllLibSeats

urls是一个列表,表示上面所爬取的存在本地的所有文件路径。最后用一个字典存储所有的房间位置信息。

最后

  1. 好了,这样就把所有的位置信息获取到了,这里,我就把我们学校WHUT的图书馆在”我去图书馆”的位置信息送上吧,虽然我看的那个刷坐程序现在似乎已经不行了,因为给任何一个结尾的url返回的都是预约成功。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    R1_SEATTABLE={'3': '5,7', '2': '5,8', '29': '5,17', '30': '5,18', '4': '7,7', '1': '7,8', '31': '7,17', '32': '7,18', '5': '9,7', '6': '9,8', '7': '11,7', '8': '11,8', '10': '13,7', '9': '13,8', '33': '13,17', '34': '13,18', '11': '15,7', '12': '15,8', '35': '15,17', '36': '15,18', '13': '17,7', '14': '17,8', '37': '17,17', '38': '17,18', '15': '19,7', '16': '19,8', '39': '19,17', '40': '19,18', '17': '21,7', '18': '21,8', '41': '21,17', '42': '21,18', '19': '23,7', '20': '23,8', '43': '23,17', '44': '23,18', '21': '25,7', '22': '25,8', '23': '27,7', '24': '27,8', '25': '29,7', '26': '29,8', '45': '29,17', '46': '29,18', '27': '31,7', '28': '31,8', '47': '31,17', '48': '31,18'}
    # 二楼(电子阅览室)
    R2_SEATTABLE={'32': '5,11', '33': '5,12', '26': '5,17', '27': '5,18', '31': '6,10', '34': '6,13', '25': '6,16', '28': '6,19', '36': '7,11', '35': '7,12', '30': '7,17', '29': '7,18', '38': '9,11', '39': '9,12', '20': '9,17', '21': '9,18', '37': '10,10', '40': '10,13', '19': '10,16', '22': '10,19', '42': '11,11', '41': '11,12', '24': '11,17', '23': '11,18', '44': '13,11', '45': '13,12', '14': '13,17', '15': '13,18', '43': '14,10', '46': '14,13', '13': '14,16', '16': '14,19', '48': '15,11', '47': '15,12', '18': '15,17', '17': '15,18', '50': '17,11', '51': '17,12', '8': '17,17', '9': '17,18', '49': '18,10', '52': '18,13', '7': '18,16', '10': '18,19', '54': '19,11', '53': '19,12', '12': '19,17', '11': '19,18', '56': '21,11', '57': '21,12', '2': '21,17', '3': '21,18', '55': '22,10', '58': '22,13', '1': '22,16', '4': '22,19', '60': '23,11', '59': '23,12', '6': '23,17', '5': '23,18'}
    # 三楼
    R3_SEATTABLE={'1': '5,8', '2': '5,9', '5': '5,11', '6': '5,12', '9': '5,14', '10': '5,15', '3': '7,8', '4': '7,9', '7': '7,11', '8': '7,12', '11': '7,14', '12': '7,15', '13': '9,8', '14': '9,9', '17': '9,11', '18': '9,12', '21': '9,14', '22': '9,15', '15': '11,8', '16': '11,9', '19': '11,11', '20': '11,12', '23': '11,14', '24': '11,15', '25': '13,8', '26': '13,9', '29': '13,11', '30': '13,12', '33': '13,14', '34': '13,15', '27': '15,8', '28': '15,9', '31': '15,11', '32': '15,12', '35': '15,14', '36': '15,15', '37': '17,8', '38': '17,9', '41': '17,11', '42': '17,12', '45': '17,14', '46': '17,15', '39': '19,8', '40': '19,9', '43': '19,11', '44': '19,12', '47': '19,14', '48': '19,15', '49': '21,8', '50': '21,9', '53': '21,11', '54': '21,12', '57': '21,14', '58': '21,15', '51': '23,8', '52': '23,9', '55': '23,11', '56': '23,12', '59': '23,14', '60': '23,15', '61': '25,8', '62': '25,9', '65': '25,11', '66': '25,12', '69': '25,14', '70': '25,15', '63': '27,8', '64': '27,9', '67': '27,11', '68': '27,12', '71': '27,14', '72': '27,15', '73': '29,8', '74': '29,9', '77': '29,11', '78': '29,12', '81': '29,14', '82': '29,15', '75': '31,8', '76': '31,9', '79': '31,11', '80': '31,12', '83': '31,14', '84': '31,15', '85': '33,8', '86': '33,9', '89': '33,11', '90': '33,12', '93': '33,14', '94': '33,15', '87': '35,8', '88': '35,9', '91': '35,11', '92': '35,12', '95': '35,14', '96': '35,15', '97': '37,8', '98': '37,9', '101': '37,11', '102': '37,12', '105': '37,14', '106': '37,15', '99': '39,8', '100': '39,9', '103': '39,11', '104': '39,12', '107': '39,14', '108': '39,15', '109': '41,8', '110': '41,9', '113': '41,11', '114': '41,12', '117': '41,14', '118': '41,15', '111': '43,8', '112': '43,9', '115': '43,11', '116': '43,12', '119': '43,14', '120': '43,15'}
    # 四楼
    R4_SEATTABLE={'1': '5,8', '2': '5,9', '5': '5,12', '6': '5,13', '9': '5,16', '10': '5,17', '3': '7,8', '4': '7,9', '7': '7,12', '8': '7,13', '11': '7,16', '12': '7,17', '13': '9,8', '14': '9,9', '17': '9,12', '18': '9,13', '21': '9,16', '22': '9,17', '15': '11,8', '16': '11,9', '19': '11,12', '20': '11,13', '23': '11,16', '24': '11,17', '25': '13,8', '26': '13,9', '29': '13,12', '30': '13,13', '33': '13,16', '34': '13,17', '27': '15,8', '28': '15,9', '31': '15,12', '32': '15,13', '35': '15,16', '36': '15,17', '37': '17,8', '38': '17,9', '41': '17,12', '42': '17,13', '45': '17,16', '46': '17,17', '39': '19,8', '40': '19,9', '43': '19,12', '44': '19,13', '47': '19,16', '48': '19,17', '49': '21,8', '50': '21,9', '53': '21,12', '54': '21,13', '57': '21,16', '58': '21,17', '51': '23,8', '52': '23,9', '55': '23,12', '56': '23,13', '59': '23,16', '60': '23,17', '61': '25,8', '62': '25,9', '65': '25,12', '66': '25,13', '69': '25,16', '70': '25,17', '63': '27,8', '64': '27,9', '67': '27,12', '68': '27,13', '71': '27,16', '72': '27,17', '73': '29,8', '74': '29,9', '77': '29,12', '78': '29,13', '81': '29,16', '82': '29,17', '75': '31,8', '76': '31,9', '79': '31,12', '80': '31,13', '83': '31,16', '84': '31,17', '85': '33,8', '86': '33,9', '89': '33,12', '90': '33,13', '93': '33,16', '94': '33,17', '87': '35,8', '88': '35,9', '91': '35,12', '92': '35,13', '95': '35,16', '96': '35,17', '97': '37,8', '98': '37,9', '101': '37,12', '102': '37,13', '105': '37,16', '106': '37,17', '99': '39,8', '100': '39,9', '103': '39,12', '104': '39,13', '107': '39,16', '108': '39,17', '109': '41,8', '110': '41,9', '113': '41,12', '114': '41,13', '117': '41,16', '118': '41,17', '111': '43,8', '112': '43,9', '115': '43,12', '116': '43,13', '119': '43,16', '120': '43,17'}
    # 五楼
    R5_SEATTABLE={'1': '5,7', '2': '5,8', '5': '5,9', '6': '5,10', '9': '5,12', '10': '5,13', '3': '7,7', '4': '7,8', '7': '7,9', '8': '7,10', '11': '7,12', '12': '7,13', '13': '9,7', '14': '9,8', '17': '9,9', '18': '9,10', '21': '9,12', '22': '9,13', '15': '11,7', '16': '11,8', '19': '11,9', '20': '11,10', '23': '11,12', '24': '11,13', '25': '13,7', '26': '13,8', '29': '13,9', '30': '13,10', '33': '13,12', '34': '13,13', '27': '15,7', '28': '15,8', '31': '15,9', '32': '15,10', '35': '15,12', '36': '15,13', '37': '17,7', '38': '17,8', '41': '17,9', '42': '17,10', '45': '17,12', '46': '17,13', '39': '19,7', '40': '19,8', '43': '19,9', '44': '19,10', '47': '19,12', '48': '19,13', '49': '21,7', '50': '21,8', '53': '21,9', '54': '21,10', '57': '21,12', '58': '21,13', '51': '23,7', '52': '23,8', '55': '23,9', '56': '23,10', '59': '23,12', '60': '23,13', '61': '25,7', '62': '25,8', '65': '25,9', '66': '25,10', '69': '25,12', '70': '25,13', '63': '27,7', '64': '27,8', '67': '27,9', '68': '27,10', '71': '27,12', '72': '27,13', '73': '29,7', '74': '29,8', '77': '29,9', '78': '29,10', '81': '29,12', '82': '29,13', '75': '31,7', '76': '31,8', '79': '31,9', '80': '31,10', '83': '31,12', '84': '31,13', '85': '33,7', '86': '33,8', '89': '33,9', '90': '33,10', '93': '33,12', '94': '33,13', '87': '35,7', '88': '35,8', '91': '35,9', '92': '35,10', '95': '35,12', '96': '35,13', '97': '37,7', '98': '37,8', '101': '37,9', '102': '37,10', '105': '37,12', '106': '37,13', '99': '39,7', '100': '39,8', '103': '39,9', '104': '39,10', '107': '39,12', '108': '39,13', '109': '41,7', '110': '41,8', '113': '41,9', '114': '41,10', '117': '41,12', '118': '41,13', '111': '43,7', '112': '43,8', '115': '43,9', '116': '43,10', '119': '43,12', '120': '43,13'}
    # 六楼
    R6_SEATTABLE={'1': '5,7', '2': '5,8', '5': '5,9', '6': '5,10', '9': '5,12', '10': '5,13', '3': '7,7', '4': '7,8', '7': '7,9', '8': '7,10', '11': '7,12', '12': '7,13', '13': '9,7', '14': '9,8', '17': '9,9', '18': '9,10', '21': '9,12', '22': '9,13', '15': '11,7', '16': '11,8', '19': '11,9', '20': '11,10', '23': '11,12', '24': '11,13', '25': '13,7', '26': '13,8', '29': '13,9', '30': '13,10', '33': '13,12', '34': '13,13', '27': '15,7', '28': '15,8', '31': '15,9', '32': '15,10', '35': '15,12', '36': '15,13', '37': '17,7', '38': '17,8', '41': '17,9', '42': '17,10', '45': '17,12', '46': '17,13', '39': '19,7', '40': '19,8', '43': '19,9', '44': '19,10', '47': '19,12', '48': '19,13', '49': '21,7', '50': '21,8', '53': '21,9', '54': '21,10', '57': '21,12', '58': '21,13', '51': '23,7', '52': '23,8', '55': '23,9', '56': '23,10', '59': '23,12', '60': '23,13', '61': '25,7', '62': '25,8', '65': '25,9', '66': '25,10', '69': '25,12', '70': '25,13', '63': '27,7', '64': '27,8', '67': '27,9', '68': '27,10', '71': '27,12', '72': '27,13', '73': '29,7', '74': '29,8', '77': '29,9', '78': '29,10', '81': '29,12', '82': '29,13', '75': '31,7', '76': '31,8', '79': '31,9', '80': '31,10', '83': '31,12', '84': '31,13', '85': '33,7', '86': '33,8', '89': '33,9', '90': '33,10', '93': '33,12', '94': '33,13', '87': '35,7', '88': '35,8', '91': '35,9', '92': '35,10', '95': '35,12', '96': '35,13', '97': '37,7', '98': '37,8', '101': '37,9', '102': '37,10', '105': '37,12', '106': '37,13', '99': '39,7', '100': '39,8', '103': '39,9', '104': '39,10', '107': '39,12', '108': '39,13', '109': '41,7', '110': '41,8', '113': '41,9', '114': '41,10', '117': '41,12', '118': '41,13', '111': '43,7', '112': '43,8', '115': '43,9', '116': '43,10', '119': '43,12', '120': '43,13'}
  2. 还是希望这些软件能够设计得更好吧,比较算是商用的了,轻松的获取了源码,还有很多信息未加密,太自信了。

  3. 最后也希望好好利用图书馆资源吧,不要占着那啥不那啥。

此致,敬礼

~~客官随意,我只是学习怎么配置打赏而已~~