标题: [原创代码] PowerShell实现科学文库图书下载 [打印本页]
作者: went 时间: 2023-9-28 21:32 标题: PowerShell实现科学文库图书下载
科学文库图书下载,png格式,可自行转换为pdf,去除在线阅读限制
科学文库官网: https://book.sciencereading.cn/shop/main/Login/shopFrame.do
图书链接格式: https://book.sciencereading.cn/s ... E053020B0A0A1666000
侵权请通知删除,谨慎外传
带目录索引下载的更新在7楼- cls
- <#
- 科学文库官网: https://book.sciencereading.cn/shop/main/Login/shopFrame.do
- 图书链接格式: https://book.sciencereading.cn/shop/book/Booksimple/show.do?id=B970CE8AEE3531D1DE053020B0A0A1666000
- #>
- #图书连接
- $book_url = Read-Host -Prompt '输入图书链接'
- $resp = Invoke-WebRequest -Uri $book_url
- #图书名称
- $book_name = $null
- $book_name = $resp.ParsedHtml.querySelector('.book_detail_title').innerText.Trim()
- if($book_name -eq $null){
- Write-Host '图书名称解析失败' -ForegroundColor Red
- pause
- exit
- }
- $book_name = $book_name -replace '\<|\>|\?|\*|\:|\||\/|\\',' '
- #图书id
- $book_id = $book_url -split '=' | Select-Object -Last 1
- #服务器ip和端口
- $server_ip = '159.226.241.32'
- $server_port = 81
- #默认用户id
- $default_user = '825ae171eb514934b1ed2374976f4a9f'
- #文档编号
- $doc_num_api = 'https://wkobwp.sciencereading.cn/api/file/add'
- $resp = Invoke-WebRequest -UseBasicParsing -Method Post -Uri $doc_num_api -Headers @{
- 'accessToken' = 'accessToken'
- 'Content-Type' = 'application/x-www-form-urlencoded; charset=UTF-8'
- } -Body (
- 'params=%7B%22params%22%3A%7B%22userName%22%3A%22Guest%22%2C%22userId%22%3A%22{0}%22%2C%22file%22%3A%22http%3A%2F%2F{1}%3A{2}%2F{3}.pdf%22%7D%7D&type=http' -f $default_user,$server_ip,$server_port,$book_id
- )
- $book_number = ($resp.Content -split '"')[3]
- #获取图片总数量
- $img_count = $null
- while($img_count -eq $null){
- $book_number
- $resp = Invoke-WebRequest -UseBasicParsing -Uri ('https://wkobwp.sciencereading.cn/asserts/{0}/manifest?language=zh-CN' -f $book_number)
- $json = [System.Text.UTF8Encoding]::UTF8.GetString($resp.Content) | ConvertFrom-Json
- $json = $json.docinfo | ConvertFrom-Json
- $img_count = $json.PageCount
- }
- #遍历下载图片
- [void][System.IO.Directory]::CreateDirectory($book_name)
- $img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true';
- 0..($img_count-1) | foreach {
- $url = $img_url -f $book_number,$_
- $png = '.\{0}\{0}-{1:000}.png' -f $book_name,$_
- while($true){
- Write-Host ('{0}/{1}' -f $_,$img_count) -ForegroundColor Yellow
- try{
- $resp = Invoke-WebRequest -UseBasicParsing -Uri ($img_url -f $book_number,$_)
- [System.IO.File]::WriteAllBytes($png,$resp.Content)
- [System.IO.Path]::GetFileName($png)
- break
- } catch {
- Start-Sleep -Seconds 1
- }
- }
- }
- Write-Host '全部下载完成' -ForegroundColor Green
- pause
复制代码
$img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true';
100改大点,图像更清晰。听别人说有9个级别
分辨率共有50,75,100,125,150,200,400,800,1000九个级别
作者: yyz219 时间: 2023-9-28 22:07
谢谢分享哦
作者: hlzj88 时间: 2023-9-28 22:27
谢谢分享,不知道全部下载完成多少G?
作者: pd1 时间: 2023-9-28 23:09
$img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true';
100改大点,图像更清晰。听别人说有9个级别
分辨率共有50,75,100,125,150,200,400,800,1000九个级别
作者: went 时间: 2023-9-28 23:20
回复 3# hlzj88
只下载自己需要的吧,爬虫这种事本来就不道德
作者: went 时间: 2023-9-28 23:21
回复 4# pd1
感谢提醒,已更新到顶楼
作者: went 时间: 2023-9-29 00:50
更新一个带目录索引的- cls
- <#
- 科学文库官网: https://book.sciencereading.cn/shop/main/Login/shopFrame.do
- 图书链接格式: https://book.sciencereading.cn/shop/book/Booksimple/show.do?id=B970CE8AEE3531D1DE053020B0A0A1666000
- #>
- #图书连接
- $book_url = Read-Host -Prompt '输入图书链接'
- $resp = Invoke-WebRequest -Uri $book_url
- #图书名称
- $book_name = $null
- $book_name = $resp.ParsedHtml.querySelector('.book_detail_title').innerText.Trim()
- if($book_name -eq $null){
- Write-Host '图书名称解析失败' -ForegroundColor Red
- pause
- exit
- }
- $book_name = $book_name -replace '\<|\>|\?|\*|\:|\||\/|\\',' '
-
-
- #图书目录
- $menus = $null
- if($resp.Content -match '(?s)var zNodes=(\[{.*}\]);'){
- $menus = $Matches[1] | ConvertFrom-Json
- $menus = $menus | Sort-Object { $arr=$_.url -split '=';return ([int]$arr[$arr.Length-1]) }
- }
-
- #图书id
- $book_id = $book_url -split '=' | Select-Object -Last 1
- #服务器ip和端口
- $server_ip = '159.226.241.32'
- $server_port = 81
- #默认用户id
- $default_user = '825ae171eb514934b1ed2374976f4a9f'
- #文档编号
- $doc_num_api = 'https://wkobwp.sciencereading.cn/api/file/add'
- $resp = Invoke-WebRequest -UseBasicParsing -Method Post -Uri $doc_num_api -Headers @{
- 'accessToken' = 'accessToken'
- 'Content-Type' = 'application/x-www-form-urlencoded; charset=UTF-8'
- } -Body (
- 'params=%7B%22params%22%3A%7B%22userName%22%3A%22Guest%22%2C%22userId%22%3A%22{0}%22%2C%22file%22%3A%22http%3A%2F%2F{1}%3A{2}%2F{3}.pdf%22%7D%7D&type=http' -f $default_user,$server_ip,$server_port,$book_id
- )
- $book_number = ($resp.Content -split '"')[3]
- #获取图片总数量
- $img_count = $null
- while($img_count -eq $null){
- $resp = Invoke-WebRequest -UseBasicParsing -Uri ('https://wkobwp.sciencereading.cn/asserts/{0}/manifest?language=zh-CN' -f $book_number)
- $json = [System.Text.UTF8Encoding]::UTF8.GetString($resp.Content) | ConvertFrom-Json
- $json = $json.docinfo | ConvertFrom-Json
- $img_count = $json.PageCount
- }
-
- #下载资源
- function Get-FullDir($menus,$menu){
- if($menu.pId -eq '0') {
- return ('.\{0}\{1}' -f $book_name,$menu.name)
- }
- $p_menu = $menus | Where-Object { $_.id -eq $menu.pId }
- $p_dir = Get-FullDir -menus $menus -menu $p_menu
- return ('{0}\{1}' -f $p_dir,$menu.name)
- }
- function Download-Image($start_page,$end_page,$save_dir){
- [void][System.IO.Directory]::CreateDirectory($save_dir)
- $img_url = 'https://wkobwp.sciencereading.cn/asserts/{0}/image/{1}/100?accessToken=accessToken&formMode=true'
- for($i=$start_page; $i -le $end_page; $i++){
- $url = $img_url -f $book_number,($i-1)
- $png = '{0}\{1}-{2:000}.png' -f $save_dir,[System.IO.Path]::GetFileName($save_dir),$i
- $png
- while($true){
- Write-Host ('{0}/{1}' -f $i,$img_count) -ForegroundColor Yellow
- try{
- $resp = Invoke-WebRequest -UseBasicParsing -Uri $url
- [System.IO.File]::WriteAllBytes($png,$resp.Content)
- [System.IO.Path]::GetFileName($png)
- break
- } catch {
- Start-Sleep -Seconds 1
- }
- }
- }
- }
- if($menus -ne $null){
- #创建主目录
- [void][System.IO.Directory]::CreateDirectory($book_name)
- $last_dir = '.\' + $book_name + '\封面'
- #遍历子目录
- $last_page = 1
- $menus | foreach {
- #当前目录
- $cur_dir = Get-FullDir -menus $menus -menu $_
- $cur_dir = $cur_dir -replace '\:|\?|\*|\"|\<|\>|\|','' -replace "'",''
- [void][System.IO.Directory]::CreateDirectory($cur_dir)
- #当前目录图片开始索引
- $arr = $_.url -split '='
- $cur_page = [int]$arr[$arr.length-1]
- #下载目录对应的图片
- Write-Host $last_dir
- Write-Host ('pages: [{0}-{1}]' -f $last_page,$cur_page) -ForegroundColor Yellow
- Download-Image -start_page $last_page -end_page $cur_page -save_dir $last_dir
- #目录对应图片下载完成,设置初始化图片索引
- $last_page = $cur_page
- $last_dir = $cur_dir
- '-------------'
- }
- #下载剩余所有图片到最后一个目录
- Write-Host $last_dir
- Write-Host ('pages: [{0}-{1}]' -f $last_page,$img_count) -ForegroundColor Yellow
- Download-Image -start_page $last_page -end_page $img_count -save_dir $last_dir
- }
- Write-Host '全部下载完成' -ForegroundColor Green
- pause
复制代码
作者: 18311622661 时间: 2024-3-9 20:46
6666666666666666666666666666666
欢迎光临 批处理之家 (http://bbs.bathome.net/) |
Powered by Discuz! 7.2 |