返回列表 发帖

[技术讨论] PowerShell下载 双色球 分布图数据【直接写入EXCEL】

本帖最后由 xczxczxcz 于 2019-1-10 13:08 编辑

纯为下载数据。非为购彩而写。天朝彩票都是坑。世界所有程序猿一起写,都中不了天朝彩票。
一句话,想让你中就能中,不让你中只要不买完所有注数你就中不了。除非你的运势很强悍。
===非 HTML 节点法====PS3.0以上.
Function DownLotteryData {
Param ( [Array]$DATA, [string]$year, [int]$n )
$Ball=$RedBlue=$arr=$array= New-Object "System.Collections.ArrayList"
(1..52) |ForEach-Object {$Ball += 'A'} #期号+日期+33+16
for ( $i = 0; $i -lt $DATA.count; $i++ ) {
if ( $n -le 8 ) {
$arr += $DATA[$i] #开奖数据
if ($n -le 2) { $num = $n } else { $num = [int](([string]($DATA[$i])).TrimStart('0')) + 2 } #分布图
$Ball[$num] = $DATA[$i] #分布图
$n++
} else {
$arr += $DATA[$i] #开奖数据
$num = [int](([string]($DATA[$i])).TrimStart('0')) + 35 #分布图
$Ball[$num] = $DATA[$i] #分布图
$RedBlue += ,$Ball #33+16分布图
$array += ,$arr #开奖数据
$arr = $Ball = New-Object "System.Collections.ArrayList" #开奖数据 分布图
(1..52) |ForEach-Object {$Ball += 'A'} #33+16+日期
$n = 1
}
}
$content = $AllBall = New-Object "System.Collections.ArrayList"
for ( $i = 0; $i -lt ( $DATA.count /9 ); $i++ ) {
$Content += ( $array[$i] -join ' ' ).Replace('.','/') #开奖数据
$AllBall += ( $RedBlue[$i] -join ',' ).Replace('.','_').Replace('A',' ') -replace '^(\s+)?,?','' #33+16分布图
}
$Content | Set-Content .\开奖数据\简略数据\$year.txt -enc Default -force #开奖数据
$AllBall | Set-Content .\开奖数据\分布图\$($year + '.csv') -enc Default -force #分布图 EXCEL
}
#下载数据
Remove-Item .\开奖数据 -Rec -force -ea 0
new-item .\开奖数据\简略数据 -type Directory -force
new-item .\开奖数据\分布图 -type Directory -force
for ( $year = 2003; $year -le 2019; $year++ ) {
$url = "https://kjh.55128.cn/ssq-history-$year.htm"
Invoke-WebRequest -uri $url -Outfile "$env:temp\Downh.log"
$PageData = (( Get-Content "$env:temp\Downh.log" -ReadCount 0 -enc utf8 ) `
-match '^(\s+)?<td>(\d{7}|\d{4}(\.\d{2}){2})</td>$|<li(\s+)?class.*>\d{2}</li>$' ) `
-replace "(\s+)?</?(td)?(li)?((\s+)class='ball.*-24')?>",''
DownLotteryData $PageData $year 1
}
Remove-Item "$env:temp\Downh.log" -force
pauseCOPY
1

评分人数

回复 2# 523066680
谢谢! 补充一个 直接写 EXCEL 的。*.xlsx格式。

从2003 -2019 最新约 180秒。写入EXCEL 速度感觉也可以。优化一下会更好。基本上是比较完整的 EXCEL操作。
所有的数据保存在 一个文档内,每年一个标签。闲时自用。如果只用分布图可以删除掉简略数据。
$time=get-date
# EXCEL 对象
$excel = New-Object -ComObject Excel.Application
Start-Sleep -Seconds 2
$Excel.Visible = $true
if ( [Io.File]::Exists("$Pwd\Lottery.xlsx") ) { $Exist = $true } else { $Exist =$Null }
if ( $Exist ) {
$workbook = $excel.Workbooks.Open("$Pwd\Lottery.xlsx")
$NewestName = $workbook.worksheets.Item(1).name
$workbook.worksheets.Application.DisplayAlerts = $false
$workbook.worksheets.Item(1).delete() #删除最近一年 追加下载
$sheet = $workbook.worksheets.add()
$sheet = $workbook.worksheets.Item(1)
} else {
$NewestName = 2003 #找不到数据则全新下载
$workbook = $excel.Workbooks.add()
$sheet = $workbook.worksheets.Item(1)
}
$borderWeight = "microsoft.office.interop.excel.xlBorderWeight" -as [type]
Function ExcelMerge {
Param ( [Array]$CSVData, [string]$year )
$workbook.worksheets.Item(1).Name = $year
$sheet = $workbook.worksheets.Item($year)
#全局
$sheet.Rows.HorizontalAlignment = 3
$sheet.Rows.VerticalAlignment = 2
$sheet.Columns.RowHeight = 16
$RangeHeight = $sheet.Range("A1:B1")
$RangeHeight.RowHeight = 24
#合并单元格
$CELL1 = $sheet.Cells.item(1,3)
$CELL2 = $sheet.Cells.item(1,35)
$CELLRange = $sheet.Range( $CELL1 , $CELL2 )
$CELLRange.Merge()
$CELL1 = $sheet.Cells.item(1,36)
$CELL2 = $sheet.Cells.item(1,51)
$CELLRange = $sheet.Range( $CELL1 , $CELL2 )
$CELLRange.Merge()
#标题
$CELL1 = $sheet.Cells.item(1,1)
$CELL2 = $sheet.Cells.item(1,51)
$CELLRange = $sheet.Range( $CELL1 , $CELL2 )
$CELLRange.Borders.Weight = $borderWeight::xlThin
$CELLRange.Font.bold = $true
$CELLRange.Font.Size = 13
$CELLRange.Font.ColorIndex = 1
$sheet.cells.item(1,1) = '期号'
$sheet.cells.item(1,2) = '开奖日期'
$sheet.cells.item(1,3) = '红球'
$sheet.cells.item(1,36) = '兰球'
#红球
$CELL1 = $sheet.Cells.item(2,3)
$CELL2 = $sheet.Cells.item($($CSVData.Count + 1),35)
$CELLRange = $sheet.Range( $CELL1 , $CELL2 )
$CELLRange.Borders.Weight=$borderWeight::xlHairLine
$CELLRange.Font.ColorIndex = 3
$CELLRange.Font.Size = 12
$CELLRange.Columns.ColumnWidth = 2.5
#兰球
$CELL1 = $sheet.Cells.item(2,36)
$CELL2 = $sheet.Cells.item($($CSVData.Count + 1),51)
$CELLRange = $sheet.Range( $CELL1 , $CELL2 )
$CELLRange.Borders.Weight=$borderWeight::xlHairLine
$CELLRange.Font.ColorIndex = 5
$CELLRange.Font.Size = 12
$CELLRange.Font.bold = $true
$CELLRange.Columns.ColumnWidth = 2.5
#日期
$CELL1 = $sheet.Cells.item(2,1)
$CELL2 = $sheet.Cells.item($($CSVData.Count + 1),2)
$CELLRange = $sheet.Range( $CELL1 , $CELL2 )
$CELLRange.Borders.Weight = $borderWeight::xlThin
$CELLRange.Font.Size = 12
$CELLRange.Font.ColorIndex = 1
$RangeWidth = $sheet.Range("A1:A2")
$RangeWidth.EntireColumn.ColumnWidth = 8
$RangeWidth = $sheet.Range("B1:B2")
$RangeWidth.EntireColumn.ColumnWidth = 11
#处理开奖数据
$StartLine = 2
for ( $line =0; $line -lt $CSVData.Count; $line++ ) {
$EachCell = @($CSVData[$line].Split(','))
for ( $v=1; $v -le $EachCell.Count; $v++ ) { $sheet.Cells.Item($StartLine,$v) = $EachCell[$($v -1)] }
$StartLine++
}
if ( [int]$year -ne 2019 ) { $sheet = $workbook.worksheets.add() }
}
# 提取开奖数据
Function DownLotteryData {
Param ( [Array]$DATA, [string]$year, [int]$n )
$Ball=$RedBlue=$arr=$array= New-Object "System.Collections.ArrayList"
(1..52) |ForEach-Object {$Ball += 'A'} #期号+日期+33+16
for ( $i = 0; $i -lt $DATA.count; $i++ ) {
if ( $n -le 8 ) {
$arr += $DATA[$i] #开奖数据
if ($n -le 2) { $num = $n } else { $num = [int](([string]($DATA[$i])).TrimStart('0')) + 2 } #分布图
$Ball[$num] = $DATA[$i] #分布图
$n++
} else {
$arr += $DATA[$i] #开奖数据
$num = [int](([string]($DATA[$i])).TrimStart('0')) + 35 #分布图
$Ball[$num] = $DATA[$i] #分布图
$RedBlue += ,$Ball #33+16分布图
$array += ,$arr #开奖数据
$arr = $Ball = New-Object "System.Collections.ArrayList" #开奖数据 分布图
(1..52) |ForEach-Object {$Ball += 'A'} #33+16+日期
$n = 1
}
}
$content = $AllBall = New-Object "System.Collections.ArrayList"
for ( $i = 0; $i -lt ( $DATA.count /9 ); $i++ ) {
$Content += ( $array[$i] -join ' ' ).Replace('.','/') #开奖数据
$AllBall += ( $RedBlue[$i] -join ',' ).Replace('.','_').Replace('A',' ') -replace '^(\s+)?,?','' #33+16分布图
}
$Content | Set-Content .\开奖数据\简略数据\$year.txt -enc Default -force #开奖数据
ExcelMerge $AllBall $year
}
#下载数据
new-item .\开奖数据\简略数据 -type Directory -force
for ( $year = [int]$NewestName; $year -le 2019; $year++ ) {
$url = "https://kjh.55128.cn/ssq-history-$year.htm"
Invoke-WebRequest -uri $url -Outfile "$env:temp\Downh.log"
$PageData = (( Get-Content "$env:temp\Downh.log" -ReadCount 0 -enc utf8 ) `
-match '^(\s+)?<td>(\d{7}|\d{4}(\.\d{2}){2})</td>$|<li(\s+)?class.*>\d{2}</li>$' ) `
-replace "(\s+)?</?(td)?(li)?((\s+)class='ball.*-24')?>",''
DownLotteryData $PageData $year 1
}
Remove-Item "$env:temp\Downh.log" -force
if ( $Exist ) { $workbook.Save() } else { $workbook.SaveAs("$Pwd\Lottery.xlsx") }
$workbook.Close()
$Excel.quit()
$excel = $null
[GC]::Collect()
([datetime]::Now -$time).totalmilliseconds
pauseCOPY
1

评分人数

TOP

本帖最后由 523066680 于 2019-1-9 22:16 编辑

写过抓双色球往期结果的,多线程,以前写的丑(发的代码不完整,好像还有另一个脚本负责导出EXCEL,一时没找着)
use Encode;
use Modern::Perl;
use Time::HiRes qw/time sleep/;
use threads;
use threads::shared;
use Try::Tiny;
use Mojo::UserAgent;
use File::Basename;
use File::Path qw/make_path/;
use File::Slurp;
STDOUT->autoflush(1);
our $workdir = "D:\\Temp\\Double_Ball_Lottery";
make_path $workdir unless -e $workdir;
#chdir $workdir;
our $ua;
our $main = "http://kaijiang.500.com";
our @links :shared;
our @ths;
$ua = Mojo::UserAgent->new();
$ua = $ua->max_redirects(5);
print "Getting Links ... ";
get_links( \@links );
say "Done";
#创建线程
grep { push @ths, threads->create( \&thread_func, $_ ) } ( 0 .. 3 );
#等待运行结束
while ( threads->list(threads::running) ) { sleep 0.2 };
#线程分离/结束
grep { $_->detach() } threads->list(threads::all);
quit();
sub thread_func
{
    our (@links, $workdir);
    my ( $id ) = @_;
    my $ua = Mojo::UserAgent->new();
    my ($link, $file, $res, $times);
    while ( $#links > 0 )
    {
        $link = shift @links;
        $file = $workdir ."\\". basename($link);
        
        if ( -e $file ) { say "$id - $link already exists"; next };
        say "$id - $link";
        $times = 0;
        while (1)
        {
            try { $res = $ua->get($link)->result  }
            catch
            {
                printf "[%d] getting %s, retry: %d\n", $id, basename($link), $times++;
                sleep 3.0;
            };
            last if ( defined $res and $res->is_success );
            return if ( $times > 10 );
        }
        write_file( $workdir ."\\". basename($link), $res->body );
    }
}
sub get_links
{
    my ($aref) = @_;
    my $html = read_file("simple.htm");
    my $dom = Mojo::DOM->new( $html );
    for my $e ( $dom->at(".iSelectList")->find("a")->each )
    {
        push @$aref, $e->attr("href");
    }
}
sub quit { system("pause"); }COPY
原来是18年2月写的,之前抓了03年到18年1月的html,导出excel(没显示日期,就是搞个分布图),
https://share.weiyun.com/5V1MvmE
[url=][/url]

TOP

返回列表