Quantcast
Channel: PowerShell.com – PowerShell Scripts, Tips, Forums, and Resources: Active Threads
Viewing all articles
Browse latest Browse all 8411

Overwhelmed. Can someone help me with this?

$
0
0

I downloaded PowerShell 4 (for Windows 7) yesterday and spent a few hours using Google to try to learn from what other people have done. I got a script working to list a set of URLs that had a certain word on the page, so I thought it wouldn't be too hard to get a script working to gather URLs from a set of search results.

Unfortunately, I'm totally confused. I've seen a few different functions used for similar purposes, and can't make sense of where to start.

Here's what I think I understand, based off of the scripts I've seen...

  • I probably need to use [ $browserAgent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36' ].

My understanding is that doing so would make it so that the pages display to PS the way they would to Firefox. If I just use [ Invoke-WebRequest -URI $Site ] from PowerShell, the number of links is smaller than what shows up if I browse the site in Firefox.

  • I need a function that adds a page number to the end of a base url ( e.g. [ ($num=1; $num -le 750; $num++) ] for a search that has 748 pages of results ).

I haven't seen a way to get all results from a search as if the results weren't split into pages.

  • I need to send the urls individually to an $outarray so that I can use [ $outarray += $i + "`r`n" ] in my script with [ $outarray | Out-File D:\powershellurlsresults.txt -width 220 ] to save the url list.

$i would be the urls collected from each page, each gathered one-by-one as part of a loop that runs per-page.

Ideally, there would be a way to grab links that have an <a title="> that includes the word "submission(s)". I know that [ if ($output -like "*submission(s)*") ] can be used to operate on pages that contain a certain word, so I'm hoping there's a way to grab a link by html title.

For the purpose of this script, maybe [ { $_.href -like '*submissions*' } ]  would do well enough, since each link to be collected ends in "page=submissions" ( e.g. http://www.thesite.com/content/grouppage.php?uid=49281203&page=submissions ).

 

 

If I start with the base...

# PowerShell Invoke-WebRequest Search Example
$Site = "http://search.thesite.com/search.php?type=long&content=submissions&page=1"
$Test = Invoke-WebRequest -URI $Site
$Test.Links | Foreach {$_.href }

I can get a list of urls, but it converts & to [ &amp; ]. I need to stringreplace fix that.

 

Here's my current scratchpad:

$url = "$urlstart$num"

$urlstart = "http://search.thesite.com/search.php?type=long&content=submissions&page="

$num ="1 -le 750; $num++"

$browserAgent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36'

$page = Invoke-WebRequest -Uri $url -UserAgent $browserAgent

$page.Links

| Where-Object { $_.href -like '*submissions*' }

| ForEach-Object { $_.href }

$outarray += $page.links + "`r`n"

$outarray | Out-File D:\powershellurlsresults.txt -width 220

 

For all I know, I'm completely off-base with the script so far. Can anyone help me explain what's wrong or what I need to add, and why the changes need to be made?

 

I could probably modify my other working script to accomplish what I'm trying to, but I'd like to get this one working. ( The other script takes a list of URLs in a text file, scans each URL for a word, and returns all URLs that contain that word. I could probably modify it so that it extracts urls from each page if the urls include "submission" in the url. However, it seems silly to generate several hundred urls in a text file if there's a way to automate grabbing urls from every page of a search. )


Viewing all articles
Browse latest Browse all 8411

Latest Images

Trending Articles



Latest Images