Scraping 101 - Learning the basics
2014-05-04, 07:47 PM,
#1
I've decided to finally open a thread regarding this topic and this topic alone. Usually when I start asking about this in already ongoing discussions I never get the answers I hope for.
In my opinion there isn't nearly enough information about URL scraping in general to be well informed and make a rational decision. I've searched through this forum, GSA forum and a lot of others, but couldn't really get to the bottom of this.



1. What are the minimum requirements for (proper) scraping?
- Proxies (Private/Public/How Many/..)?
- Internet Connection Speed requirements(Home Line/VPS/Dedicated/..)?

2. What are the major differences between GScraper and Scrapebox?
- Different minimum amount of proxies for each tool?
- Better results with GS/SB?
- Proxy consumption - I've read that GScraper burns through proxies like there's no
tomorrow, while SB seems to treat them much less aggressively?
- Is one more suited for beginners/low-medium scraping needs than the other?


These are the main questions that I'd like answers for. There are a few more, but they're rather follow up questions


IMPORTANT
Please, I appreciate it if you're willing to help, but if you're a beginner/novice or someone who doesn't absolutely know what he's talking about, then please refrain from posting. I've read enough confusing threads and posts, I don't need to read more of this.
However, if you're (very) experienced and know your way around scraping then please do post.


Thanks in advance guys.
Reply
2014-05-04, 07:55 PM,
#2
If you are a noob and are planning to just get some 20 - 50k links then instead buy as it will turn out cheap.
Or if you are going for 100k+ Scrapping then use Scrapebox with Free HMA Proxies.
Reply
2014-05-04, 08:03 PM,
(This post was last modified: 2014-05-04, 08:05 PM by Launder.)
#3
(2014-05-04, 07:47 PM)tixpf Wrote: I've decided to finally open a thread regarding this topic and this topic alone. Usually when I start asking about this in already ongoing discussions I never get the answers I hope for.
In my opinion there isn't nearly enough information about URL scraping in general to be well informed and make a rational decision. I've searched through this forum, GSA forum and a lot of others, but couldn't really get to the bottom of this.



1. What are the minimum requirements for (proper) scraping?
- Proxies (Private/Public/How Many/..)?
- Internet Connection Speed requirements(Home Line/VPS/Dedicated/..)?

2. What are the major differences between GScraper and Scrapebox?
- Different minimum amount of proxies for each tool?
- Better results with GS/SB?
- Proxy consumption - I've read that GScraper burns through proxies like there's no
tomorrow, while SB seems to treat them much less aggressively?
- Is one more suited for beginners/low-medium scraping needs than the other?


These are the main questions that I'd like answers for. There are a few more, but they're rather follow up questions


IMPORTANT
Please, I appreciate it if you're willing to help, but if you're a beginner/novice or someone who doesn't absolutely know what he's talking about, then please refrain from posting. I've read enough confusing threads and posts, I don't need to read more of this.
However, if you're (very) experienced and know your way around scraping then please do post.


Thanks in advance guys.

1. It depends on your current setup. If you have enough Private Proxies (100+) then you can scrape with them. The thing with Private Proxies is that you do not want them to get google blocked so you set the threads at 1 thread:10 proxies to keep them safe. With this little thread, expect that you are going to be scraping at a snail pace. However, you can scrape 24/7 with private proxies as they are reliable.

Public proxies are great if you have large amounts of them. Think of 300 Google Passed Public Proxies at a single time. This would allow you to crank up the threads to more than a thousand. Unless you have a good supplier for them, you won't get far. You need port scanned proxies as these are the public proxies which you won't be able scrape from the internet which makes them alive for longer periods of time. Gscraper proxies were great until everyone subscribed to them.

For the connection, It would depend on the amount of threads that you're going to be using. However, VPS are cheap nowadays, it won't be difficult or expensive to buy one just for scraping.

2. Gscraper would only scrape from google while scrapebox can scrape from multiple search engines. Gscraper is faster at scraping that scrapebox and does not have the limit of 1,000,000 URLs unlike scrapebox has. However, every SEO player should have scrapebox because of all its add-ons. It's your swiss army knife. Get them both if you can, if you only need the scraper go for gscraper. If you think you've outgrown gscraper, go for hrefer which is a lot more faster than gscraper. If you still need more power, there are other ways to scrape but that would be a trade secret.

For a quick tutorial on scraping, search for sertips.com. There should be a tutorial there for scraping. I commend you for planning to scrape as that is a whole new level different from just buying lists that are already made.
Reply
2014-05-04, 11:04 PM,
#4
(2014-05-04, 07:55 PM)Crusader Wrote: If you are a noob and are planning to just get some 20 - 50k links then instead buy as it will turn out cheap.
Or if you are going for 100k+ Scrapping then use Scrapebox with Free HMA Proxies.

What exactly are free HMA proxies?

(2014-05-04, 08:03 PM)Launder Wrote:
(2014-05-04, 07:47 PM)tixpf Wrote: I've decided to finally open a thread regarding this topic and this topic alone. Usually when I start asking about this in already ongoing discussions I never get the answers I hope for.
In my opinion there isn't nearly enough information about URL scraping in general to be well informed and make a rational decision. I've searched through this forum, GSA forum and a lot of others, but couldn't really get to the bottom of this.



1. What are the minimum requirements for (proper) scraping?
- Proxies (Private/Public/How Many/..)?
- Internet Connection Speed requirements(Home Line/VPS/Dedicated/..)?

2. What are the major differences between GScraper and Scrapebox?
- Different minimum amount of proxies for each tool?
- Better results with GS/SB?
- Proxy consumption - I've read that GScraper burns through proxies like there's no
tomorrow, while SB seems to treat them much less aggressively?
- Is one more suited for beginners/low-medium scraping needs than the other?


These are the main questions that I'd like answers for. There are a few more, but they're rather follow up questions


IMPORTANT
Please, I appreciate it if you're willing to help, but if you're a beginner/novice or someone who doesn't absolutely know what he's talking about, then please refrain from posting. I've read enough confusing threads and posts, I don't need to read more of this.
However, if you're (very) experienced and know your way around scraping then please do post.


Thanks in advance guys.

1. It depends on your current setup. If you have enough Private Proxies (100+) then you can scrape with them. The thing with Private Proxies is that you do not want them to get google blocked so you set the threads at 1 thread:10 proxies to keep them safe. With this little thread, expect that you are going to be scraping at a snail pace. However, you can scrape 24/7 with private proxies as they are reliable.

Public proxies are great if you have large amounts of them. Think of 300 Google Passed Public Proxies at a single time. This would allow you to crank up the threads to more than a thousand. Unless you have a good supplier for them, you won't get far. You need port scanned proxies as these are the public proxies which you won't be able scrape from the internet which makes them alive for longer periods of time. Gscraper proxies were great until everyone subscribed to them.

For the connection, It would depend on the amount of threads that you're going to be using. However, VPS are cheap nowadays, it won't be difficult or expensive to buy one just for scraping.

2. Gscraper would only scrape from google while scrapebox can scrape from multiple search engines. Gscraper is faster at scraping that scrapebox and does not have the limit of 1,000,000 URLs unlike scrapebox has. However, every SEO player should have scrapebox because of all its add-ons. It's your swiss army knife. Get them both if you can, if you only need the scraper go for gscraper. If you think you've outgrown gscraper, go for hrefer which is a lot more faster than gscraper. If you still need more power, there are other ways to scrape but that would be a trade secret.

For a quick tutorial on scraping, search for sertips.com. There should be a tutorial there for scraping. I commend you for planning to scrape as that is a whole new level different from just buying lists that are already made.

That was very helpful.
So in summary you could say that GScraper is the better scraper, but only scrapes URLs from Google. Scrapebox may be slower, but it's more of an allrounder. This is pretty much what I expected.

Right now I only have 30 semi-dedicated proxies and they're used for submissions in GSA SER. Upgrading to 100 proxies is just not possible with my current budget.
Do you think there's a way to make it work with just 30 private proxies? If not, are there ways to scrape with public proxies? If so, where would I get somewhat reliable anon public proxies from?

And thanks for the website. I'll definitely look into the tutorial.
Reply
2014-05-04, 11:18 PM,
#5
I Scrapped 23k Forum links from google and Not even one is added to GSA.
What should i do ?
Reply


Possibly Related Threads...
Thread Author Replies Views Last Post
  FOLLOWLIKER scraping hasorand0m 5 282 2017-08-09, 06:25 AM
Last Post: hasorand0m
  Is there a beginner's guide anywhere here that details about the basics of CPA? Visualize 6 313 2017-01-11, 01:52 PM
Last Post: Xaos
  will i be wasting time learning about BH now and should focus on WH niches instead??? readmylip 13 1,628 2016-12-17, 10:29 AM
Last Post: smom
  Learning MorganYole 1 291 2016-10-02, 12:09 AM
Last Post: ronjeremy
  Potential Scraping Service? Currency 0 241 2016-08-30, 11:57 PM
Last Post: Currency




About Us | Contact Us | CPA Elites | Advertise | Stats | Staff Team

© 2013-2017 CPA Elites Ltd
Enhanced by MyBB and WallBB
Return to top