No robots.txt file in my top level directory

kat · January 20, 2021, 9:28pm

PG always includes a robots.txt to added sub directories.

But I didn’t have one for my https://mcgraphics.us.

Here is my new https://mcgraphics.us/robots.html. Code look ok?
`User-agent: *
Allow: /

Sitemap: https://www.mcgraphics.us/sitemap.xml`

AllMediaLab · January 21, 2021, 10:24am

Hi @kat,

Your robot text should look like this:

User-agent: *
Disallow:
Sitemap: https://www.mcgraphics.us/sitemap.xml

Be sure that the robot text file ends with .txt ! And place it in the root of your site!
robots.txt

The sitemap.xml should be in the root as well! Sub directories don’t matter, you only use one sitemap.xml and one robots.txt the site map if well made directs Google to the proper pages anyway.
Notice that the sitemap is ending with .xml sitemap.xml

kat · January 21, 2021, 2:34pm

This is very helpful. Thanks for taking the time to answer.

Why Disallow: instead of Allow: /
Interesting 1 only each of sitemap & robots in root only needed.

AllMediaLab · January 21, 2021, 4:01pm

It basically says:

User-agent: * (all sniffing bots, browsers etc…)
Disallow: (Allow all files, because nothing is stated after it)
Sitemap: https://www.mcgraphics.us/sitemap.xml (from this sitemap.xml file)

Google sees the links to the pages in the sitemap.xml that’s why it’s there, no matter in what directory!

In my country we have websites with the 3 languages of the country, I make each language in it’s own folder with it’s own index and one index in the root as landings page. Only one sitemap.xml is needed and a robot.text leading to it.

You can test both files in your website with https://search.google.com/search-console where your site is indexed by you for Google search.

Had a look at your site and most importantly I would take care of all the HTML errors on the pages.

kat · January 21, 2021, 6:54pm

Disallow: means allow all, funny.
https://search.google.com/search-console won’t fetch my sitemap.xml. This is why I started checking my robots.txt.
A google FAQ says a sitemap isn’t needed for small sites, like mine.

Suggestion on how to find my HTML errors?

Thanks for you help and info on how you set up your multiple language site.

AllMediaLab · January 21, 2021, 7:20pm

To exclude all robots from the entire server

User-agent: *
Disallow: /

To allow all robots complete access

User-agent: *
Disallow:
Sitemap: https://www.mcgraphics.us/sitemap.xml

Of course you can use Allow, but my example does the same! It’s .txt code (xml) no normal English

For HTML testing in Pinegrow you can go to Page> Check for HTML errors with a web page openend.

Or install this plugin in the Chrome browser and click the small silver wheel right top browser and choose Validate HTML. Web Developer - Chrome Web Store

kat · January 22, 2021, 2:57am

T=[quote=“AllMediaLab, post:6, topic:4854”]
For HTML testing in Pinegrow you can go to Page> Check for HTML errors with a web page openend.

Or install this plugin in the Chrome browser and click the small silver wheel right top browser and choose Validate HTML. Web Developer - Chrome
[/quote]

This is great. I’m on it. Thanks again for your help.
Kat

Riccarcharias · January 22, 2021, 8:56am

This is new to me. As far as I know PG does not create or add a robots.txt itself?

Emmanuel · January 22, 2021, 9:46am

True, Pinegrow does not create any robots.txt

Riccarcharias · January 22, 2021, 10:20am

But I think that would be a great new feature. Anyone who wants to create this (and the sitemap) itself should be able to du this. But it would save a lot of time if you could just say this sites you want to block and let PG do the rest.

schpengle · January 22, 2021, 7:53pm

Great!!
Thanks, I had no idea, that the FF dev tools were kicking about.
Fab!

I now use the Brave browser and have added it to that.
This browser is pretty cool.
A sort of tightened up, security version of Chrome.
Ive been using it for months now. I recommend it and this is first extension Cheers @AllMediaLab

Topic		Replies	Views
Add Sitemap xml in Pinegrow PRO Feature Request	4	1359	January 4, 2021
Files organization: root files vs folders structure? General	4	54	January 20, 2025
Sitemap.xml not fetching Beginners	4	452	November 20, 2020
SEO and crawling/indexing of PG site Beginners	2	57	November 1, 2024
Uploaded Site list of files Beginners	2	229	December 13, 2022

No robots.txt file in my top level directory

To exclude all robots from the entire server

To allow all robots complete access

Related topics