Skip to content

Helmet not working with Facebook scraper. #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sampurcell93 opened this issue Aug 17, 2015 · 14 comments
Closed

Helmet not working with Facebook scraper. #26

sampurcell93 opened this issue Aug 17, 2015 · 14 comments

Comments

@sampurcell93
Copy link

Hi there, I've just started using Helmet to implement meta tags, so apps like Facebook and Twitter can get some useful information about my page and put it into previews. However, I notice that there is a delay between when I render a page, and when the meta tags/title get set on the page. This seems to be causing Facebook and Twitter to miss the tags, as they are not present at the instant of page load. I'm wondering if I am doing something wrong, or there is a fix I am not aware of.

I am currently using the recommended server-side technique - IE

str = React.renderToString(componentwithhelmet);
Helmet.rewind();
res.send(str)

I'm using express. And I have confirmed that when I hard code in the description (that is, it is helmet-independent), FB picks up the data.

I really like the package, but if it doesn't take care of this kind of thing, I am afraid I'll have to roll my own architecture for tags like these. Any insight would be appreciated!

Thanks,
Sam

@potench
Copy link
Contributor

potench commented Aug 22, 2015

When you "ViewSource" on your app, you should see the og:meta tags in the source. In this case str should contain the og:meta tags; can you check and see what str contains? Does it contain any og:meta data or perhaps the wrong og:meta data?

@sampurcell93
Copy link
Author

It actually does not contain any at render time. It takes a second for the tags to be injected - I know this, because I see the Tab title flicker from the default to the one set by Helmet after a split second.

@doctyper
Copy link
Contributor

@sampurcell93 This may be an implementation issue. We use Helmet in production and Helmet does successfully pre-render data (view source).

In your example, you are calling Helmet.rewind() but are not ingesting its payload. Note the example, the call returns the stringified payload necessary to prerender your data:

React.renderToString(<Handler />);
let head = Helmet.rewind();
// head = { title, meta, link }

Which you can then send to the server (pseudo-code):

res.send(`
<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8" />
        ${head.meta}
        ${head.link}
        <title>${head.title}</title>
    </head>
    <body>
    </body>
</html>
`);

@PaulieScanlon
Copy link

Hi, i'm also seeing problems with Facebook scraper not picking up the og tags. I have for the moment just hard coded them on my index but my posts use helmet to inject the image and title. Is this an issue with Helmet or React?

site in question is here: pauliescanlon.io it's the posts page after you view a portfolio item that's my current problem.

@rus-yurchenko
Copy link

+1

@PaulieScanlon
Copy link

@RuslanYurchenko The issue is not with Helmet. Crawlers can only read what’s in the meta tag that’s hard coded. Using helmet to inject this data at run time doesn’t mean it’s crawlable... unless server side rendered.

If like me you’re doing this on the client it won’t work. Shame but never mind ay!

@rus-yurchenko
Copy link

@PaulieScanlon yeah, I realized this a little later. Sorry!

@lipenco
Copy link

lipenco commented Oct 18, 2017

Is there any other solution I can consider to solve FB problem when using react-helmet on front-end.

@cjimmy
Copy link

cjimmy commented Oct 24, 2017

@lipenco
For those coming here from Google, it seems there are few options (after much research). This assumes you've built a client-side React app (with Create React App, for example), and social media crawlers (Open Graph, Twitter Cards) can't see your meta data because it's serving the pre-rendered index.html

Convert your app to server-side rendering
This is the most obvious solution but the most onerous. You won't be able to use client-side definitions like window in your js. If you're using React Router, you'll have to find a way to mirror the routes between server and client. If you're like me, you might be serverless, and running a server would be a lot more work. On the other hand, your page will likely load faster, and crawlers will see what your users would. This is a non-exhaustive list of tradeoffs.

Use a pre-rendering service
Prerender.io, Render-tron, and Prerender.cloud to name a few, give you a way to server-side render when the user-agent is a bot. Some CDNs like Netlify and Roast.io do this for you so you don't have to run your own server.
The downside to this is this is yet another service to pay for. EDIT: Netlify is free, and prerendering is one-click, albeit in beta.

Pre-render on your own
A couple of packages exist for rendering your React app statically. Graphcool's Prep, React-Snap, React-Snapshot were ones I've found that all essentially run a local server to render the site and download the html files. The files won't be pretty, but if all you're looking for is the <head> generated by React Helmet, this will do.

In the end, I ended up using react-snap to render static files, and it hardly changed my build workflow. This was sufficient for me!

Any others I'm missing?

Edit 07/20/18: I've recently started hosting on Netlify (which has an option to turn on prerendering) to remove react-snap. It was causing an unsightly flash of unstyled content when loading the page.

@riccardolardi
Copy link

riccardolardi commented May 2, 2018

Be sure to check the Facebook Crawler Docs & the debugger to get more insight on how it crawls the content. Helped me better understand what's going on: https://developers.facebook.com/docs/sharing/webmasters/crawler

@ChristiaanScheermeijer
Copy link

ChristiaanScheermeijer commented Mar 20, 2019

In our case, a GTM trigger caused an <iframe> to be inserted directly after the <html> tag. This made the Netlify prerender or Facebook crawler to move the body tag before the iframe. This caused all of our meta tags to be inside the body tag. Apparently, the Facebook crawler will ignore some of them.

This didn't scrape all og tags: 😭

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<body>
<iframe height="0" width="0" style="display: none; visibility: hidden;" src=""></iframe>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

After removing the trigger that added the iframe, the prerendered HTML was valid again. All og tags were being scraped again. 🚀

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

@ghost
Copy link

ghost commented Jun 6, 2019

In our case, a GTM trigger caused an <iframe> to be inserted directly after the <html> tag. This made the Netlify prerender or Facebook crawler to move the body tag before the iframe. This caused all of our meta tags to be inside the body tag. Apparently, the Facebook crawler will ignore some of them.

This didn't scrape all og tags: 😭

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<body>
<iframe height="0" width="0" style="display: none; visibility: hidden;" src=""></iframe>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

After removing the trigger that added the iframe, the prerendered HTML was valid again. All og tags were being scraped again. 🚀

<!DOCTYPE html>
<html lang="en" class="wf-montserrat-n5-inactive wf-montserrat-n7-inactive wf-averiasanslibre-n5-inactive wf-averiasanslibre-n7-inactive wf-sansserif-n4-inactive wf-inactive">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
...
</style>
<meta property="og:image" content="http://...." data-react-helmet="true">

@ChristiaanScheermeijer We are experiencing the same issue. Can I ask what the GTM trigger was and if you modified your implementation of it or just completely removed it? I can't see an option where I can just remove our page view events from happening as that is really the only thing being triggered. Any insight would be greatly appreciated.

@ChristiaanScheermeijer
Copy link

Hi @smschick,

At first, we've removed the trigger completely. However, since we really needed the trigger to work, we enabled it again and temporarily (until Netlify fixes the bug) removed the <DOCTYPE html> part.

But beware for some unpredictable side-effects while doing so. Especially in older browsers.

In our case, it was a Floodlight trigger. But it could be caused by any trigger who adds an iframe to the page.

@sophylee
Copy link

Hi, I can confirm that this is still an issue. Facebook developer tools only sees the following when it crawls our home page:

<!DOCTYPE html>
<html lang="en">
<head>
<script async src="...">
<title>Pledge Ukraine</title>
<script ...>
</head>
<body>
<noscript>You need to enable JavaScript to run this app.</noscript>
<div id="root"></div>
</body>
</html>

Even though opening up inspector in a browser shows all of the meta tags present. Our un-ideal workaround was to hard code our meta og and twitter tags into index.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants