Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI statuses not posting #1045

Closed
maclover7 opened this issue Dec 14, 2017 · 21 comments
Closed

CI statuses not posting #1045

maclover7 opened this issue Dec 14, 2017 · 21 comments

Comments

@maclover7
Copy link
Contributor

CI statuses are currently not posting properly via the post-build-status-update job. It looks like something is up with github-bot, as the jobs are now throwing an error. I tried to pull the logs the usual way (curl-ing them down), but I'm getting an operation timed out error. My next best guess would be to directly SSH to the server and view the logs there, but I don't think I have access to it to view.

Example of build failing: https://ci.nodejs.org/job/post-build-status-update/78609/console

cc @phillipj

@maclover7 maclover7 added this to untriaged in action items via automation Dec 14, 2017
@maclover7 maclover7 moved this from untriaged to Incidents in action items Dec 14, 2017
@maclover7
Copy link
Contributor Author

Appears as though entire github bot is now down, not responding to the /ping endpoint...

@phillipj
Copy link
Member

➜ ping 23.253.100.79
PING 23.253.100.79 (23.253.100.79): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
...

Looks like the server is unavailable. Rackspace's status says everything should be okey, although a "cloud servers degradation" happened yesterday; https://rackspace.service-now.com/system_status/

@nodejs/build anyone have a moment to help debug what's up here?

Refs ansible/inventory.yml#L22

@mhdawson
Copy link
Member

Hmm, the console says it is suspended.

@mhdawson
Copy link
Member

Looking for button to "un-suspend"

@mhdawson
Copy link
Member

mhdawson commented Dec 14, 2017

All of the options to reboot etc are grey'd out. Googling shows:

If the server becomes unresponsive due to heavy swapping activity, action will need to be done as the server may be placed in a suspended state by Rackspace due to the detrimental affect the server operations has on other customers which are located on the same host.

@mhdawson
Copy link
Member

Seems more serious that that. When going to the support section I see that there is email asking for info, otherwise our machines will be suspended. Looks like all of our machines are suspended on rackspace.

@mhdawson
Copy link
Member

Responded on the ticket asking for info, asking that they re-instate the machines. @rvagg if you have personal contacts at rackspace may also be good to reach out to them,

@mhdawson
Copy link
Member

On the phone with rackspace support to see if we can expedite.

@mhdawson
Copy link
Member

Talked to rep, they are going to follow up with account managers and give me a call back.

@mhdawson
Copy link
Member

Still waiting for a callback, I have the community committee meeting now, but if I don't hear back after that I'll try calling them again.

@mhdawson
Copy link
Member

@rvagg in case you come on line and have any direct contacts at Rackspace.

@mhdawson
Copy link
Member

Looks like all but 2 of our machines are back online. "
iojs-dfw-win2008r2-release-1" and "
iojs-iad-win2008r2-release-2" are still paused. They did mention it might take 30 mins which has not yet passed.

While they are back online, seems like we'll need to continue discussion with them as per:

'We are still evaluating the status of this account internally and will update you as soon as possible.'

@maclover7
Copy link
Contributor Author

@mhdawson Thank you for talking with Rackspace and starting the server back up again! Is there any kind of email notification that could be enabled in case the servers get suspended/shutoff in the future? If volunteers are needed, I volunteer to be on the alias.

@mhdawson
Copy link
Member

I think the problem was that the email on our account was for an employee at rackspace. I've added a new user for which the email is build@iojs.org so that those with access to the rackspace UI should now get emails. It would be good to discuss in the next build meeting if it makes sense to extend notifications beyond that group.

@mhdawson
Copy link
Member

We now show as "Pending Customer" so looks to me like evaluation of status is complete and looking good :)

@phillipj
Copy link
Member

@maclover7 this is resolved now, right? curl-ing the bot logs works and the log contents looks like it's running as expected.

@maclover7
Copy link
Contributor Author

@phillipj yep this is resolved, keeping open to try and discuss the rackspace issues during the next working group meeting

@rvagg
Copy link
Member

rvagg commented Dec 19, 2017

FYI one of the problems with Rackspace is that most Node people that have had some involvement have moved on. They were one of our first infra donors and got involved in the node-forward efforts (pre-dating io.js) but they key individual that did that moved on. Now we're dangling without a champion which is a concern. I've reached out to a remaining individual there that should be able to give us a more firm standing and a bit more assurance that we're going to have their support for the long-term but haven't heard back yet.

We still have two machines locked in a paused state that we can't override (Michael did ask them about this but we haven't got a response yet). They are release machines, used only for Node v4 I think. I took a snapshot of one and made a new server based on the image and it's back in ci-release so we have coverage there. If I can't get the other out of this paused state soon then I'll do the same for it too.

We do use Rackspace infra pretty heavily, they're easily on par with DigitalOcean in terms of the spend we'd have with them if we were paying, mainly because we get Windows there. We don't have clear guidance from them on how much we can use but we should try and be as sensitive to spend as we can be.

@maclover7
Copy link
Contributor Author

maclover7 commented Dec 19, 2017

@rvagg Ah, ok -- thank you for the additional information. Would you be willing to open up a new issue with the concerns about Rackspace / managing that relationship? It sounds like that's more what would need to be on the agenda.

EDIT: Maybe the nodejs/build-private repo is a better spot for this, if there are sensitive details?

@rvagg
Copy link
Member

rvagg commented Dec 19, 2017

yeah, build-private it is, tbh I'd forgotten that existed .. https://github.com/nodejs/build-private/issues/8

@mhdawson
Copy link
Member

mhdawson commented Feb 6, 2018

Believe last thing is to make sure we have individual emails. Another issue is open to cover that and @maclover7 will add link, closing.

@mhdawson mhdawson closed this as completed Feb 6, 2018
action items automation moved this from Incidents to done Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants