Open edX devstack and Ocim sandbox issues

@tikr, unfortunately I don’t have too much free FF capacity at the moment. I can take a look at this on Friday if nobody else has time before then.

1 Like

@tikr Since we have a public ML where the build logs are announced now, would it make sense to post messages like this there, with the firefighters in CC, so that people outside of OpenCraft could know about the detected (and manually confirmed) breakage? I was discussing this with Ned during the contributors meetup from last week, and he mentioned he would need help to confirm the build errors that arrive there, and help him (and other people in the community) understand which errors to investigate. This way it doesn’t have to be just us investigating, and other community members with the same issue could also react if they see a similar issue on their side.

1 Like

@tikr I suspect it has to do with: https://github.com/edx/cs_comments_service/pull/327. I’ll investigate it further tomorrow.

2 Likes

@antoviaque Sure, posting these updates to the public ML sounds good :slight_smile: How can I join it?

A few additional notes/questions:

  • To make sure that everyone on the team is in the loop about these breakages, would it make sense to still post them here as well (in the form of a link to the corresponding ML posts, perhaps)?
  • Re: helping Ned confirm build errors and understand which errors to investigate: What info would be good to include in the ML posts to address this? (The Trello card that you linked to didn’t mention this and I didn’t get a chance to watch the recording of the meeting this week.)
  • Once we’re clear on the items above, I’ll have to adjust the changes from https://gitlab.com/opencraft/documentation/public/-/merge_requests/174 so that they match the new process.

Thanks @usman for investigating this one :raised_hands:

When there’s a task for this incident please post it here…

@tikr See:

Sure - maybe just link to the public place internally then, asking to comment/collaborate with everyone on the public thread rather than just between us internally.

The best would likely to ask Ned - though there is already a PR to get a lot of that info directly in the automated email. You can probably start with what you would want to have to be able to access to debug/reproduce the issue.

1 Like

A note that there is a PR being worked on by Diana to resolve the issue which is causing the periodic master builds to fail: https://github.com/edx/configuration/pull/6093.

Tracking this internally in SE-3586.

A new error in periodic builds: https://manage.opencraft.com/instance/17120/edx-appserver/15241/

2020-11-09 08:36:39+0200INFO
TASK [nginx : Write out htpasswd file] *********************************************
2020-11-09 08:36:39+0200INFO
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: No module named 'passlib'
2020-11-09 08:36:39+0200ERROR
away and you might need to add |bool to the expression in the future. Also see

Passlib is mentioned somewhere in our playbooks, and it may be where we set the HTTP basic auth password for the LB←→edxapp communication.
Can the firefighters take a look and create a task if needed? @mtyaka @guruprasad @jill @giovannicimolin

@daniel Thanks! Could you do this though:

We should work on those issues with the community and edX, ie publicly, so we can eventually share some of that maintenance burden with them.

1 Like

This issue may not be in Open edX but in our playbooks. I’m not sure and can’t debug it right now, so I’ll let a firefighter decide it and create a task upstream (maybe a CRI- task… or a forum post…).

Yep I’ll take look.

1 Like

If that’s the case, that’s something to simply mention, that it might be an OpenCraft-specific issue - but it doesn’t remove the need to work publicly on this.


To all – please don’t post any more messages or discuss directly in this thread - instead, use the public mailing list, replying on the thread of the failure report if you’re posting about a specific failure, and only link to important messages here for awareness.


2 Likes

Understood.

There isn’t a message in the mailing list yet for this periodic master build failure because according to the logs, it was initiated by an OpenCraft team member? So I’ve temporarily modified the periodic build instance’s build frequency to automatically trigger another build soon, and will comment on the mailing list when the failure is posted there.

Ticket is SE-3613.

1 Like

Ocim PR sandboxes are failing to provision, e.g. PR#25965: [BD-04], PR#25955: [BD-04], PR#23109: SE-2010.

 2021-01-13 14:25:54+1030 INFO
TASK [supervisor : Install supervisor in its venv] 
**************************************************************************************************************************************************************
2021-01-13 14:25:54+1030 INFO
fatal: [149.202.183.65]: FAILED! => {"changed": false, "msg": "Could not get output from /usr/local/bin/virtualenv --help: AttributeError: module 'os' has no attribute 'PathLike'\n"}

It feels like an ansible/python version incompatibility… SO ref. I’m trying to resolve it by making my sandbox config look like the working PR#25978: [SE-3440] which used the latest watched fork confg. Specifically, I’ve updated:

  • Ansible appserver version: ansible2.8.17 (was master)
  • Openstack server base image: {"name_or_id": "focal-20.04-unmodified"} (was {"name_or_id":"xenial-16.04-unmodified"})
  • Removed the pinned FORUM_VERSION variables, so using latest master.

These changes worked, along with configuration#6244, which @sid recently merged to master. :tada:

2 Likes

I neglected to remove this from the watched fork config, so new sandboxes were still failing. Fixed now.

Since edx-platform#27757 merged last month to enable the Courseware MFE by default in master, the jump_to links used to jump to specific units used by the Course Outline and Studio course and unit pages have stopped working on Ocim master sandboxes.

To workaround this issue, you can disable the Courseware MFE using a waffle flag:

  1. Login as a superuser to the Django Admin.
  2. Locate the Waffle > Flags section, and add a new flag.
  3. Set name = courseware.use_legacy_frontend, and select Everyone: Yes before saving.

This feature is off by default in lilac, and the BTR working group are working on getting it working in Tutor for Maple, cf build-test-release-wg/issues/81.

2 Likes

Open edX master now enables these MFEs by default:

So to keep our PR sandboxes working, I’ve added the following config to Ocim’s open-craft watched fork:

## MFEs
# Deploy MFEs to the same host as the LMS/Studio
MFE_DEPLOY_STANDALONE_NGINX: false
# Note: borrowing the existing `discovery` subdomain here, to avoid having to make a code change to add a new one for MFEs.
MFE_BASE: discovery.{{ EDXAPP_LMS_BASE }}
MFE_DEPLOY_NGINX_PORT: 80
MFE_DEPLOY_COMMON_HOSTNAME: '{{ MFE_BASE }}'
MFES:
  - name: profile
    repo: frontend-app-profile
    public_path: "/profile/"
  - name: gradebook
    repo: frontend-app-gradebook
    public_path: "/gradebook/"
  - name: account
    repo: frontend-app-account
    public_path: "/account/"
  - name: learning
    repo: frontend-app-learning
    public_path: "/learning/"
## edxapp Configurations
EDXAPP_SESSION_COOKIE_DOMAIN: ".{{ EDXAPP_LMS_BASE }}"
EDXAPP_CSRF_COOKIE_SECURE: true
EDXAPP_SESSION_COOKIE_SECURE: true
EDXAPP_ENABLE_CORS_HEADERS: true
EDXAPP_ENABLE_CROSS_DOMAIN_CSRF_COOKIE: true
EDXAPP_CROSS_DOMAIN_CSRF_COOKIE_DOMAIN: ".{{ EDXAPP_LMS_BASE }}"
EDXAPP_CROSS_DOMAIN_CSRF_COOKIE_NAME: "cross-domain-cookie-mfe"
EDXAPP_CORS_ORIGIN_WHITELIST:
  - "{{ EDXAPP_CMS_BASE }}"
  - "{{ MFE_BASE }}"
EDXAPP_CSRF_TRUSTED_ORIGINS:
  - "{{ MFE_BASE }}"
EDXAPP_LOGIN_REDIRECT_WHITELIST:
  - "{{ EDXAPP_CMS_BASE }}"
  - "{{ MFE_BASE }}"
EDXAPP_SITE_CONFIGURATION:
  - values:
      ENABLE_ORDER_HISTORY_MICROFRONTEND: "{{ SANDBOX_ENABLE_ECOMMERCE }}"
## MFE Links
EDXAPP_LMS_WRITABLE_GRADEBOOK_URL: 'https://{{ MFE_BASE}}/gradebook/'
EDXAPP_LEARNING_MICROFRONTEND_URL: 'https://{{ MFE_BASE }}/learning'
EDXAPP_PROFILE_MICROFRONTEND_URL: 'https://{{ MFE_BASE}}/profile/u'
EDXAPP_ACCOUNT_MICROFRONTEND_URL: 'https://{{ MFE_BASE}}/account'
EDXAPP_ORDER_HISTORY_MICROFRONTEND_URL: 'https://{{ MFE_BASE }}/ecommerce/orders'

Note that I’ve borrowed the discovery extra subdomain for the MFE root, but it would be better to add an “Extra custom domains” to Ocim for this purpose.

If you want to use the legacy courseware frontend, you can set this waffle flag in Django Admin (until it gets removed from the platform too): courseware.use_legacy_frontend.

And there’s a bunch of gradebook-related waffle flags too.

1 Like

It’s great to see the learning MFE getting enabled by default. I think it’s a big improvement over the old courseware frontend.

I added an app.* subdomain into the “Extra custom domains” field for my MFE sandbox. Since lilac already has some MFEs enabled by default, I think it makes sense to add a dedicated subdomain for MFEs into Ocim. I’ll add a task to the Ocim Tech Debt & Bugs epic to add an “app.*” subdomain to the list of automatically managed domains.

The MFE role from edx/configuration has a task for setting up waffle flags automatically. By default it currently only sets these two, but you can add more to the list to avoid having to deal with waffle flags manually.

Thanks for your quick reply @mtyaka ! But I think @anon46505572 already jumped on this for the Lilac upgrade, see FAL-2259.

1 Like

FYI due to breaking change 36df86d merged to master recently, we now have to prefix our EDXAPP_CORS_ORIGIN_WHITELIST hostnames with the URI scheme. I’ve updated the above config in Ocim’s open-craft watched fork to be:

EDXAPP_CORS_ORIGIN_WHITELIST:
  - "https://{{ EDXAPP_CMS_BASE }}"
  - "https://{{ MFE_BASE }}"

But you’ll need to do this manually on your PR sandboxes if you’re having trouble viewing the courseware or other MFE pages.

cf mattermost discussion.

2 Likes