Open edX devstack and Ocim sandbox issues

I’m confused… on the “periodic build master” instance, the last 615 builds have failed, and the most recent successful build was on 2019-12-05 (appserver 95).

I expected based on the above comment that recent instances with those settings ^ would be building successfully. Am I missing something?

I need to figure out what to change to deploy a new instance of LabXchange CI btw, which is why I’m asking.

That is the idea, yes. The ‘periodic build master’ instance doesn’t have those settings; that is intentionally running on master with vanilla “should work if master isn’t broken” config. See SE-1544 for context.

Oh I see, I was confusing the watched fork with the periodic build.

The problem with that is that once one bug is introduced on master, even if we find a workaround for it, we could have dozens of more bugs arrive on master that we’ll never know about, until edX fixes that bug or unless someone happens to open a master PR or deploy a master instance using our known workaround and hits the new bug. So perhaps we also need a “known working master” watched fork with settings that include our temporary workarounds?

Yes please, this would be good. :+1: We can have the existing ‘periodic build master’ which is vanilla - if that is green, then everything is back to normal, and a ‘periodic keep-this-green master’ which can be our reference master instance that we need to keep working. I can create a ticket to create this if you agree? CC @antoviaque

Also while I remember, would it be worth reducing the build frequency for the periodic builds? Maybe to once a day? once every 2-3 days? This hasn’t proven to be something time sensitive.

1 Like

Yes please, unless anyone objects.

It’s definitely too frequent now I think - the Ocim UI is just not usable when there are 715+ appservers in one instance, and it’s only going to get worse. I’d say 1-2x per day for the vanilla build and once/day for the “known working” build would be fine?

3 Likes

Just linking: there’s a discussion about VM cleanup of periodic builds on this ticket.

I decreased the frequency of the periodic build master instance to 12h (twice a day) as suggested.

@braden sorry, I missed your comment to go ahead with creating a ticket to make a reference working periodic builds master. Since then though, it looks like work has been done on our existing ‘vanilla’ instance (periodic build master) - it now has working builds and some custom config. Do you have context around this? Do we still want two periodic build masters (vanilla and reference working)?

I don’t know anything about those changes, no. Sounds like we need more input from the team. I’m still in favor of having both a vanilla and a reference working deployment.

@anon46505572, @braden, I don’t have any context about other custom config, but I helped fix CRI-206 which @pooja had reported and was blocking the Juniper release. So the custom edxapp_media_dir override which we had to add for fixing the issue has been removed now.

2 Likes

Periodic builds for upstream master are currently failing at the following step (cf. AppServer 798):

TASK [certs : Install python requirements] ************************************************************************************************************************************************************************
fatal: [149.202.166.165]: FAILED! => {"changed": false, "cmd": ["/edx/app/certs/venvs/certs/bin/pip2", "install", "-i", "https://pypi.python.org/simple", "-r", "/edx/app/certs/certificates/requirements/base.txt"], "msg": "stdout: New python executable in /edx/app/certs/venvs/certs/bin/python\nInstalling setuptools, pip, wheel...\ndone.\nLooking in indexes: https://pypi.python.org/simple\nRequirement already satisfied: argparse==1.2.1 in /usr/lib/python2.7 (from -r /edx/app/certs/certificates/requirements/base.txt (line 7)) (1.2.1)\nCollecting boto==2.39.0\n  Downloading boto-2.39.0-py2.py3-none-any.whl (1.3 MB)\nCollecting certifi==2020.4.5.2\n  Downloading certifi-2020.4.5.2-py2.py3-none-any.whl (157 kB)\nCollecting chardet==3.0.4\n  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)\nCollecting edx-opaque-keys==2.1.0\n  Downloading edx-opaque-keys-2.1.0.tar.gz (61 kB)\nCollecting gnupg==2.3.1\n  Downloading gnupg-2.3.1.tar.gz (100 kB)\nCollecting idna==2.8\n  Downloading idna-2.8-py2.py3-none-any.whl (58 kB)\nCollecting nose==1.2.1\n  Downloading nose-1.2.1.tar.gz (400 kB)\nCollecting path.py==2.4.1\n  Downloading path.py-2.4.1.zip (15 kB)\nCollecting pbr==5.4.5\n  Downloading pbr-5.4.5-py2.py3-none-any.whl (110 kB)\n\n:stderr: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support\nERROR: Could not find a version that satisfies the requirement pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17)) (from versions: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 2.0.0, 2.1.0, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.7.0, 2.8.0, 2.8.1, 2.8.2, 2.9.0, 3.0.0, 3.1.0rc1, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1, 3.4.2, 4.0.0, 4.1.0, 4.1.1, 4.2.0, 4.2.1, 4.3.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 5.4.1, 6.0.0, 6.1.0, 6.2.0, 6.2.1, 6.2.2)\nERROR: No matching distribution found for pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17))\n"}

Sanitized stack trace:

{
...
"cmd": ["/edx/app/certs/venvs/certs/bin/pip2", "install", "-i", "https://pypi.python.org/simple", "-r", "/edx/app/certs/certificates/requirements/base.txt"],
"msg":
"stdout: New python executable in /edx/app/certs/venvs/certs/bin/python
Installing setuptools, pip, wheel...
done.
Looking in indexes: https://pypi.python.org/simple
Requirement already satisfied: argparse==1.2.1 in /usr/lib/python2.7 (from -r /edx/app/certs/certificates/requirements/base.txt (line 7)) (1.2.1)
Collecting boto==2.39.0
  Downloading boto-2.39.0-py2.py3-none-any.whl (1.3 MB)
Collecting certifi==2020.4.5.2
  Downloading certifi-2020.4.5.2-py2.py3-none-any.whl (157 kB)
Collecting chardet==3.0.4
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting edx-opaque-keys==2.1.0
  Downloading edx-opaque-keys-2.1.0.tar.gz (61 kB)
Collecting gnupg==2.3.1
  Downloading gnupg-2.3.1.tar.gz (100 kB)
Collecting idna==2.8
  Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
Collecting nose==1.2.1
  Downloading nose-1.2.1.tar.gz (400 kB)
Collecting path.py==2.4.1
  Downloading path.py-2.4.1.zip (15 kB)
Collecting pbr==5.4.5
  Downloading pbr-5.4.5-py2.py3-none-any.whl (110 kB)

...
ERROR: Could not find a version that satisfies the requirement pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17)) (from versions: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 2.0.0, 2.1.0, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.7.0, 2.8.0, 2.8.1, 2.8.2, 2.9.0, 3.0.0, 3.1.0rc1, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1, 3.4.2, 4.0.0, 4.1.0, 4.1.1, 4.2.0, 4.2.1, 4.3.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 5.4.1, 6.0.0, 6.1.0, 6.2.0, 6.2.1, 6.2.2)
ERROR: No matching distribution found for pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17))"}

@tikr Is that something to report upstream, now that we have a process in place with Ned?

What’s the process? (Sorry, I missed that discussion…)

It looks like edx-certificates has python3.8 support, but the requirements are still being installed by ansible’s old pip 2 (by comparison, other services (and edxapp) use the pip 3 from their virtualenv instead).

There’s a task about this issue, SE-2810.
It affects master, not client instances.

If we clarify the new process to report upstream I can continue it as firefighter.

Yep, we should do that, I just ran out of time yesterday :+1:

The process is simply to report it as a CRI issue (and ping Ned). See CRI-206 for an example (reported via SE-2587).

Any knowledge we might already have gained about the issue should be included in the ticket description.

@daniel If you could take care of this as SF, that would be great :slightly_smiling_face:

2 Likes

I just created [CRI-215] - JIRA and pinged Ned. I’ll continue at Log in - OpenCraft

1 Like

Determined a workaround for this issue, and so have updated Ocim’s watched fork configuration for new PR sandboxes to add the following:

# Update 2020-06-23, issues with certs package using pip2
# SE-2810, CRI-215
CERTS_VERSION: open-release/juniper.1
certs_version: open-release/juniper.1

I went to update and redeploy the failed PR sandboxes we have on Ocim and found there’s 22(!) of them that have never had a running appserver, so didn’t bother. This means that we either don’t need these sandboxes to demonstrate the PR, or that we haven’t finished prepping our sandboxes for our OSPRs?

In addition to adding these settings, I also had to use the upstream version of the configuration repo to get my PR sandbox to build. Should we replace the open-craft/configuration fork with the upstream master version of configuration in the watched fork configuration or should we updated our fork instead?

Good to know, thanks @mtyaka! I’ve updated the watched fork to use upstream master… it was really flaky for a while, so if it becomes that way again, we can update our branch and go back to using that.

@serenity @bebop A quick heads-up that the process for reporting and following up on periodic build failures has now been formalized in the handbook. See the updates to the ops reviewer and firefighter roles from https://gitlab.com/opencraft/documentation/public/-/merge_requests/174 for details.

6 Likes

Periodic builds for master are currently failing with the following error:

TASK [forum : initialize elasticsearch] *****************************************************************************************************************************************
fatal: [149.202.187.64]: FAILED! => {"changed": true, "cmd": ["/edx/app/forum/cs_comments_service/bin/rake", "search:initialize"], "delta": "0:00:04.117051", "end": "2020-10-25 21:57:34.776740", "msg": "non-zero return code", "rc": 1, "start": "2020-10-25 21:57:30.659689", "stderr": "/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: already initialized constant ROOT\n/edx/app/forum/cs_comments_service/lib/tasks/kpis.rake:7: warning: previous definition of ROOT was here\n/edx/app/forum/cs_comments_service/models/constants.rb:2: warning: already initialized constant COURSE_ID\n/edx/app/forum/cs_comments_service/lib/tasks/db.rake:28: warning: previous definition of COURSE_ID was here\n/edx/app/forum/cs_comments_service/lib/tasks/flags.rake:6: warning: already initialized constant ROOT\n/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: previous definition of ROOT was here\nrake aborted!\nElasticsearch::Transport::Transport::Errors::InternalServerError: [500] {\"error\":\"ClassCastException[java.lang.String cannot be cast to java.util.Map]\",\"status\":500}\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:218:in `__raise_transport_error'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:346:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/http/faraday.rb:37:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/client.rb:176:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/namespace/common.rb:38:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/actions/indices/create.rb:48:in `create'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:92:in `block in create_indices'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `each'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `create_indices'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:198:in `initialize_indices'\n/edx/app/forum/cs_comments_service/lib/tasks/search.rake:30:in `block (2 levels) in <top (required)>'\n/edx/app/forum/.gem/ruby/2.5.0/gems/rake-12.0.0/exe/rake:27:in `<top (required)>'\nTasks: TOP => search:initialize\n(See full trace by running task with --trace)", "stderr_lines": ["/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: already initialized constant ROOT", "/edx/app/forum/cs_comments_service/lib/tasks/kpis.rake:7: warning: previous definition of ROOT was here", "/edx/app/forum/cs_comments_service/models/constants.rb:2: warning: already initialized constant COURSE_ID", "/edx/app/forum/cs_comments_service/lib/tasks/db.rake:28: warning: previous definition of COURSE_ID was here", "/edx/app/forum/cs_comments_service/lib/tasks/flags.rake:6: warning: already initialized constant ROOT", "/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: previous definition of ROOT was here", "rake aborted!", "Elasticsearch::Transport::Transport::Errors::InternalServerError: [500] {\"error\":\"ClassCastException[java.lang.String cannot be cast to java.util.Map]\",\"status\":500}", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:218:in `__raise_transport_error'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:346:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/http/faraday.rb:37:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/client.rb:176:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/namespace/common.rb:38:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/actions/indices/create.rb:48:in `create'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:92:in `block in create_indices'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `each'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `create_indices'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:198:in `initialize_indices'", "/edx/app/forum/cs_comments_service/lib/tasks/search.rake:30:in `block (2 levels) in <top (required)>'", "/edx/app/forum/.gem/ruby/2.5.0/gems/rake-12.0.0/exe/rake:27:in `<top (required)>'", "Tasks: TOP => search:initialize", "(See full trace by running task with --trace)"], "stdout": "W, [2020-10-25T21:57:34.341082 #28234]  WARN -- : Overwriting existing field _id in class User.\nW, [2020-10-25T21:57:34.390196 #28234]  WARN -- : MONGODB | Unsupported client option 'max_retries'. It will be ignored.\nW, [2020-10-25T21:57:34.390303 #28234]  WARN -- : MONGODB | Unsupported client option 'retry_interval'. It will be ignored.\nW, [2020-10-25T21:57:34.390326 #28234]  WARN -- : MONGODB | Unsupported client option 'timeout'. It will be ignored.", "stdout_lines": ["W, [2020-10-25T21:57:34.341082 #28234]  WARN -- : Overwriting existing field _id in class User.", "W, [2020-10-25T21:57:34.390196 #28234]  WARN -- : MONGODB | Unsupported client option 'max_retries'. It will be ignored.", "W, [2020-10-25T21:57:34.390303 #28234]  WARN -- : MONGODB | Unsupported client option 'retry_interval'. It will be ignored.", "W, [2020-10-25T21:57:34.390326 #28234]  WARN -- : MONGODB | Unsupported client option 'timeout'. It will be ignored."]}

@Agrendalath @pooja @demid @usman Could you please follow up on this as SFs for the current Sprint 232 (following the process mentioned above?