Open edX devstack and Ocim sandbox issues

@anon46505572, @braden, I donā€™t have any context about other custom config, but I helped fix CRI-206 which @pooja had reported and was blocking the Juniper release. So the custom edxapp_media_dir override which we had to add for fixing the issue has been removed now.

2 Likes

Periodic builds for upstream master are currently failing at the following step (cf. AppServer 798):

TASK [certs : Install python requirements] ************************************************************************************************************************************************************************
fatal: [149.202.166.165]: FAILED! => {"changed": false, "cmd": ["/edx/app/certs/venvs/certs/bin/pip2", "install", "-i", "https://pypi.python.org/simple", "-r", "/edx/app/certs/certificates/requirements/base.txt"], "msg": "stdout: New python executable in /edx/app/certs/venvs/certs/bin/python\nInstalling setuptools, pip, wheel...\ndone.\nLooking in indexes: https://pypi.python.org/simple\nRequirement already satisfied: argparse==1.2.1 in /usr/lib/python2.7 (from -r /edx/app/certs/certificates/requirements/base.txt (line 7)) (1.2.1)\nCollecting boto==2.39.0\n  Downloading boto-2.39.0-py2.py3-none-any.whl (1.3 MB)\nCollecting certifi==2020.4.5.2\n  Downloading certifi-2020.4.5.2-py2.py3-none-any.whl (157 kB)\nCollecting chardet==3.0.4\n  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)\nCollecting edx-opaque-keys==2.1.0\n  Downloading edx-opaque-keys-2.1.0.tar.gz (61 kB)\nCollecting gnupg==2.3.1\n  Downloading gnupg-2.3.1.tar.gz (100 kB)\nCollecting idna==2.8\n  Downloading idna-2.8-py2.py3-none-any.whl (58 kB)\nCollecting nose==1.2.1\n  Downloading nose-1.2.1.tar.gz (400 kB)\nCollecting path.py==2.4.1\n  Downloading path.py-2.4.1.zip (15 kB)\nCollecting pbr==5.4.5\n  Downloading pbr-5.4.5-py2.py3-none-any.whl (110 kB)\n\n:stderr: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support\nERROR: Could not find a version that satisfies the requirement pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17)) (from versions: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 2.0.0, 2.1.0, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.7.0, 2.8.0, 2.8.1, 2.8.2, 2.9.0, 3.0.0, 3.1.0rc1, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1, 3.4.2, 4.0.0, 4.1.0, 4.1.1, 4.2.0, 4.2.1, 4.3.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 5.4.1, 6.0.0, 6.1.0, 6.2.0, 6.2.1, 6.2.2)\nERROR: No matching distribution found for pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17))\n"}

Sanitized stack trace:

{
...
"cmd": ["/edx/app/certs/venvs/certs/bin/pip2", "install", "-i", "https://pypi.python.org/simple", "-r", "/edx/app/certs/certificates/requirements/base.txt"],
"msg":
"stdout: New python executable in /edx/app/certs/venvs/certs/bin/python
Installing setuptools, pip, wheel...
done.
Looking in indexes: https://pypi.python.org/simple
Requirement already satisfied: argparse==1.2.1 in /usr/lib/python2.7 (from -r /edx/app/certs/certificates/requirements/base.txt (line 7)) (1.2.1)
Collecting boto==2.39.0
  Downloading boto-2.39.0-py2.py3-none-any.whl (1.3 MB)
Collecting certifi==2020.4.5.2
  Downloading certifi-2020.4.5.2-py2.py3-none-any.whl (157 kB)
Collecting chardet==3.0.4
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting edx-opaque-keys==2.1.0
  Downloading edx-opaque-keys-2.1.0.tar.gz (61 kB)
Collecting gnupg==2.3.1
  Downloading gnupg-2.3.1.tar.gz (100 kB)
Collecting idna==2.8
  Downloading idna-2.8-py2.py3-none-any.whl (58 kB)
Collecting nose==1.2.1
  Downloading nose-1.2.1.tar.gz (400 kB)
Collecting path.py==2.4.1
  Downloading path.py-2.4.1.zip (15 kB)
Collecting pbr==5.4.5
  Downloading pbr-5.4.5-py2.py3-none-any.whl (110 kB)

...
ERROR: Could not find a version that satisfies the requirement pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17)) (from versions: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.7.8, 2.0.0, 2.1.0, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.7.0, 2.8.0, 2.8.1, 2.8.2, 2.9.0, 3.0.0, 3.1.0rc1, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.4.0, 3.4.1, 3.4.2, 4.0.0, 4.1.0, 4.1.1, 4.2.0, 4.2.1, 4.3.0, 5.0.0, 5.1.0, 5.2.0, 5.3.0, 5.4.0, 5.4.1, 6.0.0, 6.1.0, 6.2.0, 6.2.1, 6.2.2)
ERROR: No matching distribution found for pillow==7.1.2 (from -r /edx/app/certs/certificates/requirements/base.txt (line 17))"}

@tikr Is that something to report upstream, now that we have a process in place with Ned?

Whatā€™s the process? (Sorry, I missed that discussionā€¦)

It looks like edx-certificates has python3.8 support, but the requirements are still being installed by ansibleā€™s old pip 2 (by comparison, other services (and edxapp) use the pip 3 from their virtualenv instead).

Thereā€™s a task about this issue, SE-2810.
It affects master, not client instances.

If we clarify the new process to report upstream I can continue it as firefighter.

Yep, we should do that, I just ran out of time yesterday :+1:

The process is simply to report it as a CRI issue (and ping Ned). See CRI-206 for an example (reported via SE-2587).

Any knowledge we might already have gained about the issue should be included in the ticket description.

@daniel If you could take care of this as SF, that would be great :slightly_smiling_face:

2 Likes

I just created [CRI-215] - JIRA and pinged Ned. Iā€™ll continue at Log in - OpenCraft

1 Like

Determined a workaround for this issue, and so have updated Ocimā€™s watched fork configuration for new PR sandboxes to add the following:

# Update 2020-06-23, issues with certs package using pip2
# SE-2810, CRI-215
CERTS_VERSION: open-release/juniper.1
certs_version: open-release/juniper.1

I went to update and redeploy the failed PR sandboxes we have on Ocim and found thereā€™s 22(!) of them that have never had a running appserver, so didnā€™t bother. This means that we either donā€™t need these sandboxes to demonstrate the PR, or that we havenā€™t finished prepping our sandboxes for our OSPRs?

In addition to adding these settings, I also had to use the upstream version of the configuration repo to get my PR sandbox to build. Should we replace the open-craft/configuration fork with the upstream master version of configuration in the watched fork configuration or should we updated our fork instead?

Good to know, thanks @mtyaka! Iā€™ve updated the watched fork to use upstream masterā€¦ it was really flaky for a while, so if it becomes that way again, we can update our branch and go back to using that.

@serenity @bebop A quick heads-up that the process for reporting and following up on periodic build failures has now been formalized in the handbook. See the updates to the ops reviewer and firefighter roles from https://gitlab.com/opencraft/documentation/public/-/merge_requests/174 for details.

6 Likes

Periodic builds for master are currently failing with the following error:

TASK [forum : initialize elasticsearch] *****************************************************************************************************************************************
fatal: [149.202.187.64]: FAILED! => {"changed": true, "cmd": ["/edx/app/forum/cs_comments_service/bin/rake", "search:initialize"], "delta": "0:00:04.117051", "end": "2020-10-25 21:57:34.776740", "msg": "non-zero return code", "rc": 1, "start": "2020-10-25 21:57:30.659689", "stderr": "/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: already initialized constant ROOT\n/edx/app/forum/cs_comments_service/lib/tasks/kpis.rake:7: warning: previous definition of ROOT was here\n/edx/app/forum/cs_comments_service/models/constants.rb:2: warning: already initialized constant COURSE_ID\n/edx/app/forum/cs_comments_service/lib/tasks/db.rake:28: warning: previous definition of COURSE_ID was here\n/edx/app/forum/cs_comments_service/lib/tasks/flags.rake:6: warning: already initialized constant ROOT\n/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: previous definition of ROOT was here\nrake aborted!\nElasticsearch::Transport::Transport::Errors::InternalServerError: [500] {\"error\":\"ClassCastException[java.lang.String cannot be cast to java.util.Map]\",\"status\":500}\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:218:in `__raise_transport_error'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:346:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/http/faraday.rb:37:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/client.rb:176:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/namespace/common.rb:38:in `perform_request'\n/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/actions/indices/create.rb:48:in `create'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:92:in `block in create_indices'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `each'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `create_indices'\n/edx/app/forum/cs_comments_service/lib/task_helpers.rb:198:in `initialize_indices'\n/edx/app/forum/cs_comments_service/lib/tasks/search.rake:30:in `block (2 levels) in <top (required)>'\n/edx/app/forum/.gem/ruby/2.5.0/gems/rake-12.0.0/exe/rake:27:in `<top (required)>'\nTasks: TOP => search:initialize\n(See full trace by running task with --trace)", "stderr_lines": ["/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: already initialized constant ROOT", "/edx/app/forum/cs_comments_service/lib/tasks/kpis.rake:7: warning: previous definition of ROOT was here", "/edx/app/forum/cs_comments_service/models/constants.rb:2: warning: already initialized constant COURSE_ID", "/edx/app/forum/cs_comments_service/lib/tasks/db.rake:28: warning: previous definition of COURSE_ID was here", "/edx/app/forum/cs_comments_service/lib/tasks/flags.rake:6: warning: already initialized constant ROOT", "/edx/app/forum/cs_comments_service/lib/tasks/deep_search.rake:7: warning: previous definition of ROOT was here", "rake aborted!", "Elasticsearch::Transport::Transport::Errors::InternalServerError: [500] {\"error\":\"ClassCastException[java.lang.String cannot be cast to java.util.Map]\",\"status\":500}", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:218:in `__raise_transport_error'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/base.rb:346:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/transport/http/faraday.rb:37:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-transport-7.8.0/lib/elasticsearch/transport/client.rb:176:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/namespace/common.rb:38:in `perform_request'", "/edx/app/forum/.gem/ruby/2.5.0/gems/elasticsearch-api-7.8.0/lib/elasticsearch/api/actions/indices/create.rb:48:in `create'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:92:in `block in create_indices'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `each'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:89:in `create_indices'", "/edx/app/forum/cs_comments_service/lib/task_helpers.rb:198:in `initialize_indices'", "/edx/app/forum/cs_comments_service/lib/tasks/search.rake:30:in `block (2 levels) in <top (required)>'", "/edx/app/forum/.gem/ruby/2.5.0/gems/rake-12.0.0/exe/rake:27:in `<top (required)>'", "Tasks: TOP => search:initialize", "(See full trace by running task with --trace)"], "stdout": "W, [2020-10-25T21:57:34.341082 #28234]  WARN -- : Overwriting existing field _id in class User.\nW, [2020-10-25T21:57:34.390196 #28234]  WARN -- : MONGODB | Unsupported client option 'max_retries'. It will be ignored.\nW, [2020-10-25T21:57:34.390303 #28234]  WARN -- : MONGODB | Unsupported client option 'retry_interval'. It will be ignored.\nW, [2020-10-25T21:57:34.390326 #28234]  WARN -- : MONGODB | Unsupported client option 'timeout'. It will be ignored.", "stdout_lines": ["W, [2020-10-25T21:57:34.341082 #28234]  WARN -- : Overwriting existing field _id in class User.", "W, [2020-10-25T21:57:34.390196 #28234]  WARN -- : MONGODB | Unsupported client option 'max_retries'. It will be ignored.", "W, [2020-10-25T21:57:34.390303 #28234]  WARN -- : MONGODB | Unsupported client option 'retry_interval'. It will be ignored.", "W, [2020-10-25T21:57:34.390326 #28234]  WARN -- : MONGODB | Unsupported client option 'timeout'. It will be ignored."]}

@Agrendalath @pooja @demid @usman Could you please follow up on this as SFs for the current Sprint 232 (following the process mentioned above?

@tikr, unfortunately I donā€™t have too much free FF capacity at the moment. I can take a look at this on Friday if nobody else has time before then.

1 Like

@tikr Since we have a public ML where the build logs are announced now, would it make sense to post messages like this there, with the firefighters in CC, so that people outside of OpenCraft could know about the detected (and manually confirmed) breakage? I was discussing this with Ned during the contributors meetup from last week, and he mentioned he would need help to confirm the build errors that arrive there, and help him (and other people in the community) understand which errors to investigate. This way it doesnā€™t have to be just us investigating, and other community members with the same issue could also react if they see a similar issue on their side.

1 Like

@tikr I suspect it has to do with: https://github.com/edx/cs_comments_service/pull/327. Iā€™ll investigate it further tomorrow.

2 Likes

@antoviaque Sure, posting these updates to the public ML sounds good :slight_smile: How can I join it?

A few additional notes/questions:

  • To make sure that everyone on the team is in the loop about these breakages, would it make sense to still post them here as well (in the form of a link to the corresponding ML posts, perhaps)?
  • Re: helping Ned confirm build errors and understand which errors to investigate: What info would be good to include in the ML posts to address this? (The Trello card that you linked to didnā€™t mention this and I didnā€™t get a chance to watch the recording of the meeting this week.)
  • Once weā€™re clear on the items above, Iā€™ll have to adjust the changes from https://gitlab.com/opencraft/documentation/public/-/merge_requests/174 so that they match the new process.

Thanks @usman for investigating this one :raised_hands:

When thereā€™s a task for this incident please post it hereā€¦

@tikr See:

Sure - maybe just link to the public place internally then, asking to comment/collaborate with everyone on the public thread rather than just between us internally.

The best would likely to ask Ned - though there is already a PR to get a lot of that info directly in the automated email. You can probably start with what you would want to have to be able to access to debug/reproduce the issue.

1 Like

A note that there is a PR being worked on by Diana to resolve the issue which is causing the periodic master builds to fail: https://github.com/edx/configuration/pull/6093.

Tracking this internally in SE-3586.

A new error in periodic builds: https://manage.opencraft.com/instance/17120/edx-appserver/15241/

2020-11-09 08:36:39+0200INFO
TASK [nginx : Write out htpasswd file] *********************************************
2020-11-09 08:36:39+0200INFO
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: No module named 'passlib'
2020-11-09 08:36:39+0200ERROR
away and you might need to add |bool to the expression in the future. Also see

Passlib is mentioned somewhere in our playbooks, and it may be where we set the HTTP basic auth password for the LBā†ā†’edxapp communication.
Can the firefighters take a look and create a task if needed? @mtyaka @guruprasad @jill @giovannicimolin