Azure RM model updates

* Make the resource group configurable
* Update SDK to 0.9.4
* Fix: Token expiration was incorrectly calculated (API is named a bit oddly).
* Fix: Token expiration is in seconds.
* Change VM/deployment names to ones that are valid
* Fix location + VM sizes
  It appears that that the location and VM size APIs do not currently support the AAD based authentication.  The certificate is required.  Rather than re-introduce the cert for just this limited UI scenario, I have decided to hard-code the list based on the returned info from current Azure.  This is a decent short term solution, since a move to the 1.0.0 API (when released) will require that this code be changed anyway and presumably this problem should be fixed for good at that point.
* Proper handling for custom image URIs
* Asynchronous verification of the subscription info
* Asynchronous verification of the azure templates
* Asynchronous provisioning
* Clean up resources after unsuccessful provisioning
* Lots of logging updates
* Fix: SSH launcher - Get channels before connection for exec channels
  Getting the channels after connection introduces a race where we could potentially fail to read from the input streams if they were obtained from the channel after the connection had completed.
* Add diagnostics to image verification messages
* Asynchronous deletion via UI
* Update version of Jsch
* Temporarily disable image verification for reference images
* Allow for initialization scripts to be run as root
* Cleanup of slaves now works properly
    * Doesn't clean up when node is taken offline by user
    * Post build tasks won't kill other runs or show errors in the job logging
    * Retention strategy now for online nodes, cleanup for offline nodes
* Nodes that are marked to shut down on idle can now be restarted properly (before too many could be started)
* Add option to treat failures of the initialization script as a reason to discard the VM (linux only currently)
* Reenable Windows custom script extension for startup
* Update documentation with new sample startup scripts

Add setAcceptingTasks appropriately

doc fixup
This commit is contained in:
Matt Mitchell 2016-09-12 16:20:16 -07:00
parent 10a5b12b42
commit 6b53a4e3a3
44 changed files with 2786 additions and 1451 deletions

View File

@ -74,8 +74,59 @@ Refer to
12. For the Init script, provide a script to install at least a Java runtime if the image does not have Java
pre-installed.
For the JNLP launch method, the init script must be in PowerShell.
If the init script is expected to take a long time to execute, it is recommended to prepare custom images with the necessary software pre-installed.<br>
For the Windows JNLP launch method, the init script must be in PowerShell.
Automatically passed to this script is:
First argument - Jenkins server URL
Second argument - VMName
Third argument - JNLP secret, required if the server has security enabled.
You need to install Java, download the slave jar file from: '[server url]jnlpJars/slave.jar'.
The server url should already have a trailing slash. Then execute the following to connect:
`java.exe -jar [slave jar location] [-secret [client secret if required]] [server url]computer/[vm name]/slave-agent.jnlp`
Example script
```
Set-ExecutionPolicy Unrestricted
$jenkinsServerUrl = $args[0]
$vmName = $args[1]
$secret = $args[2]
$baseDir = 'C:\Jenkins'
mkdir $baseDir
# Download the JDK
$source = "http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-windows-x64.exe"
$destination = "$baseDir\jdk.exe"
$client = new-object System.Net.WebClient
$cookie = "oraclelicense=accept-securebackup-cookie"
$client.Headers.Add([System.Net.HttpRequestHeader]::Cookie, $cookie)
$client.downloadFile([string]$source, [string]$destination)
# Execute the unattended install
$jdkInstallDir=$baseDir + '\jdk\'
$jreInstallDir=$baseDir + '\jre\'
C:\Jenkins\jdk.exe /s INSTALLDIR=$jdkInstallDir /INSTALLDIRPUBJRE=$jdkInstallDir
$javaExe=$jdkInstallDir + '\bin\java.exe'
$jenkinsSlaveJarUrl = $jenkinsServerUrl + "jnlpJars/slave.jar"
$destinationSlaveJarPath = $baseDir + '\slave.jar'
# Download the jar file
$client = new-object System.Net.WebClient
$client.DownloadFile($jenkinsSlaveJarUrl, $destinationSlaveJarPath)
# Calculate the jnlpURL
$jnlpUrl = $jenkinsServerUrl + 'computer/' + $vmName + '/slave-agent.jnlp'
while ($true) {
try {
# Launch
& $javaExe -jar $destinationSlaveJarPath -secret $secret -jnlpUrl $jnlpUrl -noReconnect
}
catch [System.Exception] {
Write-Output $_.Exception.ToString()
}
sleep 10
}
```
For more details about how to prepare custom images, refer to the below links:
* [Capture Windows Image](http://azure.microsoft.com/en-us/documentation/articles/virtual-machines-capture-image-windows-server/)
@ -89,22 +140,12 @@ Refer to
1. Configure an Azure profile and Template as per the above instructions.
2. If the init script is expected to take a long time to complete, it is recommended to use a custom-prepared Ubuntu
image that has the required software pre-installed, including a Java runtime
3. For platform images, you may specify an Init script as below to install Java, Git and Ant:
3. For platform images, you may specify an Init script as below to install Java (may vary based on OS):
```
#Install Java
sudo apt-get -y update
sudo apt-get install -y openjdk-7-jdk
sudo apt-get -y update --fix-missing
sudo apt-get install -y openjdk-7-jdk
# Install Git
sudo apt-get install -y git
#Install Ant
sudo apt-get install -y ant
sudo apt-get -y update --fix-missing
sudo apt-get install -y ant
sudo apt-get install -y openjdk-8-jre
```
## Template configuration for Windows images with launch method JNLP.

View File

@ -3,7 +3,7 @@
<parent>
<groupId>org.jenkins-ci.plugins</groupId>
<artifactId>plugin</artifactId>
<version>1.620</version>
<version>1.642.1</version>
</parent>
<artifactId>azure-slave-plugin</artifactId>
@ -108,7 +108,7 @@
<dependency>
<groupId>com.jcraft</groupId>
<artifactId>jsch</artifactId>
<version>0.1.53</version>
<version>0.1.54</version>
</dependency>
</dependencies>

View File

@ -18,9 +18,13 @@ package com.microsoftopentechnologies.azure;
import com.microsoft.azure.management.compute.models.VirtualMachineGetResponse;
import com.microsoft.azure.management.resources.ResourceManagementClient;
import com.microsoft.azure.management.resources.ResourceManagementService;
import com.microsoft.azure.management.resources.models.DeploymentGetResult;
import com.microsoft.azure.management.resources.models.DeploymentOperation;
import com.microsoft.azure.management.resources.models.ProvisioningState;
import com.microsoft.windowsazure.Configuration;
import com.microsoftopentechnologies.azure.exceptions.AzureCloudException;
import com.microsoftopentechnologies.azure.util.AzureUtil;
import com.microsoftopentechnologies.azure.util.CleanUpAction;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
@ -69,8 +73,22 @@ public class AzureCloud extends Cloud {
private final String serviceManagementURL;
private final int maxVirtualMachinesLimit;
private final String resourceGroupName;
private final List<AzureSlaveTemplate> instTemplates;
// True if the subscription has been verified.
// False otherwise.
private boolean configurationValid;
// True if initial verification was queued for this cloud.
// Set on either: construction or initial canProvision if
// not already set.
private transient boolean initialVerificationQueued;
// Approximate virtual machine count. Updated periodically.
private int approximateVirtualMachineCount;
@DataBoundConstructor
public AzureCloud(
@ -81,16 +99,16 @@ public class AzureCloud extends Cloud {
final String oauth2TokenEndpoint,
final String serviceManagementURL,
final String maxVirtualMachinesLimit,
final List<AzureSlaveTemplate> instTemplates,
final String fileName,
final String fileData) {
final String resourceGroupName,
final List<AzureSlaveTemplate> instTemplates) {
super(Constants.AZURE_CLOUD_PREFIX + subscriptionId);
super(AzureUtil.getCloudName(subscriptionId));
this.subscriptionId = subscriptionId;
this.clientId = clientId;
this.clientSecret = clientSecret;
this.oauth2TokenEndpoint = oauth2TokenEndpoint;
this.resourceGroupName = resourceGroupName;
this.serviceManagementURL = StringUtils.isBlank(serviceManagementURL)
? Constants.DEFAULT_MANAGEMENT_URL
@ -101,11 +119,37 @@ public class AzureCloud extends Cloud {
} else {
this.maxVirtualMachinesLimit = Integer.parseInt(maxVirtualMachinesLimit);
}
this.configurationValid = false;
this.instTemplates = instTemplates == null
? Collections.<AzureSlaveTemplate>emptyList()
: instTemplates;
readResolve();
registerInitialVerificationIfNeeded();
}
/**
* Register the initial verification if required
*/
private void registerInitialVerificationIfNeeded() {
if (this.initialVerificationQueued) {
return;
}
// Register the cloud and the templates for verification
AzureVerificationTask.registerCloud(this.name);
// Register all templates. We don't know what happened with them
// when save was hit.
AzureVerificationTask.registerTemplates(this.getInstTemplates());
// Force the verification task to run if it's not already running.
// Note that early in startup this could return null
if (AzureVerificationTask.get() != null) {
AzureVerificationTask.get().doRun();
// Set the initial verification as being queued and ready to go.
this.initialVerificationQueued = true;
}
}
private Object readResolve() {
@ -117,16 +161,33 @@ public class AzureCloud extends Cloud {
@Override
public boolean canProvision(final Label label) {
final AzureSlaveTemplate template = getAzureSlaveTemplate(label);
// return false if there is no template
if (template == null) {
LOGGER.log(Level.INFO, "Azurecloud: canProvision: template not found for label {0}", label);
if (!configurationValid) {
// The subscription is not verified or is not valid,
// so we can't provision any nodes.
LOGGER.log(Level.INFO, "Azurecloud: canProvision: Subscription not verified, or is invalid, cannot provision");
registerInitialVerificationIfNeeded();
return false;
} else if (template.getTemplateStatus().equalsIgnoreCase(Constants.TEMPLATE_STATUS_DISBALED)) {
}
final AzureSlaveTemplate template = getAzureSlaveTemplate(label);
// return false if there is no template for this label.
if (template == null) {
// Avoid logging this, it happens a lot and is just noisy in logs.
return false;
} else if (template.isTemplateDisabled()) {
// Log this. It's not terribly noisy and can be useful
LOGGER.log(Level.INFO,
"Azurecloud: canProvision: template {0} is marked has disabled, cannot provision slaves",
template.getTemplateName());
return false;
} else if (!template.isTemplateVerified()) {
// The template is available, but not verified. It may be queued for
// verification, but ensure that it's added.
LOGGER.log(Level.INFO,
"Azurecloud: canProvision: template {0} is awaiting verification or has failed verification",
template.getTemplateName());
AzureVerificationTask.registerTemplate(template);
return false;
} else {
return true;
}
@ -155,7 +216,88 @@ public class AzureCloud extends Cloud {
public int getMaxVirtualMachinesLimit() {
return maxVirtualMachinesLimit;
}
public String getResourceGroupName() {
return resourceGroupName;
}
/**
* Returns the current set of templates.
* Required for config.jelly
* @return
*/
public List<AzureSlaveTemplate> getInstTemplates() {
return Collections.unmodifiableList(instTemplates);
}
/**
* Is the configuration set up and verified?
* @return True if the configuration set up and verified, false otherwise.
*/
public boolean isConfigurationValid() {
return configurationValid;
}
/**
* Set the configuration verification status
* @param isValid True for verified + valid, false otherwise.
*/
public void setConfigurationValid(boolean isValid) {
configurationValid = isValid;
}
/**
* Retrieves the current approximate virtual machine count
* @return
*/
public int getApproximateVirtualMachineCount() {
synchronized (this) {
return approximateVirtualMachineCount;
}
}
/**
* Given the number of VMs that are desired, returns the number
* of VMs that can be allocated.
* @param quantityDesired Number that are desired
* @return Number that can be allocated
*/
public int getAvailableVirtualMachineCount(int quantityDesired) {
synchronized (this) {
if (approximateVirtualMachineCount + quantityDesired <= getMaxVirtualMachinesLimit()) {
// Enough available, return the desired quantity
return quantityDesired;
}
else {
// Not enough available, return what we have. Remember we could
// go negative (if for instance another Jenkins instance had
// a higher limit.
return Math.max(0, getMaxVirtualMachinesLimit() - approximateVirtualMachineCount);
}
}
}
/**
* Adjust the number of currently allocated VMs
* @param delta Number to adjust by.
*/
public void adjustVirtualMachineCount(int delta) {
synchronized (this) {
approximateVirtualMachineCount = Math.max(0, approximateVirtualMachineCount + delta);
}
}
/**
* Sets the new approximate virtual machine count. This is run by
* the verification task to update the VM count periodically.
* @param newCount
*/
public void setVirtualMachineCount(int newCount) {
synchronized (this) {
approximateVirtualMachineCount = newCount;
}
}
/**
* Returns slave template associated with the label.
*
@ -163,17 +305,17 @@ public class AzureCloud extends Cloud {
* @return
*/
public AzureSlaveTemplate getAzureSlaveTemplate(final Label label) {
LOGGER.log(Level.INFO, "Retrieving slave template with label {0}", label);
LOGGER.log(Level.FINE, "AzureCloud: getAzureSlaveTemplate: Retrieving slave template with label {0}", label);
for (AzureSlaveTemplate slaveTemplate : instTemplates) {
LOGGER.log(Level.INFO, "Found slave template {0}", slaveTemplate.getTemplateName());
LOGGER.log(Level.FINE, "AzureCloud: getAzureSlaveTemplate: Found slave template {0}", slaveTemplate.getTemplateName());
if (slaveTemplate.getUseSlaveAlwaysIfAvail() == Node.Mode.NORMAL) {
if (label == null || label.matches(slaveTemplate.getLabelDataSet())) {
LOGGER.log(Level.INFO, "{0} matches!", slaveTemplate.getTemplateName());
LOGGER.log(Level.FINE, "AzureCloud: getAzureSlaveTemplate: {0} matches!", slaveTemplate.getTemplateName());
return slaveTemplate;
}
} else if (slaveTemplate.getUseSlaveAlwaysIfAvail() == Node.Mode.EXCLUSIVE) {
if (label != null && label.matches(slaveTemplate.getLabelDataSet())) {
LOGGER.log(Level.INFO, "{0} matches!", slaveTemplate.getTemplateName());
LOGGER.log(Level.FINE, "AzureCloud: getAzureSlaveTemplate: {0} matches!", slaveTemplate.getTemplateName());
return slaveTemplate;
}
}
@ -200,148 +342,105 @@ public class AzureCloud extends Cloud {
return null;
}
public List<AzureSlaveTemplate> getInstTemplates() {
return Collections.unmodifiableList(instTemplates);
}
private boolean verifyTemplate(final AzureSlaveTemplate template) {
boolean isVerified;
try {
LOGGER.log(Level.INFO, "Azure Cloud: provision: Verifying template {0}", template.getTemplateName());
final List<String> errors = template.verifyTemplate();
isVerified = errors.isEmpty();
if (isVerified) {
LOGGER.log(Level.INFO,
"Azure Cloud: provision: template {0} has no validation errors", template.getTemplateName());
} else {
LOGGER.log(Level.INFO, "Azure Cloud: provision: template {0}"
+ " has validation errors , cannot provision slaves with this configuration {1}",
new Object[] { template.getTemplateName(), errors });
template.handleTemplateStatus("Validation Error: Validation errors in template \n"
+ " Root cause: " + errors, FailureStage.VALIDATION, null);
// Register template for periodic check so that jenkins can make template active if
// validation errors are corrected
if (!Constants.TEMPLATE_STATUS_ACTIVE_ALWAYS.equals(template.getTemplateStatus())) {
AzureTemplateMonitorTask.registerTemplate(template);
}
}
} catch (Exception e) {
LOGGER.log(Level.SEVERE, "Azure Cloud: provision: Exception occured while validating template", e);
template.handleTemplateStatus("Validation Error: Exception occured while validating template "
+ e.getMessage(), FailureStage.VALIDATION, null);
// Register template for periodic check so that jenkins can make template active if validation errors
// are corrected
if (!Constants.TEMPLATE_STATUS_ACTIVE_ALWAYS.equals(template.getTemplateStatus())) {
AzureTemplateMonitorTask.registerTemplate(template);
}
isVerified = false;
}
return isVerified;
}
private AzureSlave provisionedSlave(
/**
* Once a new deployment is created, construct a new AzureSlave object
* given information about the template
* @param template Template used to create the new slave
* @param vmName Name of the created VM
* @param deploymentName Name of the deployment containing the VM
* @param config Azure configuration.
* @return New slave. Throws otherwise.
* @throws Exception
*/
private AzureSlave createProvisionedSlave(
final AzureSlaveTemplate template,
final String prefix,
final int index,
final int expectedVMs,
final String vmName,
final String deploymentName,
final Configuration config) throws Exception {
final ResourceManagementClient rmc = ResourceManagementService.create(config);
final String vmName = String.format("%s%s%d", template.getTemplateName(), prefix, index);
int completed = 0;
AzureSlave slave = null;
LOGGER.log(Level.INFO, "AzureCloud: createProvisionedSlave: Waiting for deployment to be completed");
int triesLeft = 20;
do {
triesLeft--;
try {
Thread.sleep(30 * 1000);
} catch (InterruptedException ex) {
// ignore
}
final List<DeploymentOperation> ops = rmc.getDeploymentOperationsOperations().
list(Constants.RESOURCE_GROUP_NAME, prefix, null).getOperations();
completed = 0;
list(resourceGroupName, deploymentName, null).getOperations();
for (DeploymentOperation op : ops) {
final String resource = op.getProperties().getTargetResource().getResourceName();
final String type = op.getProperties().getTargetResource().getResourceType();
final String state = op.getProperties().getProvisioningState();
if (ProvisioningState.CANCELED.equals(state)
|| ProvisioningState.FAILED.equals(state)
|| ProvisioningState.NOTSPECIFIED.equals(state)) {
LOGGER.log(Level.INFO, "Failed({0}): {1}:{2}", new Object[] { state, type, resource });
if (op.getProperties().getTargetResource().getResourceType().contains("virtualMachine")) {
if (resource.equalsIgnoreCase(vmName)) {
if (ProvisioningState.CANCELED.equals(state)
|| ProvisioningState.FAILED.equals(state)
|| ProvisioningState.NOTSPECIFIED.equals(state)) {
final String statusCode = op.getProperties().getStatusCode();
final String statusMessage = op.getProperties().getStatusMessage();
String finalStatusMessage = statusCode;
if (statusMessage != null) {
finalStatusMessage += " - " + statusMessage;
}
slave = AzureManagementServiceDelegate.parseResponse(
vmName, prefix, template, template.getOsType());
} else if (ProvisioningState.SUCCEEDED.equals(state)) {
if (op.getProperties().getTargetResource().getResourceType().contains("virtualMachine")) {
if (resource.equalsIgnoreCase(vmName)) {
LOGGER.log(Level.INFO, "VM available: {0}", resource);
throw new AzureCloudException(String.format("AzureCloud: createProvisionedSlave: Deployment %s: %s:%s - %s", new Object[] { state, type, resource, finalStatusMessage }));
} else if (ProvisioningState.SUCCEEDED.equals(state)) {
LOGGER.log(Level.INFO, "AzureCloud: createProvisionedSlave: VM available: {0}", resource);
final VirtualMachineGetResponse vm
= ServiceDelegateHelper.getComputeManagementClient(config).
getVirtualMachinesOperations().
getWithInstanceView(Constants.RESOURCE_GROUP_NAME, resource);
getWithInstanceView(resourceGroupName, resource);
final String osType = vm.getVirtualMachine().getStorageProfile().getOSDisk().
getOperatingSystemType();
slave = AzureManagementServiceDelegate.parseResponse(vmName, prefix, template, osType);
AzureSlave newSlave = AzureManagementServiceDelegate.parseResponse(vmName, deploymentName, template, osType);
// Set the virtual machine details
AzureManagementServiceDelegate.setVirtualMachineDetails(newSlave, template);
return newSlave;
}
else {
LOGGER.log(Level.INFO, "AzureCloud: createProvisionedSlave: Deployment not yet finished ({0}): {1}:{2}", new Object[] { state, type, resource });
}
completed++;
}
} else {
LOGGER.log(Level.INFO, "To Be Completed({0}): {1}:{2}", new Object[] { state, type, resource });
}
}
} while (slave == null && completed < expectedVMs);
} while (triesLeft > 0);
if (slave == null) {
throw new IllegalStateException(String.format("Slave machine '%s' not found into '%s'", vmName, prefix));
}
return slave;
throw new AzureCloudException(String.format("AzureCloud: createProvisionedSlave: Deployment failed, max tries reached for %s", deploymentName));
}
@Override
public Collection<PlannedNode> provision(final Label label, int workLoad) {
LOGGER.log(Level.INFO,
"Azure Cloud: provision: start for label {0} workLoad {1}", new Object[] { label, workLoad });
"AzureCloud: provision: start for label {0} workLoad {1}", new Object[] { label, workLoad });
final AzureSlaveTemplate template = getAzureSlaveTemplate(label);
// verify template
if (!verifyTemplate(template)) {
return Collections.<PlannedNode>emptyList();
}
// round up the number of required machine
int numberOfSlaves = (workLoad + template.getNoOfParallelJobs() - 1) / template.getNoOfParallelJobs();
final List<PlannedNode> plannedNodes = new ArrayList<PlannedNode>(numberOfSlaves);
// reuse existing nodes if available
LOGGER.log(Level.INFO, "AzureCloud: provision: checking for node reuse options");
for (Computer slaveComputer : Jenkins.getInstance().getComputers()) {
LOGGER.log(Level.INFO, "Azure Cloud: provision: got slave computer {0}", slaveComputer.getName());
if (numberOfSlaves == 0) {
break;
}
if (slaveComputer instanceof AzureComputer && slaveComputer.isOffline()) {
final AzureComputer azureComputer = AzureComputer.class.cast(slaveComputer);
final AzureSlave slaveNode = azureComputer.getNode();
if (isNodeEligibleForReuse(slaveNode, template)) {
LOGGER.log(Level.INFO,
"Azure Cloud: provision: \n - slave node {0}\n - slave template {1}",
new Object[] { slaveNode.getLabelString(), template.getLabels() });
LOGGER.log(Level.INFO, "AzureCloud: provision: slave computer eligible for reuse {0}", slaveComputer.getName());
try {
if (AzureManagementServiceDelegate.virtualMachineExists(slaveNode)) {
numberOfSlaves--;
@ -361,19 +460,21 @@ public class AzureCloud extends Cloud {
Jenkins.getInstance().addNode(slaveNode);
if (slaveNode.getSlaveLaunchMethod().equalsIgnoreCase("SSH")) {
slaveNode.toComputer().connect(false).get();
} else // Wait until node is online
{
waitUntilOnline(slaveNode);
} else { // Wait until node is online
waitUntilJNLPNodeIsOnline(slaveNode);
}
azureComputer.setAcceptingTasks(true);
slaveNode.clearCleanUpAction();
slaveNode.setEligibleForReuse(false);
return slaveNode;
}
}), template.getNoOfParallelJobs()));
} else {
slaveNode.setDeleteSlave(true);
}
} catch (Exception e) {
// ignore
// Couldn't bring the node back online. Mark it
// as needing deletion
azureComputer.setAcceptingTasks(false);
slaveNode.setCleanUpAction(CleanUpAction.DEFAULT, Messages._Shutdown_Slave_Failed_To_Revive());
}
}
}
@ -382,10 +483,30 @@ public class AzureCloud extends Cloud {
// provision new nodes if required
if (numberOfSlaves > 0) {
try {
final String deployment = template.provisionSlaves(
new StreamTaskListener(System.out, Charset.defaultCharset()), numberOfSlaves);
final int count = numberOfSlaves;
// Determine how many slaves we can actually provision from here and
// adjust our count (before deployment to avoid races)
int adjustedNumberOfSlaves = getAvailableVirtualMachineCount(numberOfSlaves);
if (adjustedNumberOfSlaves == 0) {
LOGGER.log(Level.INFO, "Not able to create any new nodes, at or above maximum VM count of {0}",
getMaxVirtualMachinesLimit());
}
else if (adjustedNumberOfSlaves < numberOfSlaves) {
LOGGER.log(Level.INFO, "Able to create new nodes, but can only create {0} (desired {1})",
new Object[] { adjustedNumberOfSlaves, numberOfSlaves } );
}
final int numberOfNewSlaves = adjustedNumberOfSlaves;
// Adjust number of nodes available by the number of created nodes.
// Negative to reduce number available.
this.adjustVirtualMachineCount(-adjustedNumberOfSlaves);
ExecutorService executorService = Executors.newCachedThreadPool();
Callable<AzureDeploymentInfo> callableTask = new Callable<AzureDeploymentInfo>() {
@Override
public AzureDeploymentInfo call() throws Exception {
return template.provisionSlaves(new StreamTaskListener(System.out, Charset.defaultCharset()), numberOfNewSlaves);
}
};
final Future<AzureDeploymentInfo> deploymentFuture = executorService.submit(callableTask);
for (int i = 0; i < numberOfSlaves; i++) {
final int index = i;
@ -395,35 +516,69 @@ public class AzureCloud extends Cloud {
@Override
public Node call() throws Exception {
final AzureSlave slave = provisionedSlave(
// Wait for the future to complete
AzureDeploymentInfo info = deploymentFuture.get();
final String deploymentName = info.getDeploymentName();
final String vmBaseName = info.getVmBaseName();
final String vmName = String.format("%s%d", vmBaseName, index);
AzureSlave slave = null;
try {
slave = createProvisionedSlave(
template,
deployment,
index,
count,
vmName,
deploymentName,
ServiceDelegateHelper.getConfiguration(template));
// Get virtual machine properties
LOGGER.log(Level.INFO,
"Azure Cloud: provision: Getting slave {0} ({1}) properties",
new Object[] { slave.getNodeName(), slave.getOsType() });
}
catch (Exception e) {
LOGGER.log(
Level.SEVERE,
String.format("Failure creating provisioned slave '%s'", vmName),
e);
// Attempt to terminate whatever was created
AzureManagementServiceDelegate.terminateVirtualMachine(
ServiceDelegateHelper.getConfiguration(template), vmName,
template.getResourceGroupName());
template.getAzureCloud().adjustVirtualMachineCount(1);
// Update the template status given this new issue.
template.handleTemplateProvisioningFailure(e.getMessage(), FailureStage.PROVISIONING);
throw e;
}
try {
template.setVirtualMachineDetails(slave);
LOGGER.log(Level.INFO, "Azure Cloud: provision: Adding slave {0} to Jenkins nodes", slave.getNodeName());
// Place the node in blocked state while it starts.
slave.blockCleanUpAction();
Jenkins.getInstance().addNode(slave);
if (slave.getSlaveLaunchMethod().equalsIgnoreCase("SSH")) {
LOGGER.info("Azure Cloud: provision: Adding slave to azure nodes ");
Jenkins.getInstance().addNode(slave);
slave.toComputer().connect(false).get();
} else if (slave.getSlaveLaunchMethod().equalsIgnoreCase("JNLP")) {
LOGGER.info("Azure Cloud: provision: Checking for slave status");
// slaveTemplate.waitForReadyRole(slave);
LOGGER.info("Azure Cloud: provision: Adding slave to azure nodes ");
Jenkins.getInstance().addNode(slave);
// Wait until node is online
waitUntilOnline(slave);
waitUntilJNLPNodeIsOnline(slave);
}
// Place node in default state, now can be
// dealt with by the cleanup task.
slave.clearCleanUpAction();
} catch (Exception e) {
template.handleTemplateStatus(
e.getMessage(), FailureStage.POSTPROVISIONING, slave);
LOGGER.log(
Level.SEVERE,
String.format("Failure to in post-provisioning for '%s'", vmName),
e);
// Attempt to terminate whatever was created
AzureManagementServiceDelegate.terminateVirtualMachine(
ServiceDelegateHelper.getConfiguration(template), vmName,
template.getResourceGroupName());
template.getAzureCloud().adjustVirtualMachineCount(1);
// Update the template status
template.handleTemplateProvisioningFailure(vmName, FailureStage.POSTPROVISIONING);
// Remove the node from jenkins
Jenkins.getInstance().removeNode(slave);
throw e;
}
return slave;
@ -439,11 +594,17 @@ public class AzureCloud extends Cloud {
}
}
LOGGER.log(Level.INFO,
"AzureCloud: provision: asynchronous provision finished, returning {0} planned node(s)", plannedNodes.size());
return plannedNodes;
}
/** this methods wait for node to be available */
private void waitUntilOnline(final AzureSlave slave) {
/**
* Wait till a node that connects through JNLP comes online and connects to Jenkins.
* @param slave Node to wait for
* @throws Exception Throws if the wait time expires or other exception happens.
*/
private void waitUntilJNLPNodeIsOnline(final AzureSlave slave) throws Exception {
LOGGER.log(Level.INFO, "Azure Cloud: waitUntilOnline: for slave {0}", slave.getDisplayName());
ExecutorService executorService = Executors.newCachedThreadPool();
Callable<String> callableTask = new Callable<String>() {
@ -465,8 +626,7 @@ public class AzureCloud extends Cloud {
String result = future.get(30, TimeUnit.MINUTES);
LOGGER.log(Level.INFO, "Azure Cloud: waitUntilOnline: node is alive , result {0}", result);
} catch (Exception ex) {
LOGGER.log(Level.INFO, "Azure Cloud: waitUntilOnline: Failure waiting till online", ex);
markSlaveForDeletion(slave, Constants.JNLP_POST_PROV_LAUNCH_FAIL);
throw new AzureCloudException("Azure Cloud: waitUntilOnline: Failure waiting till online", ex);
} finally {
future.cancel(true);
executorService.shutdown();
@ -477,8 +637,7 @@ public class AzureCloud extends Cloud {
* Checks if node configuration matches with template definition.
*/
private static boolean isNodeEligibleForReuse(AzureSlave slaveNode, AzureSlaveTemplate slaveTemplate) {
// Do not reuse slave if it is marked for deletion.
if (slaveNode.isDeleteSlave()) {
if (!slaveNode.isEligibleForReuse()) {
return false;
}
@ -495,14 +654,6 @@ public class AzureCloud extends Cloud {
return false;
}
private static void markSlaveForDeletion(AzureSlave slave, String message) {
slave.setTemplateStatus(Constants.TEMPLATE_STATUS_DISBALED, message);
if (slave.toComputer() != null) {
slave.toComputer().setTemporarilyOffline(true, OfflineCause.create(Messages._Slave_Failed_To_Connect()));
}
slave.setDeleteSlave(true);
}
@Extension
public static class DescriptorImpl extends Descriptor<Cloud> {
@ -518,13 +669,18 @@ public class AzureCloud extends Cloud {
public int getDefaultMaxVMLimit() {
return Constants.DEFAULT_MAX_VM_LIMIT;
}
public String getDefaultResourceGroupName() {
return Constants.DEFAULT_RESOURCE_GROUP_NAME;
}
public FormValidation doVerifyConfiguration(
@QueryParameter String subscriptionId,
@QueryParameter String clientId,
@QueryParameter String clientSecret,
@QueryParameter String oauth2TokenEndpoint,
@QueryParameter String serviceManagementURL) {
@QueryParameter String serviceManagementURL,
@QueryParameter String resourceGroupName) {
if (StringUtils.isBlank(subscriptionId)) {
return FormValidation.error("Error: Subscription ID is missing");
@ -541,13 +697,17 @@ public class AzureCloud extends Cloud {
if (StringUtils.isBlank(serviceManagementURL)) {
serviceManagementURL = Constants.DEFAULT_MANAGEMENT_URL;
}
if (StringUtils.isBlank(resourceGroupName)) {
resourceGroupName = Constants.DEFAULT_RESOURCE_GROUP_NAME;
}
String response = AzureManagementServiceDelegate.verifyConfiguration(
subscriptionId,
clientId,
clientSecret,
oauth2TokenEndpoint,
serviceManagementURL);
serviceManagementURL,
resourceGroupName);
if (Constants.OP_SUCCESS.equalsIgnoreCase(response)) {
return FormValidation.ok(Messages.Azure_Config_Success());

View File

@ -21,6 +21,7 @@ import org.kohsuke.stapler.DataBoundConstructor;
import com.microsoftopentechnologies.azure.exceptions.AzureCloudException;
import com.microsoftopentechnologies.azure.retry.LinearRetryForAllExceptions;
import com.microsoftopentechnologies.azure.util.CleanUpAction;
import com.microsoftopentechnologies.azure.util.Constants;
import com.microsoftopentechnologies.azure.util.ExecutionEngine;
@ -31,67 +32,85 @@ import java.util.logging.Level;
public class AzureCloudRetensionStrategy extends RetentionStrategy<AzureComputer> {
public final long idleTerminationMillis;
// Configured idle termination
private final long idleTerminationMillis;
private static final Logger LOGGER = Logger.getLogger(AzureManagementServiceDelegate.class.getName());
@DataBoundConstructor
public AzureCloudRetensionStrategy(int idleTerminationMinutes) {
this.idleTerminationMillis = TimeUnit2.MINUTES.toMillis(idleTerminationMinutes);
}
/**
* Called by Jenkins to determine what to do with a particular node.
* Node could be shut down, deleted, etc.
* @param slaveNode Node to check
* @return Number of minutes before node will be checked again.
*/
@Override
public long check(final AzureComputer slaveNode) {
// if idleTerminationMinutes is zero then it means that never terminate the slave instance
// an active node or one that is not yet up and running are ignored as well
if (idleTerminationMillis > 0 && slaveNode.isIdle() && slaveNode.isProvisioned()
&& idleTerminationMillis < (System.currentTimeMillis() - slaveNode.getIdleStartMilliseconds())) {
// block node for further tasks
slaveNode.setAcceptingTasks(false);
LOGGER.log(Level.INFO, "AzureCloudRetensionStrategy: check: Idle timeout reached for slave: {0}",
slaveNode.getName());
// Determine whether we can recycle this machine.
// The CRS is the way that nodes that are currently operating "correctly"
// can be retained/reclaimed. Any failure modes need to be dealt with through
// the clean up task.
boolean canRecycle = true;
// Node must be idle
canRecycle &= slaveNode.isIdle();
// The node must also be online. This also implies not temporarily disconnected
// (like by a user).
canRecycle &= slaveNode.isOnline();
// The configured idle time must be > 0 (which means leave forever)
canRecycle &= idleTerminationMillis > 0;
// The number of ms it's been idle must be greater than the current idle time.
canRecycle &= idleTerminationMillis < (System.currentTimeMillis() - slaveNode.getIdleStartMilliseconds());
if (slaveNode.getNode() == null) {
return 1;
}
final AzureSlave slave = slaveNode.getNode();
if (canRecycle) {
LOGGER.log(Level.INFO, "AzureCloudRetensionStrategy: check: Idle timeout reached for slave: {0}, action: {1}",
new Object [] {slaveNode.getName(), slave.isShutdownOnIdle() ? "shutdown" : "delete"} );
java.util.concurrent.Callable<Void> task = new java.util.concurrent.Callable<Void>() {
@Override
public Void call() throws Exception {
LOGGER.log(Level.INFO, "AzureCloudRetensionStrategy: going to idleTimeout slave: {0}",
// Block cleanup while we execute so the cleanup task doesn't try to take it
// away (node will go offline). Also blocks cleanup in case of shutdown.
slave.blockCleanUpAction();
if (slave.isShutdownOnIdle()) {
LOGGER.log(Level.INFO, "AzureCloudRetensionStrategy: going to idleTimeout slave: {0}",
slaveNode.getName());
slaveNode.getNode().idleTimeout();
slave.shutdown(Messages._Idle_Timeout_Shutdown());
} else {
slave.deprovision(Messages._Idle_Timeout_Delete());
}
return null;
}
};
try {
ExecutionEngine.executeWithRetry(task,
ExecutionEngine.executeAsync(task,
new LinearRetryForAllExceptions(
30, // maxRetries
30, // waitinterval
30 * 60 // timeout
));
} catch (AzureCloudException ae) {
LOGGER.log(Level.INFO, "AzureCloudRetensionStrategy: check: could not terminate or shutdown {0}",
slaveNode.getName());
LOGGER.log(Level.INFO, "AzureCloudRetensionStrategy: check: could not terminate or shutdown {0}: {1}",
new Object [] { slaveNode.getName(), ae });
// If we have an exception, set the slave for deletion. It's unlikely we'll be able to shut it down properly ever.
slaveNode.getNode().setCleanUpAction(CleanUpAction.DELETE, Messages._Failed_Initial_Shutdown_Or_Delete());
} catch (Exception e) {
LOGGER.log(Level.INFO,
"AzureCloudRetensionStrategy: execute: Exception occured while calling timeout on node", e);
// We won't get exception for RNF , so for other exception types we can retry
if (e.getMessage().contains("not found in the currently deployed service")) {
LOGGER.info("AzureCloudRetensionStrategy: execute: Slave does not exist "
+ "in the subscription anymore, setting shutdownOnIdle to True");
slaveNode.getNode().setShutdownOnIdle(true);
}
}
// close channel
try {
slaveNode.setProvisioned(false);
if (slaveNode.getChannel() != null) {
slaveNode.getChannel().close();
}
} catch (Exception e) {
LOGGER.log(Level.INFO,
"AzureCloudRetensionStrategy: check: exception occured while closing channel for: {0}",
slaveNode.getName());
"AzureCloudRetensionStrategy: check: Exception occured while calling timeout on node {0}: {1}",
new Object [] { slaveNode.getName(), e });
// If we have an exception, set the slave for deletion. It's unlikely we'll be able to shut it down properly ever.
slaveNode.getNode().setCleanUpAction(CleanUpAction.DELETE, Messages._Failed_Initial_Shutdown_Or_Delete());
}
}
return 1;

View File

@ -15,6 +15,10 @@
*/
package com.microsoftopentechnologies.azure;
import com.microsoftopentechnologies.azure.exceptions.AzureCloudException;
import com.microsoftopentechnologies.azure.retry.NoRetryStrategy;
import com.microsoftopentechnologies.azure.util.CleanUpAction;
import com.microsoftopentechnologies.azure.util.ExecutionEngine;
import java.io.IOException;
import java.util.logging.Logger;
@ -23,13 +27,14 @@ import org.kohsuke.stapler.HttpResponse;
import hudson.slaves.AbstractCloudComputer;
import hudson.slaves.OfflineCause;
import java.util.concurrent.Callable;
import java.util.logging.Level;
public class AzureComputer extends AbstractCloudComputer<AzureSlave> {
private static final Logger LOGGER = Logger.getLogger(AzureComputer.class.getName());
private boolean provisioned = false;
private boolean setOfflineByUser = false;
public AzureComputer(final AzureSlave slave) {
super(slave);
@ -38,56 +43,80 @@ public class AzureComputer extends AbstractCloudComputer<AzureSlave> {
@Override
public HttpResponse doDoDelete() throws IOException {
checkPermission(DELETE);
AzureSlave slave = getNode();
this.setAcceptingTasks(false);
final AzureSlave slave = getNode();
if (slave != null) {
LOGGER.log(Level.INFO, "AzureComputer: doDoDelete called for slave {0}", slave.getNodeName());
setTemporarilyOffline(true, OfflineCause.create(Messages._Delete_Slave()));
slave.setDeleteSlave(true);
Callable<Void> task = new Callable<Void>() {
@Override
public Void call() throws Exception {
LOGGER.log(Level.INFO, "AzureComputer: doDoDelete called for slave {0}", slave.getNodeName());
try {
// Deprovision
slave.deprovision(Messages._User_Delete());
} catch (Exception e) {
LOGGER.log(Level.INFO, "AzureComputer: doDoDelete: Exception occurred while deleting slave", e);
throw new AzureCloudException("AzureComputer: doDoDelete: Exception occurred while deleting slave", e);
}
return null;
}
};
try {
deleteSlave();
} catch (Exception e) {
LOGGER.log(Level.INFO, "AzureComputer: doDoDelete: Exception occurred while deleting slave", e);
throw new IOException(
"Error deleting node, jenkins will try to clean up node automatically after some time. ", e);
ExecutionEngine.executeAsync(task, new NoRetryStrategy());
} catch (AzureCloudException exception) {
// No need to throw exception back, just log and move on.
LOGGER.log(Level.INFO,
"AzureSlaveCleanUpTask: execute: failed to shutdown/delete " + slave.getDisplayName(),
exception);
}
}
return new HttpRedirect("..");
}
public void deleteSlave() throws Exception, InterruptedException {
LOGGER.log(Level.INFO, "AzureComputer : deleteSlave: Deleting {0} slave", getName());
AzureSlave slave = getNode();
if (slave != null) {
if (slave.getChannel() != null) {
slave.getChannel().close();
}
try {
slave.deprovision();
} catch (Exception e) {
LOGGER.log(Level.SEVERE, "AzureComputer : Exception occurred while deleting {0} slave", getName());
LOGGER.log(Level.SEVERE, "Root cause", e);
throw e;
}
}
public boolean isSetOfflineByUser() {
return setOfflineByUser;
}
public void setProvisioned(boolean provisioned) {
this.provisioned = provisioned;
public void setSetOfflineByUser(boolean setOfflineByUser) {
this.setOfflineByUser = setOfflineByUser;
}
public boolean isProvisioned() {
return this.provisioned;
}
/**
* Wait until the node is online
* @throws InterruptedException
*/
@Override
public void waitUntilOnline() throws InterruptedException {
super.waitUntilOnline();
setProvisioned(true);
}
/**
* We use temporary offline settings to do investigation of machines.
* To avoid deletion, we assume this came through a user call and set a bit. Where
* this plugin might set things temp-offline (vs. disconnect), we'll reset the bit
* after calling setTemporarilyOffline
* @param setOffline
* @param oc
*/
@Override
public void setTemporarilyOffline(boolean setOffline, OfflineCause oc) {
setSetOfflineByUser(setOffline);
super.setTemporarilyOffline(setOffline, oc);
}
/**
* We use temporary offline settings to do investigation of machines.
* To avoid deletion, we assume this came through a user call and set a bit. Where
* this plugin might set things temp-offline (vs. disconnect), we'll reset the bit
* after calling setTemporarilyOffline
* @param setOffline
* @param oc
*/
@Override
public void setTemporarilyOffline(boolean setOffline) {
setSetOfflineByUser(setOffline);
super.setTemporarilyOffline(setOffline);
}
}

View File

@ -0,0 +1,46 @@
/*
* Copyright 2016 mmitche.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.microsoftopentechnologies.azure;
/**
* Simple class with info from a new Azure deployment
* @author mmitche
*/
public class AzureDeploymentInfo {
private String deploymentName;
private String vmBaseName;
private int vmCount;
public AzureDeploymentInfo(String deploymentName, String vmBaseName, int vmCount) {
this.deploymentName = deploymentName;
this.vmBaseName = vmBaseName;
this.vmCount = vmCount;
}
public String getDeploymentName() {
return deploymentName;
}
public String getVmBaseName() {
return vmBaseName;
}
public int getVmCount() {
return vmCount;
}
}

View File

@ -25,6 +25,7 @@ import jenkins.model.Jenkins;
import org.kohsuke.stapler.DataBoundConstructor;
import com.microsoftopentechnologies.azure.util.Constants;
import com.microsoftopentechnologies.azure.util.CleanUpAction;
import com.microsoftopentechnologies.azure.util.FailureStage;
import com.microsoftopentechnologies.azure.remote.AzureSSHLauncher;
@ -39,6 +40,7 @@ import hudson.slaves.ComputerLauncher;
import hudson.slaves.OfflineCause;
import hudson.slaves.RetentionStrategy;
import java.util.logging.Level;
import org.jvnet.localizer.Localizable;
public class AzureSlave extends AbstractCloudSlave {
@ -87,10 +89,20 @@ public class AzureSlave extends AbstractCloudSlave {
private String templateName;
private boolean deleteSlave;
private CleanUpAction cleanUpAction;
private Localizable cleanUpReason;
private String resourceGroupName;
private static final Logger LOGGER = Logger.getLogger(AzureSlave.class.getName());
private final boolean executeInitScriptAsRoot;
private final boolean doNotUseMachineIfInitFails;
private boolean eligibleForReuse;
@DataBoundConstructor
public AzureSlave(
final String name,
@ -111,6 +123,7 @@ public class AzureSlave extends AbstractCloudSlave {
final String adminPassword,
final String jvmOptions,
final boolean shutdownOnIdle,
final boolean eligibleForReuse,
final String deploymentName,
final int retentionTimeInMin,
final String initScript,
@ -120,7 +133,11 @@ public class AzureSlave extends AbstractCloudSlave {
final String oauth2TokenEndpoint,
final String managementURL,
final String slaveLaunchMethod,
final boolean deleteSlave) throws FormException, IOException {
final CleanUpAction cleanUpAction,
final Localizable cleanUpReason,
final String resourceGroupName,
final boolean executeInitScriptAsRoot,
final boolean doNotUseMachineIfInitFails) throws FormException, IOException {
super(name, nodeDescription, remoteFS, numExecutors, mode, label, launcher, retentionStrategy, nodeProperties);
@ -132,6 +149,7 @@ public class AzureSlave extends AbstractCloudSlave {
this.adminPassword = adminPassword;
this.jvmOptions = jvmOptions;
this.shutdownOnIdle = shutdownOnIdle;
this.eligibleForReuse = eligibleForReuse;
this.deploymentName = deploymentName;
this.retentionTimeInMin = retentionTimeInMin;
this.initScript = initScript;
@ -143,7 +161,11 @@ public class AzureSlave extends AbstractCloudSlave {
this.oauth2TokenEndpoint = oauth2TokenEndpoint;
this.managementURL = managementURL;
this.slaveLaunchMethod = slaveLaunchMethod;
this.deleteSlave = deleteSlave;
this.setCleanUpAction(cleanUpAction);
this.setCleanupReason(cleanUpReason);
this.resourceGroupName = resourceGroupName;
this.executeInitScriptAsRoot = executeInitScriptAsRoot;
this.doNotUseMachineIfInitFails = doNotUseMachineIfInitFails;
}
public AzureSlave(
@ -162,6 +184,7 @@ public class AzureSlave extends AbstractCloudSlave {
final String adminPassword,
final String jvmOptions,
final boolean shutdownOnIdle,
final boolean eligibleForReuse,
final String deploymentName,
final int retentionTimeInMin,
final String initScript,
@ -171,7 +194,11 @@ public class AzureSlave extends AbstractCloudSlave {
final String oauth2TokenEndpoint,
final String managementURL,
final String slaveLaunchMethod,
final boolean deleteSlave) throws FormException, IOException {
final CleanUpAction cleanUpAction,
final Localizable cleanUpReason,
final String resourceGroupName,
final boolean executeInitScriptAsRoot,
final boolean doNotUseMachineIfInitFails) throws FormException, IOException {
this(name,
templateName,
@ -181,11 +208,8 @@ public class AzureSlave extends AbstractCloudSlave {
numExecutors,
mode,
label,
slaveLaunchMethod.equalsIgnoreCase("SSH")
? osType.equalsIgnoreCase("Windows")
? new AzureSSHLauncher()
: new AzureSSHLauncher()
: new JNLPLauncher(),
slaveLaunchMethod.equalsIgnoreCase("SSH") ?
new AzureSSHLauncher() : new JNLPLauncher(),
new AzureCloudRetensionStrategy(retentionTimeInMin),
Collections.<NodeProperty<?>>emptyList(),